You are on page 1of 46

Contents Useful Stata Commands (for Stata version 12) 1reliminaries for #1, <ot.C,; Labs........................................................................................................... A ".

Loadin( <ata.......................................................................................................................................... A Kenneth L. Simons "1. $emory in Stata 7ersion 11 or !arlier ............................................................................................ A 0. 7ariable Lists) ,f*Statements) and ;%tions ............................................................................................ B C. Lo&ercase and U%%ercase Letters.......................................................................................................... B <. #evie& +indo&) and "bbreviatin( Command =ames ......................................................................... B !. 7ie&in( and Summarizin( <ata ............................................................................................................ B !1. Cust Loo'in( ..................................................................................................................................... B This document briefly summarizes Stata commands useful in $inimum) inancial !conometrics and .................... D !2. $ean) 7ariance) =umber of =on*missin( ;bservations) $a-imum) !tc. "dvanced #esearch $ethods <ensity unction !stimates..................................................................... D !5. Tabulations) .isto(rams) !6. Scatter 1lots and ;ther 1lots ............................................................................................................ D This %resumes a basic &or'in( 'no&led(e of ho& to o%en Stata) use the menus) use the data editor) andE !A. Correlations and Covariances........................................................................................................... use the do*fileand editor. +e &ill cover these to%ics in early Stata sessions in class. ,f you miss the . >eneratin( Chan(in( 7ariables ....................................................................................................... E sessions) you mi(ht as' a fello& student to sho& you throu(h basic usa(e of Stata) and (et the 1. >eneratin( 7ariables ........................................................................................................................ E recommended about Stata for the course and use it to %ractice &ith Stata. 2. $issin( te-t <ata..................................................................................................................................... E 5. True* alse 7ariables......................................................................................................................... F $ore information is available in La&rence C. .amilton/s Statistics with Stata ) Christo%her 6. re%lete #andom =umbers ........................................................................................................................... 1G . 0aum/s An Introduction to7ariables Modern Econometrics Using Stata ) and ". Colin Cameron and 1ravin 1G K. A. #e%lacin( 7alues of ....................................................................................................... Trivedi/s Microeconometrics using Stata . See2 htt%233&&&.stata.com3boo'store3boo's*on*stata3 . 1G B. >ettin( #id of 7ariables ................................................................................................................. D. ,f*then*else ormulas...................................................................................................................... 11 #eaders on the ,nternet2 , a%olo(ize but , cannot (enerally ans&er Stata 4uestions. Useful %laces to 11 E. Huic' Calculations.......................................................................................................................... direct 4uestions are2 (1) built*in hel% and manuals (see Stata/s .el% menu)) (2) your friends and 11 F. Stata $ore................................................................................................................................................ collea(ues) (5) Stata/s technical staff ,ntervals (you &ill............................................................................ need your serial number)) (6) Statalist >. $eans2 .y%othesis Tests andsu%%ort Confidence 11 (htt%233&&&.stata.com3statalist3 ) (but chec' the Statalist archives before as'in( a 4uestion there). >1. Confidence ,ntervals ...................................................................................................................... 11 >2. .y%othesis Tests ............................................................................................................................ 12 Throu(hout) estimation commands s%ecify robust standard errors (!ic'er*.uber*+hite heteros'edastic*consistent standard .. ;LS #e(ression (and +LS and >LS) ................................................................................................. 12 errors). This does not im%ly that robust rather than conventional estimates of 7ar8 b 9X: should al&ays be used) nor that they .1. 7ariable Lists &ith "utomated Cate(ory <ummies and ,nteractions 12 are sufficient. ;ther estimators sho&n here include <avidson and $acKinnon/s im%roved ........................................... small*sam%le robust estimators for .2. ,m%rovedestimators #obust Standard !rrors Sam%les..................................................................... 15 ;LS) cluster*robust useful &hen errors in mayinite be arbitrarily correlated &ithin (rou%s (one a%%lication is across time for .5. an individual)) and the =e&ey*+est estimator to allo& for time series correlation of errors. Selected >LS estimators are +ei(hted Least S4uares................................................................................................................. 15 listed as &ell. .o%efully the constant %resence of ?vce(robust)@ in estimation commands &ill ma'e readers sensitive to the 16 .6. easible >eneralized Least S4uares............................................................................................... need to account for heteros'edasticity and other %ro%erties of errors ty%ical in real data and models . ,. 1ost*!stimation Commands .................................................................................................................. 16 ,1. itted 7alues) #esiduals) and #elated 1lots .................................................................................... 16 ,2. Confidence ,ntervals and .y%othesis Tests..................................................................................... 16 ,5. =onlinear .y%othesis Tests............................................................................................................. 1A ,6. Com%utin( !stimated !-%ected 7alues for the <e%endent 7ariable.............................................. 1A ,A. <is%layin( "dIusted # 2 and ;ther !stimation #esults ................................................................... 1B ,B. 1lottin( "ny $athematical unction .............................................................................................. 1B ,D. ,nfluence Statistics........................................................................................................................... 1B ,E. unctional orm Test....................................................................................................................... 1D ,F. .eteros'edasticity Tests .................................................................................................................. 1D ,1G. Serial Correlation Tests ................................................................................................................. 1E ,11. 7ariance ,nflation actors ............................................................................................................. 1E ,12. $ar(inal !ffects ............................................................................................................................ 1E C. Tables of #e(ression #esults ................................................................................................................ 1E

1 2

CG. Co%yin( and 1astin( from Stata to a +ord 1rocessor or S%readsheet 1ro(ram............................. 1E C1. Tables of #e(ression #esults Usin( Stata/s 0uilt*,n Commands ................................................... 1F C2. Tables of #e(ression #esults Usin( "dd*;n Commands............................................................... 2G C2a. ,nstallin( or "ccessin( the "dd*;n Commands ....................................................................... 2G C2b. Storin( #esults and $a'in( Tables........................................................................................... 2G C2c. =ear*1ublication*Huality Tables ............................................................................................... 21 C2d. Understandin( the Table Command/s ;%tions ......................................................................... 21 C2e. Savin( Tables as iles ............................................................................................................... 22 C2f. +ide Tables ............................................................................................................................... 22 C2(. Storin( "dditional #esults ........................................................................................................ 22 C2h. Clearin( Stored #esults ............................................................................................................. 22 C2i. $ore ;%tions and #elated Commands ...................................................................................... 25 K. <ata Ty%es) +hen 5.5 5.5) and $issin( 7alues ............................................................................... 25 L. #esults #eturned after Commands ....................................................................................................... 25 $. <o* iles and 1ro(rams ........................................................................................................................ 26 =. $onte*Carlo Simulations ..................................................................................................................... 2A ;. <oin( Thin(s ;nce for !ach >rou% .................................................................................................... 2B 1. >eneratin( 7ariables for Time*Series and 1anel <ata......................................................................... 2B 11. Creatin( a Time 7ariable................................................................................................................ 2B 11a. Time 7ariable that Starts from a irst Time and ,ncreases by 1 at !ach ;bservation............. 2D 11b. Time 7ariable from a <ate Strin( ............................................................................................ 2D 11c. Time 7ariable from $ulti%le (e.(.) Jear and $onth) 7ariables.............................................. 2E 12. Tellin( Stata Jou .ave Time Series or 1anel <ata ....................................................................... 2E 15. La(s) or&ard Leads) and <ifferences ........................................................................................... 2E 16. >eneratin( $eans and ;ther Statistics by ,ndividual) Jear) or >rou%.......................................... 2F H. 1anel <ata Statistical $ethods ............................................................................................................ 2F H1. i-ed !ffects K Usin( <ummy 7ariables ...................................................................................... 2F H2. i-ed !ffects K <e*$eanin(.......................................................................................................... 5G H5. ;ther 1anel <ata !stimators .......................................................................................................... 5G H6. Time*Series 1lots for $ulti%le ,ndividuals.................................................................................... 51 #. 1robit and Lo(it $odels....................................................................................................................... 51 #1. ,nter%retin( Coefficients in 1robit and Lo(it $odels .................................................................... 51 S. ;ther $odels for Limited <e%endent 7ariables .................................................................................. 56 S1. Censored and Truncated #e(ressions &ith =ormally <istributed !rrors ...................................... 56 S2. Count <ata $odels ......................................................................................................................... 56 S5. Survival $odels (a.'.a. .azard $odels) <uration $odels) ailure Time $odels) ....................... 56 T. ,nstrumental 7ariables #e(ression....................................................................................................... 5A T1. >$$ ,nstrumental 7ariables #e(ression...................................................................................... 5B T2. ;ther ,nstrumental 7ariables $odels ............................................................................................ 5D U. Time Series $odels ............................................................................................................................. 5D U1. "utocorrelations............................................................................................................................. 5D U2. "utore(ressions ("#) and "utore(ressive <istributed La( ("<L) $odels ................................. 5D U5. ,nformation Criteria for La( Len(th Selection .............................................................................. 5E U6. "u(mented <ic'ey uller Tests for Unit #oots ............................................................................ 5E UA. orecastin(..................................................................................................................................... 5E UB. =e&ey*+est .eteros'edastic*and*"utocorrelation*Consistent Standard !rrors .......................... 5F 5

UD. <ynamic $ulti%liers and Cumulative <ynamic $ulti%liers ......................................................... 5F 7. System !stimation Commands ............................................................................................................ 6G 71. >$$ System !stimators............................................................................................................... 6G 72. Three*Sta(e Least S4uares............................................................................................................. 6G 75. Seemin(ly Unrelated #e(ression................................................................................................... 61 76. $ultivariate #e(ression ................................................................................................................. 61 +. le-ible =onlinear !stimation $ethods ............................................................................................. 61 +1. =onlinear Least S4uares ............................................................................................................... 61 +2. >eneralized $ethod of $oments !stimation for Custom $odels ............................................... 62 +5. $a-imum Li'elihood !stimation for Custom $odels ................................................................. 62 L. <ata $ani%ulation Tric's .................................................................................................................... 62 L1. Combinin( <atasets2 "ddin( #o&s ............................................................................................... 62 L2. Combinin( <atasets2 "ddin( Columns.......................................................................................... 62 L5. #esha%in( <ata .............................................................................................................................. 6A L6. Convertin( 0et&een Strin(s and =umbers .................................................................................... 6B LA. Labels ............................................................................................................................................. 6B LB. =otes .............................................................................................................................................. 6D LD. $ore Useful Commands ................................................................................................................ 6D

Useful Stata (7ersion 12) Commands

". Loadin( <ata edit ;%ens the data editor) to ty%e in or %aste data. Jou must close the data editor before you can run any further commands. use Mfilename.dtaM #eads in a Stata*format data file insheet usin( Mfilename.t-tM #eads in te-t data. im%ort e-cel Mfilename.-ls-M) firstro& #eads data from an !-cel file/s first &or'sheet) treatin( the first ro& as variable names. im%ort e-cel Mfilename.-ls-M) sheet(M%rice dataM) firstro& #eads data from the &or'sheet named ?%rice data@ in an !-cel file) treatin( the first ro& as variable names. save Mfilename.dtaM Saves the data. 0efore you load or save files) you may need to chan(e to the ri(ht directory. Under the ile menu) choose ?Chan(e +or'in( <irectoryN@) or use Stata/s ?cd@ command. "1. $emory in Stata 7ersion 11 or !arlier "s of this &ritin() Stata is in version 12. ,f you are usin( Stata version 11 or earlier) and you &ill read in a bi( dataset) then before readin( in your data you must tell Stata to ma'e available enou(h com%uter memory for your data. or e-am%le2 set memory 1GGm Sets memory available for data to 1GG me(abytes. Clear before settin(. ,f you (et a messa(e &hile usin( Stata 11 or earlier that there is not enou(h memor y) then clear the e-istin( data (&ith the ?clear@ command)) set the memory to a lar(e enou(h amount) and then re*do your analyses as necessary K you should be savin( your &or' in a do file) as noted belo& in section $).

0. 7ariable Lists) ,f*Statements) and ;%tions $ost commands in Stata allo& (1) a list of variables) (2) an if*statement) and (5) o%tions. 1. " list of variables consists of the names of the variables) se%arated &ith s%aces. ,t (oes immediately after the command. ,f you leave the list blan') Stata assumes &here %ossible that you mean all variables. Jou can use an asteris' as a &ildcard (see Stata/s hel% for varlist). !-am%les2 edit var1 var2 var5 ;%ens the data editor) Iust &ith variables var1) var2) and var5. edit ;%ens the data editor) &ith all variables. ,n later e-am%les) varlist means a list of variables) and varname (or yvar etc.) means one variable. 2. "n if*statement restricts the command to certain observations. Jou can also use an in*statement. ,f* and in*statements come after the list of variables. !-am%les2 edit var1 if var2 O 5 ;%ens the data editor) Iust &ith variable var1) only for observations in &hich var2 is (reater than 5. edit if var2 PP var5 ;%ens the data editor) &ith all variables) only for observations in &hich var2 e4uals var5. edit var1 in 1G ;%ens the data editor) Iust &ith var1) Iust in the 1Gth observation. edit var1 in 1G132GG ;%ens the data editor) Iust &ith var1) in observations 1G1*2GG. edit var1 if var2 O 5 in 1G132GG ;%ens the data editor) Iust &ith var1) in the subset of observations 1G1*2GG that meet the re4uirement var2 O 5. 5. ;%tions alter &hat the command does. There are many o%tions) de%endin( on the command K (et hel% on the command to see a list of o%tions. ;%tions (o after any variable list and if*statements) and must be %receded by a comma. <o not use an additional comma for additional o%tions (the comma &or's li'e a to((le s&itch) so a second comma turns off the use of o%tionsQ). !-am%les2 use Mfilename.dtaM) clear #eads in a Stata*format data file) clearin( all data %reviously in memoryQ (+ithout the clear o%tion) Stata refuses to let you load ne& data if you haven/t saved the old data. .ere the old data are for(otten and &ill be (one forever unless you saved some version of them.) save Mfilename.dtaM) re%lace Saves the data) re%lacin( a %reviously*e-istin( file if any. Jou &ill see more e-am%les of o%tions belo&. C. Lo&ercase and U%%ercase Letters Case matters2 if you use an u%%ercase letter &here a lo&ercase letter belon(s) or vice versa) an error messa(e &ill dis%lay. <. #evie& +indo&) and "bbreviatin( Command =ames The #evie& &indo& lists commands you ty%ed %reviously. Clic' in the #evie& &indo& to %ut a %revious command in the Command &indo& (then you can edit it as desired). <ouble*clic' to run a command. "nother shortcut is that many commands can have their names abbreviated. or e-am%le belo& instead of ty%in( ?summarize@) ?su@ &ill do) and instead of ?re(ress@) ?re(@ &ill do. !. 7ie&in( and Summarizin( <ata .ere) remember t&o %oints from above2 (1) leave a varlist blan' to mean all variables) and (2) you can use if*statements to restrict the observations used by each command. !1. Cust Loo'in( ,f you &ant to loo' at the data but not chan(e them) it is bad %ractice to use Stata/s data editor) as you could accidentally chan(e the dataQ ,nstead) use the bro&ser via the button at the to%) or by usin( the follo&in( command. ;r list the data in the main &indo&. B

bro&se varlist list varlist

;%ens the data vie&er) to loo' at data &ithout chan(in( them. Lists data. ,f there/s more than 1 screenful) %ress s%ace for the ne-t screen) or 4 to 4uit listin(.

!2. $ean) 7ariance) =umber of =on*missin( ;bservations) $inimum) $a-imum) !tc. summarize varlist See summary information for the variables listed. summarize varlist ) detail See detailed summary information for the variables listed. by byvars 2 summarize varlist See summary information se%arately for each (rou% of uni4ue values of the variables in byvars . or e-am%le) ?by (ender2 summarize &a(e@. ins%ect varlist See a mini*histo(ram) and numbers of %ositives 3 zeroes 3 ne(atives) inte(ers 3 non*inte(ers) and missin( data values) for each variable. codeboo' varlist "nother vie& of information about variables. !5. Tabulations) .isto(rams) <ensity unction !stimates tabulate varname Creates a table listin( the number of observations havin( each different value of the variable varname . tabulate var1 var2 Creates a t&o*&ay table listin( the number of observations in each ro& and column. tabulate var1 var2 ) e-act Creates the same t&o*&ay table) and carries out a statistical test of the null hy%othesis that var1 and var2 are inde%endent. The test is e-act) in that it does not rely on conver(ence to a distribution. tabulate var1 var2 ) chi2 Same as above) e-ce%t the statistical test relies on asym%totic conver(ence to a normal distribution. ,f you have lots of observations) e-act tests can ta'e a lon( time and can run out of available com%uter memoryR if so) use this test instead. histo(ram varname 1lots a histo(ram of the s%ecified variable. histo(ram varname ) bin(S) normal The bin(S) o%tion s%ecifies the number of bars. The normal o%tion overlays a normal %robability distribution &ith the same mean and variance. 'density varname ) normal Creates a ?'ernel density %lot@) &hich is an estimate of the %df that (enerated the data. The ?normal@ o%tion lets you overlay a normal %robability distribution &ith the same mean and variance. !6. Scatter 1lots and ;ther 1lots scatter yvar xvar 1lots data) &ith yvar on the vertical a-is and xvar on the horizontal a-is. scatter yvar1 yvar2 xvar 1lots multi%le variables on the vertical a-is and xvar on the horizontal a-is. Stata has lots of other %ossibilities for (ra%hs) &ith an inch*and*a*half*thic' manual. or a 4uic' &eb*based introduction to some of Stata/s (ra%hics commands) try the ?>ra%hics@ section of this &eb %a(e2 htt%233&&&.ats.ucla.edu3stat3stata3modules3 . ;r (o to Stata/s %df manuals and loo' at 8>: >ra%h intro) vie&in( es%ecially the section labeled ?" 4uic' tour.@ ;r use Stata/s .el% menu and choose ?Stata CommandN@) ty%e ?(ra%hTintro@) and %ress return. Scroll do&n %ast the table of contents and read the section labeled ?" 4uic' tour.@

!A. Correlations and Covariances The follo&in( commands com%ute the correlations and covariances bet&een any list of variables. =ote that if any of the variables listed have missin( values in some ro&s) those ro&s are i(nored in all calculations. correlate var1 var2 Com%utes the sam%le correlations bet&een variables. correlate var1 var2 ) covariance Com%utes the sam%le covariances bet&een variables. Sometimes you have missin( values in some ro&s) but &ant to use all available data &herever %ossible K i.e.) for some correlations but not others. or e-am%le) if you have data on health) nutrition) and income) and income data are missin( for FGU of your observations) then you could com%ute the correlation of health &ith nutrition usin( all of the observations) &hile com%utin( the correlations of health &ith income and of nutrition &ith income for Iust the 1GU of observations that have income data. These are called ?%air&ise@ correlations and can be obtained as follo&s2 %&corr var1 var2 Com%utes %air&ise sam%le correlations bet&een variables. . >eneratin( and Chan(in( 7ariables " variable in Stata is a &hole column of data. Jou can (enerate a ne& column of data usin( a formula) and you can re%lace e-istin( values &ith ne& ones. !ach time you do this) the calculation is done se%arately for every observation in the sam%le) usin( the same formula each time. 1. >eneratin( 7ariables (enerate newvar P N >enerate a ne& variable usin( the formula you enter in %lace of ?N@. !-am%les follo&. (en f P m V a #emember) Stata allo&s abbreviations2 ?(en@ means ?(enerate@. (en -s4uared P -W2 (en lo(income P lo((income) Use lo(() or ln() for a lo(*base*e) or lo(1G() for lo(*base*1G. (en 4 P e-%(z) 3 (1 K e-%(z)) (en a P abs(cos(-)) This uses functions for absolute value) abs()) and cosine) cos(). $any more functions are available K (et hel% for ?functions@ for a list. 2. $issin( <ata 0e a&are of missin( data in Stata. $issin( data can result &hen you com%ute a number &hose ans&er is not definedR for e-am%le) if you use ?(en lo(income P lo((income)@ then lo(income &ill be missin( for any observation in &hich income is zero or ne(ative. $issin( data can also result durin( data collectionR for e-am%le) in data on %ublicly listed com%anies often #X< e-%enditures data are unavailable. $issin( data can be entered in Stata by usin( a %eriod instead of a number. +hen you list data) a %eriod li'e&ise indicates a missin( datum. $issin( data can be used in Stata calculations. or e-am%le) you can chec' &hether lo(income is missin() and only list the data for observations &here this is true2 list if lo(incomePP. List only observations in &hich lo(income is missin(. " missin( datum counts as infinity &hen ma'in( com%arisons. or e-am%le) if lo(income is not missin() then it is less than infinity) so you could create a variable that tells &hether lo(income is non*missin( by chec'in( &hether lo(income is less the missin( value code2 (en notmiss P lo(incomeY. ,f lo(income is less than infinity) then notmiss e4uals true &hich is recorded as 1) but other&ise notmiss e4uals false &hich is recorded as G.

Should you need to distin(uish reasons &hy data are missin() you could use Stata/s ?e-tended missin( value@ codes. These codes are &ritten .a) .b) .c) N) .z. They all count as infinity &hen com%ared versus normal numbers) but com%ared to each other they are ran'ed as . Y .a Y .b Y .c Y N Y .z. or this reason) to chec' &hether a number in variable varname is missin( you should use not ? varname PP.@ but ? varname OP.@ 5. True* alse 7ariables 0elo& are e-am%les of ho& to create true*false variables in Stata. +hen you create these variables) true &ill be 1) and false &ill be G. +hen you as' Stata to chec' &hether a number means true or false) then G &ill mean false and anythin( else (includin( a missin( value) &ill mean true. The basic o%erators used &hen creatin( true*false values are PP (chec' &hether somethin( is e4ual)) Y) YP) O) OP) Q (?not@ &hich chan(es false to true and true to false)) and QP (chec' &hether somethin( is not e4ual). Jou can also use X and 9 to mean lo(ical ?and@ and ?or@ res%ectively) and you can use %arentheses as needed to (rou% %arts of your e-%ressions or e4uations. +hen creatin( true*false values) as noted above) missin( values in Stata &or' li'e infinity. So if a(e is missin( and you use ?(en old P a(e OP 1E@) then old (ets set to 1 &hen really you don/t 'no& &hether or not someone is old. ,nstead you should ?(en old P a(e OP 1E if a(eY.@. This is discussed more in section K belo&. +hen usin( true*false values) G is false and anythin( else K includin( missin( values K counts as true. So Q(G) P 1) Q(1) P G) Q(5) P G) and Q(.) P G. "(ain) use an if*statement to ensure you (enerate non*missin( values only &here a%%ro%riate. +ith these fundamentals in mind) here are e-am%les of ho& to create true*false data in Stata2 (en youn( P a(e Y 1E if a(eY. ,f a(e is less than 1E) then youn( is ?true@) re%resented in Stata as 1. ,f a(e is 1E or over) then youn( is ?false@) re%resented in Stata as G. The calculation is done only if a(e is ?less than missin()@ i.e. nonmissin( (see above)) so that no ans&er (a missin( value) &ill be (enerated if the a(e is un'no&n. (en old P a(e OP 1E if a(eY. ,f a(e is 1E or hi(her) this yields 1) other&ise this yields G (but missin(*value a(es result in missin( values for old). (en a(e1E P a(e PP 1E if a(eY. Use a sin(le e4ual si(n to set a variable e4ual to somethin(. Use a double e4ual si(n to chec' &hether the left hand side e4uals the ri(ht hand side. ,n this case) a(e1E is created and e4uals 1 if the observation has a(e 1E and G if it does not. (en youn(+oman P a(e Y 1E X femalePP1 if a(eY. X femaleY. .ere the am%ersand) ?X@) means a lo(ical and. The variable youn(+oman is created and e4uals 1 if and only if a(e is less than 1E and also female e4uals oneR other&ise it e4uals G. .ere) the ?if@ condition ensures that the ans&er &ill be missin( if either a(e or female is missin(. (en youn(;r+oman P a(eY1E 9 femalePP1 if a(eY. X femaleY. .ere the vertical bar) ?9@) means a lo(ical or. The variable youn(;r+oman is created and e4uals 1 if a(e is less than 1E or if female e4uals oneR other&ise it e4uals G. .ere) the ?if@ condition ensures that the ans&er &ill be missin( if either a(e or female is missin(. Jou could im%rove on this if* condition to ma'e the ans&er non*missin( if the %erson is 'no&n to be youn( but has a missin( value for female) or if the %erson is 'no&n to be female but has a missin( value for a(e. To do so you F

could use2 ?(en youn(;r+oman P a(eY1E 9 femalePP1 if (a(eY. X femaleY.) 9 (a(eY1E) 9 (femalePP1)@. (en a(e=ot1E P a(e QP 1E if a(eY. The ?QP@ symbol means ?not e4ual to@. (en not;ld P Qold if oldY. The ?Q@ symbol is %ronounced ?not@ and s&itches true to false or false to true. The result is the same as the variable youn( above. 6. #andom =umbers (en r1 P runiform() #andom numbers) uniformly distributed bet&een G and 1. (en r2 P rnormal() #andom numbers) &ith a standard normal distribution. (en r5 P rnormal(A)2) #andom numbers) &ith a normal distribution usin( mean A and standard deviation 2. "lternatively) you could use ?(en r5 P A Z 2 V rnormal()@) or ?(en r5 P A Z 2 V invnorm(runiform())@ (en r6 P rchi2(2D) #andom numbers) &ith a chi*s4uared distribution &ith 2D de(rees of freedom. (en rA P rt(2D) #andom numbers) &ith a t*distribution &ith 2D de(rees of freedom. or other random number distributions use Stata/s menu to (et hel% for ?functions@. Jou can also set the ?seed@ for random number (eneration (e.(.) ?set seed 1256@)) to ensure that a re%roducible se4uence of random numbers &ill result thereafter K that &ay if you rerun your analyses later you can (et e-actly the same results. A. #e%lacin( 7alues of 7ariables re%lace a(es4uared P a(eW2 Chan(es the value of the variable a(es4uared) to e4ual a(e s4uared. This &ould be useful if you had made a mista'e &hen you first created the variable. re%lace youn( P a(e Y 1B if a(eY. Chan(es the value of the variable youn() to e4ual 1 if and only if a(e is less than 1B) and G other&ise. The ?if a(eY.@ !nsures that re%lacements are only made &hen values of a(e are nonmissin( K see the comments about missin( values in sections 2 and 5 above. re%lace youn( P cond(a(eY.) a(e Y 1B) .) .ere is another &ay to ensure that the ans&er is missin( if a(e is missin(. To do this) &e use Stata/s conditional function) cond(a)b)c)) &hich chec's &hether a is true and then returns b if a is true or c if a is not true (see D belo&). re%lace youn( P G if a(eOP1B X a(eY1E Chan(es the value of the variable youn( to G) but only if a(e is at least 1B and less than 1E. That is) no chan(e is made if a(e is less than 1B or if a(e is at least 1E. B. >ettin( #id of 7ariables dro% varlist >ets rid of all variables in the list. clear >ets rid of all variables) as &ell as labels &hich are discussed in section LA. clear all >ets rid of not Iust variables and labels) but also all sorts of thin(s that &e haven/t discussed yet2 matrices) scalars) constraints) clusters) %ostfile declarations) returned results) %ro(rams) mata contents) and timer settin(s) and closes all o%en files.

1G

D. ,f*then*else ormulas (en val P cond(a) b) c) Stata/s cond( if ) then) else ) &or's much li'e !-cel/s , ( if) then ) else ). +ith the statement cond(a)b)c)) Stata chec's &hether a is true and then returns b if a is true or c if a is not true. (en real&a(e P cond(yearPP1FF2) &a(eV(1EE.F316G.5)) &a(e) Creates a variable that uses one formula for observations in &hich the year is 1FF2) or a different formula if the year is not 1FF2. This %articular e-am%le &ould be useful if you have data from t&o years only) 1FF2 and 2GG6) and the consumer %rice inde- &as 16G.5 in 1FF2 and 1EE.F in 2GG6R then the e-am%le (iven here &ould com%ute the real &a(e by rescalin( 1FF2 &a(es &hile leavin( 2GG6 &a(es the same. E. Huic' Calculations dis%lay N Calculate the formula you ty%e in) and dis%lay the result. !-am%les follo&. dis%lay (A2.5*1G.G)312.D dis%lay normal(1.FB) Com%ute the %robability to the left of 1.FB usin( the cumulative standard normal distribution. dis%lay (1G)FGGG)2.52) Com%ute the %robability that an *distributed number) &ith 1G and FGGG de(rees of freedom) is less than or e4ual to 2.52. "lso) there is a function tail( n1)n2)f) P 1 K ( n1)n2)f). Similarly) you can use ttail( n)t) for the %robability that O t) for a t*distributed random variable &ith n de(rees of freedom. F. $ore or functions available in e4uations in Stata) use Stata/s .el% menu) choose Stata CommandN) and enter ?functions@. To (enerate variables se%arately for different (rou%s of observations) see the commands in sections ; and 16. or time*series and %anel data) see section 1) es%ecially the notations for la(s) leads) and differences in section 15. ,f you need to refer to a s%ecific observation number) use a reference li'e -85:) meanin( the valuable of the variable - in the 5r d observation. ,n Stata ?Tn@ means the current observation (&hen usin( (ener ate or re%lace)) so that for e-am%le -8Tn*1: means the value of - in the %recedin( observation) and ?T=@ means the number of observations) so that -8T=: means the value of - in the last observation. >. $eans2 .y%othesis Tests and Confidence ,ntervals >1. Confidence ,ntervals ci varname Confidence interval for the mean of varname (usin( asym%totic normal distribution). ci varname ) level(S) Confidence interval at SU. or e-am%le) use FF for a FFU confidence interval. by varlist 2 ci varname Com%ute confidence intervals se%arately for each uni4ue set of values of the variables in varlist . by female2 ci &or'hours Com%ute confidence intervals for the mean of &or'hours) se%arately for %eo%le &ho are males versus females. ;ther commands also re%ort confidence intervals) and may be %referable because they do more) such as com%utin( a confidence interval for the difference in means bet&een ?by@ (rou%s (e.(.) 11

bet&een men and &omen). See section >2. ("lso) Stata/s ?mean@ command re%orts confidence intervals.) >2. .y%othesis Tests ttest varname PP S Test the hy%othesis that the mean of a variable is e4ual to some number) &hich you ty%e instead of the number si(n S. ttest varname1 PP varname2 Test the hy%othesis that the mean of one variable e4uals the mean of another variable. ttest varname ) by( grou!var ) Test the hy%othesis that the mean of a sin(le variable is the same for all (rou%s. The grou!var must be a variable &ith a distinct value for each (rou%. or e-am%le) grou!var mi(ht be year) to see if the mean of a variable is the same in every year of data. .. ;LS #e(ression (and +LS and >LS) re(ress yvar xvarlist #e(ress the de%endent variable yvar on the inde%endent variables xvarlist . or e-am%le2 ?re(ress y -@) or ?re(ress y -1 -2 -5@. re(ress yvar xvarlist ) vce(robust) #e(ress) but this time com%ute robust (!ic'er*.uber*+hite) standard errors. +e are al&ays usin( the vce(robust) o%tion in !C;=*6ADG !conometrics) because &e &ant consistent (i.e)) asym%totically unbiased) results) but &e do not &ant to have to assume homos'edasticity and normality of the random error terms. So if you are in !C;=*6ADG !conometrics) remember al&ays to s%ecify the vce(robust) o%tion after estimation commands. The ?vce@ stands for variance*covariance estimates (of the estimated model %arameters). re(ress yvar xvarlist ) vce(robust) level(S) #e(ress &ith robust standard errors) and this time chan(e the confidence interval to SU (e.(. use FF for a FFU confidence interval). ;ccasionally you &ill need to re(ress without vce(robust)) to allo& %ost*re(ression tests that assume homoscedasticity. =otably) Stata dis%lays adIusted # 2 values only under the assum%tion of homoscedasticity) since the usual inter%retation of # 2 %resumes homoscedasticity. .o&ever) another &ay to see the adIusted # 2 after usin( ?re(ress) N vce(robust)@ is to ty%e ?dis%lay e(r2Ta)@R see section ,A. .1. 7ariable Lists &ith "utomated Cate(ory <ummies and ,nteractions Stata (be(innin( &ith Stata 11) allo&s you enter variable lists that automatically create dummies for cate(ories as &ell as interaction variables. or e-am%le) su%%ose you have a variable named usstate numbered 1 throu(h AG for the fifty U.S. states) and you &ant to include forty*nine G*1 dummy variables that allo& for differences bet&een the first state ("labama) say) and other states. Then you could sim%ly include i.usstate in the xvarlist for your re(ression. Similarly) su%%ose you &ant to create the interaction bet&een t&o variables) named a(e (a continuous variable) and male (a G*1 dummy variable). Then) includin( c.a(eSi.male includes the interaction (the multi%le of the t&o variables) in the re(ression. The ?c.@ in front of a(e indicates that it is a continuous variable) &hereas the ?i.@ in front of male indicates that it is a G*1 dummy variable. ,ncludin( c.a(eSi.usstate adds 6F variables to the model) a(e times each of the 6F state dummies. Use ?SS@ instead of ?S@ to add full interactions) for e-am%le c.a(eSSi.male means a(e) male) and a(e male. Similarly) c.a(eSSi.usstate means a(e) 6F state dummies) and 6F state dummies multi%lied by a(e. 12

Jou can use ?S@ to create %olynomials. or e-am%le) ?a(e a(eSa(e a(eSa(eSa(e@ is a third* order %olynomial) &ith variables a(e and a(e 2 and a(e5 . .avin( done this) you can use Stata/s ?mar(ins@ command to com%ute mar(inal effects2 the avera(e value of the derivatives d( y)3d(a(e) across all observations in the sam%le. This &or's even if your re(ression e4uation includes interactions of a(e &ith other variables. .ere are some e-am%les usin( automated cate(ory dummies and interactions) termed ?factor variables@ in the Stata manuals (see the User/s >uide U11.6 for more information)2 re( yvar x1 i"x2 ) vce(robust) ,ncludes a G*1 dummy variables for the (rou%s indicated by uni4ue values of variable -2. re( &a(e c.a(e i.male c.a(eSi.male) vce(robust) #e(ress &a(e on a(e) male) and a(e male. re( &a(e c.a(eSSi.male) vce(robust) #e(ress &a(e on a(e) male) and a(e male. 2. re( &a(e c.a(eSSi.male c.a(eSc.a(e) vce(robust) #e(ress &a(e on a(e) male) a(e male) and a(e re( &a(e c.a(eSSi.male c.a(eSc.a(e c.a(eSc.a(eSi.male) vce(robust) #e(ress &a(e on a(e) male) a(e male) a(e 2 ) and a(e2 male. re( &a(e c.a(eSSi.usstate c.a(eSc.a(e c.a(eSc.a(eSi.usstate) vce(robust) #e(ress &a(e on a(e) 6F state dummies) 6F variable that are a(e statedummy ' ) a(e2 ) and 6F variable that are a(e 2 statedummy ' ('P1)N)6F). S!eed i! 2 <on/t ?(enerate@ lots of dummy variables and interactions K instead use this ?factor notation@ to com%ute your dummy variables and interactions ?on the fly@ durin( statistical estimation. This usually is much faster and saves lots of memory) if you have a really bi( dataset. .2. ,m%roved #obust Standard !rrors in inite Sam%les or robust standard errors) an a%%arent im%rovement is %ossible. <avidson and $acKinnon V re%ort t&o variance*covariance estimation methods that seem) at least in their $onte Carlo simulations) to conver(e more 4uic'ly) as sam%le size n increases) to the correct variance* covariance estimates. Thus their methods seem better) althou(h they re4uire more com%utational time. Stata by default ma'es <avidson and $acKinnon/s recommended sim%le de(rees of freedom correction by multi%lyin( the estimated variance matri- by n3(n*K). .o&ever) students in !C;=*BADG "dvanced !conometrics learn about an alternative in &hich the s4uared residuals are rescaled. To use this formula) s%ecify ?vce(hc2)@ instead of ?vce(robust)@) to use the a%%roach discussed in .ayashi %. 12A formula 2.A.A usin( dP1 (or in >reene/s te-t) B th edition) on %. 1B6) . "n alternative is ?vce(hc5)@ instead of ?vce(robust)@ (.ayashi %a(e 12A formula 2.A.A usin( dP2 or >reene %. 1B6 footnote 1A). .5. +ei(hted Least S4uares Students in !C;=*BADG "dvanced !conometrics learn about (variance*)&ei(hted least s4uares. ,f you 'no& (to &ithin a constant multi%le) the variances of the error terms for all observations) this yields more efficient estimates (;LS &ith robust standard errors &or's %ro%erly usin( asym%totic methods but is not the most efficient estimator). Su%%ose you have) stored in a variable sdvar ) a reasonable estimate of the standard deviation of the error term for each observation. Then &ei(hted least s4uares can be %erformed as follo&s2 v&ls yvar xvarlist ) sd( sdvar)

#. <avidson and C. $acKinnon) Estimation and Inference in Econometrics ) ;-ford2 ;-ford University 1ress) 1FF5) section 1B.5. 15

.6. easible >eneralized Least S4uares Students in !C;=*BADG "dvanced !conometrics learn about feasible (eneralized least s4uares (>reene %%. 1AB*1AE and 1BF*1DA). The (rou%&ise heteros'edasticity model can be estimated by com%utin( the estimated standard deviation for each (rou% usin( >reene/s (B th edition) e4uation E* 5B (%. 1D5)2 do the ;LS re(ression) (et the residuals) and use ?by grou!vars 2 e(en estvar P mean( residual W2)@ &ith a%%ro%riate variable names in %lace of the italicized &or ds) then ?(en estsd P s4rt( estvar )) then use this estimated standard deviation to carry out &ei(hted least s4uares as sho&n above. (To (et the residuals) see section ,1 belo&). ;r) if your inde%endent variables are Iust the (rou% variables (cate(orical variables that indicate &hich observation is in each (rou%) you can use the command2 v&ls yvar xvarlist The multi%licative heteros'edasticity model is available via a free thir d*%arty add* on command for Stata. See section C2a of this document for ho& to use add*on commands. ,f you have your o&n co%y of Stata) Iust use the hel% menu to search for ?s(DD@ and clic' the a%%ro%riate lin' to install. " discussion of these commands &as %ublished in the Stata Technical 0ulletin volume 62) available online at2 htt%233&&&.stata.com3%roducts3stb3Iournals3stb62.%df . The command then can be estimated li'e this (see the hel% file and Stata Technical 0ulletin for more information)2 re(hv yvar xvarlist ) var( #varlist ) robust t&osta(e ,. 1ost*!stimation Commands Commands described here &or' after ;LS re(ression. They sometimes &or' after other estimation commands) de%endin( on the command. ,1. itted 7alues) #esiduals) and #elated 1lots %redict yhatvar "fter a re(ression) create a ne& variable) havin( the name you enter here) that contains for each observation its fitted value [ y . i %redict rvar ) residuals "fter a re(ression) create a ne& variable) havin( the name you enter here) that contains for each observation its residual [ u (in the
i

notation of .ayashi and most boo's [ u is &ritten [e ). i ii scatter y yhat - 1lot variables named y and yhat versus -. scatter resids - ,t is &ise to %lot your residuals versus each of your -*variables. Such ?residual %lots@ may reveal a systematic relationshi% that your analysis has i(nored. ,t is also &ise to %lot your residuals versus the fitted values of y) a(ain to chec' for a %ossible nonlinearity that your analysis has i(nored. rvf%lot 1lot the residuals versus the fitted values of y. rv%%lot 1lot the residuals versus a ?%redictor@ (-*variable). or more such commands) see the nice ?8#: re(ress %ostestimation@ section of the Stata manuals. This manual section is a (reat %lace to learn techni4ues to chec' the trust&orthiness of re(ression results K al&ays a (ood ideaQ ,2. Confidence ,ntervals and .y%othesis Tests or a sin(le coefficient in your statistical model) the confidence interval is already re%orted in the table of re(ression results) alon( &ith a 2*sided t*test for &hether the true coefficient is zero. .o&ever) you may need to carry out *tests) as &ell as com%ute confidence intervals and t*tests for ?linear combinations@ of coefficients in the model. .ere are e-am%le commands. =ote that &hen 16

a variable name is used in this subsection) it really refers to the coefficient (the ) in front of that ' variable in the model e4uation. lincom lo(%lZlo(%'Zlo(%f Com%ute the estimated sum of three model coefficients) &hich ar e the coefficients in front of the variables named lo(%l) lo(%') and lo(%f. "lon( &ith this estimated sum) carry out a t*test &ith the null hy%othesis bein( that the linear combination e4uals zero) and com%ute a confidence interval. lincom 2Vlo(%lZ1Vlo(%'*1Vlo(%f Li'e the above) but no& the formula is a different linear combination of re(ression coefficients. lincom 2Vlo(%lZ1Vlo(%'*1Vlo(%f) level(S) "s above) but this time chan(e the confidence interval to SU (e.(. use FF for a FFU confidence interval). test lo(%lZlo(%'Zlo(%fPP1 Test the null hy%othesis that the sum of the coefficients of variables lo(%l) lo(%') and lo(%f) totals to 1. This only ma'es sense after a re(ression involvin( variables &ith these names. "fter ;LS re(ression) this is an *test. $ore (enerally) it is a +ald test. test (lo(42PPlo(41) (lo(45PPlo(41) (lo(46PPlo(41) (lo(4APPlo(41) Test the null hy%othesis that four e4uations are all true simultaneously2 the coefficient of lo(42 e4uals the coefficient of lo(41) the coefficient of lo(45 e4uals the coefficient of lo(41) the coefficient of lo(46 e4uals the coefficient of lo(41) and the coefficient of lo(4A e4uals the coefficient of lo(41R i.e.) they are all e4ual to each other. "fter ;LS re(ression) this is an *test. $ore (enerally) it is a +ald test. test -5 -6 -A Test the null hy%othesis that the coefficient of -5 e4uals G and the coefficient of -6 e4uals G and the coefficient of -A e4uals G. "fter ;LS re(ression) this is an *test. $ore (enerally) it is a +ald test. ,5. =onlinear .y%othesis Tests Students in !C;=*BADG "dvanced !conometrics learn about nonlinear hy%othesis tests. "fter estimatin( a model) you could do somethin( li'e the follo&in(2 testnl Tb8%o%density:VTb8landarea: P 5GGG Test a nonlinear hy%othesis. =ote that coefficients must be s%ecified usin( Tb) &hereas the linear ?test@ command lets you omit the Tb8:. testnl (Tb8m%(: P 13Tb8&ei(ht:) (Tb8trun': P 13Tb8len(th:) or multi*e4uation tests you can %ut %arentheses around each e4uation (or use multi%le e4uality si(ns in the same e4uationR see the Stata manual) 8#: testnl) for e-am%les). ,6. Com%utin( !stimated !-%ected 7alues for the <e%endent 7ariable di Tb8 xvarname : <is%lay the value of an estimated coefficient after a re(ression. Use the variable name ?Tcons@ for the estimated constant term. ;f course there/s no need Iust to dis%lay these numbers) but the (ood thin( is that you can use them in formulae. See the ne-t e-am%le. di Tb8Tcons: Z Tb8a(e:V2A Z Tb8female:V1 "fter a re(ression of y on a(e and female (but no other inde%endent variables)) com%ute the estimated value of y for a 2A*year*old female. See also the %redict command mentioned above in section ,1) and the mar(ins command.

1A

,A. <is%layin( "dIusted # 2 and ;ther !stimation #esults dis%lay e(r2Ta) "fter a re(ression) the adIusted #*s4uared) #2 ) can be loo'ed u% as ?e(r2Ta)@. ;r (et #2as in section C belo&. (Stata does not re%ort the adIusted # 2 &hen you do re(ression &ith robust standard errors) because robust standard errors are used &hen the variance (conditional on your ri(ht*hand*side variables) is thou(ht to differ bet&een observations) and this &ould alter the standard inter%retation of the adIusted # 2 statistic. =onetheless) %eo%le often re%ort the adIusted # 2 in this situation any&ay. ,t may still be a useful indicator) and often the (conditional) variance is still reasonably close to constant across observations) so that it can be thou(ht of as an a%%ro-imation to the adIusted # 2 statistic that &ould occur if the (conditional) variance &ere constant.) ereturn list <is%lay all results saved from the most recent model you estimated) includin( the adIusted # 2 and other items. ,tems that are matrices are not dis%layedR you can see them &ith the command ?matri- list e(matri-name)@. Study i!$ Students are stron(ly advised to understand the meanin(s of the t&o main sets of estimates that come out of re(ression models) (a) the coefficient estimates) and (b) the estimated variances and covariances of those coefficient estimates2 matri- list e(b) List the coefficient estimates of your recent re(ression. matri- list e(7) List the estimated variances and covariances of your coefficient estimates in your recent re(ression. This is a symmetric matri-) so the %art above the dia(onal is not sho&n. The dia(onal entries are estimated variances of your coefficient estimates (ta'e s4uare roots to (et the standard errors)) and the off*dia(onal entries are estimated covariances. ;nce you understand &hat both of these are) you/ll have a much better understandin( of &hat re(ression does (and you/ll %robably never need these %articular ?matri- list@ commandsQ ). ,B. 1lottin( "ny $athematical unction t&o&ay function yPe-%(*-3B)Vsin(-)) ran(e(G 12.AD) 1lot a function (ra%hically) for any function (of a sin(le variable -) that you s%ecify. " command li'e this may be useful &hen you &ant to e-amine ho& a %olynomial in one re(ressor (&hich here must be called -) affects the de%endent variable in a re(ression) &ithout s%ecifyin( values for other variables. ,D. ,nfluence Statistics ,nfluence statistics (ive you a sense of ho& much your estimates are sensitive to %articular observations in the data. This may be %articularly im%ortant if there mi(ht be errors in the data. "fter runnin( a re(ression) you can com%ute ho& much different the estimated coefficient of any (iven variable &ould be if any %articular observation &ere dro%%ed from the data. To do so for one variable) for all observations) use this command2 %redict newvarname ) dfbeta( varname ) Com%utes the influence statistic (?< 0!T"@) for varname 2 ho& much the estimated coefficient of varname &ould

1B

"

chan(e if each observation &ere e-cluded from the data. The chan(e divided by the standard error of varname ) for each observation i) is stored in the ith observation of the ne&ly created variable newvarname . Then you mi(ht use ?summarize newvarname ) detail@ to find out the lar(est values by &hich the estimates &ould chan(e (relative to the standard error of the estimate). ,f these are lar(e (say close to 1 or more)) then you mi(ht be alarmed that one or more observations may com%letely chan(e your results) so you had better ma'e sure those results are valid or else use a more robust estimation techni4ue (such as ?robust re(ression)@ &hich is not related to robust standard errors) or ?4uantile re(ression)@ both available in Stata). ,f you &ant to com%ute influence statistics for many or all re(ressors) Stata/s ?dfbeta@ command lets you do so in one ste%. ,E. unctional orm Test ,t is sometimes im%ortant to ensure that you have the ri(ht functional form for variables in your re(ression e4uation. Sometimes you don/t &ant to be %erfect) you Iust &ant to summarize rou(hly ho& some inde%endent variables affect the de%endent variable. 0ut sometimes) e.(.) if you &ant to control fully for the effects of an inde%endent variable) it can be im%ortant to (et the functional form ri(ht (e.(.) by addin( %olynomials and interactions to the model). To chec' &hether the functional form is reasonable and consider alternative forms) it hel%s to %lot the residuals versus the fitted values and versus the %redictors) as sho&n in section ,1 above. "nother a%%roach is to formally test the null hy%othesis that the %atterns in the residuals cannot be e-%lained by %o&ers of the fitted values. ;ne such formal test is the #amsey #!S!T test2 estat ovtest #amsey/s (1FBF) re(ression e4uation s%ecification error test. ,F. .eteros'edasticity Tests Students in !C;=*BADG "dvanced !conometrics learn about heteros'edasticity tests. "fter runnin( a re(ression) you can carry out +hite/s test for heteros'edasticity usin( the command2 estat imtest) &hite .eteros'edasticity tests includin( +hite test. Jou can also carry out the test by doin( the au-iliary re(ression described in the te-tboo'R indeed) this is a better &ay to understand ho& the test &or's. =ote) ho&ever) that there are many other heteros'edasticity tests that may be more a%%ro%riate. Stata/s imtest command also carries out other tests) and the commands hettest and szroeter carry out different tests for heteros'edasticity. The 0reusch*1a(an La(ran(e multi%lier test) &hich assumes normally distributed errors) can be carried out after runnin( a re(ression) by usin( the command2 estat hettest) normal .eteros'edasticity test * 0reusch*1a(an La(ran(e mulit%lier. ;ther tests that do not re4uire normally distributed errors include2 estat hettest) iid .eteros'edasticity test K Koen'er/s (1FE1)/s score test) assumes iid errors. estat hettest) fstat .eteros'edasticity test K +ooldrid(e/s (2GGB) *test) assumes iid errors. estat szroeter) rhs mtest(bonf) .eteros'edasticity test K Szroeter (1FDE) ran' test for null hy%othesis that variance of error term is unrelated to each variable. estat imtest .eteros'edasticity test K Cameron and Trivedi (1FFG)) also includes tests for hi(her*order moments of residuals (s'e&ness and 'urtosis). or further information see the Stata manuals. 1D

"

See also the ivhettest command described in section T1 of this document. This ma'es available the 1a(an*.all test &hich has advanta(es over the results from ?estat imtest@. ,1G. Serial Correlation Tests Students in !C;=*BADG "dvanced !conometrics learn about tests for serial correlation. To carry out these tests in Stata) you must first ?tsset@ your data as described in section 1 of this document (see also section U). or a 0reusch*>odfrey test &here) say) % P 5) do your re(ression and then use Stata/s ?estat b(odfrey@ command2 estat b(odfrey) la(s(1 2 5) .eteros'edasticity tests includin( +hite test. ;ther tests for serial correlation are available. or e-am%le) the <urbin*+atson d*statistic is available usin( Stata/s ?estat d&atson@ command. .o&ever) as .ayashi (%. 6A) %oints out) the <urbin*+atson statistic assumes there is no endo(eneity even under the alternative hy!othesis ) an assum%tion &hich is ty%ically violated if there is serial correlation) so you really should use the 0reusch*>odfrey test instead (or use <urbin/s alternative test) ?estat durbinalt@). or the 0o-* 1ierce H in .ayashi/s 2.1G.6 or the modified 0o-*1ierce H in .ayashi/s 2.1G.2G) you &ould need to com%ute them usin( matrices. The LIun(*0o- test is available in Stata by usin( the command2 &ntest4 varname ) la(s(S) LIun(*0o- %ortmanteau (H) test for &hite noise. ,11. 7ariance ,nflation actors Students in !C;=*BADG "dvanced !conometrics may use variance inflation factors (7, s)) &hich sho& the multi%le by &hich the estimated variance of each coefficient estimate is lar (er because of non*ortho(onality &ith other variables in the model. To com%ute the 7, s) use2 estat vif "fter a re(ression) dis%lay variance inflation factors. ,12. $ar(inal !ffects "fter usin( ?re(ress@ or almost any other estimation command) you can com%ute mar(inal effects usin( the ?mar(ins@ command (available be(innin( in Stata 11). $ar(inal effects are d(y)3d(- ' ) for continuous variables - ' ) or delta*y3delta*- ' for discrete variables - ' . ,n %articular) these ar e re%orted for the avera(e individual in the sam%le. Use factor variables &hen &ritin( the list of variables in the model) so that Stata 'no&s the &ay in &hich each variable contributes to the model K see section .1 above. .ere is a sim%le e-am%le) but you should read the Stata manual entry 8#: mar(ins if you %lan to use the mar(ins command much. mar(ins a(e "fter a re(ression &here the -*variables involve a(e) com%ute d(y)3d(a(e) on avera(e amon( individuals in the sam%le. C. Tables of #e(ression #esults This section &ill ma'e your &or' much easierQ Jou can store results of re(ressions) and use %reviously stored results to dis%lay a table. This ma'es it much easier to create tables of re(ression results in +ord. 0y co%yin( and %astin() most of the &or' of creatin( the table is trivial) &ithout errors from ty%in( &ron( numbers. Stata has built*in commands for ma'in( tables) and you should try them to see ho& they &or') as described in section C1. ,n %ractice it &ill be much easier to use add*on commands) that you install) discussed in section C2. CG. Co%yin( and 1astin( from Stata to a +ord 1rocessor or S%readsheet 1ro(ram To %ut results into !-cel or +ord) the follo&in( method is fiddly but sometimes hel%s. Select the table you &ant to co%y) or %art of it) but do not select anything additional . Then choose Co%y Table from the !dit menu. Stata &ill co%y information &ith tabs in the ri(ht %laces) to %aste easily 1E

"

into a s%readsheet or &ord %rocessin( %ro(ram. or this to &or') the %art of the table you select must be in a consistent format) i.e.) it must have the same columns every&here) and you must not select any e-tra blan' lines. (Stata fi(ures out &here the tabs (o based on the &hite s%ace bet&een columns.) "fter %astin( such tab*delimited te-t into +ord) use +ord/s ?Convert Te-t to TableN@ command to turn it into a table. ,n +ord 2GGD) from the ,nsert tab) in the Tables (rou%) clic' Table and select Convert Te-t to Table... (see2 htt%233&&&.u&ec.edu3hel%3+ordGD3tb* t-ttotable.htm )R choose <elimited data &ith Tab characters as delimiters. ;r if in Stata you used Co%y instead of Co%y Table) you can Convert Te-t to Table... and choose i-ed +idth data and indicate &here the columns brea' K but this ?fi-ed &idth@ a%%roach is dan(erous because you can easily ma'e mista'es) es%ecially if some numbers s%an multi%le columns. ,n either case) you can then adIust the font) borderlines) etc. a%%ro%riately. ,n section C2) you &ill see ho& to save tables as files that you can o%en in +ord) !-cel) and other %ro(rams. These files are often easier to use than co%yin( and %astin() and &ill hel% avoid mista'es. C1. Tables of #e(ression #esults Usin( Stata/s 0uilt*,n Commands 1lease use the more %o&erful commands in section C2 belo&. .o&ever) the commands sho&n here also &or') and are a 4uic' &ay to (et the idea. .ere is an e-am%le of ho& to store results of re(ressions) and then use %reviously stored results to dis%lay a table2 re(ress y -1) vce(robust) estimates store model1 re(ress y -1 -2 -5 -6 -A -B -D) vce(robust) estimates store model2 re(ress y -1 -2 -5 -6 -B -E -F) vce(robust) estimates store model5 estimates table model1 model2 model5 The last line above creates a table of the coefficient estimates from three re(ressions. Jou can im%rove on the table in various &ays. .ere are some su((estions2 estimates table model1 model2 model5) se ,ncludes standard errors. estimates table model1 model2 model5) star "dds asteris's for si(nificance levels. Unfortunately ?estimates table@ does not allo& the star and se o%tions to be combined) ho&ever (see section C2 for an alternative that lets you combine the t&o). estimates table model1 model2 model5) star stats(= r2 r2Ta rmse) "lso adds information on number of observations used) # 2 ) %2 ) and root mean s4uared error. (The latter is the estimated standard deviation of the error term.) estimates table model1 model2 model5) b(UD.2f) se(UD.2f) stfmt(UD.6() stats(= r2 r2Ta rmse) Similar to the above e-am%les) but formats numbers to be closer to the a%%ro%riate format for %a%ers or %ublications. The coefficients and standard errors in this case are dis%layed usin( the ?UD.2f@ format) and the statistics belo& the table are dis%layed usin( the ?UD.6(@ format. The ?UD.2f@ tells Stata to use a fi-ed &idth of ( at least) D characters to dis%lay the number) &ith 2 di(its after the decimal %oint. The ?UD.6(@ tells Stata to use a (eneral format &here it tries to choose the best &ay to dis%lay a number) tryin( to fit

1F

"

everythin( &ithin at most D characters) &ith at most 6 characters after the decimal %oint. Stata has many o%tions for ho& to s%ecify number formatsR for more information (et hel% on the Stata command ?format@. Jou can store estimates after any statistical command) not Iust re(ress. The estimates commands have lots more o%tionsR (et hel% on ?estimates table@ or ?estimates@ for information. "lso) for items you can include in the stats(N) o%tion) ty%e ?ereturn list@ after runnin( a statistical command K you can use any of the scalar results (but not macros) matr ices) or functions). C2. Tables of #e(ression #esults Usin( "dd*;n Commands ,n %ractice you &ill find it much easier to (o a ste% further. " fr ee set of third*%arty add*on commands (ives much needed fle-ibility and convenience &hen storin( results and creatin( tables. +hat is an add*on command\ Stata allo&s %eo%le to &rite commands (called ?ado files@) &hich can easily be distributed to other users. ,f you ever need to find available add*on commands) use Stata/s hel% menu and Search choosin( to search resources on the internet) and also try usin( Stata/s ?ssc@ command. C2a. ,nstallin( or "ccessin( the "dd*;n Commands &n your own com!uter ) the add*on commands used here can be %ermanently installed as follo&s2 ssc install estout) re%lace ,nstalls the estout suite of commands. In %'I(s )ot"*I& labs ) use a different method (because in the installation folder for add*on files) you don/t have file &rite %ermission). , have %ut the add*on commands in the course dis' s%ace in a folder named ?stata e-tensions@. Jou merely need to tell Stata &here to loo' (you could co%y the relevant files any&here) and Iust tell Stata &here). Ty%e the command listed belo& in Stata. Jou only need to run this command once after you start or restart Stata. 1ut the command at the be(innin( of your do*files (you also may need to include the command ?eststo clear@ to avoid any confusion &ith %revious results K see section C2h). ado%ath Z folder o+oo,In .ere) re%lace folder o+oo,In &ith the name of the folder) by usin( one of the follo&in( t&o commands (the first for !C;=*6ADG or *BABG) the second for !C;=*BADG)2 ado%ath Z M33hass11.&in.r%i.edu3classes3!C;=*6ADG3stata e-tensionsM ado%ath Z M33hass11.&in.r%i.edu3classes3!C;=*BADG3stata e-tensionsM (=ote the use of for&ard slashes above instead of the +indo&s standard of bac'slashes for file %aths. ,f you use bac'slashes) you &ill %robably need to use four bac'slashes instead of t&o at the front of the file %ath. +hy\ ,n certain settin(s) includin( in do*files) Stata converts t&o bac'slashes in a ro& into Iust one K for Stata ]^ means ^) ]_ means _) and ]] means ]) in order to %rovide a &ay to tell Stata that a dollar si(n is not the start of a (lobal macro but is Iust a dollar si(n) or a bac'4uote is not the start of a local macro but is Iust a bac'4uote. (" local macro is Stata/s name for a local variable in a %ro(ram or do*file) and a (lobal macro is Stata/s name for a (lobal variable in a %ro(ram or do*file.)) C2b. Storin( #esults and $a'in( Tables ;nce this is done) you can store results more sim%ly) store additional results not saved by Stata/s built*in commands) and create tables that re%ort information not allo&ed usin( Stata/s built*in commands.

2G

"

eststo2 re( y -1 -2) vce(robust) #e(ress y on -1 and -2 (&ith robust standard errors) and store the results. !stimation results &ill be stored &ith names li'e ?est1@) ?est2@) etc. K the name &ill be %rinted out after each command. eststo modelname 2 re( y -1 -2) vce(robust) Same as above) but you choose the name to use &hen storin( results) instead of Iust usin( ?est1@) etc. The modelname could be for e-am%le myre(1 (be(in your names &ith a letter) after &hich you can use letters) di(its G throu(h F) or underscores T u% to 52 total characters). eststo2 4uietly re( y -1 -2 -5) vce(robust) Similar to above) but ?4uietly@ tells Stata not to dis%lay any out%ut. C2c. =ear*1ublication*Huality Tables .ere is ho& to ma'e a near*%ublication*4uality table. ,n %lace of the ?est1 est2@ belo&) ty%e the names of the stored estimates that you &ant in the table. esttab est1 est2) b(a5) se(a5) star(Z G.1G V G.GA VV G.G1 VVV G.GG1) r2(5) ar2(5) scalars( ) no(a%s $a'e a near*%ublication*4uality table. Jou &ill still need to ma'e the variable names more meanin(ful) chan(e the column headin(s) and set u% the borders a%%ro%riately. .ere is ho& to save that table in a file that you can o%en in +ord. 1ut ?usin( filename @ Iust before the comma in the above command) and add the ?rtf@ o%tion after the comma. $a'e sure you chan(e directory first) so the file &ill save in the ri(ht folder. To chan(e directory) under the ile menu) choose ?Chan(e +or'in( <irectoryN@) or use Stata/s ?cd@ command. esttab est1 est2 usin( mytable) rtf b(a5) se(a5) star(Z G.1G V G.GA VV G.G1 VVV G.GG1) r2(5) ar2(5) scalars( ) no(a%s Save a near*%ublication*4uality table) %uttin( it in a rich te-t file (?mytable.rtf@) that can be o%ened by +or d. C2d. Understandin( the Table Command/s ;%tions The esttab commands for near*%ublication*4uality had a lot in them) so it may hel% to loo' at sim%ler versions of the command to understand ho& esttab &or's2 esttab <is%lays a table &ith all stored estimation results) &ith t*statistics (not standard errors). =umbers of observations used in estimation are at the bottom of each column. esttab) se <is%lays a table &ith standard errors instead of t*statistics. esttab) se ar2 <is%lay a table &ith standard errors and adIusted #*s4uared values. esttab) se ar2 scalars( ) Li'e the %revious table) but also dis%lay the *statistic of each model (versus the null hy%othesis that all coefficients e-ce%t the constant term are zero). esttab) b(a5) se(a5) ar2(2) Li'e ?esttab) se ar2@) but this controls the dis%lay format f or numbers. The ?(a5)@ ensures at least 5 si(nificant di(its for each estimated re(ression coefficient and for each standard error. The ?(2)@ (ives 2 decimal %laces for the adIusted #*s4uared values. Jou can also s%ecify standard Stata number formats in the %arentheses) e.(.) ?UF.G(@ or ?UE.2f@ could (o in the %arentheses (use Stata/s .el% menu) choose Command) and (et hel% on ?format@). esttab) star(Z G.1G V G.GA VV G.G1 VVV G.GG1) Set the %*values at &hich different asteris's are used. esttab) no(a%s >ets rid of blan' s%aces bet&een ro&s. This aids co%yin( of tables to %aste into) e.(.) +ord. 21

"

C2e. Savin( Tables as iles ,t can be hel%ful to save tables in files) &hich you can o%en later in +ord) !-cel) and other %ro(rams. "lthou(h they are not used here) you can use all the o%tions discussed above (li'e in the near*%ublication*4uality e-am%le that saved a rich te-t file for +ord)2 esttab est1 est2 usin( results.t-t) tab Save the table) &ith columns for the stored estimates named ?est1@ and ?est2@) into a tab*delimited te-t file named ?results.t-t@. esttab est1 est2 usin( results) rtf Saves a rich*te-t format file) (ood for o%enin( in +ord. esttab est1 est2 usin( results) csv Save a comma*se%arated values te-t file) named ?results.csv@) &ith the table. This is (ood for o%enin( in !-cel. .o&ever) numbers &ill a%%ear in !-cel as te-t. esttab est1 est2 usin( results) csv %lain Saves a file (ood for use in !-cel. The ?%lain@ o%tion lets you use the numbers in calculations. C2f. +ide Tables ,f you try to dis%lay estimates from many models at once) they may not all fit on the screen. The solution is to dra( the #esults &indo& to the ri(ht to allo& lon(er lines. ,f you are usin( Stata 1G or earlier) you must also use the ?set linesize S@ command as in the e-am%le belo& to actually use lon(er lines2 set linesize 16G Tell Stata to allo& 16G characters in each line of #esults &indo& out%ut. ,n any case) you can no& ma'e very &ide tables &ith lots of columns. "nother &ay to fit more in the #esults &indo& is to reduce the font size2 ri(ht*clic' or control*clic' in the #esults &indo& and chan(e your %reference for the font size. ,n $icrosoft +ord) &ide tables may best fit on landsca%e %a(es2 create a Section 0rea' be(innin( on a ne& %a(e) then format the ne& section of the document to turn the %a(e side&ays in landsca%e mode. Jou can create a ne& section brea' be(innin( on a ne& %a(e to (o bac' to vertical layout on later %a(es. "lso) $icrosoft +ord has commands to auto*fit tables to their contents or to the ?&indo&@ of available s%ace) and to auto*format tables K thou(h you &ill need to edit the automatic formattin( a%%ro%riately. C2(. Storin( "dditional #esults "fter estimatin( a statistical model) you can add additional results to the stored information. or e-am%le) you mi(ht &ant to do an *test on a (rou% of variables) or analyze a linear combination of coefficient estimates. .ere is an e-am%le of ho& to com%ute a linear combination and add information from it to the stored results. Jou can dis%lay the added information at the bottom of tables of results by usin( the scalars() o%tion2 eststo2 re( y -1 -2) vce(robust) #e(ress. lincom -1 * -2 >et estimated difference bet&een the coefficients of -1 and -2. estadd scalar -diff P r(estimate) Store the estimated difference alon( &ith the re(ression result. .ere it is stored as a scalar named -diff. estadd scalar -diffS! P r(se) Store the standard error for the estimated difference too. .ere it is stored as a scalar named -diffS!. esttab) scalars(-diff -diffS!) ,nclude -diff and -diffS! in a table of re(ression results. C2h. Clearin( Stored #esults #esults stored usin( eststo stay around until you 4uit Stata. To remove %reviously stored results) do the follo&in(2

22

"

eststo clear Clear out all %reviously stored results) to avoid confusion (or to free some #"$ memory). C2i. $ore ;%tions and #elated Commands or more e-am%les of ho& to use this suite of commands) use Stata/s on*line hel% after installin( the commands) or better yet) use this &ebsite2 htt%233fm&&&.bc.edu3re%ec3bocode3e3estout3 . ;n the &ebsite) loo' under !-am%les at the left. K. <ata Ty%es) +hen 5.5 5.5) and $issin( 7alues This section is some&hat technical and may be s'i%%ed on a first readin(. Com%uter s can store numbers in more or less com%act form) &ith more or fe&er di(its. ,f you need e-tra %recision) you can use ?double@ %recision variables instead of the default ?float@ variables (&hich are sin(le*%recision floatin(* %oint numbers). ,f you need com%act stora(e of inte(ers) to save memory (or to store %recise values of bi( inte(ers)) Stata %rovides other data ty%es) called ?byte@) ?int@) and ?lon(@. "lso) a strin( data ty%e) ?str@) is available. (en ty!e varname P N >enerate a variable of the s%ecified data*ty%e) usin( the s%ecified formula. !-am%les follo&. (en double ban'.oldin(s P 1256ABD.EF <ouble*%recision numbers have 1B di(its of accuracy) instead of about D di(its for re(ular float numbers. (en byte youn( P a(eY1B .ere since the result is a G or 1) usin( the ?byte@ number format accurately records the number in a small amount of memory. (en str name P firstname Z M M Z lastname >enerates a variable involvin( strin(s. The follo&in( commands hel% deal &ith data ty%es. describe varlist Lists technical information about variables) includin( data ty%es. com%ress varlist Chan(es data to most com%act form %ossible &ithout losin( information. ,f you com%are a floatin(*%oint number) accurate to about D di(its) to a double*%recision number) accurate to 1B di(its) don/t e-%ect them to be e4ual. The actual calculations Stata carries out are in double*%recision) even thou(h variables are ordinarily ?float@ (sin(le*%recision) to save s%ace. Su%%ose you (enerate a float*ty%e variable named ratin() e4ual to 5.5 in the first observation. Stata stores the number as 5.5 accurate to about D di(its. Then ty%in( ?list if ratin(PP5.5@ &ill fail to list the first observation. +hy\ Stata loo's u% the value of ratin() &hich in the first observation is 5.5 accurate to about D di(its) and com%ares it to the number 5.5) &hich is immediately %ut into double*%recision for the calculation and hence is accurate to 1B di(its) and hence is different from the ratin(. .ence the first observation &ill not be listed. ,nstead you could do this2 list if ratin( PP float(5.5) The float 5.5 converts to a number accurate to only about D di(its) the same as the ratin( variable. $issin( values in Stata are &ritten as a %eriod. They occur if you enter missin( values to be(in &ith) or if they arise in a calculation that has for e-am%le G3G or a missin( number %lus another number. or com%arison %ur%oses) missin( values are treated li'e infinity) and &hen you/ re not used to this you can (et some &eird results. or e-am%le) ?re%lace z P G if yO5@ causes z to be re%laced &ith G not only if y has a 'no&n value (reater than 5 but also if the value of y is missin(. ,nstead use somethin( li'e this2 ?re%lace z P G if yO5 X yY.@. The same caution a%%lies &hen (eneratin( variables) anytime you use an if*statement) etc. (see sections 2) 5) and A). L. #esults #eturned after Commands Commands often return results that can be used by %ro(rams you mi(ht &rite. To see a list of the results from the most recent command that returned results) ty%e2 25

"

return list Sho&s returned results from a (eneral command) li'e summarize. ereturn list Sho&s returned results from an estimation command) li'e re(ress. $. <o* iles and 1ro(rams Jou should become &ell used to the do*file editor) &hich is the sensible &ay to 'ee% trac' of your commands. Usin( the do*file editor) you can save %reviously used lists of commands and reo%en them &henever needed. ,f you are analyzin( data (for class &or') for a thesis) or for other reasons)) 'ee%in( your &or' in do*files both %rovides a record of &hat you did) and lets you ma'e corrections easily. This document mainly assumes you are used to the do*file editor) but belo& are t&o notes on usin( and &ritin( do*files) %lus an e-am%le of ho& to &rite a %ro(ram. "t the to% of the do*file editor are icons for various %ur%oses. $ove the mouse over each icon to dis%lay &hat it does. The set of icons varies across com%uter ty%es and versions of Stata) but mi(ht include2 ne& do*file) o%en do*file) save) %rint) find in this do*file) sho& &hite*s%ace symbols) cut) co%y) %aste) undo) redo) %revie& in vie&er) run) and do. The ?%revie& in vie&er@ icon you &on/t need (it/s useful &hen &ritin( documents such as hel% files for Stata/s vie&er). The ?do@ icon) at the ri(ht) is most im%ortant. Clic' on it to ?do@ all of the commands in the do*file editor2 the commands &ill be sent to Stata in the order listed. .o&ever) if you have selected some te-t in the do*file editor) then only the lines of te-t you selected &ill be done) instead of all of the te-t. (,f you select %art of a line) the &hole line &ill still be done.) The ?run@ icon has the same effect) e-ce%t that no out%ut is %rinted in Stata/s results &indo&. Since you &ill &ant to see &hat is ha%%enin() you should use the ?do@ icon not the ?run@ icon. Jou &ill &ant to include comments in the do*file editor) so you remember &hat your do*files &ere for. There are three &ays to include comments2 (1) %ut an asteris' at the be(innin( of a line (it is o'ay to have &hite s%ace) i.e.) s%aces and tabs) before the asteris') to ma'e the line a commentR (2) %ut a double slash ?33@ any&here in a line to ma'e the rest of the line a commentR (5) %ut a ?3V@ at the be(innin( of a comment and end it &ith ?V3@ to ma'e anythin( in bet&een a comment) even if it s%ans multi%le lines. or e-am%le) your do*file mi(ht loo' li'e this2 V $y analysis of em%loyee earnin(s data. V Since the data are used in several &ee's of the course) the do*file saves &or' for later useQ clear 33 This (ets rid of any %re*e-istin( dataQ ado%ath Z M33hass11.&in.r%i.edu3classes3!C;=*6ADG3stata e-tensionsM 33 ,f you`re in !C;=*6ADG. use ML2]myfolder]myfile.dtaM V , commented out the follo&in( three lines since ,`m not usin( them no&2 3V re(ress income a(e) vce(robust) %redict income.at scatter income.at income a(e V3 V =o& do my %olynomial a(e analyses2 (en a(e2 P a(eW2 (en a(e5 P a(eW5 eststo %52 re(ress income a(e a(e2 a(e5 bachelor) vce(robust) eststo %22 re(ress income a(e a(e2 bachelor) vce(robust) esttab %5 %2) b(a5) se(a5) star(Z G.1G V G.GA VV G.G1 VVV G.GG1) r2(5) ar2(5) scalars( ) no(a%s Jou can &rite %ro(rams in the do*file editor) and sometimes these are useful for re%etitive tas's. .ere is a %ro(ram to create some random data and com%ute the mean. ca%ture %ro(ram dro% random$ean <ro%s the %ro(ram if it e-ists already. %ro(ram define random$ean) rclass 0e(ins the %ro(ram) &hich is ?rclass@. dro% Tall <ro%s all variables. 26

"

4uietly set obs 5G Use 5G observations) and don/t say so. (en r P uniform() >enerate random numbers. summarize r Com%ute mean. return scalar avera(e P r(mean) #eturn it in r(avera(e). end =ote above that ?rclass@ means the %ro(ram can return a result. "fter doin( this code in the do*file) you can use the %ro(ram in Stata. 0e careful) as it &ill dro% all of your dataQ ,t &ill then (enerate 5G uniformly*distributed random numbers) summarize them) and return the avera(e. (0y the &ay) you can ma'e the %ro(ram &or' faster by usin( the ?meanonly@ o%tion after the summarize command above) althou(h then the %ro(ram &ill not dis%lay any out%ut.) =. $onte*Carlo Simulations ,t &ould be nice to 'no& ho& &ell our statistical methods &or' in %ractice. ;ften the only &ay to 'no& is to simulate &hat ha%%ens &hen &e (et some random data and a%%ly our statistical methods. +e do this many times and see ho& close our estimator is to bein( unbiased) normally distributed) etc. (;ur ;LS estimators &ill do better &ith lar(er sam%le sizes) &hen the -*variables are inde%endent and have lar(er variance) and &hen the random error terms are closer to normally distributed and have smaller variance.) .ere is a Stata command to call the above (at the end of section $) %ro(ram 1GG)GGG times and record the result from each time. simulate Mrandom$eanM av(Pr(avera(e)) re%s(1GGGGG) The result &ill be a dataset containin( one variable) named av() &ith 1GG)GGG observations. Then you can chec' the mean and distribution of the randomly (enerated sam%le avera(es) to see &hether they seem to be nearly unbiased and nearly normally distributed. summarize av( 'density av( ) normal ?Unbiased@ means ri(ht on avera(e. Since the sam%le mean) of say 5G inde%endent dra&s of a random variable) has been %roven to (ive an unbiased estimate of the variable/s true %o%ulation mean) you had better find that the avera(e (across all 1GG)GGG e-%eriments) result com%uted here is very close to the true %o%ulation mean. "nd the central limit theorem tells you that as a sam%le size (ets lar(er) in this case reachin( the not*so*enormous size of 5G observations) the means you com%ute should have a %robability distribution that is (ettin( close to normally distributed. 0y %lottin( the results from the 1GG)GGG e-%eriments) you can see ho& close to normally*distributed the sam%le mean is. ;f course) &e &ould (et sli(htly different results if &e did another set of 1GG)GGG random trials) and it is best to use as many trials as %ossible K to (et e-actly the ri(ht ans&er &e &ould need to do an infinite number of such e-%eriments. Try similar simulations to chec' results of ;LS re(ressions. Jou &ill need to chan(e the %ro(ram in section $ and alter the ?simulate@ command above. ;ne a%%roach is to chan(e the %ro(ram in section $ to return results named ?bG@) ?b1@) ?b2@) etc.) by settin( them e4ual to the coefficient estimates Tb8 varname :) and then alter the ?simulate@ command above to use the re(ression coefficient estimates instead of the mean (you mi(ht say ?bGPr(bG) b1Pr (b1) b2Pr(b2)@ in %lace of ?av(Pr(avera(e)@). "n easier a%%roach) thou(h) is to (et rid of the ?) rclass@ in the %ro(ram at the end of section $) and Iust do the re(ression in the %ro(ram K the re(ression command itself &ill return results that you can useR your simulate command mi(ht then be somethin( li'e ?simulate Mrandom#e(M bGPTb8Tcons: b1PTb8-1: b2PTb8-2:) re%s(1GGG)@.

2A

"

;. <oin( Thin(s ;nce for !ach >rou% Stata/s ?by@ command lets you do somethin( once for each of a number of (rou%s. <ata must be sorted first by the (rou%s. or e-am%le2 sort year Sort the data by year. by year2 re(ress income a(e) vce(robust) #e(ress se%arately for each year of data. sort year state Sort the data by year) and &ithin that by state. by year state2 re(ress income a(e) vce(robust) #e(ress se%arately for each state and year combination. Sometimes) &hen there are a lot of (rou%s) you don/t &ant Stata to dis%lay the out%ut. The ?4uietly@ command has Stata ta'e action &ithout sho&in( the out%ut2 4uietly by year2 (enerate -,n irst;bservation;fJear P -81: The ?-81:@ means loo' at the first observation of - &ithin each %articular by*(rou%. 4uietly by year (dayofyear)2 (enerate -,n irst;bservation;fJear P -81: ,n the above command) a %roblem is that you mi(ht accidentally have the data sorted the &ron( &ay &ithin each year. Listin( more variables in %arentheses after the year re4uires that &ithin each year) the data must be sorted correctly by the other variables. This doesn/t do the sortin( for you) but it ensures the sort order is correct. That &ay you 'no& &hat you/ll (et &hen you refer to the first observation of the year. 4uietly bysort year (dayofyear)2 (enerate -,n irst;bservation;fJear P -81: This is the same as above) but the ?bysort@ command sorts as re4uested before doin( the command for each by*(rou%. 4by year (dayofyear)2 (enerate -,n irst;bservation;fJear P -81: ?4by@ is shorthand for ?4uietly by@. 4bys year (dayofyear)2 (enerate -,n irst;bservation;fJear P -81: ?4bys@ is shorthand for ?4uietly bysort@. See also section 16 for more &ays to (enerate results) e.(.) means or standard deviations) se%arately for each by*(rou%. 'ower User i!$ $aster these commands for by*(rou%s to hel% ma'e yourself a data %re%aration &hiz. "lso master the ?e(en@ command (see section 16). 1. >eneratin( 7ariables for Time*Series and 1anel <ata +ith %anel and time series data) you may need to (1) create a time variableR (2) tell Stata &hat variable measures time (and for %anel data &hat variable distin(uishes individuals in the sam%le)R (5) use la(s) leads) and differencesR and (6) (enerate values se%arately for each individual in the sam%le. .ere are some commands to hel% you. 11. Creatin( a Time 7ariable Jou need a time variable that tells the year) 4uarter) month) day) second) or &hatever unit of time corres%onds to each observation. " common %roblem is to convert data from some other format) li'e a month*day*year strin() or numeric values for 4uarter and year) into a sin(le time variable. Stata has lots of tools to hel%) as documented in Stata/s hel% for ?datetime@. Some common methods are listed belo&. Jour time variable should be an inte(er) and should not usually have (a%s bet&een numbers. or e-am%le) it is o'ay to have years in the data be 1FDG) 1FD1) N) 2GGB) but if your time variable is every other year) e.(.) 1FDG) 1FD2) 1FD6) N) then you should create a ne& variable li'e time P

2B

"

(year*1FDG)32. Stata has lots of o%tions and commands to hel% &ith settin( u% 4uarterly data) etc. The follo&in( is (as al&ays in this document) Iust a start. 11a. Time 7ariable that Starts from a irst Time and ,ncreases by 1 at !ach ;bservation ,f you have not yet created a time variable) and your data are in order and do not have (a%s) you mi(ht create a year) 4uarter) or day variable as follo&s2 (enerate year P 1FGG Z Tn * 1 Create a ne& variable that s%ecifies the year) be(innin( &ith 1FGG in the first observation and increasin( by 1 thereafter. 0e sure your data are sorted in the ri(ht order first. (enerate 4uarter P 4(1FDG41) Z Tn * 1 Create a ne& variable that s%ecifies the time) be(innin( &ith 1FDG 4uarter 1 in the first observation) and increasin( by 1 4uarter in each observation. 0e sure your data are sorted in the ri(ht order first. The result is an inte(er number increasin( by 1 for each 4uarter (1FBG 4uarter 2 is s%ecified as 1) 1FBG 4uarter 5 is s%ecified as 2) etc.). format 4uarter Ut4 Tell Stata to dis%lay values of 4uarter as 4uarters. (enerate day P d(G1Ian1FBG) Z Tn * 1 Create a ne& variable that s%ecifies the time) be(innin( &ith 1 Can. 1FBG in the first observation) and increasin( by 1 day in each observation. 0e sure your data are sorted in the ri(ht order first. The result is an inte(er number increasin( by 1 for each day (G1Ian1FBG is s%ecified as G) G2 Ian1FBG is s%ecified as 2) etc.). format day Utd Tell Stata to dis%lay values of day as dates. Li'e the d(N) and 4(N) functions used above) you may also use &(N) for &ee') m( N) for month) h(N) for half*year) or y(N) for year. ,nside the %arentheses) you ty%e a year follo&ed (e-ce%t for y(N)) by a se%arator (a comma) colon) dash) or &ord) follo&ed by a second number. The second number s%ecifies the day) &ee') month) 4uarter) or half*year ((et hel% on ?functions@ and loo' under ?time*series functions@ for more information). 11b. Time 7ariable from a <ate Strin( ,f you have a strin( variable that describes the date for each observation) and you &ant to convert it to a numeric date) you can %robably use Stata/s very fle-ible date conversion functions. Jou &ill also &ant to format the ne& variable a%%ro%riately. .ere are some e-am%les2 (en t P daily(dstr) MmdyM) >enerate a variable t) startin( from a variable ?dstr@ that contains dates li'e ?<ec*1*2GG5@) ?12*1*2GG5@) ?123132GG5@) ?Canuary 1) 2GG5) ?Ian1*2GG5@) etc. =ote the MmdyM) &hich tells Stata the orderin( of the month) day) and year in the variable. ,f the order &ere year) month) day) you &ould use MymdM. format t Utd This tells Stata the variable is a date number that s%ecifies a day. Li'e the daily(N) function used above) The similar functions monthly( strvar ) MymM) or monthly( strvar ) MmyM)) and 4uarterly( strvar ) My4M) or 4uarterly( strvar ) M4yM)) allo& monthly or 4uarterly date formats. Use Utm or Ut4) res%ectively) &ith the for mat command. These date functions re4uire a &ay to se%arate the %arts. <ates li'e ?2GGAG621@ are not allo&ed. ,f d1 is a strin( variable &ith such dates) you could create dates &ith se%arators in a ne& variable d2 suitable for daily(N)) li'e this2 (en str1G d2 P substr( d1) 1) 6) ZM*M Z substr( d1 ) A) 2) ZM*M Z substr( d1) D) 2) This uses the substr(N) function) &hich returns a substrin( K the %art of a strin(

2D

"

be(innin( at the first number/s character for a len(th (iven by the second number. 11c. Time 7ariable from $ulti%le (e.(.) Jear and $onth) 7ariables +hat if you have a year variable and a month variable and need to create a sin(le time variable\ ;r &hat if you have some other set of time*%eriod numbers and need to create a sin(le time variable\ Stata has functions to build the time variable from its com%onents2 (en t P ym(year) month) Create a sin(le time variable t from se%arate year (the full 6*di(it year) and month (1 throu(h 12) variables. format t Utm This tells Stata to dis%lay the variable/s values in a human*readable format li'e ?2G12mA@ (meanin( $ay 2G12). ;ther functions are available for other %eriods2 If your data are Instead of ym() use Instead of %tm use y(year) Uty Yearly Half-yearly yh(year) halfyear) Uth Quarterly y4(year) 4uarter) Ut4 Monthly ym(year) month) Utm Weekly y&(year) &ee') Ut& mdy(month) day) year) Utd Daily In Milliseconds mdyhms(month) day) year) hour) minute) second)

Utc

V or data in milliseconds) data must be stored in double*%recision number for mat (see section K above)) usin( ?(en double t P mdyhms(month) day) year) hour) minute) second)@. or any of the other %eriodicities above) you can use lon( or double data ty%es to store a broader ran(e of numbers than is %ossible usin( the default float data ty%e. or data in milliseconds) a version accountin( for lea% seconds uses ?Cmdyhms(month) day) year) hour) minute) second)@ and ?UtC@. ,f your data do not match one of these standard %eriodicities) you can create your o&n time variable as in section 11a) but &ithout usin( the ?format@ command to s%ecify a human*readable format (the time numbers &ill Iust dis%lay as the numbers they are). 12. Tellin( Stata Jou .ave Time Series or 1anel <ata Jou must declare your data as time series or %anel data in order to use time*related commands2 tsset timevar Tell Stata you have time series data) &ith the time listed in variable timevar . tsset idvar timevar Tell Stata you have %anel data) &ith the idvar bein( a uni4ue ,< for each individual in the sam%le) and timevar bein( the measure of time. 15. La(s) or&ard Leads) and <ifferences "fter usin( the tsset command (see above)) it is easy to refer to %ast and future data. The value of var one unit of time a(o is L. var ) the value t&o units of time a(o is L2. var ) etc. (the Ls stand for ?la(@). uture values) althou(h you are unli'ely to need them) ar e . var ) 2"var ) etc. 0elo& are some e-am%les usin( them. <ata must be sorted first) in order by time for time*series data) or in order by individual and &ithin that by time for %anel data. sort timevar Sort time*series data. sort idvar timevar Sort %anel data. 2E

"

(en chan(e,nL P - * L.- The variable chan(e,nL created here e4uals - minus its value one year a(o. (en chan(e,nL P <.- The same chan(e,nL can be created via Stata/s difference o%erator) <. var . (en income2Jears"(o P L2.income Jou can use these L. and . notations in the list of variables for re(ression too2 re(ress (d% L.(d% L2.(d% L.unem%loyment L2.unem%loyment) vce(robust) 16. >eneratin( $eans and ;ther Statistics by ,ndividual) Jear) or >rou% The e(en (e-tensions to (enerate) command can (enerate means) sums) counts) standard deviations) medians) and much more for each individual) year) or (rou%2 4bys state year2 e(en mean,ncome P mean(income) $ean of income) in each state and year. 4bys state year2 e(en totalChildren P total(children) Total number of children of %eo%le in the sam%le) se%arately in each state and year. 4bys state year2 e(en n1eo%le P count(%erson,<) =umber of nonmissin( values of %erson,<) se%arately in each state and year. 4bys state year2 e(en sd,ncome P sd(income) Standard deviation) in each state and year. 4bys year2 e(en median,ncome0yJear P median(income) $edian of income) in each year. 4bys year2 e(en %1G,ncome0yJear P %ctile(income)) %(12) 12 th %ercentile of income) by year. e(en use,t P ta((state year) " variable e4ual to 1 in a sin(le observation for each state*year combination) and G in all other observations or many more uses of Stata/s e(en command) (et hel% on ?e(en@. ;ne caution2 +hen usin( e(en) do not use ?Tn@ or ?T=@) as these &ill cause e(en to return meanin(less results &ithout any &arnin( (Stata should really detect these and (ive an error messa(e insteadN). The above methods (enerate values for every observation &ithin each by*(rou% (i.e.) they create a variable &ith sensible values in every observation). ,f you Iust &ant to create a dataset of summary statistics) &ith one observation %er by*(rou%) try Stata/s colla%se command. H. 1anel <ata Statistical $ethods H1. i-ed !ffects K Usin( <ummy 7ariables Jou can create dummy variables and include them as r e(ressor s. +ith n individuals) you should add (n*1) dummy variables. There is an easy &ay to do this) startin( &ith a variable that has a uni4ue number for each individual. ,n your list of variables) Iust %ut ?i.@ in front of that variable/s name) and the dummies &ill be made automatically durin( the re(ression (see section .1 earlier in this document). or e-am%le re(ress y -1 -2 i.%ersonid) vce(robust) #e(ress the de%endent variable y on the inde%endent variables -1 and -2 and on dummy that distin(uish each se%arate %erson) as indicated by the %erson*identifier codes in the variable ?%ersonid@. This method can li'e&ise be used to (enerate sets of dummy variables for any variable &ith identifier codes. or e-am%le2 re(ress y -1 -2 i.se- i.a(e i.city i.year) vce(robust) #e(ress the de%endent variable y on the inde%endent variables -1 and -2 and on dummy variables for se-) a(e) city) and year. To create fi-ed effects and time effects) then2

2F

"

re(ress yvar xvars i. entity i. time) vce(robust) #e(ress the de%endent variable yvar on the inde%endent variables listed in xvars and on dummy variables for the entity and for the time. There must be uni4ue codes for each entity in the variable entity) and for each time in the variable time. 0y the &ay) you can instead create dummy variables in the ordinary &ay and then list them as variables for your re(ression. ,f you need to ma'e dummy variables for a lot of different values) a little te-t in the do*file editor &ill do the Iob 4uic'ly. .ere is an e-am%le to emulate2 forvalues t P 1FGG32G1G a (enerate year_t` P yearPP_t` b This is a loo% li'e in %ro(rammin() &here t (oes from 1FGG to 2G1G. !ach time) the line bet&een the curly brac'ets (ets run. +herever the _t` a%%ears) Stata %lu(s in the value of t before runnin( the line. (,n %ro(rammin( lin(o) t is a ?local variable@) called in Stata a ?local macro@ to avoid confusion &ith data variables.) To have values of t %lu((ed in) the t needs to be encased bet&een a left 4uote _ and a ri(ht 4uote `. H2. i-ed !ffects K <e*$eanin( Stata/s ?are(@ command %rovides a sim%le &ay to include fi-ed effects in ;LS re(ressions. $ore e-tensive commands are mentioned belo&) but the follo&in( &ill do for student course&or' in !C;=*6ADG3BABG !conometrics. Stata/s are( command only lets you de*mean &ith res%ect to one identifier) e.(.) %erson or year but not both K if you &ant fi-ed effects and time effects) you need to enter one of them usin( dummy variables (e.(.) by includin( ?i.year@ in your xvarlist ). are( yvar xvarlist ) absorb( byvar ) vce(robust) #e(ress the de%endent variable yvar on the inde%endent variables xvarlist and on the dummy variables needed to distin(uish each se%arate by*(rou% indicated by the byvar variable in the absorb() o%tion. or e-am%le the byvar mi(ht be the state) to include fi-ed effects for states. Coefficient estimates &ill not be re%orted for these fi-ed effect dummy variables. H5. ;ther 1anel <ata !stimators Students in !C;=*BADG "dvanced !conometrics &ill need to use other %anel data estimators. Jou &ill need to have declared your %anel data first) as in section 12. Then2 -tre( yvar xvarlist ) fe i-ed effects re(ression. The ?fe@ re4uests fi-ed effects estimates. This uses conventional (non*robust) standard errors. -tre( yvar xvarlist ) fe vce(cluster clustervar ) i-ed effects re(ression a(ain) but no& &ith cluster* robust standard errors clustered by the s%ecified variable. Ty%ically the clustervar is the same as the !anelvar used &hen tsset*in( your data (see section 12)) in order to allo& for arbitrary serial correlation of the error terms &ithin each observation. Actually ne&er versions of Stata automatically com%ute cluster*robust standard errors (clustered by !anelvar )) for many %anel data commands) if you merely s%ecify vce(robust) K the re(ression out%ut &ill indicate this clusterin( K and in the conte-t of this s%ecific command Iust s%ecifyin( ?robust@ standard errors (ives you standard errors that are the same as cluster*robust standard errors. estimates store fi-ed Store estimates after runnin( fi-ed effects model.

5G

"

-tre( yvar xvarlist ) re vce(robust) #andom effects re(ression. The ?re@ re4uests random effects estimates. .ere the ?robust@ o%tion for variance*covariance estimation re4uests (!ic'er*.uber*+hite) robust standard errors) but in the conte-t of this s%ecific command the resultin( standard errors are in fact cluster*robust. estimates store random Store estimates after runnin( fi-ed effects model. hausman fi-ed random .ausman test for &hether random effects model is a%%ro%riate instead of fi-ed effects model. ,f the test is reIected) this su((ests that the coefficient estimates are inconsistent &hen fi-ed effects are not used. -tre( yvar xvarlist ) mle vce(robust) #andom effects a(ain) but no& usin( the ma-imum* li'elihood random*effects model. " bet&een*effects model (?be@) is also available to estimate differences bet&een the avera(es* over*time for each individual) and a %o%ulation*avera(ed (?%a@) model is also available. See also the ?ne&ey@ command in section UB) to account for serial correlation in error terms. Stata has many other estimation commands for %anel data) includin( dynamic %anel data models such as "rellano*0ond estimation. ". Colin Cameron and 1ravin K. Trivedi/s boo' Microeconometrics Using Stata and Christo%her 0aum/s boo' An Introduction to Modern Econometrics Using Stata sho& some of these commands. There are also %anel e4uivalents of many other models) for e-am%le fi-ed and random effects versions of the lo(it model. H6. Time*Series 1lots for $ulti%le ,ndividuals +hen ma'in( %lots in Stata) the by( varlist ) o%tion lets you ma'e a se%arate %lot for each individual in the sam%le. or e-am%le) you could do2 sort com%anyid year scatter em%loyment year) by(com%anyid) connect(l) This &ould ma'e %lots of each com%any/s em%loyment in each year) &ith a se%arate %lot for each com%any) arran(ed in a (rid. .o&ever) you mi(ht %refer to overlay these %lots in a sin(le (ra%h. Jou could do this as follo&s2 tsset -tline em%loyment ) overlay The ?-tline@ command &ith the ?overlay@ o%tion %uts all com%anies/ %lots in a sin(le (ra%h) instead of havin( a se%arate %lot for each com%any. See also section !6) &hich tal's briefly about (ra%hin(. #. 1robit and Lo(it $odels %robit yvar xvarlist ) vce(robust) 1robit re(ression. lo(it yvar xvarlist ) vce(robust) Lo(it re(ression. or %robit and lo(it models) you have to be careful ho& you inter%ret the estimated coefficients2 #1. ,nter%retin( Coefficients in 1robit and Lo(it $odels +hen you use %robit or lo(it models) or any other nonlinear models) you have to be careful about inter%retin( the estimated coefficients. .ere let/s consider ho& to carry out correct inter%retation for %robit and lo(it models. irst) do not Iust loo' at a coefficient estimate and say) ?+hen L (ets lar(er by 1) the %robability that J e4uals 1 (ets lar(er by (some amount).@ To ma'e statements li'e this you have to com%ute the fitted %robabilities) for s%ecific values of the re(ressors. Second) it can even be &ron( to say that the %robability increases or decreases &ith L accor din( to the si(n 51

"

of L/s coefficient estimate bein( %ositive or ne(ative K this 'ind of statement may be &ron( if the variable L has interaction terms in the model. Therefore it is im%ortant to have &ays to com%ute the %redicted %robabilities for different %ossible ty%es of individuals in the sam%le) and to com%are ho& those %redicted %robabilities chan(e &hen the value of a re(ressor chan(es. " first command that hel%s &ith this (ives you the %redicted %robabilities for each se%arate individual in the sam%le) (iven that individual/s re(ressors2 %redict %rob;f;utcome) %r Com%ute the %redicted %robability that the de%endent variable is 1) se%arately for each observation. .o&ever) you mi(ht &ant to com%ute the %robability that J e4uals 1 for a hy%othetical individual) for &hich the values of the re(ressors are not in the data. .o& can you do this\ ;ne &ay is to &rite out the calculation of the %robability usin( the dis%lay command. or e-am%le after the follo&in( %robit and lo(it commands) the dis%lay command lets you enter a formula for the &hole fitted re(ression e4uation (includin( the cumulative normal or lo(istic functions) to calculate the fitted %robability) in this case &hen the variable %iTrat e4uals G.5 and the variable blac' e4uals G. =ote the use of Tb8 varname : to mean the estimated coefficient of the variable varname in the most recent estimation command. .ere is an e-am%le for the %robit model2 %robit deny$ort(a(e %iTrat blac') vce(robust) !stimate a %robit model. scalar z P Tb8Tcons:ZTb8%iTrat:VG.5ZTb8blac':VG Calculate - [ &hen the -*variables e4ual G.5 i and G res%ectively. dis%lay norm%rob(scalar(z)) Calculate the fitted %robit %robability that the y*variable e4uals 1 &hen the -*variables e4ual G.5 and G res%ectively. .ere is an e-am%le for the lo(it model2 lo(it deny$ort(a(e %iTrat blac') vce(robust) !stimate a lo(it model. scalar z P Tb8Tcons:ZTb8%iTrat:VG.5ZTb8blac':VG Calculate -i [ &hen the -*variables e4ual G.5 and G res%ectively. dis%lay 13(1Ze-%(*scalar(z))) Calculate the fitted lo(it %robability that the y*variable e4uals 1 &hen the -*variables e4ual G.5 and G res%ectively. ,n %articular) one usually &ants to estimate ho& much difference it ma'es if an -*variable is hi(her or lo&er by some amount. The difference made is a function of the values of any other re(ressors. or e-am%le) if a %atient receives an anticancer dru( versus does not receive it) the increase %redicted in %robability of %atient survival de%ends on any other re(ressors li'e health and a(e of the %atient. Therefore you have to com%ute t&o %robabilities that yP1) in &hich you %lu( in (a) the value(s) of the variable(s) of interest in alternative cases (such as receive<ru( P 1 versus receive<ru( P G)) and (b) values of all other variables. or (b)) there are t&o common a%%roaches. The first a%%roach is to use the mean values of all other re(ressors in the sam%le) the idea bein( that then your com%uted %robabilities %ertain to a hy%othetical avera(e individual &ho is) it is im%licitly assumed) ty%ical. This a%%roach is not desirable because individuals in the sam%le could all be far from avera(e and e-%erience very different effects from &hat you com%ute. The second and better a%%roach is to use the actual values of all other re(ressors and com%ute a se%ar ate estimated increase in the %robability for each se%arate individual in the sam%le. Then you can (ive the ran(es of the %redicted increases in %robability from lo&est to hi(hest amon( all individuals in the sam%le) you can (ra%h ho& the increases differ de%endin( on values of other variables) and you can re%ort the avera(e increase in %robability across all individuals in the sam%le. Stata ma'es it easy to (1) use the means of other variables) or (2) com%ute the avera(e increase across all individuals in the sam%le2 52

"

%robit y -1 -2 -5 ...) vce(robust) !stimate a %robit model) or you could use a lo(it instead. mar(ins ) at(-1P1) Com%ute the avera(e %robability (a%%roach 2) that yP1 for all individuals in the sam%le) for the hy%othetical %ossibility in &hich they all had -1P1. mar(ins ) at(-1P(G 1)) Com%ute the avera(e %robability (a%%roach 2) that yP1 for all individuals in the sam%le) for t&o different hy%othetical cases2 if they all had -1PG) and se%arately if they all had -1P1. mar(ins ) at(-1P(G 1)) %ost Same as above) but allo& the results to be used &ith %ost*estimation commands li'e ?test@ and ?lincom@. mar(ins) coefle(end "fter usin( the ?%ost@ o%tion) chec' the variable names to use &ith commands li'e ?test@ and ?lincom@. The variable names can be hard to (uess) e.(.) ?2.TatS1.-1@. test Tb82.TatS1.-1: P Tb81.TatS1.-1: "n e-am%le hy%othesis test for the avera(e (a%%roach 2) increases in %robabilities) in this case com%arin( the increases in %robabilities if -1P1 versus if -1PG. .ere the null hy%othesis is that there is no difference. mar(ins if ...) at(-1P(G 1)) +hen you com%ute the avera(e %robabilities (a%%roach 2)) avera(e across only a (iven ty%e of individual in the sam%le) as s%ecified by an if*statement such as ?if -5PP2G@ or ?if female X old@. mar(ins ) at(-1P(G 1)) atmeans Com%ute %robabilities usin( a%%roach 1 instead of a%%roach 2R that is) use the mean values of the other re(ressors instead of the actual values. ,f you have interaction effects in your model) you &ill need to s%ecify your re(ressors usin( a s%ecial notation so that Stata 'no&s ho& to com%ute mar (inal effects. See section .1 of this document to see ho& to use notations li'e ?i.@ to create dummy variables) ?c.@ to s%ecify continuous variables) and ?S@ and ?SS@ to s%ecify interaction effects. Then you can chec' the mar(inal effects of variables that are interacted or are involved in %olynomials. or e-am%le2 %robit y i.race i.female i.raceSi.female c.a(e c.a(eSc.a(e i.femaleSc.a(e) vce(robust) !stimate a %robit model &ith dummies for race and female) dummies for race* female interactions) a(e) a(e s4uared) and female times a(e. mar(ins ) race Com%ute the avera(e %robability (a%%roach 2) that yP1 for all individuals in the sam%le) for the hy%othetical %ossibility in &hich each %erson &ere of the same race K doin( so for every race that occurs in the data. mar(ins ) female <o the same for the hy%othetical %ossibility that each %erson &ere of the same se-. mar(ins ) race female <o both of the above. mar(ins ) raceSfemale <o the same for the hy%othetical %ossibility that each %erson &ere of the same race and the same se-. mar(ins ) at(a(eP(2G 5G 6G AG BG DG)) Com%ute the avera(e %robability (a%%roach 2) that yP1 for all individuals in the sam%le) for the hy%othetical %ossibility in &hich each %erson &ere of the same a(e K doin( so for each a(e 2G) 5G) 6G) AG) BG) and DG.

55

"

S. ;ther $odels for Limited <e%endent 7ariables ,n Stata/s hel%) you can easily find commands for models such as2 Tobit and other censored re(ression models) truncated re(ression models) count data models (such as 1oisson and ne(ative binomial)) ordered res%onse models (such as ordered %robit and ordered lo(it)) multinomial res%onse models (such as multinomial %robit and multinomial lo(it)) survival analysis models) and many other statistical models. Listed belo& are commands for a fe& of the most commonly used models. +ith these models) familiarize yourself &ith Stata/s mar(ins command) and use mar(ins after estimation to inter%ret the estimation results (section #1 sho&s ho& after %robit and lo(it commands) . This is im%ortant because other&ise it is all too common that analysts ma'e incorrect inter%retations. S1. Censored and Truncated #e(ressions &ith =ormally <istributed !rrors ,f the error terms are normally distributed) then the censored re(ression model (Tobit model) and truncated re(ression model can be estimated as follo&s. tobit yvar xvarlist ) vce(robust) ll(S) !stimate a censored re(ression (Tobit) model in &hich there is a lo&er limit to the values of the variables and it is s%ecified by S. Jou can instead) or in addition) s%ecify an u%%er limit usin( ul(S). ,f the censorin( limits are different for different observations then use the ?cnre(@ command instead) or more (enerally if you also have data that are 'no&n only to fall in certain ran(es then use the ?intre(@ command instead. truncre( yvar xvarlist ) vce(robust) ll(S) !stimate a truncated re(ression model in &hich there is a lo&er limit to the values of the variables and it is s%ecified by S. Jou can instead) or in addition) s%ecify an u%%er limit usin( ul(S). 0e careful that you really do thin' the error terms are close to normally distributed) as the results can be sensitive to the assumed distribution of the errors. There are also common models for truncated or censored data fittin( %articular distributions) such as zero*truncated count data for &hich no data are observed &hen the count is zero or ri(ht*censored survival timesR you can find many such models in Stata. S2. Count <ata $odels The 1oisson and ne(ative binomial models are t&o of the most common count data models. %oisson yvar xvarlist ) vce(robust) !stimate a model in &hich a count de%endent variable yvar results from a 1oisson arrival %rocess) in &hich durin( a %eriod of time the 1oisson rate of ?arrivals@ (that each add 1 to the count in the y*variable) is %ro%ortional to e-%( !i ` ) &here !i includes the inde%endent variables in xvarlist . nbre( yvar xvarlist ) vce(robust) !stimate a ne(ative binomial count data model. ( This allo&s the variance of y to e-ceed the mean) &hereas the 1oisson model assumes the t&o are e4ual.) "s al&ays) see the Stata documentation and on*line hel% for lots more count data models and o%tions to commands) and loo' for a boo' on the subIect if you need to &or' &ith count data seriously. S5. Survival $odels (a.'.a. .azard $odels) <uration $odels) ailure Time $odels) To fit survival models) or ma'e %lots or tables of survival or of the hazard of failure) you must first tell Stata about your data. There are a lot of o%tions and variants to this) so loo' for a boo' on the subIect if you really need to do this. " sim%le case is2 56

"

stset survival ime ) failure( dummyE-ual o&neIf.ailedElse/ero ) Tell Stata that you have survival data) &ith each individual havin( one observation. The variable survival ime tells the ela%sed time at &hich each individual either failed or ceased to be studied. ,t is the norm in survival data that some individuals are still survivin( at the end of the study) and hence that the survival times are censored from above) i.e.) ?ri(ht* censored.@ The variable dummyE-ual o&neIf.ailedElse/ero %rovides the relevant information on &hether each o%tion failed durin( the study (1) or &as ri(ht*censored (G). sts (ra%h ) survival yscale(lo() 1lot a (ra%h sho&in( the fraction of individuals survivin( as a function of ela%sed time. The o%tional use of ?yscale(lo()@ causes the vertical a-is to be lo(arithmic) in &hich cases a line of constant (ne(ative) slo%e on the (ra%h corres%onds to a hazard rate that remains constant over time. "nother o%tion is by( grou!var )) in &hich case se%arate survival curves are dra&n for the different (rou%s each of &hich has a different value of grou!var . " hazard curve can be fitted by s%ecifyin( ?hazard@ instead of ?sur vival@. stre( xvarlist ) distribution(e-%onential) nohr vce(robust) "fter usin( stset) estimate an e-%onential hazard model in &hich the hazard (1oisson arrival rate of the first failure) is %ro%ortional to e-%( !i ` ) &here !i includes the inde%endent variables in xvarlist . ;ther common models ma'e the hazard de%endent on the ela%sed timeR such models can be s%ecified instead by settin( the distribution() o%tion to &eibull) (amma) (om%ertz) lo(normal) lo(lo(istic) or one of several other choices) and a strata( grou!var ) o%tion can be used to assume that the function of ela%sed time differs bet&een different (rou%s. stco- xvarlist ) nohr vce(robust) "fter usin( stset) estimate a Co- hazard model in &hich the hazard (1oisson arrival rate of the first failure) is %ro%ortional to f(ela%sed time) e-%( !i ` ) &here !i includes the inde%endent variables in xvarlist . The function of ela%sed time is im%licitly estimated in a &ay that best fits the data) and a strata( grou!var ) o%tion can be used to assume that the function of ela%sed time differs bet&een different (rou%s. "s al&ays) see the Stata documentation and on*line hel% for lots more about survival analysis. T. ,nstrumental 7ariables #e(ression =ote for !conometrics students usin( Stoc' and +atson/s te-tboo'2 the term ?instruments@ in Stata out%ut) and in the econometrics %rofession (enerally) means both e-cluded instruments and e-o(enous re(ressors. Thus) &hen Stata lists the instruments in 2SLS re(ression out%ut) it &ill include both the c/s and the +/s as listed in Stoc' and +atson/s te-tboo'. .ere is ho& to estimate t&o sta(e least s4uares (2SLS) re(ression models. #ead the notes carefully for the first command belo&2 ivre(ress 2sls yvar exog01arlist (endog01arlist P otherInstruments )) vce(robust) T&o*sta(e least s4uares re(ression of the de%endent variable yvar on the inde%endent variables exog01arlist and endog01arlist . The variables in endog01arlist are assumed to be endo(enous. The e-o(enous #.S

5A

"

variables are exog01arlist ) and the other e-o(enous instruments (not included in the #.S of the re(ression e4uation) are the variables listed in otherInstruments . or !conometrics students usin( Stoc' and +atson/s te-tboo') exog01arlist consists of the +/s in the re(ression e4uation) endog01arlist consists of the L/s in the re(ression e4uation) and otherInstruments consists of the c/s. or "dvanced !conometrics students usin( .ayashi/s te-tboo') exog01arlist consists of the e-o(enous variables in "i (i.e. variables in "i that are also in !i )) endog01arlist consists of the endo(enous variables in "i (i.e. variables in "i that are not in !i )) and otherInstruments consists of the e-cluded instruments (i.e. variables in !i but not in "i ). ivre(ress 2sls yvar exog01arlist (endog01arlist P otherInstruments )) vce(robust) first Same) but also re%ort the first*sta(e re(ression results. ivre(ress 2sls yvar exog01arlist (endog01arlist P otherInstruments )) vce(robust) first level(FF) Same) but use FFU confidence intervals. %redict yhatvar "fter an ivre() create a ne& variable) havin( the name you enter here) that contains for each observation its value of [ y . i %redict rvar ) residuals "fter an ivre() create a ne& variable) havin( the name you enter here) that contains for each observation its residual [ u . Jou can use this i for residual %lots as in ;LS re(ression. T1. >$$ ,nstrumental 7ariables #e(ression Students in !C;=*BADG "dvanced !conometrics learn about >$$ instrumental variables re(ression. or sin(le*e4uation (linear) >$$ instrumental variables re(ression) ty%e ?(mm@ instead of ?2sls@ in the above re(ression commands2 ivre(ress (mm yvar exog01arlist (endog01arlist P otherInstruments )) vce(robust) first >$$ instrumental variables re(ression) sho&in( first*sta(e results. or sin(le*e4uation L,$L instrumental variables re(ression (.ayashi/s section E.B)) ty%e ?liml@ instead of ?2sls@ in the above re(ression commands2 ivre(ress liml yvar exog01arlist (endog01arlist P otherInstruments )) vce(robust) first L,$L instrumental variables re(ression) sho&in( first*sta(e results. or more o%tions to these commands) use the third*%arty ?ivre(2@ command described in section E.D of 0aum/s An Introduction to Modern Econometrics using Stata (use ?ssc install ivre(2) re%lace@ or ?ado%ath Z N@ as in section C2a of this document). $ulti*e4uation >$$ instrumental variables re(ression is su%%orted in Stata usin( the ?(mm@ command K see section 71 belo& (and see the manual entry 8#: (mm) and read the #emar's section there) for an e-am%le of ho& to carry out multi*e4uation >$$ ,7 re(ression). "fter estimatin( a re(ression &ith instrumental variables) a C*test of overidentifyin( restrictions can be carried out as follo&s (for an e-am%le see section E.B of 0aum/s te-t). This re4uires installin( the third*%arty ?overid@ command (use ?ssc install ivre(2) re%lace@ or ?ado%ath Z N@ as in section C2a of this document)2 overid Carry out an overidentifyin( restrictions test after ivre(ress or ivre(2. "lso) a C*test is automatically carried out &hen usin( ivre(2. To test a subset of the overidentifyin( restrictions) via a C*test (.ayashi %. 22G)) use the ivre(2 command &ith the list of variables to be tested in the ortho(() o%tion. 5B

"

ivre(2 yvar exog01arlist (endog01arlist P otherInstruments )) vce(robust) (mm ortho(( vars ) "fter this >$$ instrumental variables re(ression) an ortho(onality C*test is carried out only for the variables vars (if vars involves multi%le variables then se%arate their names &ith s%aces). or a heteros'edasticity test after ivre(ress or ivre(2 (or also after re(ress)) use the third*%arty ivhettest command (use ?ssc install ivre(2) re%lace@ or ?ado%ath Z N@ as in section C2a of this document). The 1a(an*.all statistic re%orted is most robust to assum%tionsR see section E.F of 0aum/s te-t. ivhettest Carry out a heteros'edasticity test. >et hel% on ivhettest for o%tions if you &ant to restrict the set of variables used in the au-iliary re(ression ( i in .ayashi/ s section 2.D). T2. ;ther ,nstrumental 7ariables $odels Some other models have been develo%ed that accommodate instrumental variables methods. or %robit models (G*1 de%endent variables)) see Stata/s ?iv%robit@ command. or tobit models (&ith values above or belo& a threshold re%orted as the threshold value)) see Stata/s ?ivtobit@ command. 1anel data estimators such as the "rellano*0ond model are available. "lso) nonlinear >$$ models in (eneral can be estimated usin( Stata/s ?(mm@ command. or %anel data instrumental variables methods) see ". Colin Cameron and 1ravin K. Trivedi/s boo' Microeconometrics Using Stata (cha%ter F)) or Christo%her 0aum/s boo' An Introduction to Modern Econometrics Using Stata . U. Time Series $odels irst tsset your data as in section 1 above) and note ho& to use the la( (and lead) o%erators as described in section 1. U1. "utocorrelations corr(ram varname Create a table sho&in( autocorrelations (amon( other statistics) for la((ed values of the variable varname . corr(ram varname ) la(s( 2) no%lot Jou can s%ecify the number of la(s) and su%%ress the %lot. correlate - L.- L2.- L5.- L6.- LA.- LB.- LD.- LE.- "nother &ay to com%ute autocorrelations) for - &ith its first ei(ht la(s. correlate L(G3E).- This more com%act notation also uses the Gth throu(h Eth la(s of - and com%utes the correlation. correlate L(G3E).-) covariance This (ives autocovar iances instead of autocorr elations. U2. "utore(ressions ("#) and "utore(ressive <istributed La( ("<L) $odels re(ress y L.y) vce(robust) #e(ress y on its 1*%eriod la() &ith robust standard errors. re(ress y L.y L2.y) vce(robust) #e(ress y on its first 2 la(s) &ith robust standard errors. re(ress y L(136).y) vce(robust) #e(ress y on its first 6 la(s) &ith robust standard errors. re(ress y L.y L.-1 L.-2) vce(robust) #e(ress y on the 1*%eriod la(s of y) -1) and -2) &ith robust standard errors. re(ress y L(13A).y L(136).- L.&) vce(robust) #e(ress y on its first A la(s) %lus the first 6 la(s of and the first la( of &) &ith robust standard errors. test L2.- L5.- L6.- .y%othesis tests &or' as usual.

5D

"

re(ress y L.y if tin(1FB241)1FFF46)) vce(robust) The ?if tin(N)@ used here restricts the sam%le to times in the s%ecified ran(e of dates) in this case from 1FB2 first 4uarter throu(h 1FFF fourth 4uarter. U5. ,nformation Criteria for La( Len(th Selection To (et 0,C (0ayes*Sch&artz information criterion) and ",C ("'ai'e information criterion) values after doin( a re(ression) use the ?estat ic@ command2 estat ic <is%lay the information criteria ",C and 0,C after a re(ression. To include 0,C and ",C values in tables of re(ression results) you could use the ?eststo@ and ?esttab@ commands described in section C2 (if you have trouble &ith the eststo command belo& read section C above)2 eststo m12 re(ress y L.y) vce(robust) eststo m22 re(ress y L(132).y) vce(robust) esttab m1 m2) scalars(bic aic) "fter storin( re(ression results) you can ma'e a table of re(ression results re%ortin( the 0,C and ",C.
esttab m1 m2) b(a5) se(a5) star(Z G.1G V G.GA VV G.G1 VVV G.GG1) r2(5) ar2(5) scalars( bic aic) no(a%s

.ere the 0,C and ",C are dis%layed as %art of a near*%ublication 4uality table as described in section C2c. To s%eed u% the %rocess of com%arin( alternative numbers of la(s) you could use a ?forvalues@ loo% in your do*file editor. or e-am%le2 forvalues la(s P 13B a eststo m_la(s`2 re(ress y L(13_la(s`).y) vce(robust) b esttab m1 m2 m5 m6 mA mB) stats(bic aic) U6. "u(mented <ic'ey uller Tests for Unit #oots dfuller y Carry out a <ic'ey* uller test for nonstationarity) chec'in( the null hy%othesis (in a one*sided test) that y has a unit root. dfuller y) re(ress Sho& the associated re(ression &hen doin( the <ic'ey* uller test. dfuller y) la((2) re(ress Carry out an au(mented <ic'ey* uller test for nonstationarity usin( t&o la(s of y) chec'in( the null hy%othesis that y has a unit root) and sho& the associated re(ression. dfuller y) la((2) trend re(ress "s above) but no& include a time tr end ter m in the associated re(ression. or %anel data unit root tests) see Stata/s ?-tunitroot@ command. UA. orecastin( re(ress y L.y L.- "fter a re(ressionN tsa%%end) add(1) "dd an observation for one more time after the end of the sam%le. (Use add(2) to add 2 observations.) Use bro&se after this to chec' &hat ha%%ened. N %redict yhat) -b Then com%ute the %redicted or forecasted value for each observation. %redict rmsfe) stdf "nd com%ute the standard error of the out*of*sam%le forecast. ,f you &ant to com%ute multi%le %seudo*out*of*sam%le forecasts) you could do somethin( li'e this2 (en actual P y (en forecast P . 5E

"

(en rmsfe P . forvalues % P 5G3AG a re(ress y L.y if tY_%` %redict yhatTem%) -b %redict rmsfeTem%) stdf re%lace forecast P yhatTem% if tPP_%`Z1 re%lace rmsfe P rmsfeTem% if tPP_%`Z1 dro% yhatTem% rmsfeTem% b (en fcast!rr P actual * forecast tsline actual forecast 1lot a (ra%h of actual y versus forecasts made usin( %rior data. summarize fcast!rr Chec' the mean and standard deviation of the forecast errors. (en fcastLo& P forecast * 1.FBVstdf Lo& end of FAU forecast interval assuming there are normally distributed and homos'edastic errors (other&ise the 1.FB &ould not be valid). (en fcast.i(h P forecast Z 1.FBVstdf .i(h end of FAU forecast interval assuming there are normally distributed and homos'edastic errors (other&ise the 1.FB &ould not be valid). tsline actual fcastLo& forecast fcast.i(h if forecastY. "dd forecast intervals to the (ra%h of actual versus forecast values of the y*variable. UB. =e&ey*+est .eteros'edastic*and*"utocorrelation*Consistent Standard !rrors ne&ey y -1 -2) la(( 2) #e(ress y on -1 and -2) usin( heteros'edastic*and*autocorrelation* consistent (=e&ey*+est) standard errors assumin( that the err or term times each ri(ht*hand*side variable is autocorrelated for u% to 2 %eriods of time. (,f 2 is G) this is the same as re(ression &ith robust standard errors.) " rule of thumb is to choose S P G.DA V TW(135)) rounded to an inte(er) &here T is the number of observations used in the re(ression (see the te-t by Stoc' and +atson) %a(e BGD). ,f there is stron( serial correlation) S mi(ht be made more than this rule of thumb su((ests) &hile if there is little serial correlation) S mi(ht be made less than this rule of thumb su((ests. UD. <ynamic $ulti%liers and Cumulative <ynamic $ulti%liers ,f you estimate the effect of multi%le la(s of L on J) then the estimated effects on J are effects that occur after different amounts of time. or e-am%le2 ne&ey i%>ro&th L(131E).oilshoc') la((D) .ere) the (ro&th rate of industrial %roduction (i%>ro&th) is related to the %ercenta(e oil %rice increase or G if there &as no oil %rice increase (oilshoc') in 1E %revious months. This %rovides estimates of the effects of oil %rice shoc's after 1 month) after 2 months) etc. The cumulative effect after B months then could be found by2 lincom L1.oilshoc' Z L2.oilshoc' Z L5.oilshoc' Z L6.oilshoc' Z LA.oilshoc' Z LB.oilshoc' Confidence intervals and !*values are re%orted alon( &ith these results. Jou could dra& by hand a (ra%h of the estimated effects versus the time la() alon( &ith FAU confidence intervals. Jou could also dra& by hand a (ra%h of the estimated cumulative effects versus the time la() alon( &ith FAU confidence intervals. $a'in( the same (ra%hs in an

5F

"

automated fashion in Stata is a little more %ainsta'in() but see my Stata do*file for Stoc' and +atson/s e-ercise !1A.1 for an e-am%le. 7. System !stimation Commands "dvanced !conometrics students &or' &ith estimators for systems of e4uations. .ere is a brief introduction to some %ertinent system estimation commands. =ote that the 5SLS) SU#) and multivariate re(ression commands all assume conditionally homos'edastic errors) and have no ?vce(robust)@ o%tion. To allo& for heteros'edasticity in system estimation) you need to use Stata/s ?(mm@ command) &hich is fle-ible and allo&s you to choose your methods. To refer to a coefficient in an e4uation after estimation) for the lincom) test) and testnl commands) see the e-am%le test command in section 72 belo&. #emember) the advanta(e of this sort of system estimation is efficiency) but the disadvanta(e is ris' of inconsistency. ,nconsistency in (%otentially) all e4uations occurs if assum%tions are violated for any one of the e4uations (also if cross*e4uation ortho(onality is &ron( in SU# models). Contrary to a common misconce%tion) sin(le*e4uation estimates are consistent) as lon( as the re4uisite assum%tions are satisfied. The efficiency advanta(e may be &orth the ris' in many cases) but do be&are of the ris'. 71. >$$ System !stimators or the (eneralized method of moments system estimator) Stata/s (mm command allo&s very fle-ible s%ecification of models) instrumentation) and estimation methods. ,n fact) the command allo&s estimation of nonlinear as &ell as linear models (see section +). or details) see the Stata manuals for 8#: (mm. 72. Three*Sta(e Least S4uares #ead the Stata manual/s entry for the re(5 command to (et a (ood sense of ho& it &or's. .ere are some e-am%les dra&n from the Stata manual2 re(5 (consum% &a(e%riv &a(e(ovt) (&a(e%riv consum% (ovt ca%ital1) !stimate a t&o* e4uation 5SLS model in &hich the t&o de%endent variables) consum% and &a(e%riv) are assumed to be endo(enous. (<e%endent variables are assumed to be endo(enous unless you list them in the e-o(() o%tion.) The instruments consist of all other variables2 &a(e(ovt) (ovt) ca%ital1) and the constant term. =ote that the consum%tion e4uation estimates &ill be the same as in 2SLS) since that e4uation is Iust identified. test 8consum%:&a(e(ovt 8&a(e%riv:ca%ital1 The test) lincom) and testnl commands &or' fine after multi*e4uation estimations) but you have to s%ecify each coefficient you are tal'in( about by namin( an e4uation as &ell as a variable in an e4uation. Thus) ?8consum%:&a(e(ovt@ refers to the coefficient of the variable &a(e(ovt in the e4uation named consum%. Stata by default names e4uations after their de%endent variables. or nonlinear hy%othesis tests) refer to coefficients for e-am%le usin( ?Tb8consum%2&a(e(ovt:@. re(5 (4<emand2 4uantity %rice %com%ete income) (4Su%%ly2 4uantity %rice %ra&) ) endo(( %rice) !stimate a t&o*e4uation model) namin( the e4uations 4<emand and 4Su%%ly since they have the same de%endent variable) and treat %rice as endo(enous. Treat the other three re(ressors and the constant as e-o(enous. 6G

"

75. Seemin(ly Unrelated #e(ression #ead the Stata manual/s entry for the sure( command to (et a (ood sense of ho& it &or's. .ere is an e-am%le dra&n from the Stata manual2 sure( (%rice forei(n m%( dis%l) (&ei(ht forei(n len(th)) corr !stimate a t&o*e4uation SU# model. The ?corr@ o%tion causes the cross*e4uation correlation matri- of the residuals to be dis%layed) alon( &ith a test of the null hy%othesis that the error terms have zero covariance bet&een e4uations. 76. $ultivariate #e(ression mvre( headroom trun' turn P %rice m%( dis%l (earTratio len(th &ei(ht) corr !stimate three re(ression e4uations) the first &ith headroom as the de%endent variable) the second &ith trun' (s%ace) as the de%endent variable) the third &ith turn(in( circle) as the de%endent variable. ,n each case) the si- variables listed on the ri(ht*hand side of the e4uals si(n are used as re(ressors. The ?corr@ o%tion causes the cross*e4uation correlation matri- of the residuals to be dis%layed) alon( &ith a test of the null hy%othesis that the error terms have zero covariance bet&een e4uations. The same estimates could be obtained by runnin( three se%arate re(ressions) but this also analyzes correlations of the error terms and ma'es it %ossible to carry out cross*e4uation tests after&ard. +. le-ible =onlinear !stimation $ethods "dvanced !conometrics students &or' &ith various other estimation methods discussed here. +1. =onlinear Least S4uares #ead the Stata manual entry 8#: nl to (et a (ood sense of ho& the nl command &or's. There are several &ays in &hich to use nonlinear re(ression commands. .ere is an e-am%le sho&in( ho& to estimate a nonlinear least s4uares model for the e4uation y e2
5i

nl ( y P ab1bZab2P1bVe-%(ab5bV-) ) !stimate this sim%le nonlinear re(ression. Loo' at the above line to understand its %arts. The ?nl@ is the name of the nonlinear re(ression command. "fter that is an e4uation in %arentheses. The left side of the e4uation is the de%endent variable. The ri(ht side is the conditional e-%ectation function) in this case . The terms e 12 in curly brac'ets are the %arameters to be estimated) &hich &e have called b1) b2) and b5. Stata &ill try to minimize the sum of s4uared errors by searchin( throu(h the s%ace of all %ossible values for the %arameters. .o&ever) if &e started by estimatin( as zero) &e mi(ht not be able to 2 &ould have no effect on the sum of s4uared errors. search &ell K at that %oint) the estimate of
5i

i12i

,nstead) &e start by estimatin( as one) usin( the ?ab2P1b@. The ?P1@ %art tells Stata to start at 2 1 for this %arameter. ;ften you may have a linear combination of variables) as in the formula -1 -2-5-6 y e. Stata has a shorthand notation) usin( ?-b2 varlist@) to enter the i12i linear combination2 nl ( y P aa1bZaa2P1bVe-%(a-b2 -1 -2 -5 -6b) ) !stimate this nonlinear re(ression.
1i2i5i6i

61

"

"fter a nonlinear re(ression) you mi(ht &ant to use the ?nlcom@ command to estimate a nonlinear combination of the %arameters. +2. >eneralized $ethod of $oments !stimation for Custom $odels Stata has %o&erful (eneral*%ur%ose >$$ commands (includin( for nonlinear models). See the Stata manual entry 8#: (mm. +5. $a-imum Li'elihood !stimation for Custom $odels Stata has %o&erful (eneral*%ur%ose ma-imum li'elihood estimation commands. See the Stata manual entry 8#: ml. L. <ata $ani%ulation Tric's ,n real statistical &or') often the vast maIority of time is s%ent %re%arin( the data for analysis. $any of the commands (iven above are very useful for data %re%aration K see %articularly sections ) $) ;) and 1 above. This section describes several more Stata commands that are e-tremely useful for (ettin( your data ready to use. $a'e sure you or(anize all your &or' in a do*file (or multi%le do*files). ,f you are usin( Stata 11 or earlier) your do*file should start &ith clear and set memory if needed. ,n any case) the do*file should ne-t read in the data) then do anythin( else li'e (enerate variables) mer(e datasets) resha%e the data) use tsset) et cetera. inally) if desired the do*file should save the %re%ared data in a se%arate file. ,f runnin( this do*file does not ta'e lon() Iust run it each time you &ant to do statistical analyses K so it reads in the data and does all your analyses in one clic'. ,f the do*file ta'es a lon( time to %re%are the data) save a %re%ared data file at the end so you can Iust read in the data file &hen needed. L1. Combinin( <atasets2 "ddin( #o&s Su%%ose you have t&o datasets) ty%ically &ith (at least some of) the same variables) and you &ant to combine them into a sin(le dataset. To do so) use the a%%end command2 a%%end usin( filename "%%ends another dataset to the end of the data no& in memory. Jou must have the other dataset saved as a Stata file. 7ariables &ith the same name &ill be %laced in the same columnR for e-am%le) if you have variables named ?cusi%@ and ?year@ and the other dataset has variables &ith the same names) then all the ?cusi%@ values in the a%%ended data &ill be in the ne& ro&s of the cusi% variable) &hile all of the ?year@ values in the a%%ended data &ill be in the ne& ro&s of the year variable. L2. Combinin( <atasets2 "ddin( Columns Su%%ose you have t&o datasets. The ?master@ dataset is the one no& in use (?in memory@)) and you &ant to add variables from a ?usin(@ dataset in another file. The (oal is to use an identification code (one or more variables) to determine &hich ro&s match u% across the t&o files) and brin( in the e-tra columns of data. Stata/s ?mer(e@ command does this. To ensure a (ood match and add the ri(ht information) you have to (et several issues ri(ht2 Identification code . The identification code variable(s) s%ecify &hat should (and should not) match. or e-am%le) you could have a variable named %erson,< &ith a uni4ue number for each %erson) and matchin( values of %erson,< tell &hich ro&(s) in the usin( dataset corres%ond to each ro& in the master dataset. or another e-am%le) you could have t&o variables named country and year) &hich must both match for ro&s in the usin( 62

"

dataset to corres%ond to a ro& in the master dataset. "ny number of variables may be used in combination to create the identification code) and they must all match for ro&(s) in the usin( dataset to corres%ond to a ro& in the master dataset. 3"4"2 $issin( values are values of the variables too) so be&are2 if they occur for multi%le cases) they mess u% your ability to (et a %ro%er match. Uni-ueness 5or not6 of identification code among master and using observations . ,n the master dataset) and in the usin( dataset) there may be a different identification code in each ro&) or there may be multi%le ro&s &ith the same identification code. or e-am%le) a dataset mi(ht contain observations for one million %eo%le) each in a se%arate ro&) each &ith a uni4ue value of %erson,<. ,n the same dataset) another variable named country,< may s%ecify each %erson/s country) but many %eo%le &ould share the same country,<. Therefore each value of %erson,< &ould be uni4ue amon( the observations) but values of country,< &ould not be uni4ue. Jou could match usin( %erson,< as the identification code to brin( more %erson*s%ecific information into the dataset) or you could match usin( country,< as the identification code to brin( more information about each %erson/s country into the dataset. ,f the identification code is uni4ue in each dataset) then there is a one*to*one match) &ritten ?121@. ,f the identification code is not uni4ue in the master dataset) but is uni4ue in the usin( dataset) then there is a many*to*one match) &ritten ?m21@. These t&o cases %ertain to the %erson,< and country,< e-am%les above) if the usin( datasets have a uni4ue identification code in each ro&. ,f a usin( dataset does not have uni4ue identification code) then there &ould be a one*to*many or many*to*many match) ?12m@ or ?m2m@. " ?12m@ match could arise if you reverse the order of the master and usin( datasets in the country,< match above2 you start &ith a dataset of country information as your master dataset) and you match to the dataset of %erson information as your usin( dataset. 0ecause the usin( dataset contains multi!le matching rows for each ro& in the master dataset) there will be more rows after the merge than there were in the using dataset . The result &ill be the same re(ardless &hich dataset is the usin( dataset and &hich is the master dataset K subIect to some caveats discussed belo& (if variables other than the identification code have the same name in the t&o datasets) or if not all observations are retained) there can be differences). $ista'es in matchin( could arise if you meant to %erform a 121 match but actually carried out a 12m or m21 match K then the resultin( number of observations mi(ht be somethin( other than &hat you intended K sometimes because of missin( values in the identification code but commonly also because for some reason you have more than one entry for a code that you meant to be uni4ue. Usually you 'no& &hat 'ind of match should arise) and Stata re4uires you to s%ecify this information. ,ndeed) this is really im%ortant to avoid horrible mista'es. .or m$m matches Stata(s merge command does not do what you would ex!ect 7 it does not create all the relevant !airs8 instead you must use Stata(s 9oinby command" :ee!ing matching; master<only; and using<only observations . ,t may be that not all observations have matches across the master and usin( files. +hen no matches e-ist) certain observations are or%hans for &hich it is not %ossible to add columns of data from the other dataset) so the added variables can only contain missin( values. Jou may not &ant these or%hans 'e%t in the resultin( data) %articularly for or%hans from the usin( 65

"

dataset. Therefore Stata differentiates bet&een matchin( (5)) master *only (1)) and usin(* only (2) observations. ,t creates a variable named Tmer(e that contains the numbers 5) 1) and 2 res%ectively for these three cases. Jou can then dro% observations that you do not &ant after the match. !ven better) you can s%ecify u%*front to 'ee% only observations of s%ecific ty%e(s). To do so) use the 'ee%(N) o%tion. ,nside the o%tion) s%ecify one or more of ?match@) ?master@) and ?usin(@) se%arated by s%aces) to cause matchin() master*only) and usin(* only observations to be 'e%t in the results. "ll other observations &ill be dro%%ed in the results (thou(h not in any data files on your hard dis'). To ensure a(ainst mista'es) you can also use an ?assert(N)@ o%tion to state that everythin( should match) or that there should never be or%han observations from one of the datasets. ,f your assertion that this is true turns out to be violated) then Stata &ill sto% &ith an error messa(e so you can chec' &hat ha%%ened and fi- %ossible %roblems. ,nside the assert(N) o%tion) a(ain s%ecify one or more of ?match@) ?master@) and ?usin(@) se%arated by s%aces) to assert that the results &ill contain only matchin() master*only) and usin(*only observations. 1ariables other than the identification code that are in both master and using datasets . ,f a variable in the usin( dataset) other than the identification code) already e-ists in the master dataset) then it is not %ossible to brin( that column of data into the results as an inde%endent column (at least) not &ith the same variable name) and Stata Iust doesn/t brin( it in). The ?u%date@ o%tion to the mer(e command causes missin( values to be filled in usin( values in the usin( dataset) and the ?u%date@ and ?re%lace@ o%tions to(ether causes the reverse in &hich the usin( dataset/s values are %referred e-ce%t &here they have missin( values. or matchin( observations &ith variables &ith the same name in both datasets) if you use the u%date o%tion) then Tmer(e can ta'e values not Iust of 1) 2) or 5) but also of 6 &hen a missin( value &as u%dated from the other dataset) or of A &hen there &ere conflictin( non*missin( values of a variable in the t&o datasets. %eading in only selected variables . ,f you only &ant to read in some of the variables from the usin( dataset) use the 'ee%usin(( varlist) o%tion. 0efore usin( the mer(e command) therefore) you need to (o throu(h each of the above issues and fi(ure out &hat you &ant to do. Then you can use commands such as the follo&in(2 mer(e 121 %erson,< usin( filename $atch in observations from the usin( dataset) &ith %erson,< as the identification code variable. mer(e 121 country year month usin( filename $atch in observations from the usin( dataset) &ith country) year) and month Iointly as the identification code. mer(e 121 %erson,< usin( filename ) 'ee%(match master) $atch in observations from the usin( dataset) &ith %erson,< as the identification code variable) and only 'ee% observations that match or are in the master datasetR i(nore observations that are in the usin( dataset only. mer(e 121 %erson,< usin( filename ) assert(match) $atch in observations from the usin( dataset) &ith %erson,< as the identification code variable) and assert that all observations in each dataset match K if they do not) sto% &ith an error messa(e. mer(e 121 Tn usin( filename This one*to*one mer(e assumes that each observation i in the master dataset matches to each observation i in the usin( dataset. This is

66

"

dangerous because it/s easy to mista'enly have a &ron( sort order) so this is not recommendedQ mer(e m21 country,< usin( filename $atch in observations from the usin( dataset) &ith %erson,< as the identification code variable. This is s%ecified as a many*to*one match) so the master dataset may contain multi%le observations &ith the same country,<. mer(e 12m country,< usin( filename $atch in observations from the usin( dataset) &ith %erson,< as the identification code variable. This is s%ecified as a one*to*many match) so the usin( dataset may contain multi%le observations &ith the same country,<. Ioinby family,< usin( filename Carry out a m2m match in &hich you &ant all %airs of matchin( observations. This is not &hat you (et usin( ?mer(e m2m@) and you should not use ?mer(e m2m@ unless you really 'no& &hat you are doin(. See Stata/s hel% for this command for further information about o%tions. The mer(e command &ill dis%lay the number of resultin( observations &ith Tmer(e e4ual to 1) 2) and 5. "l&ays chec' the values of Tmer(e after mer(in( t&o datasets) to avoid errors. L5. #esha%in( <ata ;ften) %articularly &ith %anel data) it is necessary to convert bet&een ?&ide@ and ?lon(@ forms of a dataset. .ere is a trivially sim%le e-am%le2 +ide orm2 %ersonid income2GGA income2GGB income2GGD birthyear 1 5265D 55E22 61GDF 1FBD 2 AGGB1 25FD6 2EAA5 1FA2 Lon( orm2 %ersonid year income birthyear 1 2GGA 5265D 1FBD 1 2GGB 55E22 1FBD 1 2GGD 61GDF 1FBD 2 2GGA AGGB1 1FA2 2 2GGB 25FD6 1FA2 2 2GGD 2EAA5 1FA2 This is a trivially sim%le e-am%le because usually you &ould have many variables) not Iust income) that trans%ose bet&een &ide and lon( form) %lus you &ould have many variables) not Iust birthyear) that are s%ecific to the %ersonid and don/t vary &ith the year. Trivial or com%le-) all such cases can be converted from &ide to lon( form or vice versa usin( Stata/s resha%e command2 resha%e lon( income) i(%ersonid) I(year) 33 Startin( fr om &ide form) convert to lon( form. resha%e &ide income) i(%ersonid) I(year) 33 Startin( from lon( form) convert to &ide form. ,f you have more variables that) li'e income) need to trans%ose bet&een &ide and lon( form) and re(ardless of ho& many variables there are that don/t vary &ith the year) Iust name the relevant variables after ?resha%e lon(@ or ?resha%e &ide@) e.(.2 6A

"

resha%e lon( income married yrseduc) i(%ersonid) I(year) Startin( from &ide form) convert to lon( form. resha%e &ide income married yrseduc) i(%ersonid) I(year) Startin( from lon( form) convert to &ide form. L6. Convertin( 0et&een Strin(s and =umbers Use the describe command to see &hich variables are strin(s versus numbers2 describe ,f you have strin( variables that contain numbers) an easy &ay to convert them to numbers is to use the destrin( command. The tostrin( command &or's in the reverse direction. or e-am%le) if you have strin( variables named year) month) and day) and the strin(s really contain numbers) you could convert them to numbers as follo&s2 destrin( year month day) re%lace Convert strin( variables named year) month) and day) to numeric variables) assumin( the strin(s really do contain numbers. Jou could convert bac' a(ain usin( tostrin(2 tostrin( year month day) re%lace Convert numeric variables named year) month) and day) to strin( variables. +hen you convert from a strin( variable to a numeric var iable) you are li'ely to (et an er ror messa(e because not all of the strin(s are numbers. or e-am%le) if a strin( is ?2)56A)BDE@ then Stata &ill not reco(nize it to be a number because of the commas. Similar) values li'e ?see note@ or ?O1GGG@ cannot be converted to numbers. ,f this occurs) Stata &ill by default refuse to convert a strin( value into a number. This is (ood) because it %oints out that you need to loo' more closely to decide ho& to treat the data. ,f you &ant such non*numeric strin(s to be converted to missin( values) instead of Stata sto%%in( &ith an error messa(e) then use the force o%tion to the destrin( command2 destrin( year month day) re%lace force Convert strin( variables named year) month) and day) to numeric variables. ,f any strin( values do not seem to be numbers) convert them to missin( values. Li'e most Stata commands) these commands have a lot of o%tions. >et hel% on the Stata command destrin() or consult the Stata manuals) for more information. LA. Labels +hat if you have strin( variables that contain somethin( other than numbers) li'e ?male@ versus ?female@) or %eo%le/s names\ ,t is sometimes useful to convert these values to cate(orical variables) &ith values 1)2)5)N) instead of strin(s. "t the same time) you &ould li'e to record &hich numbers corres%ond to &hich strin(s. The association bet&een numbers and strin(s is achieved usin( &hat are called ?value labels@. Stata/s encode command creates a labeled numeric variable from a strin( variable. Stata/s decode command does the reverse. or e-am%le2 encode %erson=ame) (enerate(%erson=ame=) decode %erson=ame) (enerate(%erson=ameS) This e-am%le started &ith a strin( variable named %erson=ame) (enerated a ne& numeric variable named %erson=ame= &ith corres%ondin( labels) and then (enerated a ne& strin( variable %erson=ameS that &as once a(ain a strin( variable Iust li'e the ori(inal. ,f you bro&se the data) %erson=ame= &ill seem to be Iust li'e the strin( variable %erson=ame because Stata &ill automatically sho& the labels that corres%ond to each name. .o&ever) the numeric version may ta'e u% a lot less memory.

6B

"

,f you &ant to create your o&n value labels for a variable) that/s easy to do. or e-am%le) su%%ose a variable named female e4uals 1 for females or G for males. Then you mi(ht label it as follo&s2 label define femaleLab G MmaleM 1 MfemaleM This defines a label named ?femaleLab@. label values female femaleLab This tells Stata that the values of the variable named female should be labeled usin( the label named ?femaleLab@. ;nce you have created a (labeled) numeric variable) it &ould be incorrect to com%are the contents of a variable to a strin(2 summarize if countryPPMCanadaM This causes an error if country is numericQ .o&ever) Stata lets you loo' u% the value corres%ondin( to the label2 summarize if countryPPMCanadaM2countryLabel Jou can loo' u% the values from a label this &ay. ,n this case) countryLabel is the name of a label) and MCanadaM2countryLabel is the number for &hich the label is ?Canada@ accordin( to the label definition named countryLabel. ,f you do not 'no& the name of the label for a variable) use the describe command) and it &ill tell you the name of each variable/s label (if it has a label). Jou can list all the values of a label &ith the command2 label list labelname This lists all values and their labels for the label named labelname . Stata also lets you label a &hole dataset) so that &hen you (et information about the data) the label a%%ears. ,t also lets you label a variable) so that &hen you &ould dis%lay the name of the the variable) instead the label a%%ears. or e-am%le2 label data M%hysical characteristics of butterfly s%eciesM This labels the data. label variable income Mreal income in 1FFB "ustralian dollarsM This labels a variable. LB. =otes Jou may find it useful to add notes to your data. Jou record a note li'e this2 note2 This dataset is %ro%rietaryR theft &ill be %rosecuted to the full e-tent of the la&. .o&ever) notes are not by seen by users of the data unless the users ma'e a %oint to read them. To see &hat notes there are) ty%e2 notes =otes are a &ay to 'ee% trac' of information about the dataset or &or' you still need to do. Jou can also add notes about s%ecific variables2 note income2 ,nflation*adIusted usin( "ustralian census data. LD. $ore Useful Commands or more useful commands) (o to Stata/s .el% menu) choose Contents) and clic' on <ata mana(ement.

6D

You might also like