You are on page 1of 28

Teaching with Stata

Peter A. Lachenbruch
&
Alan C. Acock
Oregon State University
peter.lachenbruch@oregonstate.edu
alan.acock@oregonstate.edu

First Course RequirementData


Entry
I want a first course to be able to do the things I
want students to do:
Enter and edit data--must be want to know topic
Students can do a small survey to get data on topics
of interest to them.
Voter poll
Attitudes toward diversity issues on campus
Beliefs about regulating the internet

Learn how to create a codebook, use codebook and


codebook,compact

Where possible use real data


WCSUG Presentation

First Course RequirementData


Management
Balance statistical content with proper data
management contenthard decision
Storing original dataset and creating a working
dataset
Keeping a record of every data modification they
make using do-file
Menu system is an aid
Do-files are the requirement

Missing values--distinguish types


Variable names, labels, and value labels
WCSUG Presentation

First Course Requirements


Data Management
Transformations log, , exp
Logical editing beware of logical
transformations when missing values are
present (geny=x<10 leads to .
transforming to 0)
Appending
Append student generated datasets

Merging
Merging two waves of data
WCSUG Presentation

First Course Requirements


Data Management
Constructing Measures
When to use egennewvar=rowtotal(var1,var2,var3)
When to use egennewvar=rowmean(var1,var2,var3)
When to use misschk command, what it does

Suppose the variable category is 0 or 1


If there are missing values in category, there is a difference
between

geny=1ifcategory
geny=1if(category==1)
geny=1if(category>0)
The first and third will give scores of 1 for missing values. The
second will give a score of 0 for missing values - BEWARE

WCSUG Presentation

First Course Requirements


Data Management
edit command, insheetinput,infile
(csv files)
gennewvar=ln(oldvar)
Rarely use replace oldvar=sqrt(oldvar)
only when correcting an error dont replace
data
mergeptidassessmentusingfile,
update (need for data to be sorted)

WCSUG Presentation

First Course Requirement (2)


Data presentation, numerical summary measures
summarize,detail;list;browse;edit;
describe;codebook;codebook,compact
Graphic presentation--bar chart, histogram, box plot
seem minimum
Probability computations binomial,
binomialtail,chi2,chi2tail,F,Ftail,
normal use of the inverse functions for these.

WCSUG Presentation

Examples
summarizesp,detail;listsp;
describes*;codebooks*
displaybinomial(10,3,0.1) for
cumulative or display
Binomial(10,3,.1) for reverse
cumulative; Note disp1
binomial(10,2,.1)givesthesame
result(alsobinomialtail(10,3,.1)
displaynormal(1.2)
geny=invnormal(uniform())*5+20
WCSUG Presentation

First Course Requirement (3)


Confidence intervals
Binomial cicivariable
Normal cicivariable
Poisson cicivariable,poisson

Percentiles
summarize,d
centileprice,c(10(10)90)

WCSUG Presentation

Examples
cii204;
cii204,agresti
Sometimes we want to use the Agresti formulation. The
exact is usually preferable

civarname,level(99)
summarizeweakness,detail
Can use suweakn,d(i.e.abbreviate
commands,optionsandvariables)

centileweakness,c(20,40,60,80)
Or centileweakness,c(20(20)80)
WCSUG Presentation

10

First Course Requirements (4)


Hypothesis Testing:
Normal r.v.s
One sample (including paired data) -

Two sample - ttest


K samples ANOVA

Binomial variables
One sample proportion
Two samples tabulate,chi2
WCSUG Presentation

11

Examples
ttestsp=120 [one-sample]
ttestspmen=spfem [paired]
ttestspmen=spfem,unpaired
unequalwelch
ttest sp, by(sex) [unequal welch etc.]
Also immediate form see help
anovaspagegrp

WCSUG Presentation

12

Examples
bitestsuccess=0.8 [one sample
binomial]
tabulatesuccessgroup,chi2
rowcol
prtestsuccess,by(group) [two
sample binomial]

WCSUG Presentation

13

First Course Requirements (5)


Hypothesis Testing (cont.)
Power considerations sampsi
(or spreadsheet nice
exercise for some good ones)
Nonparametric methods sign, signrank,
ranksum

Contingency tables tabulate, epitab

WCSUG Presentation

14

Examples
sampsi132.86127.44,p(0.8)r(2)
sd1(15.34)sd2(18.23)
ranksumsp,by(survive)
signrankbefore=after
When should we supplement Stata with other
software such as G*power3 that is free and
more flexible than sampsi or other software
such as PASS or nQuery Advisor?

WCSUG Presentation

15

First Course Requirements (6)


Simple linear regression regress,
rvfplot, other diagnostics
Correlation corr,spearman,ktau I tend
not to use corr because of the sensitivity to the
normality assumption for tests and confidence
intervals
Only pwcorr and not corr provide test of
significance

WCSUG Presentation

16

Examples
regressmpgweight
rvfplot
Statas type a little, get a little very different from
other packages
correlatempgweightorpwcorrmpg
weight (especially when you have more than 2
variables can specify sig and obsNote that
these only work withpwcorr)
spearmanmpgweight would be nice to have
Stata produce a Spearman correlation matrix
WCSUG Presentation

17

Examples
Its easy to use permutation tests
.permuteanyhcqt=r(t):ttestald7ifadult==1&assnum==1,by(anyhcq)(runningtteston
estimationsample)
MonteCarlopermutationresultsNumberofobs=97
command:ttestald7,by(anyhcq)
t:r(t)
permutevar:anyhcq

T|T(obs)cnp=c/nSE(p)[95%Conf.Interval]
+
t|1.648305131000.13000.0336.071073.2120407

Note:confidenceintervaliswithrespecttop=c/n.
Note:c=#{|T|>=|T(obs)|}

One can do similar things with the bootstrap


These are easy to use and intuitive for students
WCSUG Presentation

18

Use of Stata in the Classroom


Use Stata sparingly
Its not easy to follow commands typed or used from
menus students will get confused
Have handouts of what you do make spacing large
enough that students can annotate even if only to
write nasty things about the instructor
Balancing coverage of Stata, e.g. data management
with coverage of Statistics is a constant issue
Remember its a course in statistics, not in Stata

WCSUG Presentation

19

Data Sets
Place data sets on a LAN or common
drive or available for copying to flash
drive or CD
Use real data
Not too many variables
May have missing values but should not
affect main analyses unless you want to
demonstrate the problems with missing
values
WCSUG Presentation

20

In the Classroom
Using CD rather than flash drive is
better(?)
Many desktops have USB port located
inconveniently (darn you Dell!)
Sometimes newer PCs have USB port on
monitor, and laptops usually have an easy slot
for the flash drive
Light level in the room should allow students
to read easily
Days of dim projectors are over
WCSUG Presentation

21

In the Classroom (2)


Enlarge the Stata font by using right
mouse button
I have found that 14 point is pretty good
Be careful about wraparound of output if
needed, reduce point size temporarily
Dont ever use red on blue font
See what I mean? Its more difficult to read

Show how to move and fix windows


WCSUG Presentation

22

In the Classroom (2)


Optimizing visibility with projector
Use rich color background
EditPreferencesGeneral preferences.
Blue background option good but it relies on
red for errors, green for Standard text, and
doesnt bold fonts.
Custom may be better because you can make
fonts bold and pick colors that do not
disadvantage students who are colorblind.
WCSUG Presentation

23

Virtual Lab
A server supporting 30 simultaneous sessions of
Stata is remarkably inexpensive.
A department can require students to have
laptops or provide a cart with enough laptops
Because laptops are really dumb terminals with
server, the laptops can be cheap and not
updated very often
Any room becomes a lab
Students should have 24/7 access to the server
WCSUG Presentation

24

Handouts and Data Sets


Have handouts of your lecture notes
Have handouts of your data analysis
demonstrations
Include commands as well as output!

Data sets
On line LAN or CD or Floppy disk --Lots of laptops
dont have floppy drives any more, flash drives are
inexpensive

Include
Student generated datasets
Datasets with large Ns and relatively few variables
WCSUG Presentation

25

Emphasis in Course
Lectures devoted to statistics
Labs to learning Stata and working on
homework and discussion
Proper printing of output
Dont split output between two pages if
possible (at least, find a good break point)
Always use a monotype font (such as Courier
New)
WCSUG Presentation

26

Some Final Issues


Multiple testing can distort inference (i.e.
doing 100 tests guarantees some
significant results but they may be
meaningless) Worry about this
Controlling the digits in the output. Use
outreg,estout,esttab

WCSUG Presentation

27

The End

WCSUG Presentation

28

You might also like