Professional Documents
Culture Documents
Introduction to Stata
Kim L. Cochon
Department of Epidemiology and Biostatistics
College of Public Health
University of the Philippines Manila
+ 2
Outline
General description of the Stata software
Starting Stata
Exiting stata
StataCorp
- A licensed software
Available for a
variety of platforms
- Windows, Macintosh, Linux, etc.
+ 4
Starting Stata
To start Stata
- Click the Start Menu
- Then go to Programs
- Select Stata 12
+ 6
+ 7
Stata windows
All commands and their results are displayed here (except for graphs)
+ Review Window 10
Commands that have been specified are displayed here; click on any
command in the window and it will be pasted to the Command window
+ Variables Window 11
Shows a listing of all the variables in a dataset and also its corresponding
properties (e.g. label, type, value labels, etc.)
+ 12
Help Menu
Help Menu
+ 14
Viewer Window
+
Inputting Data in the Data Editor
15
+ 16
Data editor
Spreadsheet
+ 17
Missing values
Example
Example
+ 21
Variable name
1 to 32 characters long
Up to 80 characters
Value label
To specify labels of the values of a numeric variable
Value label
+ 24
Value label
+ 25
+ 26
Exiting Stata
To exit
- Click on the close box; or
- Click File menu and select Exit; or
- Type ‘exit’ in the Command window
+
Exercise 1
31
+ 32
Exercise 1
Coding Manual
Data to be Inputted
place age lwt bwt race smoke ui flv
A1 19 182 2523 2 0 1 0
A2 33 155 2551 3 0 0 1
B1 20 105 2557 1 1 0 1
A1 21 108 2594 1 1 1 1
A1 18 107 2600 1 1 1 0
B2 21 124 2622 3 0 0 0
B2 22 118 2637 1 0 0 1
A2 17 103 2637 3 0 0 1
A2 29 123 2663 1 1 0 1
A1 26 113 2665 1 1 0 0
B2 19 95 2722 3 0 0 0
B1 19 150 2733 3 0 0 1
+
Language Syntax of Stata
35
+ 36
Language Syntax
Where:
Square bracket (e.g. [if]) optional commands/qualifiers
Underline (e.g. summarize) shortest possible abbreviation of a command
Italized arguments (e.g. prefix) should be substituted by variable name(s),
observation(s), number(s), etc.
+ 37
Language Syntax
Example:
+ 38
Language syntax
Language syntax
Example:
by or bysort - repeats the command for each group of observations for which
the values of the variables in varlist are the same
Language syntax
In commands that alter or destroy data, Stata requires that the varlist be
specified explicitly
+ 41
Language syntax
Example:
. su /*all variables*/
+ 42
Language syntax
Example
. generate age2=age^2
. *generates a new variable named age2, which is the square of
the respondents age*
+ 43
Language syntax
Restricts the scope of a command to those observations for which the value
of the expression is true
Example:
. su age if lwt<100
. su age if lwt==100 if lwt>100
. su age if lwt>100 & lwt<150
. su age if lwt>100 | lwt>150
+ 44
Language syntax
Example:
. su age in 5
. *summary stats for age computed using data from obs 5 only
. su age in 3/5
. su age in 5/l
. *summary stats for age computed using data from obs 3 to last
+ 45
Language syntax
Language syntax
+ 47
Language syntax
+ 48
Language syntax
Indicated by typing a
comma at the end of
the main command
Example
. su age, detail
+
Stata Commands
49
+ 50
Getting help
search
• Searches Stata’s keyword database
• Example:
. search regression
help
• use help when you know the name of the Stata command on
which you want information
• Example:
. help summarize
+ 51
cd
• shows the current working directory. Also used for changing the
location of the working directory
• Example
. cd /*shows working directory*/
. cd “c://Desktop” /*Changes working directory to desktop*/
+ 52
Storing results
log using
• opens a filename and echos a copy of session to the file
describe
• displays summary of the contents of data in memory or
data stored in Stata-format dataset
+ 54
codebook
• displays a codebook for the variables specified or if no variables
specified, all the vars in data
• based on the variables’ name, labels & values
list
• displays values of variables
+ 55
Data manipulation
generate
• creates a new variable
replace
• changes the contents of an existing variable
+ 56
Data manipulation
recode
• useful in transforming quantitative variables into categorical
variables
• Example:
. recode agegrp min/29=1 30/49=2 50/max=3
+ 57
Data manipulation
rename
• changes the name of existing variable
• contents remain unchanged
drop
• eliminates variables or observations from data in memory
+ 58
Data manipulation
keep
• works the same as drop
• except vars or obs to be kept are specifed
sort
• arranges obs of current data in ascending order of the values
of the variables in varlist
+ 59
Data manipulation
order
• changes the order of the variables in the current dataset
• variables specified are moved, in order, in front of the dataset
by
• repeats the command for each group of observations for
which the values of the vars in the varlist are the same
+ 60
Summary statistics
summarize
• Calculates and displays a variety of summary statistics
+ 61
Summary statistics
tabulate
• produces one- and two-way tables of frequency
• One-way table
tab var1
tab1 var1 var2 var3 var4 var5
• Two-way table
tab var1 var2
tab2 var1 var2 var3 var4 var5
+ 62
63
+ 64
Exercise 2
Exercise 2
• Create the variable “hpnstat_ba” based on the baseline SBP and DBP
of the patients at baseline
(NOTE: A person is classified as hypertensive if SBP>=130 mm Hg OR DBP>=90 mm Hg)
• Exit Stata