You are on page 1of 65

+

Introduction to Stata

Kim L. Cochon
Department of Epidemiology and Biostatistics
College of Public Health
University of the Philippines Manila
+ 2

Outline
 General description of the Stata software

 Starting Stata

 Stata’s default windows

 The help menu

 Inputting data using the data editor

 Saving and opening datasets

 Exiting stata

 Stata’s command diagram

 Commonly used commands in Stata


+ 3

Stata: General Description

 Stata statistical package


- managing, analyzing and graphing data

 StataCorp
- A licensed software

 Available for a
variety of platforms
- Windows, Macintosh, Linux, etc.
+ 4

Visit http://www.stata.com/stata12/ to see details of this Stata


software version
+ 5

Starting Stata

 To start Stata
- Click the Start Menu
- Then go to Programs
- Select Stata 12
+ 6
+ 7

Stata windows

 Four default windows  Other windows


- Command - Viewer
- Results - Data editor/browser
- Review - Do-file
- Variables - Graph
+ Command Window 8

Where commands are typed when working in interactive mode; everything


is echoed in the Results window as well as in the Review window
+ Results Window 9

All commands and their results are displayed here (except for graphs)
+ Review Window 10

Commands that have been specified are displayed here; click on any
command in the window and it will be pasted to the Command window
+ Variables Window 11

Shows a listing of all the variables in a dataset and also its corresponding
properties (e.g. label, type, value labels, etc.)
+ 12

Help Menu

 The help menu will give a list of references


- pdf documentation of the Stata manuals
- Advice  guide on how to use help
- Contents  list the contents of the Stata manuals
- Search  where you type in a topic you want to know about
- Stata command  type in a specific command
- What’s new? / News  gives information about updates
done on Stata
+ 13

Help Menu
+ 14

Viewer Window
+
Inputting Data in the Data Editor

15
+ 16

Data editor

Variable name bar

Spreadsheet
+ 17

Two General Field Types in Stata

 Numbers – only contain numeric characters


 Numeric entries that contain
- A sign (e.g. -.20)
- Decimal point ( e.g. 0.45)
- An e and a signed integer exponent (e.g. 2e+5)
- BUT it must not contain commas (e.g. 2,100 is considered as a
string by Stata)

 String – entries that contain letters or special characters


 Stata’s statistical routines treat string variables as if every
observation records a numeric missing value
+ 18

Missing values

 Denoted by a single period (i.e. .) if variable is numeric

 Denoted by a blank entry (i.e. ) if variable is string


 If missing value is to be indicated in an expression for a string
variable, then is denoted by double quotes (i.e. “ “)

 Any arithmetic operation on a missing value yields a missing value

 In producing statistical output, Stata ignores observations with


missing values
+ 19

Example

gender name age


1 juan dela cruz 35
2 annie batumbakal 28
angel 18
2 jane dela cruz 45
1 maria clara 50
+ 20

Example
+ 21

Variable name

 1 to 32 characters long

 Starts wit a letter or an underscore (_)

 No special characters and spaces


+ 22

Variable label versus


 Attaches a label to a variable

 Up to 80 characters

Value label
 To specify labels of the values of a numeric variable

 Value labels may be up to 32,000


+ 23

Value label
+ 24

Value label
+ 25
+ 26

Saving the dataset

 Close first the data editor window

 Click File then select Save As or

 Click on the save icon

 Stata datasets have an extension name of .dta

 To close the dataset type ‘clear’ in the command window


+ 27
+ 28

How to access a Stata data file

 Stata data file


 .dta
 Click File menu
 Then ‘Open’, browse for the data file; or
 Click the ‘Open (use)’ menu icon
 Then browse for the data file; or
 Press ‘Ctrl-O’
 Then browse for the data file; or
 Type ‘use filename, clear’ in the Command window
+ 29
+ 30

Exiting Stata

 To exit
- Click on the close box; or
- Click File menu and select Exit; or
- Type ‘exit’ in the Command window
+
Exercise 1

31
+ 32

Exercise 1

 Encode the information given in the proceeding slides using


Stata with the aid of the coding manual provided

 Save the data file using your surname


+ 33

Coding Manual

Variable Name Variable Description Coding Instruction


place Place of origin A1, A2, B1, B2
age Age of mother in years Actual age in years
lwt Weight in Pounds at the Last Actual weight
Menstrual Period
bwt Birth weight in grams Actual birth weight
race Race 1-white, 2-black, 3-Other
smoke Smoking Status during 0-no 1-yes
pregnancy
ui Presence of Uterine Irritability 0-no 1-yes

ftv Number of Physician Visits 0-no 1-yes


During the First Trimester
+ 34

Data to be Inputted
place age lwt bwt race smoke ui flv
A1 19 182 2523 2 0 1 0
A2 33 155 2551 3 0 0 1
B1 20 105 2557 1 1 0 1
A1 21 108 2594 1 1 1 1
A1 18 107 2600 1 1 1 0
B2 21 124 2622 3 0 0 0
B2 22 118 2637 1 0 0 1
A2 17 103 2637 3 0 0 1
A2 29 123 2663 1 1 0 1
A1 26 113 2665 1 1 0 0
B2 19 95 2722 3 0 0 0
B1 19 150 2733 3 0 0 1
+
Language Syntax of Stata

35
+ 36

Language Syntax

 Stata commands follow a common syntax

 Basic language syntax

 Where:
 Square bracket (e.g. [if])  optional commands/qualifiers
 Underline (e.g. summarize)  shortest possible abbreviation of a command
 Italized arguments (e.g. prefix)  should be substituted by variable name(s),
observation(s), number(s), etc.
+ 37

Language Syntax

 Each command has a specific syntax diagram


 Shows how to type the command

 Indicates possible options


 Gives the minimum allowed abbreviations for items in the command

 Example:
+ 38

Language syntax

 prefix  prefix command


 command  main command to be executed
 varlist  list of variable names
 If varname is indicated, then this means that only one variable name is needed
 exp  algebraic expression
 if  qualifies the characteristics of the observations where the command should
be executed
 in  qualifies the location of the observations where the command should be
executed
 weight  weight to be attached to each observation
 options  optional command
+ 39

Language syntax

 Prefix commands are not stand alone commands

 It operates on other Stata commands by modifying the input or


output of a command

 Example:
 by or bysort - repeats the command for each group of observations for which
the values of the variables in varlist are the same

. bysort race: su age /*outputs summary stats for age by race*/


+ 40

Language syntax

 varlist or varname specified in this manner means that the user is


required to specify variable name(s) in the command

 [varlist] or [varname] specified in this manner means that the user is


NOT required to specify variable name(s) in the command

 In commands that alter or destroy data, Stata requires that the varlist be
specified explicitly
+ 41

Language syntax

 Example:

. su age /*varname is age*/

. su age lwt /*varlist includes age and lwt*/

. su age-bwt /*varlist includes age, bwt and lwt*/

. su /*all variables*/
+ 42

Language syntax

 specifies the value to be assigned to a variable

 most often used with the commands generate and replace

 Example
. generate age2=age^2
. *generates a new variable named age2, which is the square of
the respondents age*
+ 43

Language syntax

 Restricts the scope of a command to those observations for which the value
of the expression is true

 At most, the user can specify one if qualifier per command

 Example:
. su age if lwt<100
. su age if lwt==100 if lwt>100
. su age if lwt>100 & lwt<150
. su age if lwt>100 | lwt>150
+ 44

Language syntax

 Restricts the scope of the command to a specific observation range

 Conventional formats of specifying the in command


in #
in #1/#2

 Example:
. su age in 5

. *summary stats for age computed using data from obs 5 only

. su age in 3/5

. *summary stats for age computed using data obs 3 to 5

. su age in 5/l

. *summary stats for age computed using data from obs 3 to last
+ 45

Language syntax

 indicates the weight to be attached to each observation


 Usually used when using summary statistics instead of individual data

 square brackets are actually typed


[weightword=exp]

 Weightword for frequency weights can be any of the following:

weight, frequency, freq


+ 46

Language syntax
+ 47

Language syntax
+ 48

Language syntax

 Many commands have command-specific options

 Indicated by typing a
comma at the end of
the main command

 Example

. su age, detail
+
Stata Commands

49
+ 50

Getting help

search
• Searches Stata’s keyword database
• Example:
. search regression

help
• use help when you know the name of the Stata command on
which you want information
• Example:
. help summarize
+ 51

Operating system interface

cd
• shows the current working directory. Also used for changing the
location of the working directory
• Example
. cd /*shows working directory*/
. cd “c://Desktop” /*Changes working directory to desktop*/
+ 52

Storing results

log using
• opens a filename and echos a copy of session to the file

• To close the log file, issue the command log close


+ 53

Basic data reporting

describe
• displays summary of the contents of data in memory or
data stored in Stata-format dataset
+ 54

Basic data reporting

codebook
• displays a codebook for the variables specified or if no variables
specified, all the vars in data
• based on the variables’ name, labels & values

list
• displays values of variables
+ 55

Data manipulation

generate
• creates a new variable

replace
• changes the contents of an existing variable
+ 56

Data manipulation

recode
• useful in transforming quantitative variables into categorical
variables

recode varname #/#=# … #/#=#


recode varname min/#=# … #/max=#

• Example:
. recode agegrp min/29=1 30/49=2 50/max=3
+ 57

Data manipulation

rename
• changes the name of existing variable
• contents remain unchanged

drop
• eliminates variables or observations from data in memory
+ 58

Data manipulation
keep
• works the same as drop
• except vars or obs to be kept are specifed

sort
• arranges obs of current data in ascending order of the values
of the variables in varlist
+ 59

Data manipulation

order
• changes the order of the variables in the current dataset
• variables specified are moved, in order, in front of the dataset

by
• repeats the command for each group of observations for
which the values of the vars in the varlist are the same
+ 60

Summary statistics

summarize
• Calculates and displays a variety of summary statistics
+ 61

Summary statistics

tabulate
• produces one- and two-way tables of frequency
• One-way table
tab var1
tab1 var1 var2 var3 var4 var5

• Two-way table
tab var1 var2
tab2 var1 var2 var3 var4 var5
+ 62

Example: tabulate command

. use bioman.dta, clear


. tab barangay
. tab barangay sex cs
. tab1 barangay sex cs
. tab2 barangay sex cs
+
Exercise

63
+ 64

Exercise 2

• Change the current directory to the location of the folder “Data”

• Log your current Stata session

• Open the epoetin.dta (NOTE: File is located at the data folder)

• Determine the number of observations in the dataset

• Determine the different etiologies of the patients’ medical


condition

• Determine the mean age of the patients


+ 65

Exercise 2

• Determine the mean hct level of the males at baseline

• Create the variable “hpnstat_ba” based on the baseline SBP and DBP
of the patients at baseline
(NOTE: A person is classified as hypertensive if SBP>=130 mm Hg OR DBP>=90 mm Hg)

• Cross-tabulate hypertension status (hpn_ba) and gender

• Close your log file

• Exit Stata

You might also like