Statistics Sas

Statistics with SAS
Technology Short Courses: Fall 2007

(Oct 29, 2007)
Kentaka Aruga
Object of the course

Performing simple descriptive statistics (proc
mean, proc freq, and proc corr)
Performing basic test statistics
(Chi-square test, T-test, F-test)
Basic commands for regression analysis and how
to export the result into a table
(proc reg)
Section 1 Preparation
Getting data and importing data
Getting data
Download the SAS command that will be used in this practice

from
http://www.uri.edu/its/research/sasstat.txt
Download the data file that will be used in this course from
http://www.uri.edu/its/research/auto.xls
http://www.uri.edu/its/research/vote.txt
Save the files under C:/ drive of your windows computer.
Importing Excel file to SAS

Open SAS program and copy and paste the
following commands from the file you have just
downloaded sasstat.txt:
libname car c:/;
proc import out= car.auto
datafile=c:/auto.xls
dbms=excel2000 replace;
sheet=auto;
getnames=yes;
run;
Then highlight the command line and execute the

command.
Proc import
Look at the trunk column
Do you see an empty column?
SAS determines the data type based on
the most common data type in the first 8
rows. trunk column has mixed data.
(since the first eight columns are all zero,
the remaining columns become all zero)
Proc import
Add the following statement
mixed = yes;
Now the command line should look like
proc import out= car.auto
datafile=c:/auto.xls
dbms=excel2000 replace;
sheet=auto;
getnames=yes;
mixed = yes;
run;
Execute this command
Section 2
Performing simple descriptive
statistics (proc mean, proc freq,
and proc corr)
How to perform simple descriptive

statistics (Review from SAS basics
course)
How would you see the number of obvs, mean,

std, min, and max of all numeric variables in SAS?
Ans.
proc means data=car.auto;

run;
How do you analyze frequency of the variables?

Ans.
proc freq data=car.auto;

run;
Proc means
By default proc means provides the number of obvs, mean, std,

min, and max of all numeric variables
run;
Specifying a certain variable

var variable name ;
Q. How would you execute the mean procedure for
the variables price, mpg, and weight ?
Creating an output table

output out= file name
Q. How would you get the output for the mean
procedure for the variables price, mpg,
and weight?
Proc means (Answers)

output out=car.means;
var price mpg weight;
run;
Proc freq
By default this procedure creates frequency tables for all
variables
run;
Specifying a certain variable

tables variable name
Q. How would you execute the FREQ procedure for the
variable foreign?
Creating an output table

/out = file name
Q. How would you get the output for the FREQ procedure for
the variable foreign?
Proc freq (Answers)

tables foreign /out=car.frn;
run;
Proc freq: Creating a two-way table

How would you create a two-way
table using the FREQ procedure for the
variables rep78 and foreign?
Ans.
proc freq data=sasuser.auto;
tables rep78*foreign;
run;
Proc freq: two-way table
Total % (= 8/13)
Row % (= 8/9)
Column % (= 8/10)
Proc corr
The CORR procedure generates Simple Statistics
based on non missing values, and Pearson
Correlation Coefficient, an index that quantifies the
linear relationship between a pair of variables
Insignificant p-value indicates the lack of linear
relationship between the two variables.
Proc corr
Finding correlations between a pair of
variables
1) All variables
proc corr data=car.auto;
run;
2) Three specific variables

proc corr data=car.auto;
var price mpg weight;
run;
Section 3
Performing basic test statistics
(Chi-square test, T-test, F-test)
Chi-square test of independence

What is the Chi-square test of independence?
Ans. It tests whether the variable in the row and column
are independent or related
What is the null hypothesis?
Ans. The variables in the row and column are
independent: there is no relationship between row and
column frequencies
The command for SAS to test this is provided in the
option of proc freq. Simply use chisq.
To display the expected cell frequency for each cell use
the option expected.
Chi-square test of independence:

exercise
There are 34 students in the classroom and there was a
vote on whether they wanted to have a turtle in their
classroom as a pet. The data file vote.txt contains the
result of the vote (Yes=y, No=n), and gender of the
students (male=m, female=f).
Q1 Import the file vote.txt into SAS and name the
variables answers and gender.
Q2 Using the option chisq, test whether or not the
answers to the vote and gender are associated with
each other.
Answers
Q1
data vote;
infile 'c:/vote.txt';
input answers $ gender $;
run;
Q2
proc freq data=vote;

tables answers*gender /expected chisq;
run;
Results
Expect Freq
Row total (15) Column total (16)

Table total (34)
What does the result tell you?

The null hypothesis that
the two variables are
independent is rejected
at even 1% significance
level.
This is lower than 0.01
The two variables

answers and gender
are associated with
each other (They are
dependent).
Proc ttest
This procedure is used to test the hypothesis of

equality of means for two normal populations
from which independent samples have been
obtained.
Three cases in SAS

One-sample t-test
Computes the sample mean of the variable and
compares it with a given number.
Two-sample t-test
Compares the mean of the first sample minus the
mean of the second sample to a given number.
Pair observations t-test
Compares the mean of the differences in the
observations to a given number.
Assumptions of proc ttest

The observations are random samples drawn from normally
distributed populations. This can be tested using the
UNIVARIATE procedure
If the normality assumptions are not satisfied: use NPAR1WAY
procedure.
Two populations of a group comparison must be

independent.
If not independent, you should question the validity of a paired
comparison.
The default null hypothesis is set as equal to zero. To

change this you can use H0=number. e.g. h0=10
The default confidence level is 5%. To change this you can
use alpha=confidence level. e.g. alpha=0.01
Source: http://www.okstate.edu/sas/v8/saspdf/stat/chap67.pdf
Proc ttest: exercise

How would you perform a t-test on mpg
variable classified by foreign variable?
Hint: use class and var statement
What will the null hypothesis be in this
case?
Proc ttest (Contd)

The command
proc ttest data=car.auto;
class foreign;
var mpg;
run;
CLASS statement: contains a variable that distinguishes the
groups being compared.
VAR statement: specifies the response variable to be used in
calculations.
The null hypothesis
H 0 : domestic foreign 0
The alternative hypothesis
H1 : domestic foreign 0
See here
High high
p-value
The first table shows the basic statistics

The second table is the t-test for equal mean. Before using this table
you need to look at the third table to determine if the assumption of
equal variances is reasonable
The third table is a test of equal variances
In this example the null hypothesis of equal variance is not rejected.
Thus you need to look at the equal variance in the second table. The
second table suggests there is not a difference in means across
domestic and foreign car.
Section 4
Basic commands for regression analysis and
how to export the result into a table
(proc reg)
Regression analysis
Regression analysis : finding a reasonable
mathematical model of the relationship
between a response variable (y) and a set
of explanatory variables (x1, x2,. xP)
General model
y 0 1 x1 2 x2 L p x p
Proc reg
General command
proc reg data = file name
model DV = IV ;
run;
DV: dependent variable IV: independent variable
This procedure also does the following testing:
F-test:
Tests the null hypothesis that none of the independent
variables has any effect
T-test
Tests for each IV the null hypothesis that the independent
variable has no effect toward the dependent variable.
Proc reg: exercise

Let price be a response variable (dependent
variable (DV)), and mpg and length be explanatory
variables (independent variables (IV))
Q1 What will be the commands?
Q2 What null hypotheses will be tested?
Q3 Will the model be significant?
Proc reg: answers

Q1 proc reg data = car.auto;
model price = mpg length;
run;
Q2 F-test H 0 : price 0
H1 : price 0 1mpg 2length
T-test
H 0 : i 0
H1 : i 0
Proc reg
Q3
Proc reg: Confidence and

prediction interval
Constructing 95% confidence and

prediction interval by adding two options,
clm and cli
How would you add these options in the
case of previous model?
proc reg data=car.auto;
model price = mpg length / clm cli;
run;
Proc reg: creating an output table

Add outest = file name after the proc reg
command
proc reg data=car.auto
outest=car.est1;
model price = mpg length /clm cli;
run;
quit;
In order to see the output data file car.est1 you
need to add the statement quit in the end.
You can drop the categories you do not want to see by using
the keep or drop statement
e.g.
data car.est2 (keep=intercept mpg length);
set car.est1;
run;
data car.est3 (drop=price _model_ _depvar_ _type_ _RMSE_);
set car.est1;
run;
Proc reg: creating an output table

To see other outputs go to Help and type in
REG and go into The REG procedure.
Click Syntax
Click Here
Exporting the output data to Excel

General commands
proc export data = Name of the SAS data file you are
exporting
outfile = The name of the drive or the pass to the folder
of your computer
dbms = excel2000 replace;
run;
How would you export the file car.est2 into an Excel file?
Ans. proc export data = car.est2
outfile = c:/est.xls"
dbms = excel2000 replace;
run;
Useful supports: other useful sites

Online SAS manuals
http://www.uri.edu/sasdoc
This will automatically link you to
http://support.sas.com/documentation/onlinedoc/
sas9doc.html
Statbookstore: useful site for finding program
examples
http://www.geocities.com/statbookstore/
For further Questions:

kentaka@mail.uri.edu

Statistics Sas

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Sas

Uploaded by

Copyright:

Available Formats

Statistics with SAS

Technology Short Courses: Fall 2007

Object of the course

Download the SAS command that will be used in this practice

Save the files under C:/ drive of your windows computer.

Importing Excel file to SAS

Then highlight the command line and execute the

How to perform simple descriptive

How would you see the number of obvs, mean,

proc means data=car.auto;

How do you analyze frequency of the variables?

proc freq data=car.auto;

By default proc means provides the number of obvs, mean, std,

Specifying a certain variable

Creating an output table

Proc means (Answers)

Specifying a certain variable

Creating an output table

Proc freq (Answers)

Proc freq: Creating a two-way table

Proc freq: two-way table

2) Three specific variables

Chi-square test of independence

Chi-square test of independence:

proc freq data=vote;

Row total (15) Column total (16)

What does the result tell you?

The two variables

This procedure is used to test the hypothesis of

Three cases in SAS

Assumptions of proc ttest

Two populations of a group comparison must be

The default null hypothesis is set as equal to zero. To

Proc ttest: exercise

Proc ttest (Contd)

The alternative hypothesis

The first table shows the basic statistics

Proc reg: exercise

Proc reg: answers

H1 : price 0 1mpg 2length

Proc reg: Confidence and

Constructing 95% confidence and

Proc reg: creating an output table

Proc reg: creating an output table

Exporting the output data to Excel

Useful supports: other useful sites

For further Questions:

You might also like