Professional Documents
Culture Documents
Section 1 Preparation
Getting data and importing data
Getting data
Download the data file that will be used in this course from
http://www.uri.edu/its/research/auto.xls
http://www.uri.edu/its/research/vote.txt
Proc import
Look at the trunk column
Do you see an empty column?
SAS determines the data type based on
the most common data type in the first 8
rows. trunk column has mixed data.
(since the first eight columns are all zero,
the remaining columns become all zero)
Proc import
Add the following statement
mixed = yes;
Now the command line should look like
proc import out= car.auto
datafile=c:/auto.xls
dbms=excel2000 replace;
sheet=auto;
getnames=yes;
mixed = yes;
run;
Execute this command
Section 2
Performing simple descriptive
statistics (proc mean, proc freq,
and proc corr)
Proc means
Proc freq
By default this procedure creates frequency tables for all
variables
proc freq data=car.auto;
run;
Total % (= 8/13)
Row % (= 8/9)
Column % (= 8/10)
Proc corr
The CORR procedure generates Simple Statistics
based on non missing values, and Pearson
Correlation Coefficient, an index that quantifies the
linear relationship between a pair of variables
Insignificant p-value indicates the lack of linear
relationship between the two variables.
Proc corr
Finding correlations between a pair of
variables
1) All variables
proc corr data=car.auto;
run;
Section 3
Performing basic test statistics
(Chi-square test, T-test, F-test)
Answers
Q1
data vote;
infile 'c:/vote.txt';
input answers $ gender $;
run;
Q2
Results
Expect Freq
Proc ttest
Two-sample t-test
Compares the mean of the first sample minus the
mean of the second sample to a given number.
Pair observations t-test
Compares the mean of the differences in the
observations to a given number.
H 0 : domestic foreign 0
H1 : domestic foreign 0
See here
High high
p-value
Section 4
Basic commands for regression analysis and
how to export the result into a table
(proc reg)
Regression analysis
Regression analysis : finding a reasonable
mathematical model of the relationship
between a response variable (y) and a set
of explanatory variables (x1, x2,. xP)
General model
y 0 1 x1 2 x2 L p x p
Proc reg
General command
proc reg data = file name
model DV = IV ;
run;
DV: dependent variable IV: independent variable
This procedure also does the following testing:
F-test:
Tests the null hypothesis that none of the independent
variables has any effect
T-test
Tests for each IV the null hypothesis that the independent
variable has no effect toward the dependent variable.
T-test
H 0 : i 0
H1 : i 0
Proc reg
Q3
You can drop the categories you do not want to see by using
the keep or drop statement
e.g.
data car.est2 (keep=intercept mpg length);
set car.est1;
run;
data car.est3 (drop=price _model_ _depvar_ _type_ _RMSE_);
set car.est1;
run;
Click Syntax
Click Here