Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

SAS Statistics by Example
SAS Statistics by Example
SAS Statistics by Example
Ebook373 pages2 hours

SAS Statistics by Example

Rating: 5 out of 5 stars

5/5

()

Read preview

About this ebook

In SAS Statistics by Example, Ron Cody offers up a cookbook approach for doing statistics with SAS. Structured specifically around the most commonly used statistical tasks or techniques--for example, comparing two means, ANOVA, and regression--this book provides an easy-to-follow, how-to approach to statistical analysis not found in other books.



For each statistical task, Cody includes heavily annotated examples using ODS Statistical Graphics procedures such as SGPLOT, SGSCATTER, and SGPANEL that show how SAS can produce the required statistics. Also, you will learn how to test the assumptions for all relevant statistical tests. Major topics featured include descriptive statistics, one- and two-sample tests, ANOVA, correlation, linear and multiple regression, analysis of categorical data, logistic regression, nonparametric techniques, and power and sample size.



This is not a book that teaches statistics. Rather, SAS Statistics by Example is perfect for intermediate to advanced statistical programmers who know their statistics and want to use SAS to do their analyses.



This book is part of the SAS Press program.
LanguageEnglish
PublisherSAS Institute
Release dateAug 22, 2011
ISBN9781612900124
SAS Statistics by Example
Author

Ron Cody

Ron Cody, EdD, is a retired professor from the Rutgers Robert Wood Johnson Medical School who now works as a private consultant and national instructor for SAS. A SAS user since 1977, Ron's extensive knowledge and innovative style have made him a popular presenter at local, regional, and national SAS conferences. He has authored or co-authored numerous books, such as Learning SAS by Example: A Programmer's Guide, Second Edition; A Gentle Introduction to Statistics Using SAS Studio; Cody's Data Cleaning Techniques Using SAS, Third Edition; SAS Functions by Example, Second Edition; and several other books on SAS programming and statistical analysis. During his career at Rutgers Medical School, he authored or co-authored over 100 articles in scientific journals.

Read more from Ron Cody

Related to SAS Statistics by Example

Related ebooks

Enterprise Applications For You

View More

Related articles

Reviews for SAS Statistics by Example

Rating: 5 out of 5 stars
5/5

1 rating0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    SAS Statistics by Example - Ron Cody

    9781607648000.jpgTitlePageImage.png

    Copyright

    The correct bibliographic citation for this manual is as follows: Cody, Ron. 2011. SAS® Statistics by Example. Cary, NC: SAS Institute Inc.

    SAS® Statistics by Example

    Copyright © 2011, SAS Institute Inc., Cary, NC, USA

    ISBN 978-1-61290-012-4 (electronic book)

    ISBN 978-1-60764-800-0

    All rights reserved. Produced in the United States of America.

    For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc.

    For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication.

    The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others’ rights is appreciated.

    U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987).

    SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513-2414

    1st printing, August 2011

    SAS® Publishing provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the SAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228.

    SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

    Other brand and product names are registered trademarks or trademarks of their respective companies.

    Contents

    List of Programs

    Acknowledgments

    Chapter 1  An Introduction to SAS

    Introduction

    What is SAS

    Statistical Tasks Performed by SAS

    The Structure of SAS Programs

    SAS Data Sets

    SAS Display Manager

    Excel Workbooks

    Variable Types in SAS Data Sets

    Temporary versus Permanent SAS Data Sets

    Creating a SAS Data Set from Raw Data

    Data Values Separated by Delimiters

    Reading CSV Files

    Data Values in Fixed Columns

    Excel Files with Invalid SAS Variable Names

    Other Sources of Data

    Conclusions

    Chapter 2  Descriptive Statistics – Continuous Variables

    Introduction

    Computing Descriptive Statistics Using PROC MEANS

    Descriptive Statistics Broken Down by a Classification Variable

    Computing a 95% Confidence Interval and the Standard Error

    Producing Descriptive Statistics, Histograms, and Probability Plots

    Changing the Midpoint Values on the Histogram

    Generating a Variety of Graphical Displays of Your Data

    Displaying Multiple Box Plots for Each Value of a Categorical Variable

    Conclusions

    Chapter 3  Descriptive Statistics – Categorical Variables

    Introduction

    Computing Frequency Counts and Percentages

    Computing Frequencies on a Continuous Variable

    Using Formats to Group Observations

    Histograms and Bar Charts

    Creating a Bar Chart Using PROC SGPLOT

    Using ODS to Send Output to Alternate Destinations

    Creating a Cross-Tabulation Table

    Changing the Order of Values in a Frequency Table

    Conclusions

    Chapter 4  Descriptive Statistics – Bivariate Associations

    Introduction

    Producing a Simple Scatter Plot Using PROG GPLOT

    Producing a Scatter Plot Using PROC SGPLOT

    Creating Multiple Scatter Plots on a Single Page Using PROC SGSCATTER

    Conclusions

    Chapter 5  Inferential Statistics – One-Sample Tests

    Introduction

    Conducting a One-Sample t-test Using PROC TTEST

    Running PROC TTEST with ODS Graphics Turned On

    Conducting a One-Sample t-test Using PROC UNIVARIATE

    Testing Whether a Distribution is Normally Distributed

    Tests for Other Distributions

    Conclusions

    Chapter 6  Inferential Statistics – Two-Sample Tests

    Introduction

    Conducting a Two-Sample t-test

    Testing the Assumptions for a t-test

    Customizing the Output from ODS Statistical Graphics

    Conducting a Paired t-test

    Assumption Violations

    Conclusions

    Chapter 7  Inferential Statistics – Comparing More than Two Means

    Introduction

    A Simple One-way Design

    Conducting Multiple Comparison Tests

    Using ODS Graphics to Produce a Diffogram

    Two-way Factorial Designs

    Analyzing Factorial Models with Significant Interactions

    Analyzing a Randomized Block Design

    Conclusions

    Chapter 8  Correlation and Regression

    Introduction

    Producing Pearson Correlations

    Generating a Correlation Matrix

    Creating HTML Output with Data Tips

    Generating Spearman Nonparametric Correlations

    Running a Simple Linear Regression Model

    Using ODS Statistical Graphics to Investigate Influential Observations

    Using the Regression Equation to Do Prediction

    A More Efficient Way to Compute Predicted Values

    Conclusions

    Chapter 9  Multiple Regression

    Introduction

    Fitting Multiple Regression Models

    Running All Possible Regressions with n Variables

    Producing Separate Plots Instead of a Panel

    Choosing the Best Model (Cp and Hocking’s Criteria)

    Forward, Backward, and Stepwise Selection Methods

    Forcing Selected Variables into a Model

    Creating Dummy (Design) Variables for Regression

    Detecting Collinearity

    Influential Observations in Multiple Regression Models

    Conclusions

    Chapter 10  Categorical Data

    Introduction

    Comparing Proportions

    Rearranging Rows and Columns in a Table

    Tables with Expected Values Less Than 5 (Fisher’s Exact Test)

    Computing Chi-Square from Frequency Data

    Using a Chi-Square Macro

    A Short-Cut Method for Requesting Multiple Tables

    Computing Coefficient Kappa—A Test of Agreement

    Computing Tests for Trends

    Computing Chi-Square for One-Way Tables

    Conclusions

    Chapter 11  Binary Logistic Regression

    Introduction

    Running a Logistic Regression Model with One Categorical Predictor Variable

    Running a Logistic Regression Model with One Continuous Predictor Variable

    Using a Format to Create a Categorical Variable from a Continuous Variable

    Using a Combination of Categorical and Continuous Variables in a Logistic Regression Model

    Running a Logistic Regression with Interactions

    Conclusions

    Chapter 12  Nonparametric Tests

    Introduction

    Performing a Wilcoxon Rank-Sum Test

    Performing a Wilcoxon Signed-Rank Test (for Paired Data)

    Performing a Kruskal-Wallis One-Way ANOVA

    Comparing Spread: The Ansari-Bradley Test

    Converting Data Values into Ranks

    Using PROC RANK to Group Your Data Values

    Conclusions

    Chapter 13  Power and Sample Size

    Introduction

    Computing the Sample Size for an Unpaired t-Test

    Computing the Power of an Unpaired t-Test

    Computing Sample Size for an ANOVA Design

    Computing Sample Sizes (or Power) for a Difference in Two Proportions

    Using the SAS Power and Sample Size Interactive Application

    Conclusions

    Chapter 14  Selecting Random Samples

    Introduction

    Taking a Simple Random Sample

    Taking a Random Sample with Replacement

    Creating Replicate Samples using PROC SURVEYSELECT

    Conclusions

    References

    Index

    List of Programs

    Chapter 1

    Program 1.1:    Using PROC PRINT to List the Observations in a SAS Data Set

    Program 1.2:    Using PROC CONTENTS to Display the Data Descriptor Portion of a SAS Data Set

    Program 1.3:    Reading Data from a Text File That Uses Spaces as Delimiters

    Program 1.4:    Using PROC PRINT to List the Observations in Data Set Sample2

    Program 1.5:    Reading a CSV File

    Chapter 2

    Program 2.1:    Generating Descriptive Statistics with PROC MEANS

    Program 2.2:    Statistics Broken Down by a Classification Variable

    Program 2.3:    Demonstrating the PRINTALLTYPES Option with PROC MEANS

    Program 2.4:    Computing a 95% Confidence Interval

    Program 2.5:    Producing Histograms and Probability Plots Using PROC UNIVARIATE

    Program 2.6:    Using PROC SGPLOT to Produce a Histogram

    Program 2.7:    Using SGPLOT to Produce a Horizontal Box Plot

    Program 2.8:    Displaying Outliers in a Box Plot

    Program 2.9:    Labeling Outliers on a Box Plot

    Program 2.10:  Displaying Multiple Box Plots for Each Value of a Categorical Variable

    Chapter 3

    Program 3.1:    Computing Frequencies and Percentages Using PROC FREQ

    Program 3.2:    Demonstrating the NOCUM Tables Option

    Program 3.3:    Demonstrating the Effect of the MISSING Option with PROC FREQ

    Program 3.4:    Computing Frequencies on a Continuous Variable

    Program 3.5:    Writing a Format for Gender, SBP, and DBP

    Program 3.6:    Generating a Bar Chart Using PROC GCHART

    Program 3.7:    Generating a Bar Chart Using PROC SGPLOT

    Program 3.8:    Using ODS to Create PDF Output

    Program 3.9:    Creating a Cross-Tabulation Table Using PROC FREQ

    Program 3.10:  Changing the Order of Values in a PROC FREQ Table By Using Formats

    Chapter 4

    Program 4.1:    Creating a Scatter Plot Using PROC GPLOT

    Program 4.2:    Adding Gender Information to the Plot

    Program 4.3:    Using PROC SGPLOT to Produce a Scatter Plot

    Program 4.4:    Adding Gender to the Scatter Plot Using PROC SGPLOT

    Program 4.5:    Demonstrating the PLOT Statement of PROC SGSCATTER

    Program 4.6:    Demonstrating the COMPARE Statement of PROC SGSCATTER

    Program 4.7:    Switching the x- and y-Axes and Adding a GROUP= Option

    Program 4.8:    Producing a Scatter Plot Matrix

    Chapter 5

    Program 5.1:    Conducting a One-Sample t-test Using PROC TTEST

    Program 5.2:    Demonstrating ODS Graphics with PROC TTEST

    Program 5.3:    Conducting a One-Sample t-test

    Program 5.4:    Conducting a One-Sample Test with a Nonzero Null Hypothesis

    Program 5.5:    Testing Whether a Variable is Normally Distributed

    Chapter 6

    Program 6.1:    Conducting a Two-Sample t-test

    Program 6.2:    Demonstrating ODS Graphics

    Program 6.3:    Selecting Plots Using the PLOT Option for PROC TTEST

    Program 6.4:    Conducting a Paired t-test

    Program 6.5:    Using ODS Plots to Test t-Test Assumptions

    Chapter 7

    Program 7.1:    Running a One-Way ANOVA

    Program 7.2:    Requesting Multiple Comparison Tests

    Program 7.3:    Using ODS Graphics to Produce a Diffogram1

    Program 7.4:    Performing a Two-Way Factorial Design

    Program 7.5:    Analyzing a Factorial Design with Significant Interactions

    Program 7.6:    Analyzing a Randomized Block Design

    Chapter 8

    Program 8.1:    Producing Correlations between Two Sets of Variables

    Program 8.2:    Producing a Correlation Matrix

    Program 8.3:    Creating HTML Output That Contains Data Tips

    Program 8.4:    Generating Spearman Rank Correlations

    Program 8.5:    Running a Simple Linear Regression Model

    Program 8.6:    Displaying Influential Observations

    Program 8.7:    Predicting Values Using the Regression Equation

    Program 8.8:    Using PROC REG to Compute Predicted Values

    Program 8.9:    Describing a More Efficient Way to Compute Predicted Values

    Program 8.10:  Using PROC SCORE to Compute Predicted Values from a Regression Model

    Chapter 9

    Program 9.1:    Running a Multiple Regression Model

    Program 9.2:    Using the RSQUARE Selection Method to Execute All Possible Models

    Program 9.3:    Generating Plots of R-Square, Adjusted R-Square, and Cp

    Program 9.4:    Demonstrating Forward, Backward, and Stepwise Selection Methods

    Program 9.5:    Setting the SLENTRY Value to .15 Using the Forward Selection Method

    Program 9.6:    Forcing Variables into a Stepwise Model

    Program 9.7:    Creating Dummy Variables for Regression

    Program 9.8:    Running PROG REG with Dummy Variables for Gender and Region

    Program 9.9:    Using the VIF to Detect Collinearity

    Program 9.10:  Detecting Influential Observations in Multiple Regression

    Chapter 10

    Program 10.1:  Comparing Proportions Using Chi-Square

    Program 10.2:  Rearranging Rows and Columns in a Table and Computing Relative Risk

    Program 10.3:  Tables with Small Expected Frequencies–Fisher’s Exact Test

    Program 10.4:  Computing Chi-Square from Frequency Data

    Program 10.5:  A SAS Macro for Computing Chi-Square from Cell Frequencies

    Program 10.6:  Computing Kappa Coefficient of Agreement

    Program 10.7:  Demonstrating Two Tests for Trend

    Program 10.8:  Computing Chi-Square for a One-Way Table

    Chapter 11

    Program 11.1:  Logistic Regression with One Categorical Predictor Variable

    Program 11.2:  Logistic Regression with One Continuous Predictor Variable

    Program 11.3:  Using a Format to Create a Categorical Variable

    Program 11.4:  Using a Combination of Categorical and Continuous Variables

    Program 11.5:  Running a Logistic Model with Interactions

    Chapter 12

    Program 12.1:  Plotting the Distribution of Income

    Program 12.2:  Performing a Wilcoxon Rank-Sum Test (aka Mann-Whitney U Test)

    Program 12.3:  Requesting an Exact p-Value for a Wilcoxon Rank-Sum Test

    Program 12.4:  Performing a Wilcoxon Signed-Rank Test

    Program 12.5:  Performing a Kruskal-Wallis ANOVA

    Program 12.6:  Performing the Ansari-Bradley Test for Spread

    Program 12.7:  Demonstrating PROC RANK

    Program 12.8:  Replacing Values with Ranks and Running a t-Test

    Program 12.9:  Using PROC RANK to Create Groups

    Chapter 13

    Program 13.1:  Computing Sample Size for an Unpaired t-Test

    Program 13.2:  Computing the Power of a t-Test

    Program 13.3:  Computing the Power for an ANOVA Design

    Program 13.4:  Computing Sample Size for a Difference in Two Proportions

    Chapter 14

    Program 14.1:  Taking a Simple Random Sample

    Program 14.2:  Taking a Random Sample with Replacement

    Program 14.3:  Rerunning the Program without the OUTHITS Option

    Program 14.4:  Requesting Replicate Samples

    Acknowledgments

    A tremendous amount of work went into bringing this book to the bookshelf, and all that work wasn’t done by me alone. Several factors combined to make the review process and the final production of this book a challenge.

    First and foremost, I would like to thank John West, my editor and friend, who was amazingly patient and calm, even when there were technical challenges to overcome.

    Next, we enlisted the help of more reviewers than usual. Four of these reviewers read the book from cover to cover and made excellent suggestions for improvements and found some subtle and obscure errors. So, kudos to Gerry Hobbs, Catherine Truxillo, Jeff Smith, and Marc Huber.

    Other reviewers read chapters, particularly those where they had a particular expertise. Sincere thanks to Rob Agnelli, Paul Grant, Sanjay Matange, David Schlotzhauer, Jim Seabolt, and Sue Walsh.

    Since the decision was made to use HTML output instead of simple list output, considerable extra effort was required. The production team needed to touch approximately 161 image files so that they would look good both in print as well as on the various eBook devices. The people involved in this process were: Jennifer Dilley, designer; Candy Farrell, technical publishing specialist; Joan Celmer, copyeditor; and Mary Beth Steinbach, managing editor.

    No book would be successful without having people to market it. Thanks to Aimee Rodriguez and Stacey Hamilton for this essential task.

    Finally, I salute the artists who created the front and back covers of the book. Nice job Jennifer Dilley and Marchellina Waugh.

    Ron Cody, Summer 2011

    Chapter 1  An Introduction to SAS

    Introduction

    What is SAS

    Statistical Tasks Performed by SAS

    The Structure of SAS Programs

    SAS Data Sets

    SAS Display Manager

    Excel Workbooks

    Variable Types in SAS Data Sets

    Temporary versus Permanent SAS Data Sets

    Creating a SAS Data Set from Raw Data

    Data Values Separated by Delimiters

    Reading CSV Files

    Data Values in Fixed Columns

    Excel Files with Invalid SAS Variable Names

    Other Sources of Data

    Conclusions

    Introduction

    If you are reading this book, you are probably familiar with various statistical techniques but might not have used SAS to analyze data. The primary purpose of this book is to show you how to use SAS to perform a variety of statistical tasks. To that end, this book provides examples of many of the commonly used statistical techniques. Following each example is a discussion of the output. Although this is not a book about SAS programming, many of the examples require some data manipulation tasks, which will be described. If you need to gain more SAS programming skills, see Learning SAS by Example: A Programmer’s Guide, also by this author and published by SAS Press.

    This book is divided into five sections: An Introduction to SAS, Descriptive Statistics, Inferential Statistics, Power/Sample size calculations, and Selecting Random Samples.

    All of the programs and data files in this book are available from SAS Press. To download these programs and files, go to http://support.sas.com/authors.

    If you already have some familiarity with SAS data sets and how to run SAS programs, you can skip this chapter and start right in with Chapter 2.

    The remainder

    Enjoying the preview?
    Page 1 of 1