Logistic Regression Using SAS: Theory and Application, Second Edition

Ebook541 pages6 hours

Logistic Regression Using SAS: Theory and Application, Second Edition

Name: Logistic Regression Using SAS: Theory and Application, Second Edition
Brand: SAS Institute
Rating: 4.0 (2 reviews)

By Paul D. Allison

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

If you are a researcher or student with experience in multiple linear regression and want to learn about logistic regression, Paul Allison's Logistic Regression Using SAS: Theory and Application, Second Edition, is for you! Informal and nontechnical, this book both explains the theory behind logistic regression, and looks at all the practical details involved in its implementation using SAS. Several real-world examples are included in full detail. This book also explains the differences and similarities among the many generalizations of the logistic regression model. The following topics are covered: binary logistic regression, logit analysis of contingency tables, multinomial logit analysis, ordered logit analysis, discrete-choice analysis, and Poisson regression. Other highlights include discussions on how to use the GENMOD procedure to do loglinear analysis and GEE estimation for longitudinal binary data. Only basic knowledge of the SAS DATA step is assumed. The second edition describes many new features of PROC LOGISTIC, including conditional logistic regression, exact logistic regression, generalized logit models, ROC curves, the ODDSRATIO statement (for analyzing interactions), and the EFFECTPLOT statement (for graphing nonlinear effects). Also new is coverage of PROC SURVEYLOGISTIC (for complex samples), PROC GLIMMIX (for generalized linear mixed models), PROC QLIM (for selection models and heterogeneous logit models), and PROC MDC (for advanced discrete choice models).

This book is part of the SAS Press program.

Skip carousel

LanguageEnglish

PublisherSAS Institute

Release dateMar 30, 2012

ISBN9781607649953

Author

Paul D. Allison

Paul D. Allison is Professor of Sociology at the University of Pennsylvania and President of Statistical Horizons LLC. He is the author of Logistic Regression Using SAS: Theory and Application, Survival Analysis Using SAS: A Practical Guide, and Fixed Effects Regression Methods for Longitudinal Data Using SAS. Paul has also written numerous statistical papers and published extensively on the subject of scientistsâ€™ careers. He frequently teaches public short courses on the methods described in his books.

Related authors

Skip carousel

Related to Logistic Regression Using SAS

Related ebooks

Skip carousel

SAS Statistics by Example
Ebook
SAS Statistics by Example
byRon Cody
Rating: 5 out of 5 stars
5/5
End-to-End Data Science with SAS: A Hands-On Programming Guide
Ebook
End-to-End Data Science with SAS: A Hands-On Programming Guide
byJames Gearheart
Rating: 0 out of 5 stars
0 ratings
Biostatistics by Example Using SAS Studio
Ebook
Biostatistics by Example Using SAS Studio
byRon Cody
Rating: 0 out of 5 stars
0 ratings
Categorical Data Analysis Using SAS, Third Edition
Ebook
Categorical Data Analysis Using SAS, Third Edition
byMaura E. Stokes
Rating: 0 out of 5 stars
0 ratings
Cody's Data Cleaning Techniques Using SAS, Third Edition
Ebook
Cody's Data Cleaning Techniques Using SAS, Third Edition
byRon Cody
Rating: 5 out of 5 stars
5/5
Applied Data Mining for Forecasting Using SAS
Ebook
Applied Data Mining for Forecasting Using SAS
byTim Rey
Rating: 0 out of 5 stars
0 ratings
Survival Analysis Using SAS: A Practical Guide, Second Edition
Ebook
Survival Analysis Using SAS: A Practical Guide, Second Edition
byPaul D. Allison
Rating: 0 out of 5 stars
0 ratings
PROC REPORT by Example: Techniques for Building Professional Reports Using SAS: Techniques for Building Professional Reports Using SAS
Ebook
PROC REPORT by Example: Techniques for Building Professional Reports Using SAS: Techniques for Building Professional Reports Using SAS
byLisa Fine
Rating: 0 out of 5 stars
0 ratings
Implementing CDISC Using SAS: An End-to-End Guide, Revised Second Edition
Ebook
Implementing CDISC Using SAS: An End-to-End Guide, Revised Second Edition
byChris Holland
Rating: 0 out of 5 stars
0 ratings
PROC DOCUMENT by Example Using SAS
Ebook
PROC DOCUMENT by Example Using SAS
byMichael Tuchman
Rating: 0 out of 5 stars
0 ratings
SAS for Forecasting Time Series, Third Edition
Ebook
SAS for Forecasting Time Series, Third Edition
byJohn C. Brocklebank, Ph.D.
Rating: 0 out of 5 stars
0 ratings
Multiple Imputation of Missing Data Using SAS
Ebook
Multiple Imputation of Missing Data Using SAS
byPatricia Berglund
Rating: 0 out of 5 stars
0 ratings
Practical and Efficient SAS Programming: The Insider's Guide
Ebook
Practical and Efficient SAS Programming: The Insider's Guide
byMartha Messineo
Rating: 0 out of 5 stars
0 ratings
PROC SQL: Beyond the Basics Using SAS, Third Edition
Ebook
PROC SQL: Beyond the Basics Using SAS, Third Edition
byKirk Paul Lafler
Rating: 0 out of 5 stars
0 ratings
The SAS Programmer's PROC REPORT Handbook: ODS Companion
Ebook
The SAS Programmer's PROC REPORT Handbook: ODS Companion
byJane Eslinger
Rating: 0 out of 5 stars
0 ratings
Carpenter's Guide to Innovative SAS Techniques
Ebook
Carpenter's Guide to Innovative SAS Techniques
byArt Carpenter
Rating: 0 out of 5 stars
0 ratings
Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Third Edition
Ebook
Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Third Edition
byKattamuri S. Sarma
Rating: 0 out of 5 stars
0 ratings
Learning SAS by Example: A Programmer's Guide, Second Edition
Ebook
Learning SAS by Example: A Programmer's Guide, Second Edition
byRon Cody
Rating: 3 out of 5 stars
3/5
SAS Macro Programming Made Easy, Third Edition
Ebook
SAS Macro Programming Made Easy, Third Edition
byMichele M. Burlew
Rating: 3 out of 5 stars
3/5
The SAS Programmer's PROC REPORT Handbook: Basic to Advanced Reporting Techniques
Ebook
The SAS Programmer's PROC REPORT Handbook: Basic to Advanced Reporting Techniques
byJane Eslinger
Rating: 0 out of 5 stars
0 ratings
Advanced SQL with SAS
Ebook
Advanced SQL with SAS
byChristian FG Schendera
Rating: 0 out of 5 stars
0 ratings
SAS Programming in the Pharmaceutical Industry, Second Edition
Ebook
SAS Programming in the Pharmaceutical Industry, Second Edition
byJack Shostak
Rating: 5 out of 5 stars
5/5
Getting Started with SAS Programming: Using SAS Studio in the Cloud
Ebook
Getting Started with SAS Programming: Using SAS Studio in the Cloud
byRon Cody
Rating: 0 out of 5 stars
0 ratings
Data Mining Applications with R
Ebook
Data Mining Applications with R
byYanchang Zhao
Rating: 4 out of 5 stars
4/5
SAS Certification Prep Guide: Statistical Business Analysis Using SAS9
Ebook
SAS Certification Prep Guide: Statistical Business Analysis Using SAS9
byJoni N. Shreve, PhD
Rating: 0 out of 5 stars
0 ratings
Machine Learning with SAS Viya
Ebook
Machine Learning with SAS Viya
bySAS Institute Inc.
Rating: 0 out of 5 stars
0 ratings
The Little SAS Book: A Primer, Sixth Edition
Ebook
The Little SAS Book: A Primer, Sixth Edition
byLora D. Delwiche
Rating: 5 out of 5 stars
5/5
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Ebook
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
byKim Chantala
Rating: 0 out of 5 stars
0 ratings
A Step-by-Step Approach to Using SAS for Factor Analysis and Structural Equation Modeling, Second Edition
Ebook
A Step-by-Step Approach to Using SAS for Factor Analysis and Structural Equation Modeling, Second Edition
byNorm O'Rourke, Ph.D., R.Psych.
Rating: 0 out of 5 stars
0 ratings
SAS Viya: The Python Perspective
Ebook
SAS Viya: The Python Perspective
byKevin D. Smith
Rating: 0 out of 5 stars
0 ratings

Applications & Software For You

Skip carousel

Logic Pro X For Dummies
Ebook
Logic Pro X For Dummies
byGraham English
Rating: 0 out of 5 stars
0 ratings
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
Ebook
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
byPaul Richards
Rating: 0 out of 5 stars
0 ratings
Adobe Illustrator: A Complete Course and Compendium of Features
Ebook
Adobe Illustrator: A Complete Course and Compendium of Features
byJason Hoppe
Rating: 0 out of 5 stars
0 ratings
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
GarageBand For Dummies
Ebook
GarageBand For Dummies
byBob LeVitus
Rating: 5 out of 5 stars
5/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Sound Design for Filmmakers: Film School Sound
Ebook
Sound Design for Filmmakers: Film School Sound
byMurray Stiller
Rating: 5 out of 5 stars
5/5
iPhone Photography For Dummies
Ebook
iPhone Photography For Dummies
byMark Hemmings
Rating: 0 out of 5 stars
0 ratings
80 Ways to Use ChatGPT in the Classroom
Ebook
80 Ways to Use ChatGPT in the Classroom
byStan Skrabut
Rating: 5 out of 5 stars
5/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Synthesizer Cookbook: How to Use Filters: Sound Design for Beginners, #2
Ebook
Synthesizer Cookbook: How to Use Filters: Sound Design for Beginners, #2
byScreech House
Rating: 3 out of 5 stars
3/5
Blender 3D Basics Beginner's Guide Second Edition
Ebook
Blender 3D Basics Beginner's Guide Second Edition
byGordon Fisher
Rating: 5 out of 5 stars
5/5
The Essential Persona Lifecycle: Your Guide to Building and Using Personas
Ebook
The Essential Persona Lifecycle: Your Guide to Building and Using Personas
byTamara Adlin
Rating: 4 out of 5 stars
4/5
iPad Mini 6 User Instruction Manual: A User Guide to Help Master the Most Challenging Aspects of This Handy Device
Ebook
iPad Mini 6 User Instruction Manual: A User Guide to Help Master the Most Challenging Aspects of This Handy Device
byGerard McClay
Rating: 0 out of 5 stars
0 ratings
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
Adobe Photoshop: A Complete Course and Compendium of Features
Ebook
Adobe Photoshop: A Complete Course and Compendium of Features
byStephen Laskevitch
Rating: 5 out of 5 stars
5/5
Mastering QuickBooks 2020: The ultimate guide to bookkeeping and QuickBooks Online
Ebook
Mastering QuickBooks 2020: The ultimate guide to bookkeeping and QuickBooks Online
byCrystalynn Shelton
Rating: 0 out of 5 stars
0 ratings
The Best Hacking Tricks for Beginners
Ebook
The Best Hacking Tricks for Beginners
byRAJ TYAGI
Rating: 4 out of 5 stars
4/5
The Little SAS Book: A Primer, Sixth Edition
Ebook
The Little SAS Book: A Primer, Sixth Edition
byLora D. Delwiche
Rating: 5 out of 5 stars
5/5
YouTube Growth Mastery: How to Start & Grow A Successful Youtube Channel. Get More Views, Subscribers, Hack The Algorithm, Make Money & Master YouTube
Ebook
YouTube Growth Mastery: How to Start & Grow A Successful Youtube Channel. Get More Views, Subscribers, Hack The Algorithm, Make Money & Master YouTube
byMax Lane
Rating: 3 out of 5 stars
3/5
Six Figure Blogging In 3 Months
Ebook
Six Figure Blogging In 3 Months
byShekhar Mishra
Rating: 4 out of 5 stars
4/5
Samsung Galaxy S23 Ultra User Guide for Beginners and Seniors
Ebook
Samsung Galaxy S23 Ultra User Guide for Beginners and Seniors
byCharles J. Jones
Rating: 3 out of 5 stars
3/5
Vocal Rescue: Rediscover the Beauty, Power and Freedom in Your Singing
Ebook
Vocal Rescue: Rediscover the Beauty, Power and Freedom in Your Singing
byLois Alba
Rating: 4 out of 5 stars
4/5
Hacks for TikTok: 150 Tips and Tricks for Editing and Posting Videos, Getting Likes, Keeping Your Fans Happy, and Making Money
Ebook
Hacks for TikTok: 150 Tips and Tricks for Editing and Posting Videos, Getting Likes, Keeping Your Fans Happy, and Making Money
byKyle Brach
Rating: 5 out of 5 stars
5/5
Photoshop For Beginners: Learn Adobe Photoshop cs5 Basics With Tutorials
Ebook
Photoshop For Beginners: Learn Adobe Photoshop cs5 Basics With Tutorials
byNisha Ramavat
Rating: 0 out of 5 stars
0 ratings
Canon EOS Rebel T7/2000D For Dummies
Ebook
Canon EOS Rebel T7/2000D For Dummies
byJulie Adair King
Rating: 0 out of 5 stars
0 ratings
Memes for Music Producers: Top 100 Funny Memes for Musicians With Hilarious Jokes, Epic Fails & Crazy Comedy (Best Music Production Memes, EDM Memes, DJ Memes & FL Studio Memes 2021)
Ebook
Memes for Music Producers: Top 100 Funny Memes for Musicians With Hilarious Jokes, Epic Fails & Crazy Comedy (Best Music Production Memes, EDM Memes, DJ Memes & FL Studio Memes 2021)
byScreech House
Rating: 4 out of 5 stars
4/5
How Do I Do That In InDesign?
Ebook
How Do I Do That In InDesign?
byDave Clayton
Rating: 5 out of 5 stars
5/5
3D Printing Designs: Fun and Functional Projects
Ebook
3D Printing Designs: Fun and Functional Projects
byLarson Joe
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
Podcast episode
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
byAnalytics on Fire
0 ratings
0% found this document useful
Machine Learning in Performance with Gopal Brugalette: Managing the performance of complex systems requires more than simply running load tests. You need to perform a careful analysis of test results and production metrics. The sheer amount of data generated makes analysis a challenge that is often left...
Podcast episode
Machine Learning in Performance with Gopal Brugalette: Managing the performance of complex systems requires more than simply running load tests. You need to perform a careful analysis of test results and production metrics. The sheer amount of data generated makes analysis a challenge that is often left...
byTestGuild Devops Toolchain Podcast
0 ratings
0% found this document useful
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
Podcast episode
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
byData Engineering Podcast
0 ratings
0% found this document useful
Powering your Copilot for Data – with Artem Keydunov of Cube.dev
Podcast episode
Powering your Copilot for Data – with Artem Keydunov of Cube.dev
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
Putting machine learning into a database: Most data scientists bounce back and forth regula…
Podcast episode
Putting machine learning into a database: Most data scientists bounce back and forth regula…
byLinear Digressions
0 ratings
0% found this document useful
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
Podcast episode
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
byData Engineering Podcast
0 ratings
0% found this document useful
#08 - Tech stack: Metabase, Superset, Redash, Grafana
Podcast episode
#08 - Tech stack: Metabase, Superset, Redash, Grafana
byTOPP - The Open Podcast Podcast
0 ratings
0% found this document useful
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
Podcast episode
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
byData Engineering Podcast
0 ratings
0% found this document useful
Oracle Data Lakehouse: With each passing day, more and more data sources are sending greater volumes of data across the globe. For any organization, this combination of structured and unstructured data continues to be a challenge. Data lakehouses link, correlate, and...
Podcast episode
Oracle Data Lakehouse: With each passing day, more and more data sources are sending greater volumes of data across the globe. For any organization, this combination of structured and unstructured data continues to be a challenge. Data lakehouses link, correlate, and...
byOracle University Podcast
0 ratings
0% found this document useful
New Trends in Serverless
Podcast episode
New Trends in Serverless
byThe Cloudcast
0 ratings
0% found this document useful
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
Podcast episode
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
byMLOps.community
0 ratings
0% found this document useful
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
Podcast episode
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
byMLOps.community
0 ratings
0% found this document useful
SLOs for Everyone
Podcast episode
SLOs for Everyone
byThe Cloudcast
0 ratings
0% found this document useful
Developer Data Platforms
Podcast episode
Developer Data Platforms
byThe Cloudcast
0 ratings
0% found this document useful
Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems: Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and practices around data that is at rest and in motion, but security around data in use is still severely lacking. Recognizing this shortcoming and the capabilities that could be unlocked by a robust solution Rishabh Poddar helped to create Opaque Systems as an outgrowth of his PhD studies. In this episode he shares the work that he and his team have done to simplify integration of secure enclaves and trusted computing environments into analytical workflows and how you can start using it without re-engineering your existing systems.
Podcast episode
Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems: Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and practices around data that is at rest and in motion, but security around data in use is still severely lacking. Recognizing this shortcoming and the capabilities that could be unlocked by a robust solution Rishabh Poddar helped to create Opaque Systems as an outgrowth of his PhD studies. In this episode he shares the work that he and his team have done to simplify integration of secure enclaves and trusted computing environments into analytical workflows and how you can start using it without re-engineering your existing systems.
byData Engineering Podcast
0 ratings
0% found this document useful
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
Podcast episode
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
byData Engineering Podcast
0 ratings
0% found this document useful
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
Podcast episode
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
byData Engineering Podcast
0 ratings
0% found this document useful
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
Podcast episode
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
byData Engineering Podcast
0 ratings
0% found this document useful
Episode 442: RR 434: Surviving Webpack with Ross Kaffenberger
Podcast episode
Episode 442: RR 434: Surviving Webpack with Ross Kaffenberger
byRuby Rogues
0 ratings
0% found this document useful
My thoughts on the CAP theorem
Podcast episode
My thoughts on the CAP theorem
byThe Backend Engineering Show with Hussein Nasser
0 ratings
0% found this document useful
418: Mental Models For Reduce Functions: Joël talks about his difficulties optimizing queries in ActiveRecord, especially with complex scopes and unions, resulting in slow queries. He emphasizes the importance of optimizing subqueries in unions to boost performance despite challenges such as query duplication and difficulty reusing scopes. Stephanie discusses upgrading a client's app to Rails 7, highlighting the importance of patience, detailed attention, and the benefits of collaborative work with a fellow developer. The conversation shifts to Ruby's reduce method (inject), exploring its complexity and various mental models to understand it. They discuss when it's preferable to use reduce over other methods like each, map, or loops and the importance of understanding the underlying operation you wish to apply to two elements before scaling up with reduce. The episode also touches on monoids and how they relate to reduce, suggesting that a deep understanding of functional programming
Podcast episode
418: Mental Models For Reduce Functions: Joël talks about his difficulties optimizing queries in ActiveRecord, especially with complex scopes and unions, resulting in slow queries. He emphasizes the importance of optimizing subqueries in unions to boost performance despite challenges such as query duplication and difficulty reusing scopes. Stephanie discusses upgrading a client's app to Rails 7, highlighting the importance of patience, detailed attention, and the benefits of collaborative work with a fellow developer. The conversation shifts to Ruby's reduce method (inject), exploring its complexity and various mental models to understand it. They discuss when it's preferable to use reduce over other methods like each, map, or loops and the importance of understanding the underlying operation you wish to apply to two elements before scaling up with reduce. The episode also touches on monoids and how they relate to reduce, suggesting that a deep understanding of functional programming
byThe Bike Shed
0 ratings
0% found this document useful
Spanner Myths Busted with Pritam Shah and Vaibhav Govil: This week, we’re busting myths around Google Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. and host this episode and learn about the fantastic capabilities of Cloud Spanner. Our guests give us a quick run-down of Spanner database...
Podcast episode
Spanner Myths Busted with Pritam Shah and Vaibhav Govil: This week, we’re busting myths around Google Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. and host this episode and learn about the fantastic capabilities of Cloud Spanner. Our guests give us a quick run-down of Spanner database...
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
Podcast episode
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
byData Engineering Podcast
0 ratings
0% found this document useful
278: Beliefs in the Firmware: In this week's episode, Steph and Chris discuss the popular testing themes and questions that emerged during the RSpec training course, reflecting on which testing "rules" still apply and when to break the rules. They also chat about the results of the 2020 State of JS survey and repurposing email validations to be helpful vs strict.
Podcast episode
278: Beliefs in the Firmware: In this week's episode, Steph and Chris discuss the popular testing themes and questions that emerged during the RSpec training course, reflecting on which testing "rules" still apply and when to break the rules. They also chat about the results of the 2020 State of JS survey and repurposing email validations to be helpful vs strict.
byThe Bike Shed
0 ratings
0% found this document useful
The Cloudcast #316 - Automating to Improve Cloud Spending: Brian talks with Jay Chapel (@parkmycloudjay, Co-Founder/CEO of ParkMyCloud) about the complexity of cloud pricing models, how IT and DevOps teams view spending differently, how to integrate monitoring and automation, and who is critical to making buyi...
Podcast episode
The Cloudcast #316 - Automating to Improve Cloud Spending: Brian talks with Jay Chapel (@parkmycloudjay, Co-Founder/CEO of ParkMyCloud) about the complexity of cloud pricing models, how IT and DevOps teams view spending differently, how to integrate monitoring and automation, and who is critical to making buyi...
byThe Cloudcast
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
Podcast episode
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable: Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.
Podcast episode
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable: Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.
byData Engineering Podcast
0 ratings
0% found this document useful
The Trick That Supercharged Superhuman's Growth: Rahul Vohra founder & CEO of Superhuman
Podcast episode
The Trick That Supercharged Superhuman's Growth: Rahul Vohra founder & CEO of Superhuman
byRocketship.fm
0 ratings
0% found this document useful
Kubernetes Cost Optimization
Podcast episode
Kubernetes Cost Optimization
byThe Cloudcast
0 ratings
0% found this document useful
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP: Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing objects from an open set of categories. One way to address this challenge is to leverage multi-modal models, such as CLIP, to provide image and text features in a...
Podcast episode
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP: Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing objects from an open set of categories. One way to address this challenge is to leverage multi-modal models, such as CLIP, to provide image and text features in a...
byPapers Read on AI
0 ratings
0% found this document useful

Skip carousel

Loop The Loop
Racecar Engineering
Article
Loop The Loop
Oct 1, 2021
5 min read
Clarisse 4.0
3D World
Article
Clarisse 4.0
Apr 17, 2019
PRICE Studio: $2,299 / Indie: $999 | DEVELOPER Isotropix | WEBSITE www.isotropix.com AUTHOR PROFILE Cirstyn Bech-Yagher Cirstyn has moved from Radeon’s ProRender to the RizomUV team, where she does product management as well as modelling, UV mapping
3 min read
End Of The Line!
Linux Format
Article
End Of The Line!
Nov 15, 2022
"October 2023 may seem like a long time away, but that’s when MySQL 5.7 will hit end-of-life (EOL) status. This normally means no more updates or security patches will be released. For companies running this database in their applications, it is time
1 min read
Code A Cataloguing Application In Python
Linux Format
Article
Code A Cataloguing Application In Python
Nov 15, 2022
Credit: www.djangoproject.com Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://github.com/mat
8 min read
Making BoP changes
Racecar Engineering
Article
Making BoP changes
Dec 31, 2020
12 min read
Create A RESTful Server In Go
Linux Format
Article
Create A RESTful Server In Go
Oct 19, 2021
8 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Grid Modeling Overview: Four Types of Models Guiding the Transition to Clean Electricity
Union of Concerned Scientists
Article
Grid Modeling Overview: Four Types of Models Guiding the Transition to Clean Electricity
Apr 25, 2022
6 min read
Clever CAD Coding For Clients And Cigars
Linux Format
Article
Clever CAD Coding For Clients And Cigars
Apr 2, 2024
Credit: http://openscad.org Tam Hanna’s minimal creative capability makes him ideally suited to teaching all kinds of workarounds for problems that require the use of creativity. Catch up by ordering back issues on page 58! The experiments performed
7 min read
Browser Benchmarks
Maximum PC
Article
Browser Benchmarks
May 26, 2020
1 min read
How To Use Mojolicious For Web Scraping
Linux Format
Article
How To Use Mojolicious For Web Scraping
Mar 8, 2022
Part One Don’t miss next issue! Subscribe on page 16 Mark Gardner is a software developer and blogger with over 25 years of IT experience. You can reach him at www.phoenixtrap.com and @markjgardner. The map function is designed to transform a list or
5 min read
How To Use Mojolicious For Web Scraping
Linux Format
Article
How To Use Mojolicious For Web Scraping
Mar 8, 2022
Part One Don’t miss next issue! Subscribe on page 16 Mark Gardner is a software developer and blogger with over 25 years of IT experience. You can reach him at www.phoenixtrap.com and @markjgardner. The map function is designed to transform a list or
5 min read
Browser Benchmarks
TechLife
Article
Browser Benchmarks
Aug 24, 2020
1 min read
Code An Admin Back-end In Django
Linux Format
Article
Code An Admin Back-end In Django
Dec 13, 2022
Credit: www.djangoproject.com OUR EXPERT Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://
6 min read
Browser Benchmarks
APC
Article
Browser Benchmarks
Nov 2, 2020
1 min read
Computer-aided Design
Linux Format
Article
Computer-aided Design
Jul 25, 2023
13 min read
Is Quantum Computing Ready For Prime Time?
APC
Article
Is Quantum Computing Ready For Prime Time?
Oct 9, 2023
4 min read
AI See You…
Linux Format
Article
AI See You…
Jun 27, 2023
5 min read
Benchmark your SSD
APC
Article
Benchmark your SSD
Nov 2, 2020
4 min read
Database Control With C++ Tools
Linux Format
Article
Database Control With C++ Tools
Dec 17, 2019
10 min read
Alternative Circuit Simulators
Linux Format
Article
Alternative Circuit Simulators
Jun 2, 2020
You may find the way gEDA works to be a little clumsy, especially if you find the command-line cumbersome. An alternative solution is Qucs-S. This project started as an independent project, but some designers then forked it to include Ngspice, XYCE a
1 min read
Occupational Therapy
Racecar Engineering
Article
Occupational Therapy
Jun 5, 2020
6 min read
Shocks To The System
Racecar Engineering
Article
Shocks To The System
Feb 7, 2020
3 min read
Create Smaller Sized Apps With React
Linux Format
Article
Create Smaller Sized Apps With React
Nov 19, 2019
You may not be surprised that some developers have criticised Electron (see tutorials LXF256), mostly regarding the memory usage of its final binaries. The initial binary is over 100MB, because a major chunk of code from Chrome is embedded. When you
6 min read
Windows 11 In The Cloud
Maximum PC
Article
Windows 11 In The Cloud
Jan 2, 2024
3 min read
RapidWeaver Classic
MacFormat
Article
RapidWeaver Classic
Aug 23, 2022
3 min read
Keeping Ahead Of The Power Curve
Autosport
Article
Keeping Ahead Of The Power Curve
Mar 18, 2021
Currently there is a big push towards electrification in motorsports, but what will the favoured fuels and propulsion technologies of the future be? E-fuels, hybrid, BEV (battery electric vehicles), fuel cells or hydrogen combustion? Personally, I th
2 min read
Discover Easy-to -build Desktop Apps
Linux Format
Article
Discover Easy-to -build Desktop Apps
Oct 22, 2019
Electron is actually a browser packaged with node.js and a few APIs. Because it’s built on top of the Chromium browser, you have everything available from there to add to your application. GitHub developed it as part of the Atom editor; it was open-s
7 min read
Networking
MacLife
Article
Networking
Mar 26, 2024
3 min read
Choose The Tech Kit And Services That Are Best For The Planet
PC Pro Magazine
Article
Choose The Tech Kit And Services That Are Best For The Planet
Apr 6, 2023
6 min read

Related categories

Skip carousel

Reviews for Logistic Regression Using SAS

Rating: 4 out of 5 stars

4/5

2 ratings0 reviews

Book preview

Logistic Regression Using SAS - Paul D. Allison

Preface

It's about time! The first edition of Logistic Regression Using SAS had become so outdated that I was embarrassed to see people still using it. In the 13 years since the initial publication, there have been an enormous number of changes and enhancements to the SAS procedures for doing logistic regression and related methods. In fact, I think that the rate of change has accelerated in recent years.

So, as of April 2012, this book is up to date with the latest syntax, features, and options in SAS 9.3. All the output displays use the HTML format, which is now the default. Perhaps the biggest change is that PROC LOGISTIC plays an even larger role than before. That's because it now has a CLASS statement, and it can now estimate the multinomial logit model. Here are some chapter-by-chapter details.

Chapter 2, Binary Logistic Regression with PROC LOGISTIC: Basics. In the first edition, PROC GENMOD got major attention in this chapter. Now it's barely mentioned. The reason for this change is that PROC LOGISTIC has a CLASS statement, so GENMOD isn't needed until Chapter 8. The CLASS statement in PROC LOGISTIC works a little differently than in other PROCs, so I spend some time explaining the differences. In addition, I now show how to get robust standard errors when estimating the linear probability model with PROC REG. And I demonstrate how to estimate marginal effects for the logistic model with PROC QLIM.

Chapter 3, Binary Logistic Regression: Details and Options. There are lots of changes and additions in this chapter. As in Chapter 2, GENMOD is out, LOGISTIC is in. For dealing with small samples and separation, I explain two major new features of LOGISTIC: exact logistic regression and penalized likelihood. I show how to use the new ODDSRATIO statement to get interpretable odds ratios when a model contains interactions. And I demonstrate the new EFFECTPLOT statement for getting useful graphs of predicted values, especially when there are non-linearities and interactions. Other additions include: graphs of odds ratios, ROC curves with tests for model differences, a new R-squared measure, new diagnostic graphs, and the use of PROC QLIM to model group comparisons with heteroscedastic errors.

Chapter 4, Logit Analysis of Contingency Tables. The major change in this chapter is that PROC LOGISTIC is now used to estimate all the contingency table models rather than PROC GENMOD.

Chapter 5, Multinomial Logit Analysis. The big change in this chapter is that PROC LOGISTIC is used to estimate the multinomial logit model rather than PROC CATMOD. I also show how to produce predicted values in a new format, and I demonstrate the use of the EFFECTPLOT statement to graph predicted values from the multinomial logit model.

Chapter 6, Logistic Regression for Ordered Categories. In this chapter, I show how to use PROC QLIM to model heteroscedasticity in the cumulative logit model.

Chapter 7, Discrete Choice Analysis. The first edition used PROC PHREG for all the models. Now PROC LOGISTIC (with the STRATA statement) is used for all conditional logit models except those for ranked data. I also provide an introduction to PROC MDC for more advanced choice models, including the heterogeneous extreme value model, the multinomial probit model, and the nested logit model.

Chapter 8, Logit Analysis of Longitudinal and Other Clustered Data. The major addition to this chapter is the use of PROC GLIMMIX to estimate mixed logistic models. Important features of this procedure include the use of Gaussian quadrature to get maximum likelihood estimates and the COVTEST statement, which produces correct p-values for testing hypotheses about variance parameters. For PROC GENMOD, I now discuss the use of alternating logistic regression to get an odds ratio parameterization of the correlation structure. PROC LOGISTIC is used instead of PROC PHREG for the fixed effects models. And, finally, I show how to get robust standard errors with PROC SURVEYLOGISTIC.

Chapter 9, Regression for Count Data. The negative binomial model is now a standard option in PROC GENMOD rather than a macro. I also discuss zero-inflated models at some length.

Chapter 10, Loglinear Analysis of Contingency Tables. Except for new versions of the SAS output, this chapter is virtually unchanged.

Once again, many thanks to my editor, George McDaniel, and to all the other folks at SAS who have been so helpful and patient in getting this new edition into print. In particular, the comments and suggestions of the reviewers—Bob Derr, Mike Patetta, David Schlotzhauer—were crucial in getting things right. Of course, I take full responsibility for any errors that remain.

Chapter 1

Introduction

1.1 What This Book Is About

1.2 What This Book Is Not About

1.3 What You Need to Know

1.4 Computing

1.5 References

1.1 What This Book Is About

When I began graduate study at the University of Wisconsin in 1970, categorical data analysis consisted of chi-square tests for cross-tabulated data, a technique introduced around 1900 by the great Karl Pearson. This methodology was viewed with scorn by most of my quantitatively oriented cohorts. It was the province of old fogies who hadn't bothered to learn about REGRESSION ANALYSIS, the new universal tool for social science data analysis. Little did we realize that another revolution was taking place under our noses. By the time I left Wisconsin in 1975, the insanely great new thing was LOGLINEAR ANALYSIS, which made it possible to analyze complicated contingency tables in ways that Karl Pearson never dreamed of. But loglinear analysis was a rather different animal from linear regression and I, for one, never felt entirely comfortable working in the loglinear mode.

In the years after I left Wisconsin, these dissimilar approaches to data analysis came together in the form of LOGIT ANALYSIS, also known as logistic regression analysis. The logit model is essentially a regression model that is tailored to fit a categorical dependent variable. In its most widely used form, the dependent variable is a simple dichotomy, and the independent variables can be either quantitative or categorical. As we shall see, the logit model can be generalized to dependent variables that have more than two categories, both ordered and unordered. Using the method of conditional logit analysis, it can also be extended to handle specialized kinds of data such as discrete-choice applications, matched-pair analysis, and longitudinal data. Logit models for longitudinal data can also be estimated with such methods as generalized estimating equations and logistic regression with random effects.

This book is an introduction to the logit model and its various extensions. Unlike most introductory texts, however, this one is heavily focused on the use of SAS to estimate logit and related models. In my judgment, you can't fully understand and appreciate a new statistical technique without carefully considering the practical details of estimation. To accomplish that, it's necessary to choose a particular software system to carry out the computations. Although there are many good statistical packages for doing logistic regression, SAS is certainly among the best in terms of the range of estimation methods, available features and options, efficiency and stability of the algorithms, and quality of the documentation. I find I can do almost anything I want to do in SAS and, in the process, I encounter few of the annoying software problems that seem to crop up frequently in other packages.

In addition to the logit model, I also write briefly about two alternatives for binary data, the probit model and the complementary log-log model. The last chapter is about loglinear analysis, which is a close cousin to logit analysis. Because some kinds of contingency table analysis are awkward to handle with the logit model, the loglinear model can be a useful alternative. I don't pretend to give a comprehensive treatment of loglinear analysis, however. The emphasis is on how to do it with the GENMOD procedure, and to show examples of the types of applications where a loglinear analysis might be particularly useful. The penultimate chapter is about Poisson regression models for count data. I've included this topic partly for its intrinsic interest, but also because it's a useful preparation for the loglinear chapter. In PROC GENMOD, loglinear analysis is accomplished by way of Poisson regression.

Besides GENMOD, this book includes discussions of the following SAS procedures: LOGISTIC, SURVEYLOGISTIC, GLIMMIX, NLMIXED, QLIM, and MDC. However, this book is not intended to be a comprehensive guide to these SAS procedures. I discuss only those features that are most widely used, most potentially useful, and most likely to cause problems or confusion. You should always consult the official documentation in the SAS/STAT User's Guide.

1.2 What This Book Is Not About

This book does not cover a variety of categorical data analysis known as Cochran-Mantel-Haenszel (CMH) statistics, for two reasons. First, I have little expertise on the subject and it would be presumptuous for me to try to teach others. Second, while CMH is widely used in the biomedical sciences, there have been few applications in the social sciences. This disuse is not necessarily a good thing. CMH is a flexible and well-founded approach to categorical data analysis—one that I believe has many potential applications. While it accomplishes many of the same objectives as logit analysis, it is best suited to those situations where the focus is on the relationship between one independent variable and one dependent variable, controlling for a limited set of additional variables. Statistical control is accomplished in a nonparametric fashion by stratification on all possible combinations of the control variables. Consequently, CMH is less vulnerable than logistic regression to certain kinds of specification error but at the expense of reduced statistical power. Stokes et al. (2000) give an excellent and thorough introduction to CMH methods as implemented with the FREQ procedure in SAS.

1.3 What You Need to Know

To understand this book, you need to be familiar with multiple linear regression. That means that you should know something about the assumptions of the linear regression model and about estimation of the model via ordinary least squares. Ideally, you should have a substantial amount of practical experience using multiple regression on real data and should feel comfortable interpreting the output from a regression analysis. You should be acquainted with such topics as multicollinearity, residual analysis, variable selection, nonlinearity, interactions, and dummy (indicator) variables. As part of this knowledge, you must certainly know the basic principles of statistical inference: standard errors, confidence intervals, hypothesis tests, p-values, bias, efficiency, and so on. In short, a two-semester sequence in statistics ought to provide the necessary statistical foundation for most people.

I have tried to keep the mathematics at a minimal level throughout the book. Except for one section on maximum likelihood estimation (which can be skipped without loss of continuity), there is no calculus and little use of matrix notation. Nevertheless, to simplify the presentation of regression models, I occasionally use the vector notation βx = β0 + β1x1 +…+ βkxk . While it would be helpful to have some knowledge of maximum likelihood estimation, it's hardly essential. However, you should know the basic properties of logarithms and exponential functions.

With regard to SAS, the more experience you have with SAS/STAT and the SAS DATA step, the easier it will be to follow the presentation of SAS programs. On the other hand, the programs presented in this book are fairly simple and short, so don't be intimidated if you're just beginning to learn SAS.

1.4 Computing

All the computer input and output displayed in this book was produced by and for SAS 9.3. Occasionally, I point out differences between the syntax of SAS 9.3 and earlier releases. I use the following convention for presenting SAS programs: all SAS keywords are in uppercase. All user-specified variable names and data set names are in lowercase. In the main text, both SAS keywords and user-specified variables are in uppercase. In the output displays, nonessential output lines are often edited out to conserve space. If you would like to get a copy of the SAS code and the data sets used in this book, you can download them from my SAS Press author page at http://support.sas.com/publishing/authors/allison.html.

1.5 References

Most of the topics in this book can be found in any one of several textbooks on categorical data analysis. In preparing this book, I have particularly benefited from consulting Agresti (2002), Hosmer and Lemeshow (2000), Long (1997), Fienberg (2007), and Everitt (1992). I have generally refrained from giving references for material that is well established in the textbook literature, but I do provide references for any unusual, nonstandard, or controversial claims. I also give references whenever I think the reader might want to pursue additional information or discussion. This book should not be regarded as a substitute for official SAS documentation. The most complete and up-to-date documentation for the procedures discussed in this book can be found online at http://support.sas.com/documentation.

Chapter 2

Binary Logistic Regression with PROC LOGISTIC: Basics

2.1 Introduction

2.2 Dichotomous Dependent Variables: Example

2.3 Problems with Ordinary Linear Regression

2.4 Odds and Odds Ratios

2.5 The Logistic Regression Model

2.6 Estimation of the Logistic Model: General Principles

2.7 Maximum Likelihood Estimation with PROC LOGISTIC

2.8 Interpreting Coefficients

2.9 CLASS Variables

2.10 Multiplicative Terms in the MODEL Statement

2.1 Introduction

A great many variables in the social sciences are dichotomous—employed vs. unemployed, married vs. unmarried, guilty vs. not guilty, voted vs. didn't vote. It's hardly surprising, then, that social scientists frequently want to estimate regression models in which the dependent variable is a dichotomy. Nowadays, most researchers are aware that there's something wrong with using ordinary linear regression for a dichotomous dependent variable, and that it's better to use logistic or probit regression. But many of them don't know what it is about linear regression that makes dichotomous variables problematic, and they may have only a vague notion of why other methods are superior.

In this chapter, we focus on logistic regression (a.k.a. logit analysis) as an optimal method for the regression analysis of dichotomous (binary) dependent variables. Along the way, we'll see that logistic regression has many similarities to ordinary linear regression analysis. To understand and appreciate the logistic model, we first need to see why ordinary linear regression runs into problems when the dependent variable is dichotomous.

2.2 Dichotomous Dependent Variables: Example

To make things tangible, let's start with an example. Throughout this chapter, we'll be examining a data set consisting of 147 death penalty cases in the state of New Jersey. In all of these cases, the defendant was convicted of first-degree murder with a recommendation by the prosecutor that a death sentence be imposed. Then a penalty trial was conducted to determine whether the defendant would receive a sentence of death or life imprisonment. Our dependent variable DEATH is coded 1 for a death sentence, and 0 for a life sentence. The aim is to determine how this outcome was influenced by various characteristics of the defendant and the crime.

Many potential independent variables are available in the data set, but let's consider three of special interest:

The variable SERIOUS was developed in an auxiliary study in which panels of trial judges were given written descriptions of each of the crimes in the original data set. These descriptions did not mention the race of the defendant or the victim. Each judge evaluated 14 or 15 cases and ranked them from least serious to most serious. Each case was ranked by four to six judges. As used in this chapter, the SERIOUS score is the average ranking given to each case, ranging from 1 (least serious) to 15 (most serious).

Using the REG procedure in SAS, I estimated a linear regression model that uses DEATH as the dependent variable and the other three as independent variables. The SAS code is

PROC REG DATA=penalty;

MODEL death=blackd whitvic serious;

RUN;

Results are shown in Output 2.1. Neither of the two racial variables have coefficients that are significantly different from 0. Not surprisingly, the coefficient for SERIOUS is highly significant—more serious crimes are more likely to get the death penalty.

Should we trust these results, or should we ignore them because the statistical technique is incorrect? To answer that question we need to see why linear regression is regarded as inappropriate when the dependent variable is a dichotomy. That's the task of the next section.

Output 2.1 Linear Regression of Death Penalty on Selected Independent Variables

2.3 Problems with Ordinary Linear Regression

Not so long ago, it was common to see published research that used ordinary least squares (OLS) linear regression to analyze dichotomous dependent variables. Some people didn't know any better. Others knew better, but didn't have access to good software for alternative methods. Now, every major statistical package includes a procedure for logistic regression, so there's no excuse for applying inferior methods. No reputable social science journal would publish an article that used OLS regression with a dichotomous dependent variable.

Should all the earlier literature that violated this prohibition be dismissed? Actually, most applications of OLS regression to dichotomous variables give results that are qualitatively quite similar to results obtained using logistic regression. There are exceptions, of course, so I certainly wouldn't claim that there's no need for logistic regression. But as an approximate method, OLS linear regression does a surprisingly good job with dichotomous variables, despite clear-cut violations of assumptions.

What are the assumptions that underlie OLS regression? While there's no single set of assumptions that justifies linear regression, the list in the box below is fairly standard. To keep things simple, I've included only a single independent variable x, and I've presumed that x is fixed across repeated samples (which means that every sample has the same set of x values). The i subscript distinguishes different members of the sample.

Assumptions of the Linear Regression Model

1. yi = α + βxi + εi

2. E(εi) = 0

3. var(εi) = σ²

4. cov(εi, εj) = 0

5. εi ~ Normal

Assumption 1 says that y is a linear function of x plus a random disturbance term ε, for all members of the sample. The remaining assumptions all say something about the distribution of ε. What's important about assumption 2 is that E(ε) (the expected value of ε does not vary with x, implying that x and ε are uncorrelated. Assumption 3, often called the homoscedasticity assumption, says that the variance of ε is the same for all observations. Assumption 4 says that the random disturbance for one observation is uncorrelated with the random disturbance for any other observation. Finally, assumption 5 says that the random disturbance is normally distributed. If all five assumptions are satisfied, ordinary least squares estimates of α and β are unbiased and have minimum sampling variance (minimum variability across repeated samples).

Now suppose that y is a dichotomy with possible values of 1 or 0. It's still reasonable to claim that assumptions 1, 2, and 4 are true. But if 1 and 2 are true for a dichotomy, then 3 and 5 are necessarily false. First, let's consider assumption 5. Suppose that yi = 1. Then assumption 1 implies that εi = 1−α−βxi. On the other hand, if yi = 0, we have εi=−α−βxi. Because εi can only take on two values, it's impossible for it to have a normal distribution (which has a continuum of values and no upper or lower bound). So assumption 5 must be rejected.

To evaluate assumption 3, it's helpful to do a little preliminary algebra. The expected value of yi is, by definition,

E (yi) = 1 × Pr(yi = 1) + 0 × Pr(yi = 0).

If we define pi = Pr(yi = 1), this reduces to

E ( yi) = pi

In general, for any dummy variable, its expected value is just the probability that it is equal to 1. But assumptions 1 and 2 also imply another expression for this expectation. Taking the expected values of both sides of the equation in assumption 1, we get

Putting these two results together, we get

which is sometimes called the linear probability model. As the name suggests, this model says that the probability that y = 1 is a linear function of x. Regression coefficients have a straightforward interpretation under this model. A 1-unit change in x produces a change of β in the probability that y = 1. In Output 2.1, the coefficient for SERIOUS was .038. So we can say that each 1-point increase in the SERIOUS scale (which ranges from 1 to 15) is associated with an increase of .038 in the probability of a death sentence, controlling for the other variables in the model. The BLACKD coefficient of .12 tells us that the estimated probability of a death sentence for black defendants is .12 higher than for nonblack defendants, controlling for other variables.

Now let's consider the variance of εi Because x is treated as fixed, the variance of εi is the same as the variance of yi. In general, the variance of a dummy variable is pi(1-pi). Therefore, we have

We see, then, that the variance of εi must be different for different observations and, in particular, it varies as a function of x. The disturbance variance is at a maximum when pi =.5 and gets small when pi is near 1 or 0.

We've just shown that a dichotomous dependent variable in a linear regression model necessarily violates assumptions of homoscedasticity (assumption 3) and normality (assumption 5) of the error term. What are the consequences? Not as serious as you might think. First of all, we don't need these assumptions to get unbiased estimates. If just assumptions 1 and 2 hold, ordinary least squares will produce unbiased estimates of α and β. Second, the normality assumption is not needed if the sample is reasonably large. The central limit theorem assures us that coefficient estimates will have a distribution that is approximately normal even when ε is not normally distributed. That means that we can still use a normal table to calculate p-values and confidence intervals. If the sample is small, however, these approximations could be poor.

Violation of the homoscedasticity assumption has two undesirable consequences. First, the coefficient estimates are no longer efficient. In statistical terminology, this means that there are alternative methods of estimation with smaller standard errors. Second, and more serious, the standard error estimates are no longer consistent estimates of the true standard errors. That means that the estimated standard errors could be biased (either upward or downward) to unknown degrees. And because the standard errors are used in calculating test statistics, the test statistics could also be problematic.

Fortunately, the potential problems with standard errors and test statistics are easily fixed. Beginning with SAS 9.2, PROC REG offers a heteroscedasticity consistent covariance estimator that uses the method of Huber (1967) and White (1980), sometimes known as the sandwich method because of the structure of the matrix formula. This method produces consistent estimates of the standard errors even when the homoscedasticity assumption is violated. To implement the method in PROC REG, simply put the option HCC on the MODEL statement:

PROC REG DATA=penalty;

MODEL death=blackd whitvic serious / HCC;

RUN;

Now, in addition to the output in Output 2.1, we get the corrected standard errors, t-statistics, and p-values shown in Output 2.2. In this case, the correction for heteroscedasticity makes almost no difference in the results.

Output 2.2 Linear Regression of Death Penalty with Correction for Heteroscedasticity

Although the HCC standard errors are an easy fix, be aware that they have inherently more sampling variability than conventional standard errors (Kauermann and Carroll 2001), and may be especially unreliable in small samples. For large samples, however, they should be quite satisfactory.

In addition to these technical difficulties, there is a more fundamental problem with the assumptions of the linear model. We've seen that when the dependent variable is a dichotomy, assumptions 1 and 2 imply the linear probability model

pi = α + βxi

While there's nothing intrinsically wrong with this model, it's a bit implausible, especially if x is measured on a continuum. If x has no upper or lower bound, then for any value of β there are values of x for which pi is either greater than 1 or less than 0. In fact, when estimating a linear probability model by OLS, it's quite common for predicted values generated by the model to be outside the (0, 1) interval. (That wasn't a problem with the regression in Output 2.1, which implied predicted probabilities ranging from .03 to .65.) Of course, it's impossible for the true values (which are probabilities) to be greater than 1 or less than 0. So the only way the model could be true is if a ceiling and floor are somehow imposed on pi, leading to considerable

Enjoying the preview?

Page 1 of 1

Logistic Regression Using SAS: Theory and Application, Second Edition

About this ebook

Paul D. Allison

Related authors

Related to Logistic Regression Using SAS

Related ebooks

Applications & Software For You

Related podcast episodes

Related articles

Related categories

Reviews for Logistic Regression Using SAS

What did you think?

Book preview

Logistic Regression Using SAS - Paul D. Allison

Preface

Chapter 1

Introduction

1.1 What This Book Is About

1.2 What This Book Is Not About

1.3 What You Need to Know

1.4 Computing

1.5 References

Chapter 2

2.1 Introduction

2.2 Dichotomous Dependent Variables: Example

Output 2.1 Linear Regression of Death Penalty on Selected Independent Variables

2.3 Problems with Ordinary Linear Regression

Assumptions of the Linear Regression Model

Output 2.2 Linear Regression of Death Penalty with Correction for Heteroscedasticity