Ebook591 pages5 hours

Regression Analysis with Python

Name: Regression Analysis with Python
Author: Boschetti Alberto
ISBN: 9781783980741

By Boschetti Alberto and Luca Massaron

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Learn the art of regression analysis with Python

About This Book

- Become competent at implementing regression analysis in Python
- Solve some of the complex data science problems related to predicting outcomes
- Get to grips with various types of regression for effective data analysis

Who This Book Is For

The book targets Python developers, with a basic understanding of data science, statistics, and math, who want to learn how to do regression analysis on a dataset. It is beneficial if you have some knowledge of statistics and data science.

What You Will Learn

- Format a dataset for regression and evaluate its performance
- Apply multiple linear regression to real-world problems
- Learn to classify training points
- Create an observation matrix, using different techniques of data analysis and cleaning
- Apply several techniques to decrease (and eventually fix) any overfitting problem
- Learn to scale linear models to a big dataset and deal with incremental data

In Detail

Regression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. There are many kinds of regression algorithms, and the aim of this book is to explain which is the right one to use for each set of problems and how to prepare real-world data for it. With this book you will learn to define a simple regression problem and evaluate its performance. The book will help you understand how to properly parse a dataset, clean it, and create an output matrix optimally built for regression. You will begin with a simple regression algorithm to solve some data science problems and then progress to more complex algorithms. The book will enable you to use regression models to predict outcomes and take critical business decisions. Through the book, you will gain knowledge to use Python for building fast better linear models and to apply the results in Python or in any computer language you prefer.

Style and approach

This is a practical tutorial-based book. You will be given an example problem and then supplied with the relevant code and how to walk through it. The details are provided in a step by step manner, followed by a thorough explanation of the math underlying the solution. This approach will help you leverage your own data using the same techniques.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateFeb 29, 2016

ISBN9781783980741

Author

Boschetti Alberto

Related to Regression Analysis with Python

Related ebooks

Skip carousel

Bayesian Analysis with Python
Ebook
Bayesian Analysis with Python
byOsvaldo Martin
Rating: 5 out of 5 stars
5/5
Learning Predictive Analytics with Python
Ebook
Learning Predictive Analytics with Python
byKumar Ashish
Rating: 0 out of 5 stars
0 ratings
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
Ebook
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
byStefanie Molin
Rating: 0 out of 5 stars
0 ratings
Mastering Python for Data Science
Ebook
Mastering Python for Data Science
bySamir Madhavan
Rating: 3 out of 5 stars
3/5
Learning pandas - Second Edition
Ebook
Learning pandas - Second Edition
byHeydt Michael
Rating: 4 out of 5 stars
4/5
Building Machine Learning Systems with Python
Ebook
Building Machine Learning Systems with Python
byWilli Richert
Rating: 4 out of 5 stars
4/5
Python Data Analysis
Ebook
Python Data Analysis
byIvan Idris
Rating: 4 out of 5 stars
4/5
Mastering Python Data Analysis
Ebook
Mastering Python Data Analysis
byMagnus Vilhelm Persson
Rating: 0 out of 5 stars
0 ratings
Python Data Analysis - Second Edition
Ebook
Python Data Analysis - Second Edition
byArmando Fandango
Rating: 0 out of 5 stars
0 ratings
Learning Data Mining with Python
Ebook
Learning Data Mining with Python
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Practical Data Science Cookbook - Second Edition
Ebook
Practical Data Science Cookbook - Second Edition
byTony Ojeda
Rating: 0 out of 5 stars
0 ratings
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
Ebook
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
byMatt Harrison
Rating: 5 out of 5 stars
5/5
R Machine Learning By Example
Ebook
R Machine Learning By Example
byDipanjan Sarkar
Rating: 0 out of 5 stars
0 ratings
R for Data Science
Ebook
R for Data Science
byDan Toomey
Rating: 5 out of 5 stars
5/5
R Data Science Essentials
Ebook
R Data Science Essentials
byKoushik Raja B.
Rating: 2 out of 5 stars
2/5
Building a Recommendation System with R
Ebook
Building a Recommendation System with R
byGorakala Suresh K.
Rating: 0 out of 5 stars
0 ratings
Learning Bayesian Models with R
Ebook
Learning Bayesian Models with R
byM.Koduvely Dr. Hari
Rating: 5 out of 5 stars
5/5
Mastering Predictive Analytics with R
Ebook
Mastering Predictive Analytics with R
byRui Miguel Forte
Rating: 4 out of 5 stars
4/5
Learning Data Mining with Python - Second Edition
Ebook
Learning Data Mining with Python - Second Edition
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
R High Performance Programming
Ebook
R High Performance Programming
byAloysius Lim
Rating: 4 out of 5 stars
4/5
Mastering Social Media Mining with Python
Ebook
Mastering Social Media Mining with Python
byMarco Bonzanini
Rating: 5 out of 5 stars
5/5
Designing Machine Learning Systems with Python
Ebook
Designing Machine Learning Systems with Python
byDavid Julian
Rating: 0 out of 5 stars
0 ratings
Getting Started with Python Data Analysis
Ebook
Getting Started with Python Data Analysis
byVo.T.H Phuong
Rating: 0 out of 5 stars
0 ratings
Principles of Data Science
Ebook
Principles of Data Science
bySinan Ozdemir
Rating: 4 out of 5 stars
4/5
Python for Finance Cookbook: Over 50 recipes for applying modern Python libraries to financial data analysis
Ebook
Python for Finance Cookbook: Over 50 recipes for applying modern Python libraries to financial data analysis
byEryk Lewinson
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
NumPy Essentials
Ebook
NumPy Essentials
byLeo (Liang-Huan) Chin
Rating: 0 out of 5 stars
0 ratings
Mastering Data Mining with Python – Find patterns hidden in your data
Ebook
Mastering Data Mining with Python – Find patterns hidden in your data
byMegan Squire
Rating: 0 out of 5 stars
0 ratings
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
Hands-On Genetic Algorithms with Python: Applying genetic algorithms to solve real-world deep learning and artificial intelligence problems
Ebook
Hands-On Genetic Algorithms with Python: Applying genetic algorithms to solve real-world deep learning and artificial intelligence problems
byEyal Wirsansky
Rating: 0 out of 5 stars
0 ratings

Data Visualization For You

Skip carousel

Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
How to Lie with Maps
Ebook
How to Lie with Maps
byMark Monmonier
Rating: 4 out of 5 stars
4/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Data Science: What the Best Data Scientists Know About Data Analytics, Data Mining, Statistics, Machine Learning, and Big Data – That You Don't
Ebook
Data Science: What the Best Data Scientists Know About Data Analytics, Data Mining, Statistics, Machine Learning, and Big Data – That You Don't
byHerbert Jones
Rating: 5 out of 5 stars
5/5
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
Ebook
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
byMatt Goldwasser
Rating: 0 out of 5 stars
0 ratings
Python For Beginners.Learn Data Science in 5 Days the Smart Way and Remember it Longer. With Easy Step by Step Guidance & Hands on Examples. (Python Crash Course-Programming for Beginners): Python for Beginners
Ebook
Python For Beginners.Learn Data Science in 5 Days the Smart Way and Remember it Longer. With Easy Step by Step Guidance & Hands on Examples. (Python Crash Course-Programming for Beginners): Python for Beginners
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals
Ebook
Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals
byBrent Dykes
Rating: 4 out of 5 stars
4/5
Data Visualization: A Practical Introduction
Ebook
Data Visualization: A Practical Introduction
byKieran Healy
Rating: 5 out of 5 stars
5/5
How to Become a Data Analyst: My Low-Cost, No Code Roadmap for Breaking into Tech
Ebook
How to Become a Data Analyst: My Low-Cost, No Code Roadmap for Breaking into Tech
byAnnie Nelson
Rating: 0 out of 5 stars
0 ratings
Visual Analytics with Tableau
Ebook
Visual Analytics with Tableau
byAlexander Loth
Rating: 0 out of 5 stars
0 ratings
DAX Patterns: Second Edition
Ebook
DAX Patterns: Second Edition
byMarco Russo
Rating: 5 out of 5 stars
5/5
Teach Yourself VISUALLY Power BI
Ebook
Teach Yourself VISUALLY Power BI
byAlexander Loth
Rating: 0 out of 5 stars
0 ratings
R for Data Science
Ebook
R for Data Science
byDan Toomey
Rating: 5 out of 5 stars
5/5
The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios
Ebook
The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios
bySteve Wexler
Rating: 4 out of 5 stars
4/5
How to be Clear and Compelling with Data: Principles, Practice and Getting Beyond the Basics
Ebook
How to be Clear and Compelling with Data: Principles, Practice and Getting Beyond the Basics
byJohn J Burrett
Rating: 0 out of 5 stars
0 ratings
Learn D3.js: Create interactive data-driven visualizations for the web with the D3.js library
Ebook
Learn D3.js: Create interactive data-driven visualizations for the web with the D3.js library
byHelder da Rocha
Rating: 0 out of 5 stars
0 ratings
D3.js in Action: Data visualization with JavaScript
Ebook
D3.js in Action: Data visualization with JavaScript
byElijah Meeks
Rating: 0 out of 5 stars
0 ratings
Data Pipelines with Apache Airflow
Ebook
Data Pipelines with Apache Airflow
byJulian de Ruiter
Rating: 0 out of 5 stars
0 ratings
Learning Tableau 10 - Second Edition
Ebook
Learning Tableau 10 - Second Edition
byJoshua N. Milligan
Rating: 4 out of 5 stars
4/5
Excel for Beginners 2023: A Step-by-Step and Comprehensive Guide to Master the Basics of Excel, with Formulas, Functions, & Charts
Ebook
Excel for Beginners 2023: A Step-by-Step and Comprehensive Guide to Master the Basics of Excel, with Formulas, Functions, & Charts
byGerald Stroud
Rating: 0 out of 5 stars
0 ratings
Data Analysis with Stata
Ebook
Data Analysis with Stata
byKothari Prasad
Rating: 5 out of 5 stars
5/5
Mastering Excel: Excel Apps
Ebook
Mastering Excel: Excel Apps
byMark Moore
Rating: 3 out of 5 stars
3/5
Mastering Text Mining with R
Ebook
Mastering Text Mining with R
byAvinash Paul
Rating: 0 out of 5 stars
0 ratings
SPSS Statistics for Data Analysis and Visualization
Ebook
SPSS Statistics for Data Analysis and Visualization
byKeith McCormick
Rating: 0 out of 5 stars
0 ratings
Mastering Data Analysis with Python: A Comprehensive Guide to NumPy, Pandas, and Matplotlib
Ebook
Mastering Data Analysis with Python: A Comprehensive Guide to NumPy, Pandas, and Matplotlib
byRajender Kumar
Rating: 0 out of 5 stars
0 ratings
#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time
Ebook
#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time
byEva Murray
Rating: 0 out of 5 stars
0 ratings
Clojure Data Analysis Cookbook - Second Edition
Ebook
Clojure Data Analysis Cookbook - Second Edition
byEric Rochester
Rating: 0 out of 5 stars
0 ratings
Visualizing Graph Data
Ebook
Visualizing Graph Data
byCorey Lanum
Rating: 0 out of 5 stars
0 ratings
Fieldwork Handbook: A Practical Guide on the Go
Ebook
Fieldwork Handbook: A Practical Guide on the Go
byMarika Vertzonis
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

#77 Acing the Data Science Interview
Podcast episode
#77 Acing the Data Science Interview
byDataFramed
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
Podcast episode
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
byDataFramed
100%
100% found this document useful
#10 Data Science, the Environment and MOOCs: Air pollution, the environment and data science: where do these intersect? Find out in this episode of DataFramed, in which Hugo speaks with Roger Peng, Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health...
Podcast episode
#10 Data Science, the Environment and MOOCs: Air pollution, the environment and data science: where do these intersect? Find out in this episode of DataFramed, in which Hugo speaks with Roger Peng, Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health...
byDataFramed
0 ratings
0% found this document useful
Advantages of Completing Small Python Projects
Podcast episode
Advantages of Completing Small Python Projects
byThe Real Python Podcast
0 ratings
0% found this document useful
Learning Python Through Errors
Podcast episode
Learning Python Through Errors
byThe Real Python Podcast
0 ratings
0% found this document useful
Going Beyond the Basic Stuff With Python and Al Sweigart
Podcast episode
Going Beyond the Basic Stuff With Python and Al Sweigart
byThe Real Python Podcast
0 ratings
0% found this document useful
Amplifying Agility with AI and Insightful Business Analysis
Podcast episode
Amplifying Agility with AI and Insightful Business Analysis
byBusiness Analysis Live!
0 ratings
0% found this document useful
Four Most Commonly Asked Questions About AI with Dr. Jerry Smith: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is talking today about questions and...
Podcast episode
Four Most Commonly Asked Questions About AI with Dr. Jerry Smith: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is talking today about questions and...
byAI Live & Unbiased
0 ratings
0% found this document useful
How ChatGPT Can Supercharge Your L&D With Ross Stevenson
Podcast episode
How ChatGPT Can Supercharge Your L&D With Ross Stevenson
byThe Learning & Development Podcast
0 ratings
0% found this document useful
Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI: Making effective use of data requires proper context around the information that is being used. As the size and complexity of your organization increases the difficulty of ensuring that everyone has the necessary knowledge about how to get their work done scales exponentially. Wikis and intranets are a common way to attempt to solve this problem, but they are frequently ineffective. Rehgan Avon co-founded AlignAI to help address this challenge through a more purposeful platform designed to collect and distribute the knowledge of how and why data is used in a business. In this episode she shares the strategic and tactical elements of how to make more effective use of the technical and organizational resources that are available to you for getting work done with data.
Podcast episode
Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI: Making effective use of data requires proper context around the information that is being used. As the size and complexity of your organization increases the difficulty of ensuring that everyone has the necessary knowledge about how to get their work done scales exponentially. Wikis and intranets are a common way to attempt to solve this problem, but they are frequently ineffective. Rehgan Avon co-founded AlignAI to help address this challenge through a more purposeful platform designed to collect and distribute the knowledge of how and why data is used in a business. In this episode she shares the strategic and tactical elements of how to make more effective use of the technical and organizational resources that are available to you for getting work done with data.
byData Engineering Podcast
0 ratings
0% found this document useful
Quantifying The Return On Investment For Your Data Team: As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.
Podcast episode
Quantifying The Return On Investment For Your Data Team: As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.
byData Engineering Podcast
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Episode 170: The Power of Workshops with Jonathan Courtney: How can you use workshops to level up your design practice? Why are they exponentially better than regular meetings? Our guest today is Jonathan Courtney, CEO of AJ&Smart and the author of The Workshopper Playbook. You’ll learn how they transformed their business using workshops, why this collaboration method is so powerful, and how to get started with your first workshop.
Podcast episode
Episode 170: The Power of Workshops with Jonathan Courtney: How can you use workshops to level up your design practice? Why are they exponentially better than regular meetings? Our guest today is Jonathan Courtney, CEO of AJ&Smart and the author of The Workshopper Playbook. You’ll learn how they transformed their business using workshops, why this collaboration method is so powerful, and how to get started with your first workshop.
byUI Breakfast: UI/UX Design and Product Strategy
0 ratings
0% found this document useful
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
Podcast episode
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
byData Engineering Podcast
0 ratings
0% found this document useful
Data jobs: Interview with data & machine learning expert Catherine Lopes PhD (Ep 42): Who would have thought that 2020 would be the year of data charts? That we would be glued to the daily news like never before, anxiously waiting to see more and more charts, expecting data analysts to tell us which way curves, bars, and pie charts ar...
Podcast episode
Data jobs: Interview with data & machine learning expert Catherine Lopes PhD (Ep 42): Who would have thought that 2020 would be the year of data charts? That we would be glued to the daily news like never before, anxiously waiting to see more and more charts, expecting data analysts to tell us which way curves, bars, and pie charts ar...
byThe Job Hunting Podcast
0 ratings
0% found this document useful
EP 121: Faster and More Accurate Results From ChatGPT with ScholarAI
Podcast episode
EP 121: Faster and More Accurate Results From ChatGPT with ScholarAI
byEveryday AI Podcast – An AI and ChatGPT Podcast
0 ratings
0% found this document useful
#140 Isabelle Guyon: The Future of AI and Support Vector Machines: This episode is sponsored by MindStudio by YouAi. MindStudio is the best way to build an AI business. Start driving some serious revenue before everyone else. Mind Studio allows you to use conversational language to program incredibly powerful AI...
Podcast episode
#140 Isabelle Guyon: The Future of AI and Support Vector Machines: This episode is sponsored by MindStudio by YouAi. MindStudio is the best way to build an AI business. Start driving some serious revenue before everyone else. Mind Studio allows you to use conversational language to program incredibly powerful AI...
byEye On A.I.
0 ratings
0% found this document useful
Data Journeys with Bruno Aziza: On the show this week, and share two popular episodes of ’s YouTube series Data Journeys. First up, Bruno talks with Aaron Biller of Postmates about their triangle of complex data that includes customer, courier, and merchant. He details their...
Podcast episode
Data Journeys with Bruno Aziza: On the show this week, and share two popular episodes of ’s YouTube series Data Journeys. First up, Bruno talks with Aaron Biller of Postmates about their triangle of complex data that includes customer, courier, and merchant. He details their...
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Synthetic data and the next generation of AI creativity
Podcast episode
Synthetic data and the next generation of AI creativity
byTechnology Now
0 ratings
0% found this document useful
#71 Scaling Machine Learning Adoption: A Pragmatic Approach
Podcast episode
#71 Scaling Machine Learning Adoption: A Pragmatic Approach
byDataFramed
0 ratings
0% found this document useful
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
Podcast episode
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
byData Engineering Podcast
0 ratings
0% found this document useful
2719: Redefining AI's Learning Curve: The Art of Unlearning: Today, I welcome Zuzanna Stamirowska, CEO and co-founder of Pathway, a pioneering French tech startup. Zuzanna, celebrated for her achievements including the Female Founders Challenge award at VivaTech, joins us to explore the innovative concept of...
Podcast episode
2719: Redefining AI's Learning Curve: The Art of Unlearning: Today, I welcome Zuzanna Stamirowska, CEO and co-founder of Pathway, a pioneering French tech startup. Zuzanna, celebrated for her achievements including the Female Founders Challenge award at VivaTech, joins us to explore the innovative concept of...
byThe Tech Talks Daily Podcast
0 ratings
0% found this document useful
Looking Back at AI in 2021 with Jeremie from Towards Data Science: For our first episode in 2022, we are joined with our friends from the Towards Data Science podcast to discuss our thoughts about the AI-related trends and events that happened in 2021. Some things we discuss are: Foundation models continue to grow, ...
Podcast episode
Looking Back at AI in 2021 with Jeremie from Towards Data Science: For our first episode in 2022, we are joined with our friends from the Towards Data Science podcast to discuss our thoughts about the AI-related trends and events that happened in 2021. Some things we discuss are: Foundation models continue to grow, ...
byLast Week in AI
0 ratings
0% found this document useful
Is data science something for you?: Interview with Cytel statisticians Yannis Jemiai and Rajat Mukherjee
Podcast episode
Is data science something for you?: Interview with Cytel statisticians Yannis Jemiai and Rajat Mukherjee
byThe Effective Statistician - in association with PSI
0 ratings
0% found this document useful
BDTP. Data Activation for SaaS with David Sepulveda: What processes are involved in activating your data? In this episode, we talk to David Sepulveda, Head of Data at Kumospace. You'll learn why data analysts need to work closely with research and product teams, how to distinguish correlation from causation, how much data you need to track, and more.
Podcast episode
BDTP. Data Activation for SaaS with David Sepulveda: What processes are involved in activating your data? In this episode, we talk to David Sepulveda, Head of Data at Kumospace. You'll learn why data analysts need to work closely with research and product teams, how to distinguish correlation from causation, how much data you need to track, and more.
byUI Breakfast: UI/UX Design and Product Strategy
0 ratings
0% found this document useful
Explainable/Interpretable AI: In this episode, we will continue with part two of a four-part series looking at Responsible AI (Listen to part one: ). One of the major challenges with effectively developing, deploying, and managing AI systems are often related to the “black...
Podcast episode
Explainable/Interpretable AI: In this episode, we will continue with part two of a four-part series looking at Responsible AI (Listen to part one: ). One of the major challenges with effectively developing, deploying, and managing AI systems are often related to the “black...
byGARP Risk Podcast
0 ratings
0% found this document useful
Big Data, Data Lakes, and Blockchain with Rahul Pathak, Executive at Amazon Web Services: Everyone knows that data is exploding. What most people don’t realize is the pace and ways in which data is changing our everyday lives. According to , we’re seeing a “roughly 10x increase in data every 5 years, and the types of data that’s...
Podcast episode
Big Data, Data Lakes, and Blockchain with Rahul Pathak, Executive at Amazon Web Services: Everyone knows that data is exploding. What most people don’t realize is the pace and ways in which data is changing our everyday lives. According to , we’re seeing a “roughly 10x increase in data every 5 years, and the types of data that’s...
byMission Daily
0 ratings
0% found this document useful
Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic: Business intellingence has been chasing the promise of self-serve data for decades. As the capabilities of these systems has improved and become more accessible, the target of what self-serve means changes. With the availability of AI powered by large language models combined with the evolution of semantic layers, the team at Zenlytic have taken aim at this problem again. In this episode Paul Blankley and Ryan Janssen explore the power of natural language driven data exploration combined with semantic modeling that enables an intuitive way for everyone in the business to access the data that they need to succeed in their work.
Podcast episode
Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic: Business intellingence has been chasing the promise of self-serve data for decades. As the capabilities of these systems has improved and become more accessible, the target of what self-serve means changes. With the availability of AI powered by large language models combined with the evolution of semantic layers, the team at Zenlytic have taken aim at this problem again. In this episode Paul Blankley and Ryan Janssen explore the power of natural language driven data exploration combined with semantic modeling that enables an intuitive way for everyone in the business to access the data that they need to succeed in their work.
byData Engineering Podcast
0 ratings
0% found this document useful
Accelerating the Shift from Enablers to Adopters of AI
Podcast episode
Accelerating the Shift from Enablers to Adopters of AI
byThoughts on the Market
0 ratings
0% found this document useful

Skip carousel

Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
01 Giving Data Collectors—and Donors—a Real-Time Rush
Fast Company
Article
01 Giving Data Collectors—and Donors—a Real-Time Rush
Mar 20, 2017
7 min read
How Image Recognition Works
APC
Article
How Image Recognition Works
Nov 4, 2019
4 min read
» Stochastic Algorithms
Linux Format
Article
» Stochastic Algorithms
Dec 14, 2021
If you’re up for some relatively maths-heavy computer-science reading (and who isn’t?), then consider looking into stochastic algorithms. Sometimes lumped together with machine-learning, stochastic algorithms is a loosely defined category that you co
1 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
Adoption of Cognitive Computing Across Various Industries
Techfastly
Article
Adoption of Cognitive Computing Across Various Industries
Dec 1, 2021
5 min read
The Deep Learning Revolution For Artificial Intelligence
Facility Management
Article
The Deep Learning Revolution For Artificial Intelligence
Mar 28, 2019
3 min read
Quantum Leap
Marketing
Article
Quantum Leap
Jul 11, 2019
6 min read
“Reputations Are Going To Be Staked On How ‘The Computer’ Goes About Making Decisions”
PC Pro Magazine
Article
“Reputations Are Going To Be Staked On How ‘The Computer’ Goes About Making Decisions”
Jun 10, 2021
We live lonely lives here sometimes. The type of critic who sees patterns in everything loves to tell me that I’m in the pockets of PC Pro advertisers, and that we all toe the party line – most recently over systems such as the Raspberry Pi 400 or an
6 min read
Things Get Strange When AI Starts Training Itself
The Atlantic
Article
Things Get Strange When AI Starts Training Itself
Feb 16, 2024
7 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
The Risks Of The Generative AI Gold Rush
APC
Article
The Risks Of The Generative AI Gold Rush
May 22, 2023
8 min read
How Can AI Help Your Business?
PC Pro Magazine
Article
How Can AI Help Your Business?
Jun 8, 2023
7 min read
Questions for Angela Zutavern, Machine Intelligence Expert, Booz Allen Hamilton
Rotman Management
Article
Questions for Angela Zutavern, Machine Intelligence Expert, Booz Allen Hamilton
Jan 1, 2018
You believe that the world of leadership has hit an inflection point. How so? As useful as popular mental models and heuristics are, machine models now outstrip human performance in about half of the portfolio of cognitive tasks. Going forward, we wi
6 min read
The Risks Of The Generative AI Gold Rush
PC Pro Magazine
Article
The Risks Of The Generative AI Gold Rush
Apr 6, 2023
8 min read
Decoding The Impact Of AI
Her World Singapore
Article
Decoding The Impact Of AI
May 5, 2023
6 min read
What Have Humans Just Unleashed?
The Atlantic
Article
What Have Humans Just Unleashed?
Mar 16, 2023
9 min read
Family History In The AI Era
Family Tree UK
Article
Family History In The AI Era
Apr 12, 2024
7 min read
AI – Turn Buzz Into Biz
Facility Management
Article
AI – Turn Buzz Into Biz
Dec 23, 2018
4 min read
The Machine Learning Revolution
APC
Article
The Machine Learning Revolution
Sep 6, 2021
8 min read
Pandora’s Box? Unleashing The Power Of AI
NZ Marketing
Article
Pandora’s Box? Unleashing The Power Of AI
Jun 22, 2023
8 min read
01 Ready Or Not, AI Is Here To Assist You
HWM Singapore
Article
01 Ready Or Not, AI Is Here To Assist You
Jul 11, 2023
4 min read
Leadership Forum: Investing in Disruption
Rotman Management
Article
Leadership Forum: Investing in Disruption
Jan 1, 2019
10 min read
Smart Answers: GenAI Tool Makes It Easier To Find The Info You Need On PCWorld
PCWorld
Article
Smart Answers: GenAI Tool Makes It Easier To Find The Info You Need On PCWorld
Sep 5, 2023
4 min read
ARTIFICIAL INTELLIGENCE (AI) IN SUPPLY CHAIN PLANNING THE Future is Here & Now
The European Business Review
Article
ARTIFICIAL INTELLIGENCE (AI) IN SUPPLY CHAIN PLANNING THE Future is Here & Now
Dec 3, 2019
7 min read
This PC Does Not Exist
Maximum PC
Article
This PC Does Not Exist
May 23, 2023
7 min read
Embracing AI in Financial Services
Rotman Management
Article
Embracing AI in Financial Services
Jan 1, 2020
You are the Chief Science Officer at RBC and you also oversee its AI research institute. Describe the bank’s interest in this arena. There are many aspects to our interest in AI. First of all, financial services is a very data-driven business. From t
6 min read
Why a Hedge Fund Started a Video Game Competition
Nautilus
Article
Why a Hedge Fund Started a Video Game Competition
Nov 30, 2017
There’s a weird way in which a hedge fund is a confluence of everything. There’s the money of course—Two Sigma, located in lower Manhattan, manages over $50 billion, an amount that has grown 600 percent in 6 years and is roughly the size of the econo
9 min read

Related categories

Skip carousel

Reviews for Regression Analysis with Python

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Regression Analysis with Python - Boschetti Alberto

Regression Analysis with Python

Credits

About the Authors

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Regression – The Workhorse of Data Science

Regression analysis and data science

Exploring the promise of data science

The challenge

The linear models

What you are going to find in the book

Python for data science

Installing Python

Choosing between Python 2 and Python 3

Step-by-step installation

Installing packages

Package upgrades

Scientific distributions

Introducing Jupyter or IPython

Python packages and functions for linear models

NumPy

SciPy

Statsmodels

Scikit-learn

Summary

2. Approaching Simple Linear Regression

Defining a regression problem

Linear models and supervised learning

Reflecting on predictive variables

Reflecting on response variables

The family of linear models

Preparing to discover simple linear regression

Starting from the basics

A measure of linear relationship

Extending to linear regression

Regressing with Statsmodels

The coefficient of determination

Meaning and significance of coefficients

Evaluating the fitted values

Correlation is not causation

Predicting with a regression model

Regressing with Scikit-learn

Minimizing the cost function

Explaining the reason for using squared errors

Pseudoinverse and other optimization methods

Gradient descent at work

Summary

3. Multiple Regression in Action

Using multiple features

Model building with Statsmodels

Using formulas as an alternative

The correlation matrix

Revisiting gradient descent

Feature scaling

Unstandardizing coefficients

Estimating feature importance

Inspecting standardized coefficients

Comparing models by R-squared

Interaction models

Discovering interactions

Polynomial regression

Testing linear versus cubic transformation

Going for higher-degree solutions

Introducing underfitting and overfitting

Summary

4. Logistic Regression

Defining a classification problem

Formalization of the problem: binary classification

Assessing the classifier's performance

Defining a probability-based approach

More on the logistic and logit functions

Let's see some code

Pros and cons of logistic regression

Revisiting gradient descent

Multiclass Logistic Regression

An example

Summary

5. Data Preparation

Numeric feature scaling

Mean centering

Standardization

Normalization

The logistic regression case

Qualitative feature encoding

Dummy coding with Pandas

DictVectorizer and one-hot encoding

Feature hasher

Numeric feature transformation

Observing residuals

Summarizations by binning

Missing data

Missing data imputation

Keeping track of missing values

Outliers

Outliers on the response

Outliers among the predictors

Removing or replacing outliers

Summary

6. Achieving Generalization

Checking on out-of-sample data

Testing by sample split

Cross-validation

Bootstrapping

Greedy selection of features

The Madelon dataset

Univariate selection of features

Recursive feature selection

Regularization optimized by grid-search

Ridge (L2 regularization)

Grid search for optimal parameters

Random grid search

Lasso (L1 regularization)

Elastic net

Stability selection

Experimenting with the Madelon

Summary

7. Online and Batch Learning

Batch learning

Online mini-batch learning

A real example

Streaming scenario without a test set

Summary

8. Advanced Regression Methods

Least Angle Regression

Visual showcase of LARS

A code example

LARS wrap up

Bayesian regression

Bayesian regression wrap up

SGD classification with hinge loss

Comparison with logistic regression

SVR

SVM wrap up

Regression trees (CART)

Regression tree wrap up

Bagging and boosting

Bagging

Boosting

Ensemble wrap up

Gradient Boosting Regressor with LAD

GBM with LAD wrap up

Summary

9. Real-world Applications for Regression Models

Downloading the datasets

Time series problem dataset

Regression problem dataset

Multiclass classification problem dataset

Ranking problem dataset

A regression problem

Testing a classifier instead of a regressor

An imbalanced and multiclass classification problem

A ranking problem

A time series problem

Open questions

Summary

Index

Regression Analysis with Python

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: February 2016

Production reference: 1250216

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78528-631-5

www.packtpub.com

Credits

Authors

Luca Massaron

Alberto Boschetti

Reviewers

Giuliano Janson

Zacharias Voulgaris

Commissioning Editor

Kunal Parikh

Acquisition Editor

Sonali Vernekar

Content Development Editor

Siddhesh Salvi

Technical Editor

Shivani Kiran Mistry

Copy Editor

Stephen Copestake

Project Coordinator

Nidhi Joshi

Proofreader

Safis Editing

Indexer

Mariammal Chettiyar

Graphics

Disha Haria

Production Coordinator

Nilesh Mohite

Cover Work

Nilesh Mohite

About the Authors

Luca Massaron is a data scientist and a marketing research director who is specialized in multivariate statistical analysis, machine learning, and customer insight with over a decade of experience in solving real-world problems and in generating value for stakeholders by applying reasoning, statistics, data mining, and algorithms. From being a pioneer of Web audience analysis in Italy to achieving the rank of a top ten Kaggler, he has always been very passionate about everything regarding data and its analysis and also about demonstrating the potential of data-driven knowledge discovery to both experts and non-experts. Favoring simplicity over unnecessary sophistication, he believes that a lot can be achieved in data science just by doing the essentials.

I would like to thank Yukiko and Amelia for their support, help, and loving patience.

Alberto Boschetti is a data scientist, with an expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces daily challenges that span from natural language processing (NLP) and machine learning to distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.

I would like to thank my family, my friends and my colleagues. Also, big thanks to the Open Source community.

About the Reviewers

Giuliano Janson's professional experiences have centered on advanced analytics and applied machine learning in healthcare. His work is primarily focused on extracting information from large, dirty, and noisy data utilizing machine learning, stats, Monte Carlo simulation, and data visualization to identify business opportunities and help leadership make data-driven decisions through actionable analytics.

I'd like to thank my wife, Magda, and my two beautiful children, Alex and Emily, for all the love they share.

Zacharias Voulgaris is a data scientist and technical author specializing in data science books. He has an engineering and management background, with post-graduate studies in information systems and machine learning. Zacharias has worked as a research fellow in Georgia Tech, investigating and applying machine learning technologies to real-world problems, as an SEO manager in an e-marketing company in Europe, as a program manager in Microsoft, and as a data scientist in US Bank and G2 Web Services.

Dr. Voulgaris has also authored technical books, the most notable of which is Data Scientist: The Definitive Guide to Becoming a Data Scientist, Technics Publications, and is currently working on Julia for Data Science, Manning Publications. He has also written a number of data science-related articles on blogs and participates in various data science/machine learning meet-up groups. Finally, he has provided technical editorial aid in the book Python Data Science Essentials, Packt Publishing, by the same authors as this book.

I would like to express my gratitude to the authors of the book for giving me the opportunity to contribute to this project. Also, I'd like to thank Bastiaan Sjardin for introducing me to them and to the world of technical editing.

www.PacktPub.com

eBooks, discount offers, and more

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Preface

Linear models have been known to scholars and practitioners and studied by them for a long time now. Before they were adopted into data science and placed into the syllabi of numerous boot camps and in the early chapters of many practical how-to-do books, they have been a prominent and relevant element of the body of knowledge of statistics, economics, and of many other respectable quantitative fields of study.

Consequently, there is a vast availability of monographs, book chapters, and papers about linear regression, logistic regression (its classification variant), and the different types of generalized linear models; models where the original linear regression paradigm is adapted in its formulation in order to solve more complex problems.

Yet, in spite of such an embarrassment of riches, we have never encountered any book that really explains the speed and ease of implementation of such linear models when, as a developer or a data scientist, you have to quickly create an application or API whose response cannot be defined programmatically but it does have to learn from data.

Of course we are very well aware of the limitations of linear models (being simple unfortunately has some drawbacks) and we also know how there is no fixed solution for any data science problem; however, our experience in the field has told us that the following advantages of a linear model cannot be easily ignored:

It's easy to explain how it works to yourself, to the management, or to anyone

It's flexible in respect of your data problem, since it can handle numeric and probability estimates, ranking, and classification up to a large number of classes

It's fast to train, no matter what the amount of data you have to process

It's fast and easy to implement in any production environment

It's scalable to real-time response toward users

If for you, as it is daily for us, it is paramount to deliver value from data in a fast and tangible way, just follow us and discover how far linear model can help you get to.

What this book covers

Chapter 1, Regression – The Workhorse of Data Science, introduces why regression is indeed useful for data science, how to quickly set up Python for data science and provides an overview of the packages used throughout the book with the help of examples. At the end of this chapter, we will be able to run all the examples contained in the following chapters. You will have clear ideas and motivations why regression analysis is not just an underrated technique taken from statistics but a powerful and effective data science algorithm.

Chapter 2, Approaching Simple Linear Regression, presents the simple linear regression by first describing a regression problem, where to fit a regressor, and then giving some intuitions underneath the math formulation of its algorithm. Then, you will learn how to tune the model for higher performances and understand every parameter of it, deeply. Finally, the engine under the hood the gradient descent will be described.

Chapter 3, Multiple Regression in Action, extends the simple linear regression to extract predictive information from more than a feature and create models that can solve real-life prediction tasks. The stochastic gradient descent technique, explained in the previous chapter, will be powered up to cope with a matrix of features and to complete the overview, you will be shown multi-collinearity, interactions, and polynomial regression topics.

Chapter 4, Logistic Regression, continues laying down the foundations of your knowledge of linear model. Starting from the necessary mathematical definitions, it demonstrates how to furthermore extend the linear regression to classification problems, both binary and multiclass.

Chapter 5, Data Preparation, discusses about the data feeding the model, describing what can be done to prepare the data in the best way and how to deal with unusual situations, especially when data is missing and outliers are present.

Chapter 6, Achieving Generalization, will introduce you to the key data science recipes for testing your model thoroughly, tune it at its best, make it parsimonious, and to put it against real fresh data, before proceeding to more complex techniques.

Chapter 7, Online and Batch Learning, illustrates the best practices to train classifiers on Big Data; it first focuses on batch learning and its limitations and then introduces online learning. Finally, you will be showed an example of Big Data, combining the benefits of online learning and the power of the hashing trick.

Chapter 8, Advanced Regression Methods, introduces some advanced methods for regression. Without getting too deep into their mathematical formulation, but always keeping an eye on practical applications, we will discuss the ideas underneath Least Angle Regression, Bayesian Regression, and stochastic gradient descent with hinge loss, and also touch upon bagging and boosting techniques.

Chapter 9, Real-world Applications for Regression Models, comprises of four practical examples of real-world data science problems solved by linear models. The ultimate goal is to demonstrate how to approach such problems and how develop the reasoning around their resolution, so that they can be used as blueprints for similar challenges you'll encounter.

What you need for this book

The execution of the code examples provided in this book requires an installation of Python 3.4.3 or newer on Mac OS X, Linux, or Microsoft Windows.

The code presented throughout the book will also make frequent use of Python's essential libraries for scientific and statistical computing such as SciPy, NumPy, Scikit-learn, Statsmodels, to a minor extent, matplotlib, and pandas.

Apart from taking advantage of Scientific distributions (such as Anaconda from Continuum Analytics) that can save you a lot of time from the time-consuming operations in order to create a working environment, we also suggest you to adopt Jupyter and its IPython Notebooks, as a more productive and scientific-friendly way to code your linear models in Python.

The first chapter will provide you with the step-by-step instructions and some useful tips to set up your Python environment, these core libraries and all the necessary tools.

Who this book is for

First, this book is addressed to Python developers with at least a basic understanding of data science, statistics, and math. Though the book doesn't require a previous background in data science or statistics, it can well address data scientists of any seniority who intend to learn how to best do regression analysis on a dataset. We imagine that you want to put intelligence in your data products but you don't want to depend on any black box. Therefore, you prefer a technique which is simple, understandable, and yet effective to be production grade. Through the book, we will provide you with the knowledge to use Python for building fast and better linear models and to deploy the resulting models in Python or in any computer language you prefer.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: When inspecting the linear model, first check the coef_ attribute.

A block of code is set as follows:

from sklearn import datasets

iris = datasets.load_iris()

Since we will be using IPython Notebooks along most of the examples, expect to have always an input (marked as In:) and often an output (marked Out:) from the cell containing the block of code. On your computer you have just to input the code after the In: and check if results correspond to the Out: content:

In: clf.fit(X, y)

Out: SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0, kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)

When a command should be given in the terminal command line, you'll find the command with the prefix $>, otherwise, if it's for the Python REPL it will be preceded by >>>:

$>python

>>> import sys

>>> print sys.version_info

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: Go to the Databases section and create a new database using the UTF collation.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <feedback@packtpub.com>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.

Hover the mouse pointer on the SUPPORT tab at the top.

Click on Code Downloads & Errata.

Enter the name of the book in the Search box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/RegressionAnalysisWithPython_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <questions@packtpub.com>, and we will do our best to address the problem.

Chapter 1. Regression – The Workhorse of Data Science

Welcome to this presentation on the workhorse of data science, linear regression, and its related family of linear models.

Nowadays, interconnectivity and data explosion are realities that open a world of new opportunities for every business that can read and interpret data in real time. Everything is facilitating the production and diffusion of data: the omnipresent Internet diffused both at home and at work, an army of electronic devices in the pockets of large portions of the population, and the pervasive presence of software producing data about every process and event. So much data is generated daily that humans cannot deal with it because of its volume, velocity, and variety. Thus, machine learning and AI are on the rise.

Coming from a long and glorious past in the field of statistics and econometrics, linear regression, and its derived methods, can provide you with a simple, reliable, and effective tool to learn from data and act on it. If carefully trained with the right data, linear methods can compete well against the most complex and fresh AI technologies, offering you unbeatable ease of implementation and scalability for increasingly large problems.

In this chapter, we will explain:

Why linear models can be helpful as models to be evaluated in a data science pipeline or as a shortcut for the immediate development of a scalable minimum viable product

Some quick indications for installing Python and setting it up for data science tasks

The necessary modules for implementing linear models in Python

Regression analysis and data science

Imagine you are a developer hastily working on a very cool application that is going to serve thousands of customers using your company's website everyday. Using the available information about customers in your data warehouse, your application is expected to promptly provide a pretty smart and not-so-obvious answer. The answer unfortunately cannot easily be programmatically predefined, and thus will require you to adopt a learning-from-data approach, typical of data science or predictive analytics.

In this day and age, such applications are quite frequently found assisting numerous successful ventures on the Web, for instance:

In the advertising business, an application delivering targeted advertisements

In e-commerce, a batch application filtering customers to make more relevant commercial offers or an online app recommending products to buy on the basis of ephemeral data such as navigation records

In the credit or insurance business, an application selecting whether to proceed with online inquiries from users, basing its judgment on their credit rating and past relationship with the company

There are numerous other possible examples, given the constantly growing number of use cases about machine learning applied to business problems. The core idea of all these applications is that you don't need to program how your application should behave, but you just set some desired behaviors by providing useful examples. The application will learn by itself what to do in any circumstance.

After you are clear about the purpose of your application and decide to use the learning-from-data approach, you are confident that you don't have to reinvent the wheel. Therefore, you jump into reading tutorials and documentation about data science and machine learning solutions applied to problems similar to yours (they could be papers, online blogs, or books talking about data science, machine learning, statistical learning, and predictive analytics).

After reading a few pages, you will surely be exposed to the wonders of many complex machine

Enjoying the preview?

Page 1 of 1

Regression Analysis with Python

About this ebook

Boschetti Alberto

Read more from Boschetti Alberto

Related authors

Related to Regression Analysis with Python

Related ebooks

Data Visualization For You

Related podcast episodes

Related articles

Related categories

Reviews for Regression Analysis with Python

What did you think?

Book preview

Regression Analysis with Python - Boschetti Alberto

Table of Contents

Regression Analysis with Python

Regression Analysis with Python

Credits

About the Authors

About the Reviewers

eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Chapter 1. Regression – The Workhorse of Data Science

Regression analysis and data science