Ebook884 pages5 hours

Practical Data Analysis Cookbook

Name: Practical Data Analysis Cookbook
Author: Tomasz Drabas
ISBN: 9781783558513

By Tomasz Drabas

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book

Clean dirty data, extract accurate information, and explore the relationships between variables
Forecast the output of an electric plant and the water flow of American rivers using pandas, NumPy, Statsmodels, and scikit-learn
Find and extract the most important features from your dataset using the most efficient Python libraries

Who This Book Is For

This book is for everyone who wants to get into the data science field and needs to build up their skills on a set of examples that aim to tackle the problems faced in the corporate world. More advanced practitioners might also find some of the examples refreshing and the more advanced topics covered interesting.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateApr 29, 2016

ISBN9781783558513

Author

Tomasz Drabas

Related authors

Skip carousel

Related to Practical Data Analysis Cookbook

Related ebooks

Skip carousel

Python Data Analysis Cookbook
Ebook
Python Data Analysis Cookbook
byIvan Idris
Rating: 5 out of 5 stars
5/5
Python Data Visualization Cookbook - Second Edition
Ebook
Python Data Visualization Cookbook - Second Edition
byMilovanović Igor
Rating: 0 out of 5 stars
0 ratings
R: Data Analysis and Visualization
Ebook
R: Data Analysis and Visualization
byBrett Lantz
Rating: 5 out of 5 stars
5/5
Python Business Intelligence Cookbook
Ebook
Python Business Intelligence Cookbook
byDempsey Robert
Rating: 0 out of 5 stars
0 ratings
R Data Visualization Cookbook
Ebook
R Data Visualization Cookbook
byAtmajitsinh Gohil
Rating: 0 out of 5 stars
0 ratings
Microsoft Tabular Modeling Cookbook
Ebook
Microsoft Tabular Modeling Cookbook
byPaul te Braak
Rating: 0 out of 5 stars
0 ratings
matplotlib Plotting Cookbook
Ebook
matplotlib Plotting Cookbook
byAlexandre Devert
Rating: 5 out of 5 stars
5/5
Python Data Visualization Cookbook
Ebook
Python Data Visualization Cookbook
byMilovanović Igor
Rating: 4 out of 5 stars
4/5
R: Recipes for Analysis, Visualization and Machine Learning
Ebook
R: Recipes for Analysis, Visualization and Machine Learning
byAtmajitsinh Gohil
Rating: 0 out of 5 stars
0 ratings
Tableau Cookbook – Recipes for Data Visualization
Ebook
Tableau Cookbook – Recipes for Data Visualization
byShweta Sankhe-Savale
Rating: 0 out of 5 stars
0 ratings
Tableau 10 Business Intelligence Cookbook
Ebook
Tableau 10 Business Intelligence Cookbook
bySantos Donabel
Rating: 0 out of 5 stars
0 ratings
MDX with Microsoft SQL Server 2016 Analysis Services Cookbook - Third Edition
Ebook
MDX with Microsoft SQL Server 2016 Analysis Services Cookbook - Third Edition
bySherry Li
Rating: 0 out of 5 stars
0 ratings
Python Machine Learning Cookbook
Ebook
Python Machine Learning Cookbook
byPrateek Joshi
Rating: 0 out of 5 stars
0 ratings
Learning Predictive Analytics with Python
Ebook
Learning Predictive Analytics with Python
byKumar Ashish
Rating: 0 out of 5 stars
0 ratings
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
Ebook
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
byMatt Goldwasser
Rating: 0 out of 5 stars
0 ratings
Mastering Python Data Analysis
Ebook
Mastering Python Data Analysis
byMagnus Vilhelm Persson
Rating: 0 out of 5 stars
0 ratings
Learning pandas
Ebook
Learning pandas
byHeydt Michael
Rating: 4 out of 5 stars
4/5
Python Data Science Essentials - Second Edition
Ebook
Python Data Science Essentials - Second Edition
byBoschetti Alberto
Rating: 4 out of 5 stars
4/5
Python Data Science Essentials
Ebook
Python Data Science Essentials
byBoschetti Alberto
Rating: 0 out of 5 stars
0 ratings
Mastering Python for Data Science
Ebook
Mastering Python for Data Science
bySamir Madhavan
Rating: 3 out of 5 stars
3/5
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
Ebook
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
byStefanie Molin
Rating: 0 out of 5 stars
0 ratings
Data Analysis with R
Ebook
Data Analysis with R
byFischetti Tony
Rating: 5 out of 5 stars
5/5
Big Data Analytics with R
Ebook
Big Data Analytics with R
bySimon Walkowiak
Rating: 0 out of 5 stars
0 ratings
Python Data Analysis - Second Edition
Ebook
Python Data Analysis - Second Edition
byArmando Fandango
Rating: 0 out of 5 stars
0 ratings
Real-Time Big Data Analytics
Ebook
Real-Time Big Data Analytics
byShilpi
Rating: 5 out of 5 stars
5/5
Data Analysis Using SQL and Excel
Ebook
Data Analysis Using SQL and Excel
byGordon S. Linoff
Rating: 3 out of 5 stars
3/5
Mastering Social Media Mining with Python
Ebook
Mastering Social Media Mining with Python
byMarco Bonzanini
Rating: 5 out of 5 stars
5/5
RStudio for R Statistical Computing Cookbook
Ebook
RStudio for R Statistical Computing Cookbook
byAndrea Cirillo
Rating: 0 out of 5 stars
0 ratings
Python: Real-World Data Science
Ebook
Python: Real-World Data Science
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5

Computers For You

Skip carousel

Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
Network+ Study Guide & Practice Exams
Ebook
Network+ Study Guide & Practice Exams
byRobert Shimonski
Rating: 4 out of 5 stars
4/5
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
Ebook
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
byTriumph Books
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
Ebook
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
bySeth Reichelson
Rating: 0 out of 5 stars
0 ratings
Remote/WebCam Notarization : Basic Understanding
Ebook
Remote/WebCam Notarization : Basic Understanding
byJeannie Eunice Franks
Rating: 3 out of 5 stars
3/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
Ebook
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
byKatherine Johnson Martinko
Rating: 0 out of 5 stars
0 ratings
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Practical Lock Picking: A Physical Penetration Tester's Training Guide
Ebook
Practical Lock Picking: A Physical Penetration Tester's Training Guide
byDeviant Ollam
Rating: 5 out of 5 stars
5/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
Master Builder Roblox: The Essential Guide
Ebook
Master Builder Roblox: The Essential Guide
byTriumph Books
Rating: 4 out of 5 stars
4/5
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
Ebook
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
byDexter Jackson
Rating: 4 out of 5 stars
4/5
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
Ebook
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
byDavid Kadavy
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
Podcast episode
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
byDataFramed
100%
100% found this document useful
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
Podcast episode
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
byAnalytics on Fire
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
Podcast episode
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
byPHPRoundtable Podcast
0 ratings
0% found this document useful
#63 The Past and Present of Data Science
Podcast episode
#63 The Past and Present of Data Science
byDataFramed
0 ratings
0% found this document useful
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss: Laurie Voss is a senior data analyst at Netlify, makers of a serverless platform designed to help teams build, deploy, and collaborate on web apps more effectively. Previously, Laurie worked as Chief Data Officer at npm, Inc., co-founded Snowball Factory,
Podcast episode
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss: Laurie Voss is a senior data analyst at Netlify, makers of a serverless platform designed to help teams build, deploy, and collaborate on web apps more effectively. Previously, Laurie worked as Chief Data Officer at npm, Inc., co-founded Snowball Factory,
byScreaming in the Cloud
0 ratings
0% found this document useful
#75 The Data Storytelling Skills Data Teams Need with Andy Cotgreave, Technical Evangelist at Tableau
Podcast episode
#75 The Data Storytelling Skills Data Teams Need with Andy Cotgreave, Technical Evangelist at Tableau
byDataFramed
0 ratings
0% found this document useful
007: Data Cleansing & Analysis with Oz du Soleil: Oz du Soleil is an Excel MVP since 2015 and is an expert in data cleansing & analysis. He has an Excel blog over at www.datascopic.net which is his commitment to data literacy. He’s the leading author on the revised version of Guerrilla Data...
Podcast episode
007: Data Cleansing & Analysis with Oz du Soleil: Oz du Soleil is an Excel MVP since 2015 and is an expert in data cleansing & analysis. He has an Excel blog over at www.datascopic.net which is his commitment to data literacy. He’s the leading author on the revised version of Guerrilla Data...
byLearn Microsoft Excel with MyExcelOnline
0 ratings
0% found this document useful
#69 Effective Data Storytelling: How to Turn Insights into Action
Podcast episode
#69 Effective Data Storytelling: How to Turn Insights into Action
byDataFramed
0 ratings
0% found this document useful
[DataFramed Careers Series #1] Launching a Data Career in 2022
Podcast episode
[DataFramed Careers Series #1] Launching a Data Career in 2022
byDataFramed
0 ratings
0% found this document useful
Gitting After It with Katie Sylor-Miller: Katie Sylor-Miller is a frontend architect at Etsy, a company she joined in November 2015. Prior to this position, Katie worked as a senior front end developer at Constant Contact, a technical lead at EF Education, a front end web developer at Miller Syst
Podcast episode
Gitting After It with Katie Sylor-Miller: Katie Sylor-Miller is a frontend architect at Etsy, a company she joined in November 2015. Prior to this position, Katie worked as a senior front end developer at Constant Contact, a technical lead at EF Education, a front end web developer at Miller Syst
byScreaming in the Cloud
0 ratings
0% found this document useful
#300: Bali Special | Sim Khela - Future of Blockchain and New Earth: Born in India, raised in California and educated in Calgary, Sim Khela is a jack of all trades and master of some. After studying Communications and Culture at University of Calgary he moved on to study Electronics Engineering at South Alberta...
Podcast episode
#300: Bali Special | Sim Khela - Future of Blockchain and New Earth: Born in India, raised in California and educated in Calgary, Sim Khela is a jack of all trades and master of some. After studying Communications and Culture at University of Calgary he moved on to study Electronics Engineering at South Alberta...
by10 Million Journey
0 ratings
0% found this document useful
The Value of Analysts and Observability with Nick Heudecker: Nick Heudecker, who leads Market Strategy and Competitive Intelligence at Cirbl, joins Corey who, as it turns out, has some similarities with Corey. Nick also spent some time in Maine, as a cryptologist for the Navy, and also spent the months of deep wint
Podcast episode
The Value of Analysts and Observability with Nick Heudecker: Nick Heudecker, who leads Market Strategy and Competitive Intelligence at Cirbl, joins Corey who, as it turns out, has some similarities with Corey. Nick also spent some time in Maine, as a cryptologist for the Navy, and also spent the months of deep wint
byScreaming in the Cloud
0 ratings
0% found this document useful
Ep. 39 - Lean and Process Mining: Sebastian Kotulla
Podcast episode
Ep. 39 - Lean and Process Mining: Sebastian Kotulla
byWhat's Your Baseline? Enterprise Architecture & Business Process Management Demystified
0 ratings
0% found this document useful
Predict Your Future (and Make Your CFO Happy): Join Pete and Jesse as they talk about the important role tagging plays in influencing DevOps, why tagging strategies need to change over time, why improving your organization's tagging strategy isn't an overnight fix, how tagging is all about cost attrib
Podcast episode
Predict Your Future (and Make Your CFO Happy): Join Pete and Jesse as they talk about the important role tagging plays in influencing DevOps, why tagging strategies need to change over time, why improving your organization's tagging strategy isn't an overnight fix, how tagging is all about cost attrib
byAWS Morning Brief
0 ratings
0% found this document useful
The Cloudcast #346 - What is Observability?
Podcast episode
The Cloudcast #346 - What is Observability?
byThe Cloudcast
0 ratings
0% found this document useful
Chief Information Officer: Enterprise AI and the CIO: How should Chief Information Officers manage AI in the enterprise, and what challenges may arise? Every CIO must consider these questions from both enterprise technology and business leadership perspectives. Author and AI investor Ash Fontana...
Podcast episode
Chief Information Officer: Enterprise AI and the CIO: How should Chief Information Officers manage AI in the enterprise, and what challenges may arise? Every CIO must consider these questions from both enterprise technology and business leadership perspectives. Author and AI investor Ash Fontana...
byCXOTalk
0 ratings
0% found this document useful
438 Statistics & Data Analysis: Does It Have A Future? - Simple Programmer Podcast: CHECK OUT HIRED.COM: https://www.simpleprogrammer.com/hiredsp CHECK OUT KOBITON: htttps://www.kobiton.com/simpleprogrammer The process of evaluating data using analytical and logical reasoning to examine each component of the data provided is called...
Podcast episode
438 Statistics & Data Analysis: Does It Have A Future? - Simple Programmer Podcast: CHECK OUT HIRED.COM: https://www.simpleprogrammer.com/hiredsp CHECK OUT KOBITON: htttps://www.kobiton.com/simpleprogrammer The process of evaluating data using analytical and logical reasoning to examine each component of the data provided is called...
bySimple Programmer Podcast
0 ratings
0% found this document useful
362: Prioritizing Learning: This week, Steph and Joël discuss investment time and keeping track of things they want to learn. How do you, dear listener, keep track of things you want to learn? When investment time rolls around, what do you reach for, or how do you prioritize that list? Are there things you actively decide not to focus on when choosing where to develop deep expertise? Are there things you wish you could spend time on if you could?
Podcast episode
362: Prioritizing Learning: This week, Steph and Joël discuss investment time and keeping track of things they want to learn. How do you, dear listener, keep track of things you want to learn? When investment time rolls around, what do you reach for, or how do you prioritize that list? Are there things you actively decide not to focus on when choosing where to develop deep expertise? Are there things you wish you could spend time on if you could?
byThe Bike Shed
0 ratings
0% found this document useful
66: A guide to data models and dynamic dashboards for marketers
Podcast episode
66: A guide to data models and dynamic dashboards for marketers
byHumans of Martech
0 ratings
0% found this document useful
Platform Engineering at a FAANG Company
Podcast episode
Platform Engineering at a FAANG Company
byThe Cloudcast
0 ratings
0% found this document useful
65: It takes a village to build a dashboard: When designing a dashboard, it's important to focus on the decisions you want to make, rather than just the metrics you want to track. Before building your dashboard, consider your audience and bring together the right people to answer key questions. This
Podcast episode
65: It takes a village to build a dashboard: When designing a dashboard, it's important to focus on the decisions you want to make, rather than just the metrics you want to track. Before building your dashboard, consider your audience and bring together the right people to answer key questions. This
byHumans of Martech
0 ratings
0% found this document useful
Defining Success: Metrics and KPIs - Adam Sroka
Podcast episode
Defining Success: Metrics and KPIs - Adam Sroka
byDataTalks.Club
0 ratings
0% found this document useful
832: SaaS: Machine Learning and AI for Re-Engaging Customers, $250k ACV and $1.5m Raised: Victor Szczerba. He’s the co-founder and CEO of Yeti Data, solving big data problems for customers. Prior experience includes running product strategy at the data division at SAP. He was a McKinsey consultant and sales VP for Tadpole Computer and Utopy.
Podcast episode
832: SaaS: Machine Learning and AI for Re-Engaging Customers, $250k ACV and $1.5m Raised: Victor Szczerba. He’s the co-founder and CEO of Yeti Data, solving big data problems for customers. Prior experience includes running product strategy at the data division at SAP. He was a McKinsey consultant and sales VP for Tadpole Computer and Utopy.
bySaaS Interviews with CEOs, Startups, Founders
0 ratings
0% found this document useful
BAM 069: My 4 Step Process for Getting Remote Access From IT and An Overview of Analytics: One of the greatest challenges faced by BAS professionals is getting remote access to their BAS. In this episode, I teach you my four-step process to get IT to give you remote access to your BAS. I also explore analytics. I unpack what analytics...
Podcast episode
BAM 069: My 4 Step Process for Getting Remote Access From IT and An Overview of Analytics: One of the greatest challenges faced by BAS professionals is getting remote access to their BAS. In this episode, I teach you my four-step process to get IT to give you remote access to your BAS. I also explore analytics. I unpack what analytics...
byThe Smart Buildings Academy Podcast | Teaching You Building Automation, Systems Integration, and Information Technology
0 ratings
0% found this document useful
Using Data as a Springboard for Improvement with Peter Kazanjy: This episode of the Live Better Seller Better Podcast features Peter Kazanjy, Cofounder of Atrium. There's always some hesitancy when describing SaaS as a numbers game even when, boiling it all down, it actually is. However, we should never forget that behind every number is a person, process, and skill! Peter talks about the challenges most organizations face when it comes to using data to improve individual performance. He talks about how leaders can identify the essential numbers and change these to get the best results in the short and long term. HIGHLIGHTS What's holding companies back when it comes to leveraging data The key metrics a revenue org should be tracking How to work with data to make improvements Creating a data-driven org QUOTES Peter on the demographics of data-use: "Oftentimes the younger managers who came up with Fitbit or Peloton or what have you are way more open to this and way more data
Podcast episode
Using Data as a Springboard for Improvement with Peter Kazanjy: This episode of the Live Better Seller Better Podcast features Peter Kazanjy, Cofounder of Atrium. There's always some hesitancy when describing SaaS as a numbers game even when, boiling it all down, it actually is. However, we should never forget that behind every number is a person, process, and skill! Peter talks about the challenges most organizations face when it comes to using data to improve individual performance. He talks about how leaders can identify the essential numbers and change these to get the best results in the short and long term. HIGHLIGHTS What's holding companies back when it comes to leveraging data The key metrics a revenue org should be tracking How to work with data to make improvements Creating a data-driven org QUOTES Peter on the demographics of data-use: "Oftentimes the younger managers who came up with Fitbit or Peloton or what have you are way more open to this and way more data
byLive Better. Sell Better.
0 ratings
0% found this document useful
First Impressions of Fresh Books: First Impressions of Fresh Books
Podcast episode
First Impressions of Fresh Books: First Impressions of Fresh Books
byWeb Tools Radio
0 ratings
0% found this document useful
Storytime for DataOps - Christopher Bergh
Podcast episode
Storytime for DataOps - Christopher Bergh
byDataTalks.Club
0 ratings
0% found this document useful
Potluck — Copilot × Glasses × Databases × Dealing with Stress × Employment vs Self-Employment × Auth in GraphQL × Headless CMS × More!: It’s another Potluck! In this episode, Scott and Wes answer your questions about GitHub Copilot, glasses, databases, dealing with stress, self-employment vs employment, design, CORS, and much more! Linode - Sponsor Whether you’re working on a...
Podcast episode
Potluck — Copilot × Glasses × Databases × Dealing with Stress × Employment vs Self-Employment × Auth in GraphQL × Headless CMS × More!: It’s another Potluck! In this episode, Scott and Wes answer your questions about GitHub Copilot, glasses, databases, dealing with stress, self-employment vs employment, design, CORS, and much more! Linode - Sponsor Whether you’re working on a...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful

Skip carousel

Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
01 Giving Data Collectors—and Donors—a Real-Time Rush
Fast Company
Article
01 Giving Data Collectors—and Donors—a Real-Time Rush
Mar 20, 2017
7 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
Top Five AI-ML Books For Business Leaders
Techfastly
Article
Top Five AI-ML Books For Business Leaders
Aug 2, 2021
5 min read
Help Yourself To Avoid These Pitfalls
MacLife
Article
Help Yourself To Avoid These Pitfalls
Dec 11, 2018
GETTING UP TO full speed with the Shortcuts app takes time, and you’ll inevitably make a few mistakes along the way. Having to troubleshoot your efforts doesn’t mean you’ve failed — with years of experience, even professional programmers do this. Tak
2 min read
A Place For Everything
Outdoor Photographer
Article
A Place For Everything
Aug 10, 2019
9 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
Mistakes To Avoid
MacFormat
Article
Mistakes To Avoid
Jan 12, 2021
2 min read
Remote Learning
Writing Magazine
Article
Remote Learning
Mar 4, 2021
6 min read
A Short Guide to Chatbot Training Dataset
Home Business Magazine
Article
A Short Guide to Chatbot Training Dataset
Jun 29, 2023
3 min read
Problems Solved
Computeractive
Article
Problems Solved
Mar 16, 2022
11 min read
In Conversation With portrait Motorsport Images Rob Smedley
GP Racing UK
Article
In Conversation With portrait Motorsport Images Rob Smedley
Jul 8, 2021
3 min read
ChatGPT Masterclass Make AI Work For You
APC
Article
ChatGPT Masterclass Make AI Work For You
Mar 4, 2024
ChatGPT is a splendid time sink, letting you plot out horror movies starring Michael Stipe and Kylie Minogue, or practically anything else you can dream up. But it’s a lot more than a plaything to pass a tea break with. OpenAI’s chatbot has a series
14 min read
Mailserver
Linux Format
Article
Mailserver
Jun 27, 2023
4 min read
Family History In The AI Era
Family Tree UK
Article
Family History In The AI Era
Apr 12, 2024
7 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
Q&A
Rotman Management
Article
Q&A
May 1, 2023
Describe the capability that companies like Netflix, UPS, Amazon and Caesars Entertainment have in common. These are all leading firms in their industries with respect to leveraging analytics as a source of competitive advantage. We now have so much
7 min read
GENEALOGY GADGETS & APPS FOR ALL OCCASIONS!
Family Tree UK
Article
GENEALOGY GADGETS & APPS FOR ALL OCCASIONS!
Aug 12, 2022
2 min read
“There’s Something About Online Meetings That Makes People More Willing To Engage With Each Other”
PC Pro Magazine
Article
“There’s Something About Online Meetings That Makes People More Willing To Engage With Each Other”
Oct 8, 2020
9 min read
Real Intelligence v Artificial Intelligence
The Oldie
Article
Real Intelligence v Artificial Intelligence
May 31, 2023
Hold onto your hats: the digital world is about to take a giant leap, courtesy of the much-discussed advent of artificial intelligence (AI), We will see as much change in the next two years as we have seen in the last ten. I hope that I can keep up.
2 min read
The Algorithmic Leader
Rotman Management
Article
The Algorithmic Leader
Jan 1, 2020
9 min read
Machine Learning How Effective Is It in Cryptocurrency Trading?
Techfastly
Article
Machine Learning How Effective Is It in Cryptocurrency Trading?
Nov 1, 2021
5 min read
Q & A
Rotman Management
Article
Q & A
Sep 1, 2021
You and your co-author spent a combined 27 years working at Amazon, sharing founder Jeff Bezos’ conviction that the long-term interests of shareholders are perfectly aligned with the interests of customers. Please unpack that belief for us. It’s more
7 min read
Magnus’ Marketing Minute
Shop Talk
Article
Magnus’ Marketing Minute
Aug 1, 2022
Michael Magnus is an advertising professional who supports the growth of the leather industry through his marketing agency, Magnus Opus. Among his client partnerships are Silver Creek Leather Co., manufacturers of Realeather® Crafts and Lace, and Jim
5 min read
Observability Of The Kernel And Containers
Linux Format
Article
Observability Of The Kernel And Containers
Apr 4, 2023
Mihalis Tsoukalos is currently working on Time Series. You can reach him at: @mactsouk. For our final delve into eBPF, we’re tackling applications, the kernel and Docker containers. At the end of the day, all Linux machines execute code for applicat
10 min read
What It Takes To Be A Smart Business
Rotman Management
Article
What It Takes To Be A Smart Business
Jan 1, 2019
Why is it important for every Western businessperson to be familiar with Alibaba's business model? Alibaba’s business model provides key insights into the future of strategy. The sources of competitive advantage have shifted dramatically, and compani
6 min read
There’s A New Career In Town
True Love
Article
There’s A New Career In Town
Oct 21, 2019
2 min read
Mastering Chatgpt
PC Pro Magazine
Article
Mastering Chatgpt
Jan 4, 2024
5 min read

Related categories

Skip carousel

Reviews for Practical Data Analysis Cookbook

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Practical Data Analysis Cookbook - Tomasz Drabas

Practical Data Analysis Cookbook

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Preparing the Data

Introduction

Reading and writing CSV/TSV files with Python

Getting ready

How to do it…

How it works…

There's more…

See also

Reading and writing JSON files with Python

Getting ready

How to do it…

How it works…

There's more…

See also

Reading and writing Excel files with Python

Getting ready

How to do it…

How it works…

There's more…

See also

Reading and writing XML files with Python

Getting ready

How to do it…

How it works…

Retrieving HTML pages with pandas

Getting ready

How to do it…

How it works…

Storing and retrieving from a relational database

Getting ready

How to do it…

How it works…

There's more…

See also

Storing and retrieving from MongoDB

Getting ready

How to do it…

How it works…

See also

Opening and transforming data with OpenRefine

Getting ready

How to do it…

See also

Exploring the data with Open Refine

Getting ready

How to do it…

Removing duplicates

Getting ready

How to do it…

Using regular expressions and GREL to clean up data

Getting ready

How to do it…

See also

Imputing missing observations

Getting ready

How to do it…

How it works…

There's more…

Normalizing and standardizing the features

Getting ready

How to do it…

How it works…

Binning the observations

Getting ready

How to do it…

How it works…

There's more…

Encoding categorical variables

Getting ready

How to do it…

How it works…

2. Exploring the Data

Introduction

Producing descriptive statistics

Getting ready

How to do it…

How it works…

There's more…

See also…

Exploring correlations between features

Getting ready

How to do it…

How it works…

See also…

Visualizing the interactions between features

Getting ready

How to do it…

How it works…

See also…

Producing histograms

Getting ready

How to do it…

How it works…

There's more…

See also…

Creating multivariate charts

Getting ready

How to do it…

How it works…

See also…

Sampling the data

Getting ready

How to do it…

How it works…

There's more…

Splitting the dataset into training, cross-validation, and testing

Getting ready

How to do it…

How it works…

There's more…

3. Classification Techniques

Introduction

Testing and comparing the models

Getting ready

How to do it…

How it works…

There's more…

See also

Classifying with Naïve Bayes

Getting ready

How to do it…

How it works…

See also

Using logistic regression as a universal classifier

Getting ready

How to do it…

How it works…

There's more…

See also

Utilizing Support Vector Machines as a classification engine

Getting ready

How to do it…

How it works…

There's more…

Classifying calls with decision trees

Getting ready

How to do it…

How it works…

There's more…

Predicting subscribers with random tree forests

Getting ready

How to do it…

How it works…

There's more…

Employing neural networks to classify calls

Getting ready

How to do it…

How it works…

There's more…

See also

4. Clustering Techniques

Introduction

Assessing the performance of a clustering method

Getting ready

How to do it…

How it works…

See also…

Clustering data with k-means algorithm

Getting ready

How to do it…

How it works…

There's more…

See also…

Finding an optimal number of clusters for k-means

Getting ready

How to do it…

How it works…

There's more…

Discovering clusters with mean shift clustering model

Getting ready

How to do it…

How it works…

See also…

Building fuzzy clustering model with c-means

Getting ready

How to do it…

How it works…

Using hierarchical model to cluster your data

Getting ready

How to do it…

How it works…

There's more…

See also…

Finding groups of potential subscribers with DBSCAN and BIRCH algorithms

Getting ready

How to do it…

How it works…

See also…

5. Reducing Dimensions

Introduction

Creating three-dimensional scatter plots to present principal components

Getting ready

How to do it…

How it works…

Reducing the dimensions using the kernel version of PCA

Getting ready

How to do it…

How it works…

There's more…

See also

Using Principal Component Analysis to find things that matter

Getting ready

How to do it…

How it works…

There's more…

See also

Finding the principal components in your data using randomized PCA

Getting ready

How to do it…

How it works…

There's more…

Extracting the useful dimensions using Linear Discriminant Analysis

Getting ready

How to do it…

How it works…

Using various dimension reduction techniques to classify calls using the k-Nearest Neighbors classification model

Getting ready

How to do it…

How it works…

6. Regression Methods

Introduction

Identifying and tackling multicollinearity

Getting ready

How to do it…

How it works…

There's more…

Building Linear Regression model

Getting ready

How to do it…

How it works…

There's more…

Using OLS to forecast how much electricity can be produced

Getting ready

How to do it…

How it works…

There's more…

See also

Estimating the output of an electric plant using CART

Getting ready

How to do it…

How it works…

There's more…

See also

Employing the kNN model in a regression problem

Getting ready

How to do it…

How it works…

Applying the Random Forest model to a regression analysis

Getting ready

How to do it…

How it works…

Gauging the amount of electricity a plant can produce using SVMs

Getting ready

How to do it…

How it works…

There's more…

See also

Training a Neural Network to predict the output of a power plant

Getting ready

How to do it…

How it works…

See also

7. Time Series Techniques

Introduction

Handling date objects in Python

Getting ready

How to do it…

How it works…

There's more…

Understanding time series data

Getting ready

How to do it…

How it works…

There's more…

Smoothing and transforming the observations

Getting ready

How to do it…

How it works…

There's more…

Filtering the time series data

Getting ready

How to do it…

How it works…

There's more…

Removing trend and seasonality

Getting ready

How to do it…

How it works…

There's more…

Forecasting the future with ARMA and ARIMA models

Getting ready

How to do it…

How it works…

See also

8. Graphs

Introduction

Handling graph objects in Python with NetworkX

Getting ready

How to do it…

How it works…

There's more…

See also

Using Gephi to visualize graphs

Getting ready

How to do it…

There's more…

See also

Identifying people whose credit card details were stolen

Getting ready

How to do it…

How it works…

There's more…

Identifying those responsible for stealing the credit cards

Getting ready

How to do it…

How it works…

See also

9. Natural Language Processing

Introduction

Reading raw text from the Web

Getting ready

How to do it…

How it works…

Tokenizing and normalizing text

Getting ready

How to do it…

How it works…

See also

Identifying parts of speech, handling n-grams, and recognizing named entities

Getting ready

How to do it…

How it works…

There's more…

Identifying the topic of an article

Getting ready

How to do it…

How it works…

Identifying the sentence structure

Getting ready

How to do it…

How it works…

See also

Classifying movies based on their reviews

Getting ready

How to do it…

How it works…

10. Discrete Choice Models

Introduction

Preparing a dataset to estimate discrete choice models

Getting ready

How to do it…

How it works…

There's more…

Estimating the well-known Multinomial Logit model

Getting ready

How to do it…

How it works…

See also

Testing for violations of the Independence from Irrelevant Alternatives

Getting ready

How to do it…

How it works…

There's more…

Handling IIA violations with the Nested Logit model

Getting ready

How to do it…

How it works…

Managing sophisticated substitution patterns with the Mixed Logit model

Getting ready

How to do it…

How it works…

11. Simulations

Introduction

Using SimPy to simulate the refueling process of a gas station

Getting ready

How to do it…

How it works…

There's more…

Simulating out-of-energy occurrences for an electric car

Getting ready

How to do it…

How it works…

Determining if a population of sheep is in danger of extinction due to a wolf pack

Getting ready

How to do it…

How it works…

Index

Practical Data Analysis Cookbook

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: April 2011

Production reference: 1250416

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78355-166-8

www.packtpub.com

Credits

Author

Tomasz Drabas

Reviewers

Brett Bloomquist

Khaled Tannir

Commissioning Editor

Dipika Gaonkar

Acquisition Editor

Prachi Bisht

Content Development Editor

Pooja Mhapsekar

Technical Editor

Bharat Patil

Copy Editor

Tasneem Fatehi

Project Coordinator

Francina Pinto

Proofreader

Safis Editing

Indexer

Mariammal Chettiyar

Production Coordinator

Nilesh R. Mohite

Cover Work

Nilesh R. Mohite

About the Author

Tomasz Drabas is a data scientist working for Microsoft and currently residing in the Seattle area. He has over 12 years of international experience in data analytics and data science in numerous fields, such as advanced technology, airlines, telecommunications, finance, and consulting.

Tomasz started his career in 2003 with LOT Polish Airlines in Warsaw, Poland, while finishing his master's degree in strategy management. In 2007, he moved to Sydney to pursue a doctoral degree in operations research at the University of New South Wales, School of Aviation; his research crossed boundaries between discrete choice modeling and airline operations research. During his time in Sydney, he worked as a data analyst for Beyond Analysis Australia and as a senior data analyst/data scientist for Vodafone Hutchison Australia, among others. He has also published scientific papers, attended international conferences, and served as a reviewer for scientific journals.

In 2015, he relocated to Seattle to begin his work for Microsoft. There he works on numerous projects involving solving problems in high-dimensional feature space.

Acknowledgments

First and foremost, I would like to thank my wife, Rachel, and daughter, Skye, for encouraging me to undertake this challenge and tolerating long days of developing code and late nights of writing up. You are the best and I love you beyond bounds! Also, thanks to my family for putting up with me (in general).

Tomasz Bednarz has not only been a great friend but also a great mentor when I was learning programming—thank you! I also want to thank my current and former managers, Mike Stephenson and Rory Carter, as well as numerous colleagues and friends who also encouraged me to finish this book.

Special thanks go to my two former supervisors, Dr Richard Cheng-Lung Wu and Dr Tomasz Jablonski. The master's project with Tomasz sparked my interest in neural networks—lessons that I will never forget. Without Richard's help, I would not have been able to finish my PhD and will always be grateful for his help, guidance, and friendship.

About the Reviewers

Brett Bloomquist holds a BS in mathematics and an MS in computer science, specializing in computer-aided geometric design. He has 26 years of work experience in the software industry with a focus on geometric modeling algorithms and computer graphics. More recently, Brett has been applying his mathematics and visualization background as a principal data scientist.

Khaled Tannir is a visionary solution architect with more than 20 years of technical experience focusing on big data technologies, data science machine learning, and data mining since 2010.

He is widely recognized as an expert in these fields and has a bachelor's degree in electronics and a master's degree in system information architectures. He is working on completing his PhD.

Khaled has more than 15 certifications (R programming, big data, and many more) and is a Microsoft Certified Solution Developer (MCSD) and an avid technologist.

He has worked for many companies in France (and recently in Canada), leading the development and implementation of software solutions and giving technical presentations.

He is the author of the books RavenDB 2.x Beginner's Guide and Optimizing Hadoop MapReduce, both by Packt Publishing (which were translated in Simplified Chinese) and a technical reviewer on the books, Pentaho Analytics for MongoDB, MongoDB High Availability, and Learning Predictive Analytics with R, by Packt Publishing.

He enjoys taking landscape and night photos, traveling, playing video games, creating funny electronics gadgets using Arduino, Raspberry Pi, and .Net Gadgeteer, and of course spending time with his wife and family.

You can connect with him on LinkedIn or reach him at <contact@khaledtannir.net>.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

Data analytics and data science have garnered a lot of attention from businesses around the world. The amount of data generated these days is mind-boggling, and it keeps growing everyday; with the proliferation of mobiles, access to Facebook, YouTube, Netflix, or other 4K video content providers, and increasing reliance on cloud computing, we can only expect this to increase.

The task of a data scientist is to clean, transform, and analyze the data in order to provide the business with insights about its customers and/or competitors, monitor the health of the services provided by the company, or automatically present recommendations to drive more opportunities for cross-selling (among many others).

In this book, you will learn how to read, write, clean, and transform the data—the tasks that are the most time-consuming but also the most critical. We will then present you with a broad array of tools and techniques that any data scientist should master, ranging from classification, clustering, or regression, through graph theory and time-series analysis, to discrete choice modeling and simulations. In each chapter, we will present an array of detailed examples written in Python that will help you tackle virtually any problem that you might encounter in your career as a data scientist.

What this book covers

Chapter 1, Preparing the Data, covers the process of reading and writing from and to various data formats and databases, as well as cleaning the data using OpenRefine and Python.

Chapter 2, Exploring the Data, describes various techniques that aid in understanding the data. We will see how to calculate distributions of variables and correlations between them and produce some informative charts.

Chapter 3, Classification Techniques, introduces several classification techniques, from simple Naïve Bayes classifiers to more sophisticated Neural Networks and Random Tree Forests.

Chapter 4, Clustering Techniques, explains numerous clustering models; we start with the most common k-means method and finish with more advanced BIRCH and DBSCAN models.

Chapter 5, Reducing Dimensions, presents multiple dimensionality reduction techniques, starting with the most renowned PCA, through its kernel and randomized versions, to LDA.

Chapter 6, Regression Methods, covers many regression models, both linear and nonlinear. We also bring back random forests and SVMs (among others) as these can be used to solve either classification or regression problems.

Chapter 7, Time Series Techniques, explores the methods of handling and understanding time series data as well as building ARMA and ARIMA models.

Chapter 8, Graphs, introduces NetworkX and Gephi to handle, understand, visualize, and analyze data in the form of graphs.

Chapter 9, Natural Language Processing, describes various techniques related to the analytics of free-flow text: part-of-speech tagging, topic extraction, and classification of data in textual form.

Chapter 10, Discrete Choice Models, explains the choice modeling theory and some of the most popular models: the Multinomial, Nested, and Mixed Logit models.

Chapter 11, Simulations, covers the concepts of agent-based simulations; we simulate the functioning of a gas station, out-of-power occurrences for electric vehicles, and sheep-wolf predation scenarios.

What you need for this book

For this book, you need a personal computer (it can be a Windows machine, Mac, or Linux) with an installed and configured Python 3.5 environment; we use the Anaconda distribution of Python that can be downloaded at https://www.continuum.io/downloads.

Throughout this book, we use various Python modules: pandas, NumPy/SciPy, SciKit-Learn, MLPY, StatsModels, PyBrain, NLTK, BeautifulSoup, Optunity, Matplotlib, Seaborn, Bokeh, PyLab, OpenPyXl, PyMongo, SQLAlchemy, NetworkX, and SimPy. Most of the modules used come preinstalled with Anaconda, but some of them need to be installed via either the conda installer or by downloading the module and using the python setup.py install command. It is fine if some of those modules are not currently installed on your machine; we will guide you through the installation process.

We also use several non-Python tools: OpenRefine to aid in data cleansing and analysis, D3.js to visualize data, Postgres and MongoDB databases to store data, Gephi to visualize graphs, and PythonBiogeme to estimate discrete choice models. We will provide detailed installation instructions where needed.

Who this book is for

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it, How it works, There's more, and See also).

To give clear instructions on how to complete a recipe, we use these sections as follows:

Getting ready

This section tells you what to expect in the recipe, and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make the reader more knowledgeable about the recipe.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: We can include other contexts through the use of the include directive.

A block of code is set as follows:

for p in all_disputed_transactions:

try:

transactions[p[0]].append(p[2]['amount'])

except:

transactions[p[0]] = [p[2]['amount']]

Any command-line input or output is written as follows:

cd networkx python setup.py install

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: We start with using Range on the age filter.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <feedback@packtpub.com>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.

Hover the mouse pointer on the SUPPORT tab at the top.

Click on Code Downloads & Errata.

Enter the name of the book in the Search box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on Code Download.

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for this book is also available on GitHub at https://github.com/drabastomek/practicalDataAnalysisCookbook/tree/master/Data.

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/practicaldataanalysiscookbook_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <questions@packtpub.com>, and we will do our best to address the problem.

Chapter 1. Preparing the Data

In this chapter, we will cover the basic tasks of reading, storing, and cleaning data using Python and OpenRefine. You will learn the following recipes:

Reading and writing CSV/TSV files with Python

Reading and writing JSON files with Python

Reading and writing Excel files with Python

Reading and writing XML files with Python

Retrieving HTML pages with pandas

Storing and retrieving from a relational database

Storing and retrieving from MongoDB

Opening and transforming data with OpenRefine

Exploring the data with OpenRefine

Removing duplicates

Using regular expressions and GREL to clean up the data

Imputing missing observations

Normalizing and standardizing features

Binning the observations

Encoding categorical variables

Introduction

For the following set of recipes, we will use Python to read data in various formats and store it in RDBMS and NoSQL databases.

All the source codes and datasets that we will use in this book are available in the GitHub repository for this book. To clone the repository, open your terminal of choice (on Windows, you can use command line, Cygwin, or Git Bash and in the Linux/Mac environment, you can go to Terminal) and issue the following command (in one line):

git clone https://github.com/drabastomek/practicalDataAnalysisCookbook.git

Tip

Note that you need Git installed on your machine. Refer to https://git-scm.com/book/en/v2/Getting-Started-Installing-Git for installation instructions.

In the following four sections, we will use a dataset that consists of 985 real estate transactions. The real estate sales took place in the Sacramento area over a period of five consecutive days. We downloaded the data from https://support.spatialkey.com/spatialkey-sample-csv-data/—in specificity, http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv. The data was then transformed into various formats that are stored in the Data/Chapter01 folder in the GitHub repository.

In addition, you will learn how to retrieve information from HTML files. For this purpose, we will use the Wikipedia list of airports starting with the letter A, https://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_A.

To clean our dataset, we will use OpenRefine; it is a powerful tool to read, clean, and transform data.

Reading and writing CSV/TSV files with Python

CSV and TSV formats are essentially text files formatted in a specific way: the former one separates data using a comma and the latter uses tab \t characters. Thanks to this, they are really portable and facilitate the ease of sharing data between various platforms.

Getting ready

To execute this recipe, you will need the pandas module installed. These modules are all available in the Anaconda distribution of Python and no further work is required if you already use this distribution. Otherwise, you will need to install pandas and make sure that it loads properly.

Note

You can download Anaconda from http://docs.continuum.io/anaconda/install. If you already have Python installed but do not have pandas, you can download the package from https://github.com/pydata/pandas/releases/tag/v0.17.1 and follow the instructions to install it appropriately for your operating system (http://pandas.pydata.org/pandas-docs/stable/install.html).

No other prerequisites are required.

How to do it…

The pandas module is a library that provides high-performing, high-level data structures (such as DataFrame) and some basic analytics tools for Python.

Note

The DataFrame

Enjoying the preview?

Page 1 of 1

Practical Data Analysis Cookbook

About this ebook

Tomasz Drabas

Related authors

Related to Practical Data Analysis Cookbook

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Practical Data Analysis Cookbook

What did you think?

Book preview

Practical Data Analysis Cookbook - Tomasz Drabas

Table of Contents

Practical Data Analysis Cookbook

Practical Data Analysis Cookbook

Credits

About the Author

Acknowledgments

About the Reviewers

Support files, eBooks, discount offers, and more

Why Subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

There's more…

See also

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Chapter 1. Preparing the Data

Introduction

Tip

Reading and writing CSV/TSV files with Python

Getting ready

Note

How to do it…

Note