Ebook731 pages5 hours

Learning pandas

Name: Learning pandas
Brand: Packt Publishing
Rating: 4.0 (1 reviews)

By Heydt Michael

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

About This Book

Employ the use of pandas for data analysis closely to focus more on analysis and less on programming
Get programmers comfortable in performing data exploration and analysis on Python using pandas
Step-by-step demonstration of using Python and pandas with interactive and incremental examples to facilitate learning

Who This Book Is For

If you are a Python programmer who wants to get started with performing data analysis using pandas and Python, this is the book for you. Some experience with statistical analysis would be helpful but is not mandatory.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateApr 16, 2015

ISBN9781783985135

Author

Heydt Michael

Related to Learning pandas

Related ebooks

Skip carousel

Learning Predictive Analytics with Python
Ebook
Learning Predictive Analytics with Python
byKumar Ashish
Rating: 0 out of 5 stars
0 ratings
Python Data Analysis
Ebook
Python Data Analysis
byIvan Idris
Rating: 4 out of 5 stars
4/5
Python Data Science Essentials
Ebook
Python Data Science Essentials
byBoschetti Alberto
Rating: 0 out of 5 stars
0 ratings
Python Data Analysis - Second Edition
Ebook
Python Data Analysis - Second Edition
byArmando Fandango
Rating: 0 out of 5 stars
0 ratings
Learning Data Mining with Python
Ebook
Learning Data Mining with Python
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Python: Real-World Data Science
Ebook
Python: Real-World Data Science
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Mastering Python for Data Science
Ebook
Mastering Python for Data Science
bySamir Madhavan
Rating: 3 out of 5 stars
3/5
Learning Jupyter
Ebook
Learning Jupyter
byDan Toomey
Rating: 5 out of 5 stars
5/5
Learning Data Mining with Python - Second Edition
Ebook
Learning Data Mining with Python - Second Edition
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Functional Python Programming
Ebook
Functional Python Programming
bySteven Lott
Rating: 0 out of 5 stars
0 ratings
R for Data Science
Ebook
R for Data Science
byDan Toomey
Rating: 5 out of 5 stars
5/5
Getting Started with Python Data Analysis
Ebook
Getting Started with Python Data Analysis
byVo.T.H Phuong
Rating: 0 out of 5 stars
0 ratings
Python: Journey from Novice to Expert
Ebook
Python: Journey from Novice to Expert
byDusty Phillips
Rating: 5 out of 5 stars
5/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
Introduction to R for Business Intelligence
Ebook
Introduction to R for Business Intelligence
byJay Gendron
Rating: 0 out of 5 stars
0 ratings
Learning Python
Ebook
Learning Python
byRomano Fabrizio
Rating: 5 out of 5 stars
5/5
Learning R Programming
Ebook
Learning R Programming
byKun Ren
Rating: 5 out of 5 stars
5/5
Python for Secret Agents
Ebook
Python for Secret Agents
bySteven F. Lott
Rating: 0 out of 5 stars
0 ratings
Python for Finance
Ebook
Python for Finance
byYuxing Yan
Rating: 3 out of 5 stars
3/5
Mastering Python Scientific Computing
Ebook
Mastering Python Scientific Computing
byMehta Hemant Kumar
Rating: 4 out of 5 stars
4/5
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
Python: Deeper Insights into Machine Learning
Ebook
Python: Deeper Insights into Machine Learning
byJohn Hearty
Rating: 0 out of 5 stars
0 ratings
Scientific Computing with Python 3
Ebook
Scientific Computing with Python 3
byJan Erik Solem
Rating: 0 out of 5 stars
0 ratings
Python Essentials
Ebook
Python Essentials
bySteven F. Lott
Rating: 5 out of 5 stars
5/5
Big Data Analytics
Ebook
Big Data Analytics
byVenkat Ankam
Rating: 0 out of 5 stars
0 ratings
Learning Tableau
Ebook
Learning Tableau
byJoshua N. Milligan
Rating: 0 out of 5 stars
0 ratings
Learning Predictive Analytics with R
Ebook
Learning Predictive Analytics with R
byMayor Eric
Rating: 0 out of 5 stars
0 ratings
Data Visualization: Representing Information on Modern Web
Ebook
Data Visualization: Representing Information on Modern Web
byAndy Kirk
Rating: 5 out of 5 stars
5/5
Mastering Python for Finance
Ebook
Mastering Python for Finance
byJames Ma Weiming
Rating: 5 out of 5 stars
5/5
The Data Science Workshop: A New, Interactive Approach to Learning Data Science
Ebook
The Data Science Workshop: A New, Interactive Approach to Learning Data Science
byAnthony So
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
Python: Learn Python in 24 Hours
Ebook
Python: Learn Python in 24 Hours
byAlex Nordeen
Rating: 4 out of 5 stars
4/5
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Ebook
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
byBrady Ellison
Rating: 5 out of 5 stars
5/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
Ebook
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
bySlobodan Dmitrović
Rating: 0 out of 5 stars
0 ratings
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
Ebook
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
byDavid DuRocher
Rating: 4 out of 5 stars
4/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Learn JavaScript in 24 Hours
Ebook
Learn JavaScript in 24 Hours
byAlex Nordeen
Rating: 3 out of 5 stars
3/5
Linux Command Line and Shell Scripting Bible
Ebook
Linux Command Line and Shell Scripting Bible
byRichard Blum
Rating: 3 out of 5 stars
3/5
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
Ebook
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
byTravis Plunk
Rating: 0 out of 5 stars
0 ratings
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Ebook
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
byMark Chan
Rating: 5 out of 5 stars
5/5
HTML in 30 Pages
Ebook
HTML in 30 Pages
byU.Q. Magnusson
Rating: 5 out of 5 stars
5/5
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
Ebook
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
byRobert Oliver
Rating: 0 out of 5 stars
0 ratings
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Mastering Windows PowerShell Scripting
Ebook
Mastering Windows PowerShell Scripting
byBrenton J.W. Blawat
Rating: 4 out of 5 stars
4/5
Photoshop For Beginners: Learn Adobe Photoshop cs5 Basics With Tutorials
Ebook
Photoshop For Beginners: Learn Adobe Photoshop cs5 Basics With Tutorials
byNisha Ramavat
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Unraveling Python's Syntax to Its Core With Brett Cannon
Podcast episode
Unraveling Python's Syntax to Its Core With Brett Cannon
byThe Real Python Podcast
100%
100% found this document useful
Measuring Your Python Learning Progress
Podcast episode
Measuring Your Python Learning Progress
byThe Real Python Podcast
100%
100% found this document useful
Advantages of Completing Small Python Projects
Podcast episode
Advantages of Completing Small Python Projects
byThe Real Python Podcast
0 ratings
0% found this document useful
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
Podcast episode
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
0 ratings
0% found this document useful
#059 - 10 Python clean code tips drawn from code reviews
Podcast episode
#059 - 10 Python clean code tips drawn from code reviews
byPybites Podcast
0 ratings
0% found this document useful
Learning Python Through Errors
Podcast episode
Learning Python Through Errors
byThe Real Python Podcast
0 ratings
0% found this document useful
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
Podcast episode
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
byDataFramed
100%
100% found this document useful
Improving the Learning Experience on Real Python
Podcast episode
Improving the Learning Experience on Real Python
byThe Real Python Podcast
0 ratings
0% found this document useful
Building a Platform Game With Arcade and Covering Python News Monthly
Podcast episode
Building a Platform Game With Arcade and Covering Python News Monthly
byThe Real Python Podcast
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
Exploring deep reinforcement learning: with Thomas Simonini of Hugging Face
Podcast episode
Exploring deep reinforcement learning: with Thomas Simonini of Hugging Face
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
Podcast episode
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
Effective Python and Python at Google Scale
Podcast episode
Effective Python and Python at Google Scale
byThe Real Python Podcast
0 ratings
0% found this document useful
386 The Top 10 Books To Learn Python - Simple Programmer Podcast: Have you ever wondered what are the best books to learn Python? "Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic...
Podcast episode
386 The Top 10 Books To Learn Python - Simple Programmer Podcast: Have you ever wondered what are the best books to learn Python? "Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic...
bySimple Programmer Podcast
0 ratings
0% found this document useful
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Podcast episode
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
byData Engineering Podcast
0 ratings
0% found this document useful
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
Podcast episode
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
byData Engineering Podcast
0 ratings
0% found this document useful
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
Podcast episode
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
byData Engineering Podcast
0 ratings
0% found this document useful
High Agency Pydantic > VC Backed Frameworks — with Jason Liu of Instructor
Podcast episode
High Agency Pydantic > VC Backed Frameworks — with Jason Liu of Instructor
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
Defining Success: Metrics and KPIs - Adam Sroka
Podcast episode
Defining Success: Metrics and KPIs - Adam Sroka
byDataTalks.Club
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
Podcast episode
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
Podcast episode
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
byData Engineering Podcast
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
Podcast episode
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
byData Engineering Podcast
0 ratings
0% found this document useful
Use Your Data Warehouse To Power Your Product Analytics With NetSpring: With the rise of the web and digital business came the need to understand how customers are interacting with the products and services that are being sold. Product analytics has grown into its own category and brought with it several services with generational differences in how they approach the problem. NetSpring is a warehouse-native product analytics service that allows you to gain powerful insights into your customers and their needs by combining your event streams with the rest of your business data. In this episode Priyendra Deshwal explains how NetSpring is designed to empower your product and data teams to build and explore insights around your products in a streamlined and maintainable workflow.
Podcast episode
Use Your Data Warehouse To Power Your Product Analytics With NetSpring: With the rise of the web and digital business came the need to understand how customers are interacting with the products and services that are being sold. Product analytics has grown into its own category and brought with it several services with generational differences in how they approach the problem. NetSpring is a warehouse-native product analytics service that allows you to gain powerful insights into your customers and their needs by combining your event streams with the rest of your business data. In this episode Priyendra Deshwal explains how NetSpring is designed to empower your product and data teams to build and explore insights around your products in a streamlined and maintainable workflow.
byData Engineering Podcast
0 ratings
0% found this document useful
Spanner Myths Busted with Pritam Shah and Vaibhav Govil: This week, we’re busting myths around Google Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. and host this episode and learn about the fantastic capabilities of Cloud Spanner. Our guests give us a quick run-down of Spanner database...
Podcast episode
Spanner Myths Busted with Pritam Shah and Vaibhav Govil: This week, we’re busting myths around Google Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. and host this episode and learn about the fantastic capabilities of Cloud Spanner. Our guests give us a quick run-down of Spanner database...
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Reflecting On The Past 6 Years Of Data Engineering: This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have been a number of generational shifts in how data engineering is done. In this episode I reflect on some of the major themes and take a brief look forward at some of the upcoming changes.
Podcast episode
Reflecting On The Past 6 Years Of Data Engineering: This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have been a number of generational shifts in how data engineering is done. In this episode I reflect on some of the major themes and take a brief look forward at some of the upcoming changes.
byData Engineering Podcast
0 ratings
0% found this document useful
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
Podcast episode
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
byData Engineering Podcast
0 ratings
0% found this document useful
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
Podcast episode
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
byData Engineering Podcast
0 ratings
0% found this document useful
Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service: A significant portion of the time spent by data engineering teams is on managing the workflows and operations of their pipelines. DataOps has arisen as a parallel set of practices to that of DevOps teams as a means of reducing wasted effort. Agile Data Engine is a platform designed to handle the infrastructure side of the DataOps equation, as well as providing the insights that you need to manage the human side of the workflow. In this episode Tevje Olin explains how the platform is implemented, the features that it provides to reduce the amount of effort required to keep your pipelines running, and how you can start using it in your own team.
Podcast episode
Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service: A significant portion of the time spent by data engineering teams is on managing the workflows and operations of their pipelines. DataOps has arisen as a parallel set of practices to that of DevOps teams as a means of reducing wasted effort. Agile Data Engine is a platform designed to handle the infrastructure side of the DataOps equation, as well as providing the insights that you need to manage the human side of the workflow. In this episode Tevje Olin explains how the platform is implemented, the features that it provides to reduce the amount of effort required to keep your pipelines running, and how you can start using it in your own team.
byData Engineering Podcast
0 ratings
0% found this document useful
New Trends in Serverless
Podcast episode
New Trends in Serverless
byThe Cloudcast
0 ratings
0% found this document useful

Skip carousel

Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
DJANGO Create A Database-driven Website
Linux Format
Article
DJANGO Create A Database-driven Website
Jun 4, 2019
The Django web framework was named after the famous guitarist Django Reinhardt and was first created by web developers at a small newspaper in Kansas. The main goals of Django is to enable fast development of complex websites with database needs. It
7 min read
Build A Static Analysis Development Pipeline
Linux Format
Article
Build A Static Analysis Development Pipeline
Jul 27, 2021
9 min read
How Image Recognition Works
APC
Article
How Image Recognition Works
Nov 4, 2019
4 min read
Tensor Flow 101
APC
Article
Tensor Flow 101
Jan 27, 2020
4 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
Code An Admin Back-end In Django
Linux Format
Article
Code An Admin Back-end In Django
Dec 13, 2022
Credit: www.djangoproject.com OUR EXPERT Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://
6 min read
Finish Your Cataloguing App
Linux Format
Article
Finish Your Cataloguing App
Jan 10, 2023
Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. In his spare time, Matt enjoys listening to music and reading. More featurepacked source code for this project can be downlo
7 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read
Code A Cataloguing Application In Python
Linux Format
Article
Code A Cataloguing Application In Python
Nov 15, 2022
Credit: www.djangoproject.com Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://github.com/mat
8 min read
Comparing Time Series Data Like A Pro
Linux Format
Article
Comparing Time Series Data Like A Pro
Jun 1, 2021
8 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Visualise Complex Data In Style Using Timelion
Linux Format
Article
Visualise Complex Data In Style Using Timelion
Oct 20, 2020
Simon Quain is a site reliability engineer who likes discovering open datasets online to play around with in the Elastic Stack. You’ve probably heard of Elasticsearch – the search engine that enables you to index and then quickly search through your
9 min read
Machine-learning On Your Android Phone?
APC
Article
Machine-learning On Your Android Phone?
Dec 30, 2019
4 min read
Visualise Smart- Home Sensor Data
Linux Format
Article
Visualise Smart- Home Sensor Data
Oct 17, 2023
8 min read
“We’re Learning As We Go And Accepting Any False Starts As Being A Part Of The Process”
PC Pro Magazine
Article
“We’re Learning As We Go And Accepting Any False Starts As Being A Part Of The Process”
Jul 8, 2021
6 min read
Inside APC
APC
Article
Inside APC
Jun 19, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Inside APC
APC
Article
Inside APC
Mar 20, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Pandas And Data
Linux Format
Article
Pandas And Data
Jul 27, 2021
Pandas can perform a variety of tasks including data loading, preparation and manipulation as well as data modelling and analysis. You can join, merge and reshape data with the help of Pandas, using data from different sources. As mentioned elsewhere
1 min read
Inside APC
APC
Article
Inside APC
Apr 20, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Inside APC
APC
Article
Inside APC
May 22, 2023
2 min read
Thriving As An Ecosystem Partner
The European Business Review
Article
Thriving As An Ecosystem Partner
Sep 30, 2022
Researching ecosystems that span industries from e-commerce and publishing to semiconductors and healthcare over the past decade, we found companies that have been successful for years by contributing to an ecosystem. Sometimes, by contributing as pa
10 min read
Accurate, Open Source IP-based Localisation
Linux Format
Article
Accurate, Open Source IP-based Localisation
Dec 14, 2021
8 min read
Inside APC
APC
Article
Inside APC
Feb 20, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Inside APC
APC
Article
Inside APC
Feb 20, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
The Big Tech Boost
Business Today
Article
The Big Tech Boost
Jan 5, 2024
5 min read
Is There A Way To Avoid Using Control Offset Groups In My Rig?
3D World
Article
Is There A Way To Avoid Using Control Offset Groups In My Rig?
Feb 23, 2021
2 min read

Related categories

Skip carousel

Reviews for Learning pandas

Rating: 4 out of 5 stars

4/5

1 rating0 reviews

Book preview

Learning pandas - Heydt Michael

Learning pandas

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. A Tour of pandas

pandas and why it is important

pandas and IPython Notebooks

Referencing pandas in the application

Primary pandas objects

The pandas Series object

The pandas DataFrame object

Loading data from files and the Web

Loading CSV data from files

Loading data from the Web

Simplicity of visualization of pandas data

Summary

2. Installing pandas

Getting Anaconda

Installing Anaconda

Installing Anaconda on Linux

Installing Anaconda on Mac OS X

Installing Anaconda on Windows

Ensuring pandas is up to date

Running a small pandas sample in IPython

Starting the IPython Notebook server

Installing and running IPython Notebooks

Using Wakari for pandas

Summary

3. NumPy for pandas

Installing and importing NumPy

Benefits and characteristics of NumPy arrays

Creating NumPy arrays and performing basic array operations

Selecting array elements

Logical operations on arrays

Slicing arrays

Reshaping arrays

Combining arrays

Splitting arrays

Useful numerical methods of NumPy arrays

Summary

4. The pandas Series Object

The Series object

Importing pandas

Creating Series

Size, shape, uniqueness, and counts of values

Peeking at data with heads, tails, and take

Looking up values in Series

Alignment via index labels

Arithmetic operations

The special case of Not-A-Number (NaN)

Boolean selection

Reindexing a Series

Modifying a Series in-place

Slicing a Series

Summary

5. The pandas DataFrame Object

Creating DataFrame from scratch

Example data

S&P 500

Monthly stock historical prices

Selecting columns of a DataFrame

Selecting rows and values of a DataFrame using the index

Slicing using the [] operator

Selecting rows by index label and location: .loc[] and .iloc[]

Selecting rows by index label and/or location: .ix[]

Scalar lookup by label or location using .at[] and .iat[]

Selecting rows of a DataFrame by Boolean selection

Modifying the structure and content of DataFrame

Renaming columns

Adding and inserting columns

Replacing the contents of a column

Deleting columns in a DataFrame

Adding rows to a DataFrame

Appending rows with .append()

Concatenating DataFrame objects with pd.concat()

Adding rows (and columns) via setting with enlargement

Removing rows from a DataFrame

Removing rows using .drop()

Removing rows using Boolean selection

Removing rows using a slice

Changing scalar values in a DataFrame

Arithmetic on a DataFrame

Resetting and reindexing

Hierarchical indexing

Summarized data and descriptive statistics

Summary

6. Accessing Data

Setting up the IPython notebook

CSV and Text/Tabular format

The sample CSV data set

Reading a CSV file into a DataFrame

Specifying the index column when reading a CSV file

Data type inference and specification

Specifying column names

Specifying specific columns to load

Saving DataFrame to a CSV file

General field-delimited data

Handling noise rows in field-delimited data

Reading and writing data in an Excel format

Reading and writing JSON files

Reading HTML data from the Web

Reading and writing HDF5 format files

Accessing data on the web and in the cloud

Reading and writing from/to SQL databases

Reading data from remote data services

Reading stock data from Yahoo! and Google Finance

Retrieving data from Yahoo! Finance Options

Reading economic data from the Federal Reserve Bank of St. Louis

Accessing Kenneth French's data

Reading from the World Bank

Summary

7. Tidying Up Your Data

What is tidying your data?

Setting up the IPython notebook

Working with missing data

Determining NaN values in Series and DataFrame objects

Selecting out or dropping missing data

How pandas handles NaN values in mathematical operations

Filling in missing data

Forward and backward filling of missing values

Filling using index labels

Interpolation of missing values

Handling duplicate data

Transforming Data

Mapping

Replacing values

Applying functions to transform data

Summary

8. Combining and Reshaping Data

Setting up the IPython notebook

Concatenating data

Merging and joining data

An overview of merges

Specifying the join semantics of a merge operation

Pivoting

Stacking and unstacking

Stacking using nonhierarchical indexes

Unstacking using hierarchical indexes

Melting

Performance benefits of stacked data

Summary

9. Grouping and Aggregating Data

Setting up the IPython notebook

The split, apply, and combine (SAC) pattern

Split

Data for the examples

Grouping by a single column's values

Accessing the results of grouping

Grouping using index levels

Apply

Applying aggregation functions to groups

The transformation of group data

An overview of transformation

Practical examples of transformation

Filtering groups

Discretization and Binning

Summary

10. Time-series Data

Setting up the IPython notebook

Representation of dates, time, and intervals

The datetime, day, and time objects

Timestamp objects

Timedelta

Introducing time-series data

DatetimeIndex

Creating time-series data with specific frequencies

Calculating new dates using offsets

Date offsets

Anchored offsets

Representing durations of time using Period objects

The Period object

PeriodIndex

Handling holidays using calendars

Normalizing timestamps using time zones

Manipulating time-series data

Shifting and lagging

Frequency conversion

Up and down resampling

Time-series moving-window operations

Summary

11. Visualization

Setting up the IPython notebook

Plotting basics with pandas

Creating time-series charts with .plot()

Adorning and styling your time-series plot

Adding a title and changing axes labels

Specifying the legend content and position

Specifying line colors, styles, thickness, and markers

Specifying tick mark locations and tick labels

Formatting axes tick date labels using formatters

Common plots used in statistical analyses

Bar plots

Histograms

Box and whisker charts

Area plots

Scatter plots

Density plot

The scatter plot matrix

Heatmaps

Multiple plots in a single chart

Summary

12. Applications to Finance

Setting up the IPython notebook

Obtaining and organizing stock data from Yahoo!

Plotting time-series prices

Plotting volume-series data

Calculating the simple daily percentage change

Calculating simple daily cumulative returns

Resampling data from daily to monthly returns

Analyzing distribution of returns

Performing a moving-average calculation

The comparison of average daily returns across stocks

The correlation of stocks based on the daily percentage change of the closing price

Volatility calculation

Determining risk relative to expected returns

Summary

Index

Learning pandas

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: April 2015

Production reference: 1090415

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78398-512-8

www.packtpub.com

Credits

Author

Michael Heydt

Reviewers

Bill Chambers

S. Shelly Jang

Arun Karunagath Rajeevan

Daniel Velkov

Adrian Wan

Commissioning Editor

Kartikey Pandey

Acquisition Editor

Neha Nagwekar

Content Development Editor

Akshay Nair

Technical Editors

Shashank Desai

Chinmay Puranik

Copy Editors

Roshni Banerjee

Pranjali Chury

Stuti Srivastava

Project Coordinator

Mary Alex

Proofreaders

Simran Bhogal

Paul Hindle

Linda Morris

Christopher Smith

Indexer

Monica Ajmera Mehta

Graphics

Sheetal Aute

Production Coordinator

Arvindkumar Gupta

Cover Work

Arvindkumar Gupta

About the Author

Michael Heydt is an independent consultant, educator, and trainer with nearly 30 years of professional software development experience, during which he focused on agile software design and implementation using advanced technologies in multiple verticals, including media, finance, energy, and healthcare. He holds an MS degree in mathematics and computer science from Drexel University and an executive master's of technology management degree from the University of Pennsylvania's School of Engineering and Wharton Business School. His studies and research have focused on technology management, software engineering, entrepreneurship, information retrieval, data sciences, and computational finance. Since 2005, he has been specializing in building energy and financial trading systems for major investment banks on Wall Street and for several global energy trading companies, utilizing .NET, C#, WPF, TPL, DataFlow, Python, R, Mono, iOS, and Android. His current interests include creating seamless applications using desktop, mobile, and wearable technologies, which utilize high concurrency, high availability, real-time data analytics, augmented and virtual reality, cloud services, messaging, computer vision, natural user interfaces, and software-defined networks. He is the author of numerous technology articles, papers, and books (Instant Lucene.NET, Learning pandas). He is a frequent speaker at .NET users' groups and various mobile and cloud conferences, and he regularly delivers webinars on advanced technologies.

About the Reviewers

Bill Chambers is a Python developer and data scientist currently pursuing a master of information management and systems degree at the UC Berkeley School of Information. Previously, he focused on data architecture and systems using marketing, sales, and customer analytics data. Bill is passionate about delivering actionable insights and innovative solutions using data.

You can find more information about him at http://www.billchambers.me.

S. Shelly Jang received her PhD degree in electrical engineering from the University of Washington and a master's degree in chemical and biological engineering from the University of British Columbia in 2014 and 2009, respectively. She was an Insight Data Science fellow in 2014. During her tenure, she built a web app that recommends crowd-verified treatment options for various medical conditions. She is currently a senior data scientist at AT&T Big Data. Exploring complex, large-scale data sets to build models and derive insights is just a part of her job.

In her free time, she participates in the Quantified Self community, sharing her insights on personal analytics and self-hacking.

Arun Karunagath Rajeevan is a senior consultant (products) in an exciting start-up, working as an architect and coder, and is a polyglot. He is currently involved in developing the best quality management suite in the supply chain management category.

Apart from this, he has experience in healthcare and multimedia (embedded) domains. When he is not working, he loves to travel and listen to music.

Daniel Velkov is a software engineer based in San Francisco, who has more than 10 years of programming experience. His biggest professional accomplishment was designing and implementing the search stack for MyLife.com—one of the major social websites in the US. Nowadays, he works on making Google search better. Besides Python and search, he has worked on several machine learning and data analysis-oriented projects. When he is not coding, he enjoys skiing, riding motorcycles, and exploring the Californian outdoors.

Adrian Wan is a physics and computer science major at Swarthmore College. After he graduates, he will be working at Nest, a Google company, as a software engineer and data scientist. His passion lies at the intersection of his two disciplines, where elegant mathematical models and explanations of real-life phenomena are brought to life and probed deeply with efficient, clean, and powerful code. He greatly enjoyed contributing to this book and hopes that you will be able to appreciate the power that pandas brings to Python.

You can find out more about him at http://awan1.github.io.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

This book is about learning to use pandas, an open source library for Python, which was created to enable Python to easily manipulate and perform powerful statistical and mathematical analyses on tabular and multidimensional datasets. The design of pandas and its power combined with the familiarity of Python have created explosive growth in its usage over the last several years, particularly among financial firms as well as those simply looking for practical tools for statistical and data analysis.

While there exist many excellent examples of using pandas to solve many domain-specific problems, it can be difficult to find a cohesive set of examples in a form that allows one to effectively learn and apply the features of pandas. The information required to learn practical skills in using pandas is distributed across many websites, slide shares, and videos, and is generally not in a form that gives an integrated guide to all of the features with practical examples in an easy-to-understand and applicable fashion.

This book is therefore intended to be a go-to reference for learning pandas. It will take you all the way from installation, through to creating one- and two-dimensional indexed data structures, to grouping data and slicing-and-dicing them, with common analyses used to demonstrate derivation of useful results. This will include the loading and saving of data from resources that are local and Internet-based and creating effective data visualizations that provide instant ability to visually realize insights into the meaning previously hidden within complex data.

What this book covers

Chapter 1, A Tour of pandas, is a hands-on introduction to the key features of pandas. It will give you a broad overview of the types of data tasks that can be performed with pandas. This chapter will set the groundwork for learning as all concepts introduced in this chapter will be expanded upon in subsequent chapters.

Chapter 2, Installing pandas, will show you how to install Anaconda Python and pandas on Windows, OS X, and Linux. This chapter also covers using the conda package manager to upgrade pandas and its dependent libraries to the most recent version.

Chapter 3, NumPy for pandas, will introduce you to concepts in NumPy, particularly NumPy arrays, which are core for understanding the pandas Series and DataFrame objects.

Chapter 4, The pandas Series Object, covers the pandas Series object and how it expands upon the functionality of the NumPy array to provide richer representation and manipulation of sequences of data through the use of high-performance indexes.

Chapter 5, The pandas DataFrame Object, introduces the primary data structure of pandas, the DataFrame object, and how it forms a two-dimensional representation of tabular data by aligning multiple Series objects along a common index to provide seamless access and manipulation across elements in multiple series that are related by a common index label.

Chapter 6, Accessing Data, shows how data can be loaded and saved from external sources into both Series and DataFrame objects. You will learn how to access data from multiple sources such as files, HTTP servers, database systems, and web services, as well as how to process data in CSV, HTML, and JSON formats.

Chapter 7, Tidying Up Your Data, instructs you on how to use the various tools provided by pandas for managing dirty and missing data.

Chapter 8, Combining and Reshaping Data, covers various techniques for combining, splitting, joining, and merging data located in multiple pandas objects, and then demonstrates how to reshape data using concepts such as pivots, stacking, and melting.

Chapter 9, Grouping and Aggregating Data, focuses on how to use pandas to group data to enable you to perform aggregate operations on grouped data to assist in deriving analytic results.

Chapter 10, Time-series Data, will instruct you on how to use pandas to represent sequences of information that is indexed by the progression of time. This chapter will first cover how pandas represents dates and time, as well as concepts such as periods, frequencies, time zones, and calendars. The focus then shifts to time-series data and various operations such as shifting, lagging, resampling, and moving window operations.

Chapter 11, Visualization, dives into the integration of pandas with matplotlib to visualize pandas data. This chapter will demonstrate how to represent and present many common statistical and financial data visualizations, including bar charts, histograms, scatter plots, area plots, density plots, and heat maps.

Chapter 12, Applications to Finance, brings together everything learned through the previous chapters with practical examples of using pandas to obtain, manipulate, analyze, and visualize stock data.

What you need for this book

This book assumes some familiarity with programming concepts, but those without programming experience, or specifically Python programming experience, will be comfortable with the examples as they focus on pandas constructs more than Python or programming. The examples are based on Anaconda Python 2.7 and pandas 0.15.1. If you do not have either installed, guidance will be given in Chapter 2, Installing pandas, on installing both on Windows, OS X, and Ubuntu systems. For those not interested in installing any software, instructions are also given on using the Warkari.io online Python data analysis service.

Who this book is for

If you are looking to get into data science and want to learn how to use the Python programming language for data analysis instead of other domain-specific data science tools such as R, then this book is for you. If you have used other data science packages and want to learn how to apply that knowledge to Python, then this book is also for you. Alternately, if you want to learn an additional tool or start with data science to enhance your career, then this book is for you.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: This information can be easily imported into DataFrame using the pd.read_csv() function as follows.

Any command-line / IPython input or output is written as follows:

In [1]: # import numpy and pandas, and DataFrame / Series import numpy as np import pandas as pd from pandas import DataFrame, Series

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: Clicking on the New Notebook button will present you with a notebook where you can start entering your pandas code.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <feedback@packtpub.com>, and mention the book title through the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. The code examples in the book are also publicly available on Wakari.io at https://wakari.io/sharing/bundle/LearningPandas/LearningPandas_Index.

Tip

Although great efforts are taken to use data that will reproduce the same output when you execute the samples, there is a small set of code that uses current data and hence the result of running those samples may vary from what is published in this book. These include In [39]: and In [40]: in Chapter 1, A Tour of pandas, which uses the data of the last three months of Google stock, as well as a small number of samples used in the later chapters that demonstrate the usage of date offsets centered on the current date.

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/5128OS_ColoredImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books — maybe a mistake in the text or the code — we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website, or added to any list of existing errata, under the Errata section of that title.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <questions@packtpub.com> if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. A Tour of pandas

In this chapter, we will take a look at pandas, which is an open source Python-based data analysis library. It provides high-performance and easy-to-use data structures and data analysis tools built with the Python programming language. The pandas library brings many of the good things from R, specifically the DataFrame objects and R packages such as plyr and reshape2, and places them in a single library that you can use in your Python applications.

The development of pandas was begun in 2008 by Wes McKinney when he worked at AQR Capital Management. It was opened sourced in 2009 and is currently supported and actively developed by various organizations and contributors. It was initially designed with finance in mind, specifically with its ability around time series data manipulation, but emphasizes the data manipulation part of the equation leaving statistical, financial, and other types of analyses to other Python libraries.

In this chapter, we will take a brief tour of pandas and some of the associated tools such as IPython notebooks. You will be introduced to a variety of concepts in pandas for data organization and manipulation in an effort to form both a base understanding and a frame of reference for deeper coverage in later sections of this book. By the end of this chapter, you will have a good understanding of the fundamentals of pandas and even be able to perform basic data manipulations. Also, you will be ready to continue with later portions of this book for more detailed understanding.

This chapter will introduce you to:

pandas and why it is important

IPython and IPython Notebooks

Referencing pandas in your application

The Series and DataFrame objects of pandas

How to load data from files and the Web

The simplicity of visualizing pandas data

Note

pandas is always lowercase by convention in pandas documentation, and this will be a convention followed by this book.

pandas and why it is important

pandas is a library containing high-level data structures and tools that have been created to assist a Python programmer to perform powerful data manipulations, and discover information in that data in a simple and fast way.

The simple and effective data analysis requires the ability to index, retrieve, tidy, reshape, combine, slice, and perform various analyses on both single and multidimensional data, including heterogeneous typed data that is automatically aligned along index labels. To enable these capabilities, pandas provides the following features (and many more not explicitly mentioned here):

High performance array and table structures for representation of homogenous and heterogeneous data sets: the Series and DataFrame objects

Flexible reshaping of data structure, allowing the ability to insert and delete both rows and columns of tabular data

Hierarchical indexing of data along multiple axes (both rows and columns), allowing multiple labels per data item

Labeling of series and tabular data to facilitate indexing and automatic alignment of data

Ability to easily identify and fix missing data, both in floating point and as non-floating point formats

Powerful grouping capabilities and a functionality to perform split-apply-combine operations on series and tabular data

Simple conversion from ragged and differently indexed data of both NumPy and Python data structures to pandas objects

Smart label-based slicing and subsetting of data sets, including intuitive and flexible merging, and joining of data with SQL-like constructs

Extensive I/O facilities to load and save data from multiple formats including CSV, Excel, relational and non-relational databases, HDF5 format, and JSON

Explicit support for time series-specific functionality, providing functionality for date range generation, moving window statistics, time shifting, lagging, and so on

Built-in support to retrieve and automatically parse data from various web-based data sources such as Yahoo!, Google Finance, the World Bank, and several others

For those desiring to get into data analysis and the emerging field of data science, pandas offers an excellent means for a Python programmer (or just an enthusiast) to learn data manipulation. For those just learning or coming from a statistical language like R, pandas can offer an excellent introduction to Python as a programming language.

pandas itself is not a data science toolkit. It does provide some statistical methods as a matter of convenience, but to draw conclusions from data, it leans upon other packages in the Python ecosystem, such as SciPy, NumPy, scikit-learn, and upon graphics libraries such as matplotlib and ggvis for data visualization. This is actually the strength of pandas over other languages such as R, as pandas applications are able to leverage an extensive network of robust Python frameworks already built and tested elsewhere.

In this book, we will look at how to use pandas for data manipulation, with a specific focus on gathering, cleaning, and manipulation of various forms of data using pandas. Detailed specifics of data science, finance, econometrics, social network analysis, Python, and IPython are left as reference. You can refer to some other excellent books on these topics already available at https://www.packtpub.com/.

pandas and IPython Notebooks

A popular means of using pandas is through the use of IPython Notebooks. IPython Notebooks provide a web-based interactive computational environment, allowing the combination of code, text, mathematics, plots, and right media into a web-based document. IPython Notebooks run in a browser and contain Python code that is run in a local or server-side Python session that the notebooks communicate with using WebSockets. Notebooks can also contain markup code and rich media content, and can be converted to other formats such as PDF, HTML, and slide shows.

The following is an example of an IPython Notebook from the IPython website (http://ipython.org/notebook.html) that demonstrates the rich capabilities of notebooks:

IPython Notebooks are not strictly required for using pandas and can be installed into your development environment independently or alongside of pandas. During the course of this this book, we will install pandas and an IPython Notebook server. You will be able to perform code examples in the text directly in an IPython console interpreter, and the examples will be packaged as notebooks that can be run with a local notebook server. Additionally, the workbooks will be available online for easy and immediate access at https://wakari.io/sharing/bundle/LearningPandas/LearningPandas_Index.

Note

To learn more about IPython Notebooks, visit the notebooks site at http://ipython.org/ipython-doc/dev/notebook/, and for more in-depth coverage, refer to another book, Learning IPython for Interactive Computing and Data Visualization, Cyrille Rossant, Packt Publishing.

Referencing pandas in the application

All pandas programs and examples in this book will always start by importing pandas (and NumPy) into the Python environment. There is a common convention used in many publications (web and print) of importing pandas and NumPy, which will also be used throughout this book. All workbooks and examples for chapters will start with code similar to the following to initialize the pandas library within Python.

In [1]: # import numpy and pandas, and DataFrame / Series import numpy as np import pandas as pd from pandas import DataFrame, Series

# Set some pandas options pd.set_option('display.notebook_repr_html', False) pd.set_option('display.max_columns', 10) pd.set_option('display.max_rows', 10)

# And some items for matplotlib %matplotlib inline import matplotlib.pyplot as plt pd.options.display.mpl_style = 'default'

NumPy and pandas go hand-in-hand, as much of pandas is built on NumPy. It is, therefore, very convenient to import NumPy and put it in a np. namespace. Likewise, pandas is imported and referenced with a pd. prefix. Since DataFrame and Series objects of pandas are used so frequently, the third line then imports the Series and DataFrame objects into the global namespace so that we can use them without a pd. prefix.

The three pd.set_options() method calls set up some defaults for IPython Notebooks and console output from pandas. These specify how wide and high any output will be, and how many columns it will contain. They can be used to modify the output of IPython and pandas to fit your personal needs to display results. The options set here are convenient for formatting the output of the examples to the constraints of the text.

Primary pandas objects

A programmer of pandas will spend most of their time using two primary objects provided by the pandas framework: Series and DataFrame. The DataFrame objects will be the overall workhorse of pandas and the most frequently used as they provide the means to manipulate tabular and heterogeneous data.

The pandas Series object

The base data structure of pandas is the Series object, which is designed to operate similar to a NumPy array but also adds index capabilities. A simple way to create a Series object is by initializing a Series object with a Python array or Python list.

In [2]: # create a four item DataFrame s = Series([1, 2, 3, 4]) s

Out [2]: 0 1 1 2 2 3 3 4 dtype: int64

This has created a pandas Series from the list. Notice that printing the series resulted in what appears to be two columns of data. The first column in the output is not a column of the Series object, but the index labels. The second column is the values of the Series object. Each row represents the index label and the value for that label. This Series was created without specifying an index, so pandas automatically creates indexes starting at zero and increasing by one.

Elements of a Series object can be accessed through the index using []. This informs the Series which value to return given one or more index values (referred to in pandas as labels). The following code retrieves the items in the series with labels 1 and 3.

In [3]: # return a Series with the rows with labels 1 and 3 s[[1, 3]]

Out [3]: 1 2 3 4 dtype: int64

Note

It is important to note that the lookup here is not by zero-based positions 1 and 3 like an array, but by the values in the index.

A Series object can be created with a user-defined index by specifying the labels for the index using the index parameter.

In [4]: # create a series using an explicit

Enjoying the preview?

Page 1 of 1

Learning pandas

About this ebook

Heydt Michael

Read more from Heydt Michael

Related authors

Related to Learning pandas

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for Learning pandas

What did you think?

Book preview

Learning pandas - Heydt Michael

Table of Contents

Learning pandas

Learning pandas

Credits

About the Author

About the Reviewers

Support files, eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Tip

Downloading the color images of this book

Errata

Piracy

Questions

Chapter 1. A Tour of pandas

Note

pandas and why it is important

pandas and IPython Notebooks

Note

Referencing pandas in the application

Primary pandas objects

The pandas Series object

Note