Python 3 Text Processing with NLTK 3 Cookbook

Ebook866 pages6 hours

Python 3 Text Processing with NLTK 3 Cookbook

Name: Python 3 Text Processing with NLTK 3 Cookbook
Brand: Packt Publishing
Rating: 4.0 (1 reviews)

By Jacob Perkins

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

This book is intended for Python programmers interested in learning how to do natural language processing. Maybe you’ve learned the limits of regular expressions the hard way, or you’ve realized that human language cannot be deterministically parsed like a computer language. Perhaps you have more text than you know what to do with, and need automated ways to analyze and structure that text. This Cookbook will show you how to train and use statistical language models to process text in ways that are practically impossible with standard programming tools. A basic knowledge of Python and the basic text processing concepts is expected. Some experience with regular expressions will also be helpful.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateAug 26, 2014

ISBN9781782167860

Author

Jacob Perkins

Related to Python 3 Text Processing with NLTK 3 Cookbook

Related ebooks

Skip carousel

Python Data Visualization Cookbook
Ebook
Python Data Visualization Cookbook
byMilovanović Igor
Rating: 4 out of 5 stars
4/5
TensorFlow Machine Learning Cookbook
Ebook
TensorFlow Machine Learning Cookbook
byNick McClure
Rating: 4 out of 5 stars
4/5
Python Data Analysis Cookbook
Ebook
Python Data Analysis Cookbook
byIvan Idris
Rating: 5 out of 5 stars
5/5
Python Data Visualization Cookbook - Second Edition
Ebook
Python Data Visualization Cookbook - Second Edition
byMilovanović Igor
Rating: 0 out of 5 stars
0 ratings
Modern Python Cookbook
Ebook
Modern Python Cookbook
bySteven F. Lott
Rating: 5 out of 5 stars
5/5
Apache Spark for Data Science Cookbook
Ebook
Apache Spark for Data Science Cookbook
byPadma Priya Chitturi
Rating: 0 out of 5 stars
0 ratings
Flask Framework Cookbook
Ebook
Flask Framework Cookbook
byShalabh Aggarwal
Rating: 5 out of 5 stars
5/5
Building Machine Learning Systems with Python
Ebook
Building Machine Learning Systems with Python
byWilli Richert
Rating: 4 out of 5 stars
4/5
Advanced Machine Learning with Python
Ebook
Advanced Machine Learning with Python
byJohn Hearty
Rating: 0 out of 5 stars
0 ratings
Natural Language Processing with Python: Natural Language Processing Using NLTK
Ebook
Natural Language Processing with Python: Natural Language Processing Using NLTK
byFrank Millstein
Rating: 4 out of 5 stars
4/5
Hands-On Web Scraping with Python: Perform advanced scraping operations using various Python libraries and tools such as Selenium, Regex, and others
Ebook
Hands-On Web Scraping with Python: Perform advanced scraping operations using various Python libraries and tools such as Selenium, Regex, and others
byAnish Chapagain
Rating: 0 out of 5 stars
0 ratings
Mastering Objectoriented Python
Ebook
Mastering Objectoriented Python
bySteven F. Lott
Rating: 5 out of 5 stars
5/5
matplotlib Plotting Cookbook
Ebook
matplotlib Plotting Cookbook
byAlexandre Devert
Rating: 5 out of 5 stars
5/5
Python: Real World Machine Learning
Ebook
Python: Real World Machine Learning
byJohn Hearty
Rating: 0 out of 5 stars
0 ratings
Natural Language Processing in Action: Understanding, analyzing, and generating text with Python
Ebook
Natural Language Processing in Action: Understanding, analyzing, and generating text with Python
byHannes Hapke
Rating: 0 out of 5 stars
0 ratings
Python: Real-World Data Science
Ebook
Python: Real-World Data Science
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Python: Deeper Insights into Machine Learning
Ebook
Python: Deeper Insights into Machine Learning
byJohn Hearty
Rating: 0 out of 5 stars
0 ratings
Python Data Science Essentials
Ebook
Python Data Science Essentials
byBoschetti Alberto
Rating: 0 out of 5 stars
0 ratings
NumPy: Beginner's Guide - Third Edition
Ebook
NumPy: Beginner's Guide - Third Edition
byIvan Idris
Rating: 4 out of 5 stars
4/5
Data Science Bookcamp: Five real-world Python projects
Ebook
Data Science Bookcamp: Five real-world Python projects
byLeonard Apeltsin
Rating: 5 out of 5 stars
5/5
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
Ebook
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
byIvan Vasilev
Rating: 0 out of 5 stars
0 ratings
Large Scale Machine Learning with Python
Ebook
Large Scale Machine Learning with Python
byBastiaan Sjardin
Rating: 2 out of 5 stars
2/5
Hands-On Deep Learning Algorithms with Python: Master deep learning algorithms with extensive math by implementing them using TensorFlow
Ebook
Hands-On Deep Learning Algorithms with Python: Master deep learning algorithms with extensive math by implementing them using TensorFlow
bySudharsan Ravichandiran
Rating: 0 out of 5 stars
0 ratings
Mastering Python Design Patterns
Ebook
Mastering Python Design Patterns
bySakis Kasampalis
Rating: 0 out of 5 stars
0 ratings
Deep Learning with PyTorch
Ebook
Deep Learning with PyTorch
byLuca Pietro Giovanni Antiga
Rating: 5 out of 5 stars
5/5
Python: Master the Art of Design Patterns
Ebook
Python: Master the Art of Design Patterns
byDusty Phillips
Rating: 4 out of 5 stars
4/5
Mastering Social Media Mining with Python
Ebook
Mastering Social Media Mining with Python
byMarco Bonzanini
Rating: 5 out of 5 stars
5/5
Python Deep Learning
Ebook
Python Deep Learning
byValentino Zocca
Rating: 5 out of 5 stars
5/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
Learning pandas
Ebook
Learning pandas
byHeydt Michael
Rating: 4 out of 5 stars
4/5

Programming For You

Skip carousel

The HTML and CSS Workshop: Learn to build your own websites and kickstart your career as a web designer or developer
Ebook
The HTML and CSS Workshop: Learn to build your own websites and kickstart your career as a web designer or developer
byLewis Coulson
Rating: 5 out of 5 stars
5/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
Game Development with Unreal Engine 5: Learn the Basics of Game Development in Unreal Engine 5 (English Edition)
Ebook
Game Development with Unreal Engine 5: Learn the Basics of Game Development in Unreal Engine 5 (English Edition)
byMitchell Lynn
Rating: 0 out of 5 stars
0 ratings
Python: Learn Python in 24 Hours
Ebook
Python: Learn Python in 24 Hours
byAlex Nordeen
Rating: 4 out of 5 stars
4/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
Ebook
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
bySlobodan Dmitrović
Rating: 0 out of 5 stars
0 ratings
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
Ebook
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
byPaul Richards
Rating: 0 out of 5 stars
0 ratings
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
Ebook
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
byEric Vargas
Rating: 0 out of 5 stars
0 ratings
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Microsoft Certification: Complete step by step guide to pass all Microsoft Exams and get certifications real and unique practice tests included
Ebook
Microsoft Certification: Complete step by step guide to pass all Microsoft Exams and get certifications real and unique practice tests included
byDavid Mayer
Rating: 5 out of 5 stars
5/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Linux: Learn in 24 Hours
Ebook
Linux: Learn in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
Ebook
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
byTravis Plunk
Rating: 0 out of 5 stars
0 ratings
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Ebook
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
byMark Chan
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles
Ebook
Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles
byPatrick McNeil
Rating: 4 out of 5 stars
4/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Beginning Programming with Python For Dummies
Ebook
Beginning Programming with Python For Dummies
byJohn Paul Mueller
Rating: 3 out of 5 stars
3/5
Learn JavaScript in 24 Hours
Ebook
Learn JavaScript in 24 Hours
byAlex Nordeen
Rating: 3 out of 5 stars
3/5
Problem Solving in C and Python: Programming Exercises and Solutions, Part 1
Ebook
Problem Solving in C and Python: Programming Exercises and Solutions, Part 1
byYana Kortsarts
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Leveling Up Natural Language Processing with Transfer Learning: An interview with Paul Azunre about how you can use transfer learning techniques to build more flexible natural language processing systems and reduce the requirements for labelled data.
Podcast episode
Leveling Up Natural Language Processing with Transfer Learning: An interview with Paul Azunre about how you can use transfer learning techniques to build more flexible natural language processing systems and reduce the requirements for labelled data.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
Podcast episode
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
Advantages of Completing Small Python Projects
Podcast episode
Advantages of Completing Small Python Projects
byThe Real Python Podcast
0 ratings
0% found this document useful
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
Podcast episode
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
0 ratings
0% found this document useful
Let's Talk About Natural Language Processing: This episode reboots our podcast with the theme of Natural Language Processing for the next few months. We begin with introductions of Yoshi and Linh Da and then get into a broad discussion about natural language processing: what it is, what some of...
Podcast episode
Let's Talk About Natural Language Processing: This episode reboots our podcast with the theme of Natural Language Processing for the next few months. We begin with introductions of Yoshi and Linh Da and then get into a broad discussion about natural language processing: what it is, what some of...
byData Skeptic
0 ratings
0% found this document useful
001 Introduction: Teaches the high level fundamentals of machine learning and artificial intelligence. I teach basic intuition, algorithms, and math. I discuss languages and frameworks, deep learning, and more. ocdevel.com/mlg/1 for notes and resources
Podcast episode
001 Introduction: Teaches the high level fundamentals of machine learning and artificial intelligence. I teach basic intuition, algorithms, and math. I discuss languages and frameworks, deep learning, and more. ocdevel.com/mlg/1 for notes and resources
byMachine Learning Guide
0 ratings
0% found this document useful
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
Podcast episode
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
byThe Web Platform Podcast
100%
100% found this document useful
Anaconda + Pyston and more: with Peter Wang, CEO of Anaconda
Podcast episode
Anaconda + Pyston and more: with Peter Wang, CEO of Anaconda
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
Measuring Your Python Learning Progress
Podcast episode
Measuring Your Python Learning Progress
byThe Real Python Podcast
100%
100% found this document useful
Unraveling Python's Syntax to Its Core With Brett Cannon
Podcast episode
Unraveling Python's Syntax to Its Core With Brett Cannon
byThe Real Python Podcast
100%
100% found this document useful
Exploring deep reinforcement learning: with Thomas Simonini of Hugging Face
Podcast episode
Exploring deep reinforcement learning: with Thomas Simonini of Hugging Face
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
Podcast episode
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
Podcast episode
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
Python, Django, and Channels: with Andrew Godwin, creator of Django Channels
Podcast episode
Python, Django, and Channels: with Andrew Godwin, creator of Django Channels
byThe Changelog: Software Development, Open Source
0 ratings
0% found this document useful
2: Pytest vs Unittest vs Nose: Choosing a test framework
Podcast episode
2: Pytest vs Unittest vs Nose: Choosing a test framework
byTest and Code
0 ratings
0% found this document useful
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
Podcast episode
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
#111 The Rise of the Julia Programming Language
Podcast episode
#111 The Rise of the Julia Programming Language
byDataFramed
0 ratings
0% found this document useful
#37 Prophet, Time Series & Causal Inference, with Sean Taylor
Podcast episode
#37 Prophet, Time Series & Causal Inference, with Sean Taylor
byLearning Bayesian Statistics
0 ratings
0% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
Podcast episode
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful
Crafting Interpreters With Bob Nystrom: Bob Nystrom is the author of Crafting Interpreters. I speak with Nystrom about building a programming language and an interpreter implementation for it. We talk about parsing, the difference between compiler and interpreters and a lot more. If you are...
Podcast episode
Crafting Interpreters With Bob Nystrom: Bob Nystrom is the author of Crafting Interpreters. I speak with Nystrom about building a programming language and an interpreter implementation for it. We talk about parsing, the difference between compiler and interpreters and a lot more. If you are...
byCoRecursive: Coding Stories
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
Podcast episode
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
#51 Francois Chollet - Intelligence and Generalisation
Podcast episode
#51 Francois Chollet - Intelligence and Generalisation
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
Podcast episode
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
byData Engineering Podcast
0 ratings
0% found this document useful
MLA 020 Kubeflow: Conversation with Dirk-Jan Kubeflow (vs cloud native solutions like SageMaker) - Data Scientist at Dept Agency . (From the website:) The Machine Learning Toolkit for Kubernetes. The Kubeflow project is dedicated to making deployments of...
Podcast episode
MLA 020 Kubeflow: Conversation with Dirk-Jan Kubeflow (vs cloud native solutions like SageMaker) - Data Scientist at Dept Agency . (From the website:) The Machine Learning Toolkit for Kubernetes. The Kubeflow project is dedicated to making deployments of...
byMachine Learning Guide
0 ratings
0% found this document useful
MLA 018 Descript: (Optional episode) just showcasing a cool application using machine learning Dept uses Descript for some of their podcasting. I'm using it like a maniac, I think they're surprised at how into it I am. Check out the transcript & see how it...
Podcast episode
MLA 018 Descript: (Optional episode) just showcasing a cool application using machine learning Dept uses Descript for some of their podcasting. I'm using it like a maniac, I think they're surprised at how into it I am. Check out the transcript & see how it...
byMachine Learning Guide
0 ratings
0% found this document useful
MLG 002 What is AI, ML, DS: Show notes at . What is artificial intelligence and machine learning? What's the difference? How about compared to statistics and data science? AI history.
Podcast episode
MLG 002 What is AI, ML, DS: Show notes at . What is artificial intelligence and machine learning? What's the difference? How about compared to statistics and data science? AI history.
byMachine Learning Guide
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
Podcast episode
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
byDataFramed
0 ratings
0% found this document useful

Skip carousel

Tensor Flow 101
APC
Article
Tensor Flow 101
Jan 27, 2020
4 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
How Image Recognition Works
APC
Article
How Image Recognition Works
Nov 4, 2019
4 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
DJANGO Create A Database-driven Website
Linux Format
Article
DJANGO Create A Database-driven Website
Jun 4, 2019
The Django web framework was named after the famous guitarist Django Reinhardt and was first created by web developers at a small newspaper in Kansas. The main goals of Django is to enable fast development of complex websites with database needs. It
7 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
The Fundamental Limits of Machine Learning
Nautilus
Article
The Fundamental Limits of Machine Learning
Sep 20, 2016
5 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
Build A Static Analysis Development Pipeline
Linux Format
Article
Build A Static Analysis Development Pipeline
Jul 27, 2021
9 min read
Abbreviations
Simply Crochet
Article
Abbreviations
Dec 17, 2019
1 min read
A Place For Everything
Outdoor Photographer
Article
A Place For Everything
Aug 10, 2019
9 min read
Putting Your Words In Order
Writing Magazine
Article
Putting Your Words In Order
Jun 3, 2021
5 min read
This Is the Right Way to Capitalize Headlines
The Millions
Article
This Is the Right Way to Capitalize Headlines
Aug 13, 2019
Welcome back to Do You Copy, our series on copyediting (copy editing? copy-editing?) that investigates some of the editorial life’s deepest mysteries. The post This Is the Right Way to Capitalize Headlines appeared first on The Millions.
2 min read
ORGANIZING YOUR PHOTOS, PART 2: Using Keywords
Outdoor Photographer
Article
ORGANIZING YOUR PHOTOS, PART 2: Using Keywords
Sep 14, 2019
10 min read
“There’s No Single ‘Best’ Language To Learn. I Think The Real Key Is To Learn How To Write Code”
PC Pro Magazine
Article
“There’s No Single ‘Best’ Language To Learn. I Think The Real Key Is To Learn How To Write Code”
Oct 8, 2022
9 min read
Magnus’ Marketing Minute
Shop Talk
Article
Magnus’ Marketing Minute
Aug 1, 2022
Michael Magnus is an advertising professional who supports the growth of the leather industry through his marketing agency, Magnus Opus. Among his client partnerships are Silver Creek Leather Co., manufacturers of Realeather® Crafts and Lace, and Jim
5 min read
TIPS & TACTICS TO PROVE YOUR FAMILY TREE IS CORRECT
Family Tree UK
Article
TIPS & TACTICS TO PROVE YOUR FAMILY TREE IS CORRECT
Mar 10, 2023
It’s tempting to bemoan the inaccurate and tangled tree branches that we (sometimes? often?) find online. However, it’s worth stepping back and asking ourselves, firstly whether we are absolutely sure our own research is correct? And secondly how to
5 min read
Bringing It All Together
Australian Flying
Article
Bringing It All Together
Apr 15, 2024
7 min read
LOOPMASTERS Loopcloud
Music Tech Focus
Article
LOOPMASTERS Loopcloud
Oct 5, 2017
6 min read
Research Logs:
Family Tree UK
Article
Research Logs:
Mar 8, 2024
You’ve probably heard the story of Theseus and the Minotaur: how the young hero wound his way through a fiendish labyrinth, to slay the fearsome beast hidden in its confines. But do you recall how Theseus escaped from the maze, when others had been t
9 min read
GENEALOGY GADGETS & APPS FOR ALL OCCASIONS!
Family Tree UK
Article
GENEALOGY GADGETS & APPS FOR ALL OCCASIONS!
Sep 9, 2022
To help you keep track of the web addresses & tools provided on the Genealogy Gadgets pages we are listing them at https:// familytr.ee/gadgets Last issue Family Tree reader Chris West wrote in, seeking help from a fellow reader who was an experience
2 min read
Deciphering Cable Patterns
The Knitter
Article
Deciphering Cable Patterns
Sep 29, 2022
8 min read
Publishing A Hit Book Using Linux
TechLife
Article
Publishing A Hit Book Using Linux
Mar 7, 2022
8 min read
Monumental Yet Accessible
Equus
Article
Monumental Yet Accessible
Nov 28, 2023
Deb Bennett, PhD, has been a columnist and consulting editor at EQUUS for more than three decades. During that time, she has helped thousands of readers better understand their horses from the skeleton out—what makes them work and how humans can best
4 min read
Recording research Findings
Writing Magazine
Article
Recording research Findings
Aug 5, 2021
3 min read
The World’s Great Literature … in Morse Code!
CQ Amateur Radio
Article
The World’s Great Literature … in Morse Code!
Aug 1, 2023
10 min read
Of Acronyms, Et Al!
Classic Motorcycle Mechanics
Article
Of Acronyms, Et Al!
Jul 18, 2022
1 min read
Cryptic Classroom #2: Double Definitions
Games World of Puzzles
Article
Cryptic Classroom #2: Double Definitions
Aug 12, 2021
3 min read
‘Simplifying Does Not Mean Boring And It Does Not Mean Less’
Car India
Article
‘Simplifying Does Not Mean Boring And It Does Not Mean Less’
May 6, 2022
4 min read
Solve Word Puzzles With Clever Code
Linux Format
Article
Solve Word Puzzles With Clever Code
Apr 2, 2024
Matt Holder is an IT professional of 15 years, Linux user for over 20 years, homeautomation fan and selfprofessed geek. The full source code can be downloaded from https://github.com/mattmole/LXF-Countdown-Word-Solver We are going to create a program
8 min read

Related categories

Skip carousel

Reviews for Python 3 Text Processing with NLTK 3 Cookbook

Rating: 4 out of 5 stars

4/5

1 rating0 reviews

Book preview

Python 3 Text Processing with NLTK 3 Cookbook - Jacob Perkins

Python 3 Text Processing with NLTK 3 Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Tokenizing Text and WordNet Basics

Introduction

Tokenizing text into sentences

Getting ready

How to do it...

How it works...

There's more...

Tokenizing sentences in other languages

See also

Tokenizing sentences into words

How to do it...

How it works...

There's more...

Separating contractions

PunktWordTokenizer

WordPunctTokenizer

See also

Tokenizing sentences using regular expressions

Getting ready

How to do it...

How it works...

There's more...

Simple whitespace tokenizer

See also

Training a sentence tokenizer

Getting ready

How to do it...

How it works...

There's more...

See also

Filtering stopwords in a tokenized sentence

Getting ready

How to do it...

How it works...

There's more...

See also

Looking up Synsets for a word in WordNet

Getting ready

How to do it...

How it works...

There's more...

Working with hypernyms

Part of speech (POS)

See also

Looking up lemmas and synonyms in WordNet

How to do it...

How it works...

There's more...

All possible synonyms

Antonyms

See also

Calculating WordNet Synset similarity

How to do it...

How it works...

There's more...

Comparing verbs

Path and Leacock Chordorow (LCH) similarity

See also

Discovering word collocations

Getting ready

How to do it...

How it works...

There's more...

Scoring functions

Scoring ngrams

See also

2. Replacing and Correcting Words

Introduction

Stemming words

How to do it...

How it works...

There's more...

The LancasterStemmer class

The RegexpStemmer class

The SnowballStemmer class

See also

Lemmatizing words with WordNet

Getting ready

How to do it...

How it works...

There's more...

Combining stemming with lemmatization

See also

Replacing words matching regular expressions

Getting ready

How to do it...

How it works...

There's more...

Replacement before tokenization

See also

Removing repeating characters

Getting ready

How to do it...

How it works...

There's more...

See also

Spelling correction with Enchant

Getting ready

How to do it...

How it works...

There's more...

The en_GB dictionary

Personal word lists

See also

Replacing synonyms

Getting ready

How to do it...

How it works...

There's more...

CSV synonym replacement

YAML synonym replacement

See also

Replacing negations with antonyms

How to do it...

How it works...

There's more...

See also

3. Creating Custom Corpora

Introduction

Setting up a custom corpus

Getting ready

How to do it...

How it works...

There's more...

Loading a YAML file

See also

Creating a wordlist corpus

Getting ready

How to do it...

How it works...

There's more...

Names wordlist corpus

English words corpus

See also

Creating a part-of-speech tagged word corpus

Getting ready

How to do it...

How it works...

There's more...

Customizing the word tokenizer

Customizing the sentence tokenizer

Customizing the paragraph block reader

Customizing the tag separator

Converting tags to a universal tagset

See also

Creating a chunked phrase corpus

Getting ready

How to do it...

How it works...

There's more...

Tree leaves

Treebank chunk corpus

CoNLL2000 corpus

See also

Creating a categorized text corpus

Getting ready

How to do it...

How it works...

There's more...

Category file

Categorized tagged corpus reader

Categorized corpora

See also

Creating a categorized chunk corpus reader

Getting ready

How to do it...

How it works...

There's more...

Categorized CoNLL chunk corpus reader

See also

Lazy corpus loading

How to do it...

How it works...

There's more...

Creating a custom corpus view

How to do it...

How it works...

There's more...

Block reader functions

Pickle corpus view

Concatenated corpus view

See also

Creating a MongoDB-backed corpus reader

Getting ready

How to do it...

How it works...

There's more...

See also

Corpus editing with file locking

Getting ready

How to do it...

How it works...

4. Part-of-speech Tagging

Introduction

Default tagging

Getting ready

How to do it...

How it works...

There's more...

Evaluating accuracy

Tagging sentences

Untagging a tagged sentence

See also

Training a unigram part-of-speech tagger

How to do it...

How it works...

There's more...

Overriding the context model

Minimum frequency cutoff

See also

Combining taggers with backoff tagging

How to do it...

How it works...

There's more...

Saving and loading a trained tagger with pickle

See also

Training and combining ngram taggers

Getting ready

How to do it...

How it works...

There's more...

Quadgram tagger

See also

Creating a model of likely word tags

How to do it...

How it works...

There's more...

See also

Tagging with regular expressions

Getting ready

How to do it...

How it works...

There's more...

See also

Affix tagging

How to do it...

How it works...

There's more...

Working with min_stem_length

See also

Training a Brill tagger

How to do it...

How it works...

There's more...

Tracing

See also

Training the TnT tagger

How to do it...

How it works...

There's more...

Controlling the beam search

Significance of capitalization

See also

Using WordNet for tagging

Getting ready

How to do it...

How it works...

See also

Tagging proper names

How to do it...

How it works...

See also

Classifier-based tagging

How to do it...

How it works...

There's more...

Detecting features with a custom feature detector

Setting a cutoff probability

Using a pre-trained classifier

See also

Training a tagger with NLTK-Trainer

How to do it...

How it works...

There's more...

Saving a pickled tagger

Training on a custom corpus

Training with universal tags

Analyzing a tagger against a tagged corpus

Analyzing a tagged corpus

See also

5. Extracting Chunks

Introduction

Chunking and chinking with regular expressions

Getting ready

How to do it...

How it works...

There's more...

Parsing different chunk types

Parsing alternative patterns

Chunk rule with context

See also

Merging and splitting chunks with regular expressions

How to do it...

How it works...

There's more...

Specifying rule descriptions

See also

Expanding and removing chunks with regular expressions

How to do it...

How it works...

There's more...

See also

Partial parsing with regular expressions

How to do it...

How it works...

There's more...

The ChunkScore metrics

Looping and tracing chunk rules

See also

Training a tagger-based chunker

How to do it...

How it works...

There's more...

Using different taggers

See also

Classification-based chunking

How to do it...

How it works...

There's more...

Using a different classifier builder

See also

Extracting named entities

How to do it...

How it works...

There's more...

Binary named entity extraction

See also

Extracting proper noun chunks

How to do it...

How it works...

There's more...

See also

Extracting location chunks

How to do it...

How it works...

There's more...

See also

Training a named entity chunker

How to do it...

How it works...

There's more...

See also

Training a chunker with NLTK-Trainer

How to do it...

How it works...

There's more...

Saving a pickled chunker

Training a named entity chunker

Training on a custom corpus

Training on parse trees

Analyzing a chunker against a chunked corpus

Analyzing a chunked corpus

See also

6. Transforming Chunks and Trees

Introduction

Filtering insignificant words from a sentence

Getting ready

How to do it...

How it works...

There's more...

See also

Correcting verb forms

Getting ready

How to do it...

How it works...

See also

Swapping verb phrases

How to do it...

How it works...

There's more...

See also

Swapping noun cardinals

How to do it...

How it works...

See also

Swapping infinitive phrases

How to do it...

How it works...

There's more...

See also

Singularizing plural nouns

How to do it...

How it works...

See also

Chaining chunk transformations

How to do it...

How it works...

There's more...

See also

Converting a chunk tree to text

How to do it...

How it works...

There's more...

See also

Flattening a deep tree

Getting ready

How to do it...

How it works...

There's more...

The cess_esp and cess_cat treebank

See also

Creating a shallow tree

How to do it...

How it works...

See also

Converting tree labels

Getting ready

How to do it...

How it works...

See also

7. Text Classification

Introduction

Bag of words feature extraction

How to do it...

How it works...

There's more...

Filtering stopwords

Including significant bigrams

See also

Training a Naive Bayes classifier

Getting ready

How to do it...

How it works...

There's more...

Classification probability

Most informative features

Training estimator

Manual training

See also

Training a decision tree classifier

How to do it...

How it works...

There's more...

Controlling uncertainty with entropy_cutoff

Controlling tree depth with depth_cutoff

Controlling decisions with support_cutoff

See also

Training a maximum entropy classifier

Getting ready

How to do it...

How it works...

There's more...

Megam algorithm

See also

Training scikit-learn classifiers

Getting ready

How to do it...

How it works...

There's more...

Comparing Naive Bayes algorithms

Training with logistic regression

Training with LinearSVC

See also

Measuring precision and recall of a classifier

How to do it...

How it works...

There's more...

F-measure

See also

Calculating high information words

How to do it...

How it works...

There's more...

The MaxentClassifier class with high information words

The DecisionTreeClassifier class with high information words

The SklearnClassifier class with high information words

See also

Combining classifiers with voting

Getting ready

How to do it...

How it works...

See also

Classifying with multiple binary classifiers

Getting ready

How to do it...

How it works...

There's more...

See also

Training a classifier with NLTK-Trainer

How to do it...

How it works...

There's more...

Saving a pickled classifier

Using different training instances

The most informative features

The Maxent and LogisticRegression classifiers

SVMs

Combining classifiers

High information words and bigrams

Cross-fold validation

Analyzing a classifier

See also

8. Distributed Processing and Handling Large Datasets

Introduction

Distributed tagging with execnet

Getting ready

How to do it...

How it works...

There's more...

Creating multiple channels

Local versus remote gateways

See also

Distributed chunking with execnet

Getting ready

How to do it...

How it works...

There's more...

Python subprocesses

See also

Parallel list processing with execnet

How to do it...

How it works...

There's more...

See also

Storing a frequency distribution in Redis

Getting ready

How to do it...

How it works...

There's more...

See also

Storing a conditional frequency distribution in Redis

Getting ready

How to do it...

How it works...

There's more...

See also

Storing an ordered dictionary in Redis

Getting ready

How to do it...

How it works...

There's more...

See also

Distributed word scoring with Redis and execnet

Getting ready

How to do it...

How it works...

There's more...

See also

9. Parsing Specific Data Types

Introduction

Parsing dates and times with dateutil

Getting ready

How to do it...

How it works...

There's more...

See also

Timezone lookup and conversion

Getting ready

How to do it...

How it works...

There's more...

Local timezone

Custom offsets

See also

Extracting URLs from HTML with lxml

Getting ready

How to do it...

How it works...

There's more...

Extracting links directly

Parsing HTML from URLs or files

Extracting links with XPaths

See also

Cleaning and stripping HTML

Getting ready

How to do it...

How it works...

There's more...

See also

Converting HTML entities with BeautifulSoup

Getting ready

How to do it...

How it works...

There's more...

Extracting URLs with BeautifulSoup

See also

Detecting and converting character encodings

Getting ready

How to do it...

How it works...

There's more...

Converting to ASCII

UnicodeDammit conversion

See also

A. Penn Treebank Part-of-speech Tags

Index

Python 3 Text Processing with NLTK 3 Cookbook

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: November 2010

Second edition: August 2014

Production reference: 1200814

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78216-785-3

www.packtpub.com

Cover image by Faiz Fattohi (<faizfattohi@gmail.com>)

Credits

Author

Jacob Perkins

Reviewers

Patrick Chan

Mohit Goenka

Lihang Li

Maurice HT Ling

Jing (Dave) Tian

Commissioning Editor

Kevin Colaco

Acquisition Editor

Kevin Colaco

Content Development Editor

Amey Varangaonkar

Technical Editor

Humera Shaikh

Copy Editors

Deepa Nambiar

Laxmi Subramanian

Project Coordinator

Leena Purkait

Proofreaders

Simran Bhogal

Paul Hindle

Indexers

Hemangini Bari

Mariammal Chettiyar

Tejal Soni

Priya Subramani

Graphics

Ronak Dhruv

Disha Haria

Yuvraj Mannari

Abhinash Sahu

Production Coordinators

Pooja Chiplunkar

Conidon Miranda

Nilesh R. Mohite

Cover Work

Pooja Chiplunkar

About the Author

Jacob Perkins is the cofounder and CTO of Weotta, a local search company. Weotta uses NLP and machine learning to create powerful and easy-to-use natural language search for what to do and where to go.

He is the author of Python Text Processing with NLTK 2.0 Cookbook, Packt Publishing, and has contributed a chapter to the Bad Data Handbook, O'Reilly Media. He writes about NLTK, Python, and other technology topics at http://streamhacker.com.

To demonstrate the capabilities of NLTK and natural language processing, he developed http://text-processing.com, which provides simple demos and NLP APIs for commercial use. He has contributed to various open source projects, including NLTK, and created NLTK-Trainer to simplify the process of training NLTK models. For more information, visit https://github.com/japerk/nltk-trainer.

I would like to thank my friends and family for their part in making this book possible. And thanks to the editors and reviewers at Packt Publishing for their helpful feedback and suggestions. Finally, this book wouldn't be possible without the fantastic NLTK project and team: http://www.nltk.org/.

About the Reviewers

Patrick Chan is an avid Python programmer and uses Python extensively for data processing.

I would like to thank my beautiful wife, Thanh Tuyen, for her endless patience and understanding in putting up with my various late night  hacking sessions.

Mohit Goenka is a software developer in the Yahoo Mail team. Earlier, he graduated from the University of Southern California (USC) with a Master's degree in Computer Science. His thesis focused on Game Theory and Human Behavior concepts as applied in real-world security games. He also received an award for academic excellence from the Office of International Services at the University of Southern California. He has showcased his presence in various realms of computers including artificial intelligence, machine learning, path planning, multiagent systems, neural networks, computer vision, computer networks, and operating systems.

During his tenure as a student, he won multiple competitions cracking codes and presented his work on Detection of Untouched UFOs to a wide range of audience. Not only is he a software developer by profession, but coding is also his hobby. He spends most of his free time learning about new technology and developing his skills.

What adds feather to his cap is his poetic skills. Some of his works are part of the University of Southern California Libraries archive under the cover of The Lewis Carroll collection. In addition to this, he has made significant contributions by volunteering his time to serve the community.

Lihang Li received his BE degree in Mechanical Engineering from Huazhong University of Science and Technology (HUST), China, in 2012, and now is pursuing his MS degree in Computer Vision at National Laboratory of Pattern Recognition (NLPR) from the Institute of Automation, Chinese Academy of Sciences (IACAS).

As a graduate student, he is focusing on Computer Vision and specially on vision-based SLAM algorithms. In his free time, he likes to take part in open source activities and is now the President of the Open Source Club, Chinese Academy of Sciences. Also, building a multicopter is his hobby and he is with a team called OpenDrone from BLUG (Beijing Linux User Group).

His interests include Linux, open source, cloud computing, virtualization, computer vision, operating systems, machine learning, data mining, and a variety of programming languages.

You can find him by visiting his personal website http://hustcalm.me.

Many thanks to my girlfriend Jingjing Shao, who is always with me. Also, I must thank the entire team at Packt Publishing, I would like to thank Kartik who is a very good Project Coordinator. I would also like to thank the other reviewers; though we haven't met, I'm really happy working with you.

Maurice HT Ling completed his PhD in Bioinformatics and BSc (Hons) in Molecular and Cell Biology from The University of Melbourne. He is currently a Research Fellow in Nanyang Technological University, Singapore, and an Honorary Fellow in The University of Melbourne, Australia. He co-edits The Python Papers and co-founded the Python User Group (Singapore), where he has been serving as the executive committee member since 2010. His research interests lie in life—biological life, and artificial life and artificial intelligence—and in using computer science and statistics as tools to understand life and its numerous aspects. His personal website is http://maurice.vodien.com.

Jing (Dave) Tian is now a graduate research fellow and a PhD student in the Computer and Information Science and Engineering (CISE) department at the University of Florida. His research direction involves system security, embedded system security, trusted computing, and static analysis for security and virtualization. He is interested in Linux kernel hacking and compilers. He also spent a year on AI and machine learning directions and taught classes on Intro to Problem Solving using Python and Operating System in the Computer Science department at the University of Oregon. Before that, he worked as a software developer in the Linux Control Platform (LCP) group in Alcatel-Lucent (former Lucent Technologies) R&D for around 4 years. He has got BS and ME degrees of EE in China. His website is http://davejingtian.org.

I would like to thank the author of the book, who has made a good job for both Python and NLTK. I would also like to thank to the editors of the book, who made this book perfect and offered me the opportunity to review such a nice book.

www.PacktPub.com

Support files, eBooks, discount offers, and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by Packt

Copy and paste, print and bookmark content

On demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

Natural language processing is used everywhere, from search engines such as Google or Weotta, to voice interfaces such as Siri or Dragon NaturallySpeaking. Python's Natural Language Toolkit (NLTK) is a suite of libraries that has become one of the best tools for prototyping and building natural language processing systems.

Python 3 Text Processing with NLTK 3 Cookbook is your handy and illustrative guide, which will walk you through many natural language processing techniques in a step-by-step manner. It will demystify the dark arts of text mining and language processing using the comprehensive Natural Language Toolkit.

This book cuts short the preamble, ignores pedagogy, and lets you dive right into the techniques of text processing with a practical hands-on approach.

Get started by learning how to tokenize text into words and sentences, then explore the WordNet lexical dictionary. Learn the basics of stemming and lemmatization. Discover various ways to replace words and perform spelling corrections. Create your own corpora and custom corpus readers, including a MongoDB-based corpus reader. Use part-of-speech taggers to annotate words. Create and transform chunked phrase trees and named entities using partial parsing and chunk transformations. Dig into feature extraction and text classification for sentiment analysis. Learn how to process large amount of text with distributed processing and NoSQL databases.

This book will teach you all that and more, in a hands-on learn-by-doing manner. Become an expert in using NLTK for Natural Language Processing with this useful companion.

What this book covers

Chapter 1, Tokenizing Text and WordNet Basics, covers how to tokenize text into sentences and words, then look up those words in the WordNet lexical dictionary.

Chapter 2, Replacing and Correcting Words, demonstrates various word replacement and correction techniques, including stemming, lemmatization, and using the Enchant spelling dictionary.

Chapter 3, Creating Custom Corpora, explains how to use corpus readers and create custom corpora. It also covers how to use some of the corpora that come with NLTK.

Chapter 4, Part-of-speech Tagging, shows how to annotate a sentence of words with part-of-speech tags, and how to train your own custom part-of-speech tagger.

Chapter 5, Extracting Chunks, covers the chunking process, also known as partial parsing, which can identify phrases and named entities in a sentence. It also explains how to train your own custom chunker and create specific named entity recognizers.

Chapter 6, Transforming Chunks and Trees, demonstrates how to transform chunk phrases and parse trees in various ways.

Chapter 7, Text Classification, shows how to transform text into feature dictionaries, and how to train a text classifier for sentiment analysis. It also covers multi-label classification and classifier evaluation metrics.

Chapter 8, Distributed Processing and Handling Large Datasets, discusses how to use execnet for distributed natural language processing and how to use Redis for storing large datasets.

Chapter 9, Parsing Specific Data Types, covers various Python modules that are useful for parsing specific kinds of data, such as datetimes and HTML.

Appendix A, Penn Treebank Part-of-speech Tags, shows a table of Treebank part-of-speech tags, that is a useful reference for Chapter 3, Creating Custom Corpora, and Chapter 4, Part-of-speech Tagging.

What you need for this book

You will need Python 3 and the listed Python packages. For this book, I used Python 3.3.5. To install the packages, you can use pip (https://pypi.python.org/pypi/pip/). The following is the list of the packages in requirements format with the version number used while writing this

Enjoying the preview?

Page 1 of 1

Python 3 Text Processing with NLTK 3 Cookbook

About this ebook

Jacob Perkins

Read more from Jacob Perkins

Related authors

Related to Python 3 Text Processing with NLTK 3 Cookbook

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for Python 3 Text Processing with NLTK 3 Cookbook

What did you think?

Book preview

Python 3 Text Processing with NLTK 3 Cookbook - Jacob Perkins

Table of Contents

Python 3 Text Processing with NLTK 3 Cookbook

Python 3 Text Processing with NLTK 3 Cookbook

Credits

About the Author

About the Reviewers

Support files, eBooks, discount offers, and more

Why Subscribe?

Preface

What this book covers

What you need for this book