Python 3 Text Processing with NLTK 3 Cookbook
4/5
()
About this ebook
Read more from Jacob Perkins
Natural Language Processing: Python and NLTK Rating: 0 out of 5 stars0 ratingsPython Text Processing with NLTK 2.0 Cookbook: LITE Rating: 4 out of 5 stars4/5
Related to Python 3 Text Processing with NLTK 3 Cookbook
Related ebooks
Python Data Visualization Cookbook Rating: 4 out of 5 stars4/5TensorFlow Machine Learning Cookbook Rating: 4 out of 5 stars4/5Python Data Analysis Cookbook Rating: 5 out of 5 stars5/5Python Data Visualization Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsModern Python Cookbook Rating: 5 out of 5 stars5/5Apache Spark for Data Science Cookbook Rating: 0 out of 5 stars0 ratingsFlask Framework Cookbook Rating: 5 out of 5 stars5/5Building Machine Learning Systems with Python Rating: 4 out of 5 stars4/5Advanced Machine Learning with Python Rating: 0 out of 5 stars0 ratingsNatural Language Processing with Python: Natural Language Processing Using NLTK Rating: 4 out of 5 stars4/5Mastering Objectoriented Python Rating: 5 out of 5 stars5/5matplotlib Plotting Cookbook Rating: 5 out of 5 stars5/5Python: Real World Machine Learning Rating: 0 out of 5 stars0 ratingsNatural Language Processing in Action: Understanding, analyzing, and generating text with Python Rating: 0 out of 5 stars0 ratingsPython: Real-World Data Science Rating: 0 out of 5 stars0 ratingsPython: Deeper Insights into Machine Learning Rating: 0 out of 5 stars0 ratingsPython Data Science Essentials Rating: 0 out of 5 stars0 ratingsNumPy: Beginner's Guide - Third Edition Rating: 4 out of 5 stars4/5Data Science Bookcamp: Five real-world Python projects Rating: 5 out of 5 stars5/5Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch Rating: 0 out of 5 stars0 ratingsLarge Scale Machine Learning with Python Rating: 2 out of 5 stars2/5Mastering Python Design Patterns Rating: 0 out of 5 stars0 ratingsDeep Learning with PyTorch Rating: 5 out of 5 stars5/5Python: Master the Art of Design Patterns Rating: 4 out of 5 stars4/5Mastering Social Media Mining with Python Rating: 5 out of 5 stars5/5Python Deep Learning Rating: 5 out of 5 stars5/5Learning pandas Rating: 4 out of 5 stars4/5
Programming For You
Game Development with Unreal Engine 5: Learn the Basics of Game Development in Unreal Engine 5 (English Edition) Rating: 0 out of 5 stars0 ratingsPython: Learn Python in 24 Hours Rating: 4 out of 5 stars4/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Python: For Beginners A Crash Course Guide To Learn Python in 1 Week Rating: 4 out of 5 stars4/5Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards Rating: 0 out of 5 stars0 ratingsSQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days Rating: 5 out of 5 stars5/5The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application Rating: 0 out of 5 stars0 ratingsSQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5HTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5Linux: Learn in 24 Hours Rating: 5 out of 5 stars5/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 0 out of 5 stars0 ratingsExcel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5SQL All-in-One For Dummies Rating: 3 out of 5 stars3/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles Rating: 4 out of 5 stars4/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Beginning Programming with Python For Dummies Rating: 3 out of 5 stars3/5Learn JavaScript in 24 Hours Rating: 3 out of 5 stars3/5Problem Solving in C and Python: Programming Exercises and Solutions, Part 1 Rating: 5 out of 5 stars5/5
Reviews for Python 3 Text Processing with NLTK 3 Cookbook
1 rating0 reviews
Book preview
Python 3 Text Processing with NLTK 3 Cookbook - Jacob Perkins
Table of Contents
Python 3 Text Processing with NLTK 3 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Tokenizing Text and WordNet Basics
Introduction
Tokenizing text into sentences
Getting ready
How to do it...
How it works...
There's more...
Tokenizing sentences in other languages
See also
Tokenizing sentences into words
How to do it...
How it works...
There's more...
Separating contractions
PunktWordTokenizer
WordPunctTokenizer
See also
Tokenizing sentences using regular expressions
Getting ready
How to do it...
How it works...
There's more...
Simple whitespace tokenizer
See also
Training a sentence tokenizer
Getting ready
How to do it...
How it works...
There's more...
See also
Filtering stopwords in a tokenized sentence
Getting ready
How to do it...
How it works...
There's more...
See also
Looking up Synsets for a word in WordNet
Getting ready
How to do it...
How it works...
There's more...
Working with hypernyms
Part of speech (POS)
See also
Looking up lemmas and synonyms in WordNet
How to do it...
How it works...
There's more...
All possible synonyms
Antonyms
See also
Calculating WordNet Synset similarity
How to do it...
How it works...
There's more...
Comparing verbs
Path and Leacock Chordorow (LCH) similarity
See also
Discovering word collocations
Getting ready
How to do it...
How it works...
There's more...
Scoring functions
Scoring ngrams
See also
2. Replacing and Correcting Words
Introduction
Stemming words
How to do it...
How it works...
There's more...
The LancasterStemmer class
The RegexpStemmer class
The SnowballStemmer class
See also
Lemmatizing words with WordNet
Getting ready
How to do it...
How it works...
There's more...
Combining stemming with lemmatization
See also
Replacing words matching regular expressions
Getting ready
How to do it...
How it works...
There's more...
Replacement before tokenization
See also
Removing repeating characters
Getting ready
How to do it...
How it works...
There's more...
See also
Spelling correction with Enchant
Getting ready
How to do it...
How it works...
There's more...
The en_GB dictionary
Personal word lists
See also
Replacing synonyms
Getting ready
How to do it...
How it works...
There's more...
CSV synonym replacement
YAML synonym replacement
See also
Replacing negations with antonyms
How to do it...
How it works...
There's more...
See also
3. Creating Custom Corpora
Introduction
Setting up a custom corpus
Getting ready
How to do it...
How it works...
There's more...
Loading a YAML file
See also
Creating a wordlist corpus
Getting ready
How to do it...
How it works...
There's more...
Names wordlist corpus
English words corpus
See also
Creating a part-of-speech tagged word corpus
Getting ready
How to do it...
How it works...
There's more...
Customizing the word tokenizer
Customizing the sentence tokenizer
Customizing the paragraph block reader
Customizing the tag separator
Converting tags to a universal tagset
See also
Creating a chunked phrase corpus
Getting ready
How to do it...
How it works...
There's more...
Tree leaves
Treebank chunk corpus
CoNLL2000 corpus
See also
Creating a categorized text corpus
Getting ready
How to do it...
How it works...
There's more...
Category file
Categorized tagged corpus reader
Categorized corpora
See also
Creating a categorized chunk corpus reader
Getting ready
How to do it...
How it works...
There's more...
Categorized CoNLL chunk corpus reader
See also
Lazy corpus loading
How to do it...
How it works...
There's more...
Creating a custom corpus view
How to do it...
How it works...
There's more...
Block reader functions
Pickle corpus view
Concatenated corpus view
See also
Creating a MongoDB-backed corpus reader
Getting ready
How to do it...
How it works...
There's more...
See also
Corpus editing with file locking
Getting ready
How to do it...
How it works...
4. Part-of-speech Tagging
Introduction
Default tagging
Getting ready
How to do it...
How it works...
There's more...
Evaluating accuracy
Tagging sentences
Untagging a tagged sentence
See also
Training a unigram part-of-speech tagger
How to do it...
How it works...
There's more...
Overriding the context model
Minimum frequency cutoff
See also
Combining taggers with backoff tagging
How to do it...
How it works...
There's more...
Saving and loading a trained tagger with pickle
See also
Training and combining ngram taggers
Getting ready
How to do it...
How it works...
There's more...
Quadgram tagger
See also
Creating a model of likely word tags
How to do it...
How it works...
There's more...
See also
Tagging with regular expressions
Getting ready
How to do it...
How it works...
There's more...
See also
Affix tagging
How to do it...
How it works...
There's more...
Working with min_stem_length
See also
Training a Brill tagger
How to do it...
How it works...
There's more...
Tracing
See also
Training the TnT tagger
How to do it...
How it works...
There's more...
Controlling the beam search
Significance of capitalization
See also
Using WordNet for tagging
Getting ready
How to do it...
How it works...
See also
Tagging proper names
How to do it...
How it works...
See also
Classifier-based tagging
How to do it...
How it works...
There's more...
Detecting features with a custom feature detector
Setting a cutoff probability
Using a pre-trained classifier
See also
Training a tagger with NLTK-Trainer
How to do it...
How it works...
There's more...
Saving a pickled tagger
Training on a custom corpus
Training with universal tags
Analyzing a tagger against a tagged corpus
Analyzing a tagged corpus
See also
5. Extracting Chunks
Introduction
Chunking and chinking with regular expressions
Getting ready
How to do it...
How it works...
There's more...
Parsing different chunk types
Parsing alternative patterns
Chunk rule with context
See also
Merging and splitting chunks with regular expressions
How to do it...
How it works...
There's more...
Specifying rule descriptions
See also
Expanding and removing chunks with regular expressions
How to do it...
How it works...
There's more...
See also
Partial parsing with regular expressions
How to do it...
How it works...
There's more...
The ChunkScore metrics
Looping and tracing chunk rules
See also
Training a tagger-based chunker
How to do it...
How it works...
There's more...
Using different taggers
See also
Classification-based chunking
How to do it...
How it works...
There's more...
Using a different classifier builder
See also
Extracting named entities
How to do it...
How it works...
There's more...
Binary named entity extraction
See also
Extracting proper noun chunks
How to do it...
How it works...
There's more...
See also
Extracting location chunks
How to do it...
How it works...
There's more...
See also
Training a named entity chunker
How to do it...
How it works...
There's more...
See also
Training a chunker with NLTK-Trainer
How to do it...
How it works...
There's more...
Saving a pickled chunker
Training a named entity chunker
Training on a custom corpus
Training on parse trees
Analyzing a chunker against a chunked corpus
Analyzing a chunked corpus
See also
6. Transforming Chunks and Trees
Introduction
Filtering insignificant words from a sentence
Getting ready
How to do it...
How it works...
There's more...
See also
Correcting verb forms
Getting ready
How to do it...
How it works...
See also
Swapping verb phrases
How to do it...
How it works...
There's more...
See also
Swapping noun cardinals
How to do it...
How it works...
See also
Swapping infinitive phrases
How to do it...
How it works...
There's more...
See also
Singularizing plural nouns
How to do it...
How it works...
See also
Chaining chunk transformations
How to do it...
How it works...
There's more...
See also
Converting a chunk tree to text
How to do it...
How it works...
There's more...
See also
Flattening a deep tree
Getting ready
How to do it...
How it works...
There's more...
The cess_esp and cess_cat treebank
See also
Creating a shallow tree
How to do it...
How it works...
See also
Converting tree labels
Getting ready
How to do it...
How it works...
See also
7. Text Classification
Introduction
Bag of words feature extraction
How to do it...
How it works...
There's more...
Filtering stopwords
Including significant bigrams
See also
Training a Naive Bayes classifier
Getting ready
How to do it...
How it works...
There's more...
Classification probability
Most informative features
Training estimator
Manual training
See also
Training a decision tree classifier
How to do it...
How it works...
There's more...
Controlling uncertainty with entropy_cutoff
Controlling tree depth with depth_cutoff
Controlling decisions with support_cutoff
See also
Training a maximum entropy classifier
Getting ready
How to do it...
How it works...
There's more...
Megam algorithm
See also
Training scikit-learn classifiers
Getting ready
How to do it...
How it works...
There's more...
Comparing Naive Bayes algorithms
Training with logistic regression
Training with LinearSVC
See also
Measuring precision and recall of a classifier
How to do it...
How it works...
There's more...
F-measure
See also
Calculating high information words
How to do it...
How it works...
There's more...
The MaxentClassifier class with high information words
The DecisionTreeClassifier class with high information words
The SklearnClassifier class with high information words
See also
Combining classifiers with voting
Getting ready
How to do it...
How it works...
See also
Classifying with multiple binary classifiers
Getting ready
How to do it...
How it works...
There's more...
See also
Training a classifier with NLTK-Trainer
How to do it...
How it works...
There's more...
Saving a pickled classifier
Using different training instances
The most informative features
The Maxent and LogisticRegression classifiers
SVMs
Combining classifiers
High information words and bigrams
Cross-fold validation
Analyzing a classifier
See also
8. Distributed Processing and Handling Large Datasets
Introduction
Distributed tagging with execnet
Getting ready
How to do it...
How it works...
There's more...
Creating multiple channels
Local versus remote gateways
See also
Distributed chunking with execnet
Getting ready
How to do it...
How it works...
There's more...
Python subprocesses
See also
Parallel list processing with execnet
How to do it...
How it works...
There's more...
See also
Storing a frequency distribution in Redis
Getting ready
How to do it...
How it works...
There's more...
See also
Storing a conditional frequency distribution in Redis
Getting ready
How to do it...
How it works...
There's more...
See also
Storing an ordered dictionary in Redis
Getting ready
How to do it...
How it works...
There's more...
See also
Distributed word scoring with Redis and execnet
Getting ready
How to do it...
How it works...
There's more...
See also
9. Parsing Specific Data Types
Introduction
Parsing dates and times with dateutil
Getting ready
How to do it...
How it works...
There's more...
See also
Timezone lookup and conversion
Getting ready
How to do it...
How it works...
There's more...
Local timezone
Custom offsets
See also
Extracting URLs from HTML with lxml
Getting ready
How to do it...
How it works...
There's more...
Extracting links directly
Parsing HTML from URLs or files
Extracting links with XPaths
See also
Cleaning and stripping HTML
Getting ready
How to do it...
How it works...
There's more...
See also
Converting HTML entities with BeautifulSoup
Getting ready
How to do it...
How it works...
There's more...
Extracting URLs with BeautifulSoup
See also
Detecting and converting character encodings
Getting ready
How to do it...
How it works...
There's more...
Converting to ASCII
UnicodeDammit conversion
See also
A. Penn Treebank Part-of-speech Tags
Index
Python 3 Text Processing with NLTK 3 Cookbook
Python 3 Text Processing with NLTK 3 Cookbook
Copyright © 2014 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2010
Second edition: August 2014
Production reference: 1200814
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78216-785-3
www.packtpub.com
Cover image by Faiz Fattohi (<faizfattohi@gmail.com>)
Credits
Author
Jacob Perkins
Reviewers
Patrick Chan
Mohit Goenka
Lihang Li
Maurice HT Ling
Jing (Dave) Tian
Commissioning Editor
Kevin Colaco
Acquisition Editor
Kevin Colaco
Content Development Editor
Amey Varangaonkar
Technical Editor
Humera Shaikh
Copy Editors
Deepa Nambiar
Laxmi Subramanian
Project Coordinator
Leena Purkait
Proofreaders
Simran Bhogal
Paul Hindle
Indexers
Hemangini Bari
Mariammal Chettiyar
Tejal Soni
Priya Subramani
Graphics
Ronak Dhruv
Disha Haria
Yuvraj Mannari
Abhinash Sahu
Production Coordinators
Pooja Chiplunkar
Conidon Miranda
Nilesh R. Mohite
Cover Work
Pooja Chiplunkar
About the Author
Jacob Perkins is the cofounder and CTO of Weotta, a local search company. Weotta uses NLP and machine learning to create powerful and easy-to-use natural language search for what to do and where to go.
He is the author of Python Text Processing with NLTK 2.0 Cookbook, Packt Publishing, and has contributed a chapter to the Bad Data Handbook, O'Reilly Media. He writes about NLTK, Python, and other technology topics at http://streamhacker.com.
To demonstrate the capabilities of NLTK and natural language processing, he developed http://text-processing.com, which provides simple demos and NLP APIs for commercial use. He has contributed to various open source projects, including NLTK, and created NLTK-Trainer to simplify the process of training NLTK models. For more information, visit https://github.com/japerk/nltk-trainer.
I would like to thank my friends and family for their part in making this book possible. And thanks to the editors and reviewers at Packt Publishing for their helpful feedback and suggestions. Finally, this book wouldn't be possible without the fantastic NLTK project and team: http://www.nltk.org/.
About the Reviewers
Patrick Chan is an avid Python programmer and uses Python extensively for data processing.
I would like to thank my beautiful wife, Thanh Tuyen, for her endless patience and understanding in putting up with my various late night hacking sessions.
Mohit Goenka is a software developer in the Yahoo Mail team. Earlier, he graduated from the University of Southern California (USC) with a Master's degree in Computer Science. His thesis focused on Game Theory and Human Behavior concepts as applied in real-world security games. He also received an award for academic excellence from the Office of International Services at the University of Southern California. He has showcased his presence in various realms of computers including artificial intelligence, machine learning, path planning, multiagent systems, neural networks, computer vision, computer networks, and operating systems.
During his tenure as a student, he won multiple competitions cracking codes and presented his work on Detection of Untouched UFOs to a wide range of audience. Not only is he a software developer by profession, but coding is also his hobby. He spends most of his free time learning about new technology and developing his skills.
What adds feather to his cap is his poetic skills. Some of his works are part of the University of Southern California Libraries archive under the cover of The Lewis Carroll collection. In addition to this, he has made significant contributions by volunteering his time to serve the community.
Lihang Li received his BE degree in Mechanical Engineering from Huazhong University of Science and Technology (HUST), China, in 2012, and now is pursuing his MS degree in Computer Vision at National Laboratory of Pattern Recognition (NLPR) from the Institute of Automation, Chinese Academy of Sciences (IACAS).
As a graduate student, he is focusing on Computer Vision and specially on vision-based SLAM algorithms. In his free time, he likes to take part in open source activities and is now the President of the Open Source Club, Chinese Academy of Sciences. Also, building a multicopter is his hobby and he is with a team called OpenDrone from BLUG (Beijing Linux User Group).
His interests include Linux, open source, cloud computing, virtualization, computer vision, operating systems, machine learning, data mining, and a variety of programming languages.
You can find him by visiting his personal website http://hustcalm.me.
Many thanks to my girlfriend Jingjing Shao, who is always with me. Also, I must thank the entire team at Packt Publishing, I would like to thank Kartik who is a very good Project Coordinator. I would also like to thank the other reviewers; though we haven't met, I'm really happy working with you.
Maurice HT Ling completed his PhD in Bioinformatics and BSc (Hons) in Molecular and Cell Biology from The University of Melbourne. He is currently a Research Fellow in Nanyang Technological University, Singapore, and an Honorary Fellow in The University of Melbourne, Australia. He co-edits The Python Papers and co-founded the Python User Group (Singapore), where he has been serving as the executive committee member since 2010. His research interests lie in life—biological life, and artificial life and artificial intelligence—and in using computer science and statistics as tools to understand life and its numerous aspects. His personal website is http://maurice.vodien.com.
Jing (Dave) Tian is now a graduate research fellow and a PhD student in the Computer and Information Science and Engineering (CISE) department at the University of Florida. His research direction involves system security, embedded system security, trusted computing, and static analysis for security and virtualization. He is interested in Linux kernel hacking and compilers. He also spent a year on AI and machine learning directions and taught classes on Intro to Problem Solving using Python and Operating System in the Computer Science department at the University of Oregon. Before that, he worked as a software developer in the Linux Control Platform (LCP) group in Alcatel-Lucent (former Lucent Technologies) R&D for around 4 years. He has got BS and ME degrees of EE in China. His website is http://davejingtian.org.
I would like to thank the author of the book, who has made a good job for both Python and NLTK. I would also like to thank to the editors of the book, who made this book perfect and offered me the opportunity to review such a nice book.
www.PacktPub.com
Support files, eBooks, discount offers, and more
You might want to visit www.PacktPub.com for support files and downloads related to your book.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
http://PacktLib.PacktPub.com
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.
Why Subscribe?
Fully searchable across every book published by Packt
Copy and paste, print and bookmark content
On demand and accessible via web browser
Free Access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.
Preface
Natural language processing is used everywhere, from search engines such as Google or Weotta, to voice interfaces such as Siri or Dragon NaturallySpeaking. Python's Natural Language Toolkit (NLTK) is a suite of libraries that has become one of the best tools for prototyping and building natural language processing systems.
Python 3 Text Processing with NLTK 3 Cookbook is your handy and illustrative guide, which will walk you through many natural language processing techniques in a step-by-step manner. It will demystify the dark arts of text mining and language processing using the comprehensive Natural Language Toolkit.
This book cuts short the preamble, ignores pedagogy, and lets you dive right into the techniques of text processing with a practical hands-on approach.
Get started by learning how to tokenize text into words and sentences, then explore the WordNet lexical dictionary. Learn the basics of stemming and lemmatization. Discover various ways to replace words and perform spelling corrections. Create your own corpora and custom corpus readers, including a MongoDB-based corpus reader. Use part-of-speech taggers to annotate words. Create and transform chunked phrase trees and named entities using partial parsing and chunk transformations. Dig into feature extraction and text classification for sentiment analysis. Learn how to process large amount of text with distributed processing and NoSQL databases.
This book will teach you all that and more, in a hands-on learn-by-doing manner. Become an expert in using NLTK for Natural Language Processing with this useful companion.
What this book covers
Chapter 1, Tokenizing Text and WordNet Basics, covers how to tokenize text into sentences and words, then look up those words in the WordNet lexical dictionary.
Chapter 2, Replacing and Correcting Words, demonstrates various word replacement and correction techniques, including stemming, lemmatization, and using the Enchant spelling dictionary.
Chapter 3, Creating Custom Corpora, explains how to use corpus readers and create custom corpora. It also covers how to use some of the corpora that come with NLTK.
Chapter 4, Part-of-speech Tagging, shows how to annotate a sentence of words with part-of-speech tags, and how to train your own custom part-of-speech tagger.
Chapter 5, Extracting Chunks, covers the chunking process, also known as partial parsing, which can identify phrases and named entities in a sentence. It also explains how to train your own custom chunker and create specific named entity recognizers.
Chapter 6, Transforming Chunks and Trees, demonstrates how to transform chunk phrases and parse trees in various ways.
Chapter 7, Text Classification, shows how to transform text into feature dictionaries, and how to train a text classifier for sentiment analysis. It also covers multi-label classification and classifier evaluation metrics.
Chapter 8, Distributed Processing and Handling Large Datasets, discusses how to use execnet for distributed natural language processing and how to use Redis for storing large datasets.
Chapter 9, Parsing Specific Data Types, covers various Python modules that are useful for parsing specific kinds of data, such as datetimes and HTML.
Appendix A, Penn Treebank Part-of-speech Tags, shows a table of Treebank part-of-speech tags, that is a useful reference for Chapter 3, Creating Custom Corpora, and Chapter 4, Part-of-speech Tagging.
What you need for this book
You will need Python 3 and the listed Python packages. For this book, I used Python 3.3.5. To install the packages, you can use pip (https://pypi.python.org/pypi/pip/). The following is the list of the packages in requirements format with the version number used while writing this