Python Text Processing with NLTK 2.0 Cookbook: LITE
4/5
()
About this ebook
Read more from Jacob Perkins
Python 3 Text Processing with NLTK 3 Cookbook Rating: 4 out of 5 stars4/5Natural Language Processing: Python and NLTK Rating: 0 out of 5 stars0 ratings
Related to Python Text Processing with NLTK 2.0 Cookbook
Related ebooks
Parallel Programming with Python Rating: 0 out of 5 stars0 ratingsLearning NumPy Array Rating: 0 out of 5 stars0 ratingsDistributed Computing with Python Rating: 0 out of 5 stars0 ratingsNatural Language Processing with Java Rating: 0 out of 5 stars0 ratingsAdvanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch Rating: 0 out of 5 stars0 ratingsPractical Python Data Visualization: A Fast Track Approach To Learning Data Visualization With Python Rating: 4 out of 5 stars4/5Python Unlocked Rating: 0 out of 5 stars0 ratingsInteractive Applications Using Matplotlib Rating: 0 out of 5 stars0 ratingsMastering Python Regular Expressions Rating: 5 out of 5 stars5/5NumPy Essentials Rating: 0 out of 5 stars0 ratingsMastering Flask Rating: 0 out of 5 stars0 ratingsPython Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation Rating: 0 out of 5 stars0 ratingsMastering Python Design Patterns Rating: 0 out of 5 stars0 ratingsJava Data Science Cookbook Rating: 0 out of 5 stars0 ratingsNatural Language Processing with Java and LingPipe Cookbook Rating: 0 out of 5 stars0 ratingsMastering Scala Machine Learning Rating: 0 out of 5 stars0 ratingsBuilding Slack Bots Rating: 0 out of 5 stars0 ratingsTransfer Learning for Natural Language Processing Rating: 0 out of 5 stars0 ratingsPython 3 Object-oriented Programming - Second Edition Rating: 4 out of 5 stars4/5Real-World Natural Language Processing: Practical applications with deep learning Rating: 0 out of 5 stars0 ratingsPython Workout: 50 ten-minute exercises Rating: 0 out of 5 stars0 ratingsGetting Started with Python Data Analysis Rating: 0 out of 5 stars0 ratingsTensorFlow Machine Learning Cookbook Rating: 4 out of 5 stars4/5Machine Learning Systems: Designs that scale Rating: 0 out of 5 stars0 ratingsApache Spark Graph Processing Rating: 0 out of 5 stars0 ratingsPractical Natural Language Processing with Python: With Case Studies from Industries Using Text Data at Scale Rating: 0 out of 5 stars0 ratingsLearning Python Design Patterns - Second Edition Rating: 0 out of 5 stars0 ratings
Information Technology For You
Summary of Super-Intelligence From Nick Bostrom Rating: 5 out of 5 stars5/5Computer Science: A Concise Introduction Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5How To Use Chatgpt: Using Chatgpt To Make Money Online Has Never Been This Simple Rating: 0 out of 5 stars0 ratingsChatGPT: The Future of Intelligent Conversation Rating: 4 out of 5 stars4/5CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 1 Exam 220-1101 Rating: 0 out of 5 stars0 ratingsHow to Write Effective Emails at Work Rating: 4 out of 5 stars4/5CompTIA Network+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam N10-008 Rating: 0 out of 5 stars0 ratingsSupercommunicator: Explaining the Complicated So Anyone Can Understand Rating: 3 out of 5 stars3/5The Basics of Hacking and Penetration Testing: Ethical Hacking and Penetration Testing Made Easy Rating: 4 out of 5 stars4/5An Ultimate Guide to Kali Linux for Beginners Rating: 3 out of 5 stars3/5Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5Windows Registry Forensics: Advanced Digital Forensic Analysis of the Windows Registry Rating: 4 out of 5 stars4/5Health Informatics: Practical Guide Rating: 0 out of 5 stars0 ratingsLinux Command Line and Shell Scripting Bible Rating: 3 out of 5 stars3/5Computer Organization and Design: The Hardware / Software Interface Rating: 4 out of 5 stars4/5Handbook of Digital Forensics and Investigation Rating: 4 out of 5 stars4/5Beginner's Guide to Information Security Rating: 0 out of 5 stars0 ratingsPractical Ethical Hacking from Scratch Rating: 5 out of 5 stars5/5Cybersecurity for Beginners : Learn the Fundamentals of Cybersecurity in an Easy, Step-by-Step Guide: 1 Rating: 0 out of 5 stars0 ratingsPanda3d 1.7 Game Developer's Cookbook Rating: 0 out of 5 stars0 ratingsAWS Certified Cloud Practitioner: Study Guide with Practice Questions and Labs Rating: 5 out of 5 stars5/5Inkscape Beginner’s Guide Rating: 5 out of 5 stars5/5Data Governance For Dummies Rating: 0 out of 5 stars0 ratingsHacking Essentials - The Beginner's Guide To Ethical Hacking And Penetration Testing Rating: 3 out of 5 stars3/5SharePoint Designer Tutorial: Working with SharePoint Websites Rating: 1 out of 5 stars1/5ARDUINO PROGRAMMING FOR BEGINNERS: Tips and Tricks for the Efficient Use of Arduino Programming Rating: 0 out of 5 stars0 ratings
Reviews for Python Text Processing with NLTK 2.0 Cookbook
1 rating0 reviews
Book preview
Python Text Processing with NLTK 2.0 Cookbook - Jacob Perkins
Table of Contents
Python Text Processing with NLTK 2.0 Cookbook: LITE
Credits
About the Author
About the Reviewers
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Errata
Piracy
Questions
1. Tokenizing Text and WordNet Basics
Introduction
Tokenizing text into sentences
Getting ready
How to do it...
How it works...
There's more...
Other languages
See also
Tokenizing sentences into words
How to do it...
How it works...
There's more...
Contractions
PunktWordTokenizer
WordPunctTokenizer
See also
Tokenizing sentences using regular expressions
Getting ready
How to do it...
How it works...
There's more...
Simple whitespace tokenizer
See also
Filtering stopwords in a tokenized sentence
Getting ready
How to do it...
How it works...
There's more...
See also
Looking up synsets for a word in WordNet
Getting ready
How to do it...
How it works...
There's more...
Hypernyms
Part-of-speech (POS)
See also
Looking up lemmas and synonyms in WordNet
How to do it...
How it works...
There's more...
All possible synonyms
Antonyms
See also
Calculating WordNet synset similarity
How to do it...
How it works...
There's more...
Comparing verbs
Path and LCH similarity
See also
Discovering word collocations
Getting ready
How to do it...
How it works...
There's more...
Scoring functions
Scoring ngrams
2. Replacing and Correcting Words
Introduction
Stemming words
How to do it...
How it works...
There's more...
LancasterStemmer
RegexpStemmer
SnowballStemmer
See also
Lemmatizing words with WordNet
Getting ready
How to do it...
How it works...
There's more...
Combining stemming with lemmatization
See also
Translating text with Babelfish
Getting ready
How to do it...
How it works...
There's more...
Available languages
Replacing words matching regular expressions
Getting ready
How to do it...
How it works...
There's more...
Replacement before tokenization
See also
Removing repeating characters
Getting ready
How to do it...
How it works...
There's more...
See also
Spelling correction with Enchant
Getting ready
How to do it...
How it works...
There's more...
en_GB dictionary
Personal word lists
See also
Replacing synonyms
Getting ready
How to do it...
How it works...
There's more...
CSV synonym replacement
YAML synonym replacement
See also
Replacing negations with antonyms
How to do it...
How it works...
There's more...
See also
3. Text Classification
Introduction
Bag of Words feature extraction
How to do it...
How it works...
There's more...
Filtering stopwords
Including significant bigrams
See also
Training a naive Bayes classifier
Getting ready
How to do it...
How it works...
There's more...
Classification probability
Most informative features
Training estimator
Manual training
See also
Training a decision tree classifier
Getting ready
How to do it...
How it works...
There's more...
Entropy cutoff
Depth cutoff
Support cutoff
See also
Training a maximum entropy classifier
Getting ready
How to do it...
How it works...
There's more...
Scipy algorithms
Megam algorithm
See also
Measuring precision and recall of a classifier
How to do it...
How it works...
There's more...
F-measure
See also
Calculating high information words
How to do it...
How it works...
There's more...
MaxentClassifier with high information words
DecisionTreeClassifier with high information words
See also
Combining classifiers with voting
Getting ready
How to do it...
How it works...
See also
Classifying with multiple binary classifiers
Getting ready
How to do it...
How it works...
There's more...
See also
Index
Python Text Processing with NLTK 2.0 Cookbook: LITE
Python Text Processing with NLTK 2.0 Cookbook: LITE
Copyright © 2011 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: April 2011
Production Reference: 1130411
Published by Packt Publishing Ltd. 32 Lincoln Road Olton Birmingham, B27 6PA, UK.
ISBN 978-1-849516-38-9
www.packtpub.com
Cover Image by Sujay Gawand K (<sujay0000@gmail.com>)
Credits
Author
Jacob Perkins
Reviewers
Patrick Chan
Herjend Teny
Acquisition Editor
Steven Wilding
Technical Editors
Hithesh Uchil
Indexer
Hemangini Bari
Production Coordinator
Melwyn D'sa
Cover Work
Melwyn D'sa
About the Author
Jacob Perkins has been an avid user of open source software since high school, when he first built his own computer and didn't want to pay for Windows. At one point he had five operating systems installed, including Red Hat Linux, OpenBSD, and BeOS.
While at Washington University in St. Louis, Jacob took classes in Spanish and poetry writing, and worked on an independent study project that eventually became his Master's project: WUGLE—a GUI for manipulating logical expressions. In his free time, he wrote the Gnome2 version of Seahorse (a GUI for encryption and key management), which has since been translated into over a dozen languages and is included in the default Gnome distribution.
After receiving his MS in Computer Science, Jacob tried to start a web development studio with some friends, but since no one knew anything about web development, it didn't work out as planned. Once he'd actually learned about web development, he went off and co-founded another company called Weotta, which sparked his interest in Machine Learning and Natural Language Processing.
Jacob is currently the CTO/Chief Hacker for Weotta and blogs about what he's learned along the way at http://streamhacker.com/. He is also applying this knowledge to produce text processing APIs and demos at http://text-processing.com/. This book is a synthesis of his knowledge on processing text using Python, NLTK, and more.
Thanks to my parents for all their support, even when they don't understand what I'm doing; Grant for sparking my interest in Natural Language Processing; Les