Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Python Text Processing with NLTK 2.0 Cookbook: LITE
Python Text Processing with NLTK 2.0 Cookbook: LITE
Python Text Processing with NLTK 2.0 Cookbook: LITE
Ebook252 pages1 hour

Python Text Processing with NLTK 2.0 Cookbook: LITE

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

The learn-by-doing approach of this book will enable you to dive right into the heart of text processing from the very first page. Each recipe is carefully designed to fulfill your appetite for Natural Language Processing. Packed with numerous illustrative examples and code samples, it will make the task of using the NLTK for Natural Language Processing easy and straightforward. This book is for Python programmers who want to quickly get to grips with using the NLTK for Natural Language Processing. Familiarity with basic text processing concepts is required. Programmers experienced in the NLTK will also find it useful. Students of linguistics will find it invaluable.
LanguageEnglish
Release dateMay 19, 2011
ISBN9781849516396
Python Text Processing with NLTK 2.0 Cookbook: LITE

Read more from Jacob Perkins

Related to Python Text Processing with NLTK 2.0 Cookbook

Related ebooks

Information Technology For You

View More

Related articles

Reviews for Python Text Processing with NLTK 2.0 Cookbook

Rating: 4 out of 5 stars
4/5

1 rating0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Python Text Processing with NLTK 2.0 Cookbook - Jacob Perkins

    Table of Contents

    Python Text Processing with NLTK 2.0 Cookbook: LITE

    Credits

    About the Author

    About the Reviewers

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Conventions

    Reader feedback

    Customer support

    Errata

    Piracy

    Questions

    1. Tokenizing Text and WordNet Basics

    Introduction

    Tokenizing text into sentences

    Getting ready

    How to do it...

    How it works...

    There's more...

    Other languages

    See also

    Tokenizing sentences into words

    How to do it...

    How it works...

    There's more...

    Contractions

    PunktWordTokenizer

    WordPunctTokenizer

    See also

    Tokenizing sentences using regular expressions

    Getting ready

    How to do it...

    How it works...

    There's more...

    Simple whitespace tokenizer

    See also

    Filtering stopwords in a tokenized sentence

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Looking up synsets for a word in WordNet

    Getting ready

    How to do it...

    How it works...

    There's more...

    Hypernyms

    Part-of-speech (POS)

    See also

    Looking up lemmas and synonyms in WordNet

    How to do it...

    How it works...

    There's more...

    All possible synonyms

    Antonyms

    See also

    Calculating WordNet synset similarity

    How to do it...

    How it works...

    There's more...

    Comparing verbs

    Path and LCH similarity

    See also

    Discovering word collocations

    Getting ready

    How to do it...

    How it works...

    There's more...

    Scoring functions

    Scoring ngrams

    2. Replacing and Correcting Words

    Introduction

    Stemming words

    How to do it...

    How it works...

    There's more...

    LancasterStemmer

    RegexpStemmer

    SnowballStemmer

    See also

    Lemmatizing words with WordNet

    Getting ready

    How to do it...

    How it works...

    There's more...

    Combining stemming with lemmatization

    See also

    Translating text with Babelfish

    Getting ready

    How to do it...

    How it works...

    There's more...

    Available languages

    Replacing words matching regular expressions

    Getting ready

    How to do it...

    How it works...

    There's more...

    Replacement before tokenization

    See also

    Removing repeating characters

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Spelling correction with Enchant

    Getting ready

    How to do it...

    How it works...

    There's more...

    en_GB dictionary

    Personal word lists

    See also

    Replacing synonyms

    Getting ready

    How to do it...

    How it works...

    There's more...

    CSV synonym replacement

    YAML synonym replacement

    See also

    Replacing negations with antonyms

    How to do it...

    How it works...

    There's more...

    See also

    3. Text Classification

    Introduction

    Bag of Words feature extraction

    How to do it...

    How it works...

    There's more...

    Filtering stopwords

    Including significant bigrams

    See also

    Training a naive Bayes classifier

    Getting ready

    How to do it...

    How it works...

    There's more...

    Classification probability

    Most informative features

    Training estimator

    Manual training

    See also

    Training a decision tree classifier

    Getting ready

    How to do it...

    How it works...

    There's more...

    Entropy cutoff

    Depth cutoff

    Support cutoff

    See also

    Training a maximum entropy classifier

    Getting ready

    How to do it...

    How it works...

    There's more...

    Scipy algorithms

    Megam algorithm

    See also

    Measuring precision and recall of a classifier

    How to do it...

    How it works...

    There's more...

    F-measure

    See also

    Calculating high information words

    How to do it...

    How it works...

    There's more...

    MaxentClassifier with high information words

    DecisionTreeClassifier with high information words

    See also

    Combining classifiers with voting

    Getting ready

    How to do it...

    How it works...

    See also

    Classifying with multiple binary classifiers

    Getting ready

    How to do it...

    How it works...

    There's more...

    See also

    Index

    Python Text Processing with NLTK 2.0 Cookbook: LITE


    Python Text Processing with NLTK 2.0 Cookbook: LITE

    Copyright © 2011 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: April 2011

    Production Reference: 1130411

    Published by Packt Publishing Ltd. 32 Lincoln Road Olton Birmingham, B27 6PA, UK.

    ISBN 978-1-849516-38-9

    www.packtpub.com

    Cover Image by Sujay Gawand K (<sujay0000@gmail.com>)

    Credits

    Author

    Jacob Perkins

    Reviewers

    Patrick Chan

    Herjend Teny

    Acquisition Editor

    Steven Wilding

    Technical Editors

    Hithesh Uchil

    Indexer

    Hemangini Bari

    Production Coordinator

    Melwyn D'sa

    Cover Work

    Melwyn D'sa

    About the Author

    Jacob Perkins has been an avid user of open source software since high school, when he first built his own computer and didn't want to pay for Windows. At one point he had five operating systems installed, including Red Hat Linux, OpenBSD, and BeOS.

    While at Washington University in St. Louis, Jacob took classes in Spanish and poetry writing, and worked on an independent study project that eventually became his Master's project: WUGLE—a GUI for manipulating logical expressions. In his free time, he wrote the Gnome2 version of Seahorse (a GUI for encryption and key management), which has since been translated into over a dozen languages and is included in the default Gnome distribution.

    After receiving his MS in Computer Science, Jacob tried to start a web development studio with some friends, but since no one knew anything about web development, it didn't work out as planned. Once he'd actually learned about web development, he went off and co-founded another company called Weotta, which sparked his interest in Machine Learning and Natural Language Processing.

    Jacob is currently the CTO/Chief Hacker for Weotta and blogs about what he's learned along the way at http://streamhacker.com/. He is also applying this knowledge to produce text processing APIs and demos at http://text-processing.com/. This book is a synthesis of his knowledge on processing text using Python, NLTK, and more.

    Thanks to my parents for all their support, even when they don't understand what I'm doing; Grant for sparking my interest in Natural Language Processing; Les

    Enjoying the preview?
    Page 1 of 1