You are on page 1of 19

563.10.

3 CAPTCHA

Presented by: Sari Louis

SPAM Group: Marc Gagnon, Sari Louis, Steve White


University of Illinois
Spring 2006
Agenda
• Definition
• Background
• Applications
• Types of CAPTCHAs
• Breaking CAPTCHAs
• Proposed Approach
• Conclusion

2
Definition
• CAPTCHA stands for Completely Automated
Public Turing test to tell Computers and Humans
Apart
• A.K.A. Reverse Turing Test, Human Interaction
Proof
• The challenge: develop a software program that
can create and grade challenges most humans
can pass but computers cannot

3
Background
• First used by Altavista in1997
– Reduced SPAM add-url by over 95%
• CMU/Yahoo!
– Automated the creating and grading of
challenges
• PARC
– Relies on document image degradation to
prevent successful OCR
– Conducted user-focused studies to assess
the effectiveness of CAPTCHAs
4
Background
• CAPTCHAs are based on open AI
problems
• Breaking CAPTCHAs help advance AI by
solving these open problems
• Improving CAPTCHAs help telling
computers and human apart
• Win-win situation

5
Background - Papers
• Pessimal Print: A Reverse Turing Test
Allison L. Coates, Henry S. Baird, Richard J. Fateman
• Telling Humans and Computer Apart
Automatically
Luis von Ahn, Manuel Blum, and John Langford
• CAPTCHA: Using Hard AI Problems for
Security
Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford
• Using Machine Learning to Break Visual
Human Interaction Proofs (HIPs)
Kumar Chellapilla, Patrice Y. Simard

6
Applications
• Free email services
• Online polls
• Dictionary attacks
• Newsgroups, Blogs, etc…
• SPAM

7
Types of CAPTCHAs
• Text based
– Gimpy, ez-gimpy
– Gimpy-r, Google CAPTCHA
– Simard’s HIP (MSN)
• Graphic based
– Bongo
– Pix
• Audio based

8
Text Based CAPTCHAs
• Gimpy, ez-gimpy
– Pick a word or words from a small dictionary
– Distort them and add noise and background
• Gimpy-r, Google’s CAPTCHA
– Pick random letters
– Distort them, add noise and background
• Simard’s HIP
– Pick random letters and numbers
– Distort them and add arcs

9
Text Based CAPTCHAs

10
Graphic Based CAPTCHAs
• Bongo
– Display two series of blocks
– User must find the characteristic that sets the
two series apart
– User is asked to determine which series each
of four single blocks belongs to

Difference? thick vs. thin lines

11
Graphic Based CAPTCHAs
• PIX
– Create a large database of labeled images
– Pick a concrete object
– Pick four images of the object from the
images database
– Distort the images
– Ask the user to pick the object for a list of
words

12
Graphic Based CAPTCHAs

Pool
Dog

13
Audio Based CAPTCHAs
• Pick a word or a sequence of numbers at
random
• Render them into an audio clip using a
TTS software
• Distort the audio clip
• Ask the user to identify and type the word
or numbers

14
Breaking CAPTCHAs
• Most text based CAPTCHAs have been
broken by software
– OCR
– Segmentation

• Other CAPTCHAs were broken by


streaming the tests for unsuspecting users
to solve.

15
Proposed Approach
• Very similar to PIX
• Pick a concrete object
• Get 6 images at random from
images.google.com that match the object
• Distort the images
• Build a list of 100 words: 90 from a full
dictionary, 10 from the objects dictionary
• Prompt the user to pick the object from the
list of words
16
Proposed Approach - Technical
• Make an HTTP call to images.google.com
and search for the object
• Screen scrape the result of 2-3 pages to
get the list of images
• Pick 6 images at random
• Randomly distort both the images and
their URLs before displaying them
• Expire the CAPTCHA in 30-45 seconds

17
Proposed Approach - Benefits
• The database already exists and is public
• The database is constantly being updated
and maintained
• Adding “concrete objects” to the dictionary
is virtually instantaneous
• Distortion prevents caching hacks
• Quick expiration limits streaming hacks

18
Proposed Approach - Drawbacks
• Not accessible to people with disabilities
(which is the case of most CAPTCHAs)
• Relies on Google’s infrastructure
• Unlike CAPTCHAs using random letters
and numbers, the number of challenge
words is limited

19

You might also like