563.10.3 CAPTCHA Presented by: Sari Louis SPAM Group: Marc Gagnon, Sari Louis, Steve White University of Illinois Spring 2006 Agenda • • • • • • • Definition Background Applications Types of CAPTCHAs Breaking CAPTCHAs Proposed Approach Conclusion 2 Definition • CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart • A.K.A. Reverse Turing Test, Human Interaction Proof • The challenge: develop a software program that can create and grade challenges most humans can pass but computers cannot 3 Background • First used by Altavista in1997 – Reduced SPAM add-url by over 95% • CMU/Yahoo! – Automated the creating and grading of challenges • PARC – Relies on document image degradation to prevent successful OCR – Conducted user-focused studies to assess the effectiveness of CAPTCHAs 4 Background • CAPTCHAs are based on open AI problems • Breaking CAPTCHAs help advance AI by solving these open problems • Improving CAPTCHAs help telling computers and human apart • Win-win situation 5 Background - Papers • Pessimal Print: A Reverse Turing Test Allison L. Coates, Henry S. Baird, Richard J. Fateman • Telling Humans and Computer Apart Automatically Luis von Ahn, Manuel Blum, and John Langford • CAPTCHA: Using Hard AI Problems for Security Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford • Using Machine Learning to Break Visual Human Interaction Proofs (HIPs) Kumar Chellapilla, Patrice Y. Simard 6 Applications • • • • • Free email services Online polls Dictionary attacks Newsgroups, Blogs, etc… SPAM 7 Types of CAPTCHAs • Text based – Gimpy, ez-gimpy – Gimpy-r, Google CAPTCHA – Simard’s HIP (MSN) • Graphic based – Bongo – Pix • Audio based 8 Text Based CAPTCHAs • Gimpy, ez-gimpy – Pick a word or words from a small dictionary – Distort them and add noise and background • Gimpy-r, Google’s CAPTCHA – Pick random letters – Distort them, add noise and background • Simard’s HIP – Pick random letters and numbers – Distort them and add arcs 9 Text Based CAPTCHAs 10 Graphic Based CAPTCHAs • Bongo – Display two series of blocks – User must find the characteristic that sets the two series apart – User is asked to determine which series each of four single blocks belongs to Difference? thick vs. thin lines 11 Graphic Based CAPTCHAs • PIX – Create a large database of labeled images – Pick a concrete object – Pick four images of the object from the images database – Distort the images – Ask the user to pick the object for a list of words 12 Graphic Based CAPTCHAs Pool Dog 13 Audio Based CAPTCHAs • Pick a word or a sequence of numbers at random • Render them into an audio clip using a TTS software • Distort the audio clip • Ask the user to identify and type the word or numbers 14 Breaking CAPTCHAs • Most text based CAPTCHAs have been broken by software – OCR – Segmentation • Other CAPTCHAs were broken by streaming the tests for unsuspecting users to solve. 15 Proposed Approach • Very similar to PIX • Pick a concrete object • Get 6 images at random from images.google.com that match the object • Distort the images • Build a list of 100 words: 90 from a full dictionary, 10 from the objects dictionary • Prompt the user to pick the object from the list of words 16 Proposed Approach - Technical • Make an HTTP call to images.google.com and search for the object • Screen scrape the result of 2-3 pages to get the list of images • Pick 6 images at random • Randomly distort both the images and their URLs before displaying them • Expire the CAPTCHA in 30-45 seconds 17 Proposed Approach - Benefits • The database already exists and is public • The database is constantly being updated and maintained • Adding “concrete objects” to the dictionary is virtually instantaneous • Distortion prevents caching hacks • Quick expiration limits streaming hacks 18 Proposed Approach - Drawbacks • Not accessible to people with disabilities (which is the case of most CAPTCHAs) • Relies on Google’s infrastructure • Unlike CAPTCHAs using random letters and numbers, the number of challenge words is limited 19
© Copyright 2026 Paperzz