CAPTCHA

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Early CAPTCHAs such as these, generated by the EZ-Gimpy program, were used on Yahoo!
A modern CAPTCHA, rather than attempting to create a distorted background and high levels of warping on the text, might focus on making segmentation difficult by adding an angled line.
Another way to make segmentation difficult is to crowd symbols together, as in Yahoo!'s current CAPTCHA format.
Some CAPTCHAs try to utilize the ability of people to see three-dimensional objects. On this image, symbols are drawn with lines of different thickness to make an effect of extrusion.

A CAPTCHA (pron.: /ˈkæp.ə/) is a type of challenge-response test used in computing as an attempt to ensure that the response is generated by a human being. The process usually involves a computer asking a user to complete a simple test which the computer is able to grade. These tests are designed to be easy for a computer to generate but difficult for a computer to solve, but again easy for a human. If a correct solution is received, it can be presumed to have been entered by a human. A common type of CAPTCHA requires the user to type letters and/or digits from a distorted image that appears on the screen. Such tests are commonly used to prevent unwanted internet bots from accessing websites, since a normal human can easily read a CAPTCHA, while the bot cannot process the image letters and therefore, cannot answer properly, or at all.

Although most CAPTCHAs are letter pictures randomly generated, many of them have become difficult even for a human to read , so picture CAPTCHAs were created in which a human is shown a simple test to show a picture of a certain animal (given few animal pictures), which is simple for a human being to process, and therefore easy to pick, while a bot cannot process and solve the question because although it can analyze the picture, it cannot easily guess the animal.

The term "CAPTCHA" was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford (all of Carnegie Mellon University). It is an acronym based on the word "capture" and standing for "Completely Automated Public Turing test to tell Computers and Humans Apart". Carnegie Mellon University attempted to trademark the term on 15 October 2004,[1] but the trademark application was abandoned on 21 April 2008.[2]

A CAPTCHA is sometimes described as a reverse Turing test, because it is administered by a machine and targeted at a human, in contrast to the standard Turing test that is typically administered by a human and targeted at a machine.

Contents

[edit] Applications

CAPTCHAs are used in attempts to prevent automated software from performing actions which degrade the quality of service of a given system, whether due to abuse or resource expenditure. CAPTCHAs can be deployed to protect systems vulnerable to e-mail spam, such as the webmail services of Gmail, Hotmail, and Yahoo! Mail.

Most interactive sites today are run by databases and become quickly clogged and sluggish when a database table exceeds capabilities the operating server can handle.[3] A website's Google PageRank can also be reduced by excessive commercial links created by automated posting.

CAPTCHAs are also used to minimize automated posting to blogs, forums and wikis, whether as a result of commercial promotion, or harassment and vandalism. CAPTCHAs also serve an important function in rate limiting. Automated usage of a service might be desirable until such usage is done to excess and to the detriment of human users. In such cases, administrators can use CAPTCHA to enforce automated usage policies based on given thresholds. The article rating systems used by many news web sites are another example of an online facility vulnerable to manipulation by automated software.[4]

[edit] Accessibility

Because CAPTCHAs rely on visual perception, users unable to view a CAPTCHA due to a disability will be unable to perform the task protected by a CAPTCHA. Groups who commonly struggle with visual CAPTCHAs include:

Sites implementing CAPTCHAs may provide an audio version of the CAPTCHA in addition to the visual method. The official CAPTCHA site recommends providing an audio CAPTCHA for accessibility reasons, but it is still not usable for deafblind people or for users of some text-based web browsers.

Due to the sound distortion present in audio CAPTCHAs and visual distortion present in visual CAPTCHAs, offering one as an alternative to the other does not help people with impairments in both areas. While deafblind is a small group, having some degree of impairment in both areas is actually common, and very common amongst older people.

[edit] Attempts at more accessible CAPTCHAs

Even audio and visual CAPTCHAs will require manual intervention for some users, such as those who have disabilities. There have been various attempts at creating more accessible CAPTCHAs, including the use of JavaScript, mathematical questions ("how much is 1+1") and common knowledge questions ("what color is the sky on a clear day"). However, these approaches may worsen accessibility for people with intellectual and developmental disabilities, for instance dyscalculia. Some CAPTCHAs of this kind do not meet the criteria for a successful CAPTCHA because they are not automatically generated or do not present a new problem or test for each attack.

One approach to text-based CAPTCHAs is to create a central "anti-bot server", used by many websites, which selects for each call one puzzle, randomly, from a very large set of many different automatically generated puzzles, of many different kinds. Such a solution can be made usable for blind and visually impaired people who otherwise find prevalent image-based CAPTCHAs to be insurmountable obstacles to completing web forms.

For a more complete solution to the CAPTCHA accessibility problem all four types of impairment that affect web use - motor, visual, cognitive and hearing - would need to be catered for. Combining the different approaches, i.e., image, audio and puzzle, would open up access to many more people, however there has not yet been an attempt to achieve this.

[edit] Advertising

Since 2009, CAPTCHA advertising has become much more prevalent. Publishers like AOL, Meredith Corporation,[5] and Internet Brands [6] have adopted the option as an additional revenue stream. Users typically type in brand messages instead of distorted text.[7]

[edit] Circumvention

There are several approaches available to defeating CAPTCHAs:

  • exploiting bugs in the implementation that allow the attacker to completely bypass the CAPTCHA
  • improving character recognition software
  • using cheap human labor to process the tests (see below)

[edit] Insecure implementation

Like any security system, design flaws in a system implementation can prevent the theoretical security from being realized. Many CAPTCHA implementations, especially those which have not been designed and reviewed by experts in the fields of security, are prone to common attacks.

Some CAPTCHA protection systems can be bypassed without using OCR simply by re-using the session ID of a known test image. A correctly designed CAPTCHA does not allow multiple solution attempts at the same test, which would allow unlimited reuse of a correct solution, or a second guess after an incorrect OCR attempt.[8] Other CAPTCHA implementations use a hash (such as an MD5 hash) of the solution as a key passed to the client to validate the answer. Further, the hash could assist an OCR based attempt. A more secure scheme would use an HMAC (Hash-based message authentication code). Another example is directly provide answer in the code such as showing four pictures to let user pickup the correct one, a spam bot can always guess the first picture to gain 25% success rate in this case. Finally, some implementations use only a small fixed pool of CAPTCHA images. Eventually, when enough image solutions have been collected by an attacker over a period of time, the test can be broken by simply looking up solutions in a table, based on a hash of the challenge image.

[edit] Computer character recognition

A number of research projects have attempted (often successfully[citation needed]) to beat visual CAPTCHAs by creating programs that contain the following functionality:

  1. Pre-processing: Removal of background clutter and noise
  2. Segmentation: Splitting the image into regions which each contain a single character
  3. Classification: Identifying the character in each region

Steps 1 and 3 are easy tasks for computers.[9] The only step where humans still outperform computers is segmentation. If the background clutter consists of shapes similar to letter shapes, and the letters are connected by this clutter, the segmentation becomes nearly impossible with current software. Hence, an effective CAPTCHA should focus on the segmentation.

Several research projects have broken real world CAPTCHAs, including one of Yahoo!'s early CAPTCHAs called "EZ-Gimpy",[10] the CAPTCHAs used by popular sites such as PayPal,[11] LiveJournal, phpBB, the e-banking CAPTCHAs used by many financial institutions,[12] and CAPTCHAs used by other services.[13][14][15] In January 2008, Network Security Research released their program for automated Yahoo! CAPTCHA recognition.[16] Windows Live Hotmail and Gmail, the other two major free email providers, were cracked shortly after.[17][18]

In February 2008, it was reported that spammers had achieved a success rate of 30% to 35%, using a bot to respond to CAPTCHAs for Microsoft's Live Mail service[19] and a success rate of 20% against Google's Gmail CAPTCHA.[20] A Newcastle University research team has defeated the segmentation part of Microsoft's CAPTCHA with a 90% success rate, and reported that this could lead to a complete crack with a greater than 60% rate.[21]

[edit] Human solvers

CAPTCHA is vulnerable to a relay attack that uses humans to solve the puzzles. One approach involves relaying the puzzles to a group of human operators who can solve CAPTCHAs. In this scheme, a computer fills out a form and when it reaches a CAPTCHA, it gives the CAPTCHA to the human operator to solve.

Spammers pay about $0.80 to $1.20 for each 1,000 solved CAPTCHAs to companies employing human solvers in Bangladesh, China, India, and many other developing nations.[22] Other sources cite a cost as low as $0.50 for each 1,000 solved.[23]

Another approach involves copying the CAPTCHA images and using them as CAPTCHAs for a high-traffic site owned by the attacker. With enough traffic, the attacker can get a solution to the CAPTCHA puzzle in time to relay it back to the target site.[24] In October 2007, a piece of malware appeared in the wild which enticed users to solve CAPTCHAs in order to see progressively further into a series of striptease images.[25][26] A more recent view is that this is unlikely to work due to unavailability of high-traffic sites and competition by similar sites.[27]

These methods have been used by spammers to set up thousands of accounts on free email services such as Gmail and Yahoo![28] Since Gmail and Yahoo! are unlikely to be blacklisted by anti-spam systems, spam sent through these compromised accounts is less likely to be blocked.

[edit] Legal concerns

The circumvention of CAPTCHAs may violate the anti-circumvention clause of the Digital Millennium Copyright Act (DMCA) in the United States. In 2007, Ticketmaster sued software maker RMG Technologies[29] for its product which circumvented the ticket seller's CAPTCHAs on the basis that it violated the anti-circumvention clause of the DMCA. In October 2007, an injunction was issued stating that Ticketmaster would likely succeed in making its case.[30] In June 2008, Ticketmaster filed for default judgment against RMG. The Court granted Ticketmaster the default and entered an $18.2M judgment in favor of Ticketmaster.

In 2010, encouraged by Ticketmaster, the U.S. Attorney in Newark, New Jersey won a grand jury indictment against Wiseguy Tickets, Inc. for purchasing tickets in bulk by circumventing CAPTCHA mechanisms.[31] Among its 43 findings, the grand jury found Wiseguy Tickets Inc defeated online ticket vendors' security mechanisms CAPTCHA.[32]

[edit] Interaction with images as an alternative to texting (text typing)

Some researchers promote interaction with images as a possible alternative for texting CAPTCHAs, given the common feeling that they are "one of the most hated pieces of user interaction on the web".[33]

Computer-based recognition algorithms require the extraction of color, texture, shape, or special point features, which cannot be correctly extracted after the designed distortions. However, humans can still recognize the original concept depicted in the images even with these distortions.

A recent example of interacting with images CAPTCHA is to present the website visitor with a grid of random pictures and instruct the visitor to click on specific pictures to verify that they are not a bot (such as “Click on the pictures of the airplane, the boat and the clock”).

Image interaction CAPTCHAs face many potential problems which have not been fully studied. It is difficult for a small site to acquire a large dictionary of images to which an attacker does not have access and without a means of automatically acquiring new labelled images, an image-based challenge does not usually meet the definition of a CAPTCHA. KittenAuth, by default, had only 42 images in its database.[34] Microsoft's "Asirra", which it is providing as a free web service, attempts to address this by means of Microsoft Research's partnership with Petfinder.com, which has provided it with more than three million images of cats and dogs, classified by people at thousands of US animal shelters.[35] Researchers claim to have written a program that can break the Microsoft Asirra CAPTCHA.[36] Extending the number of categories (more than just cats and dogs) and randomizing the number of correct images in a grid increases the security of the system. The IMAGINATION CAPTCHA, however, uses a sequence of randomized distortions on the original images to create the CAPTCHA images. Their original images can be made public without risk of image-retrieval or image-annotation based attacks.

Human solvers are a potential weakness for strategies such as Asirra. If the database of cat and dog photos can be downloaded, paying workers $0.01 to classify each photo as of either a dog or a cat means that almost the entire database of photos can be deciphered for $30,000. Photos that are subsequently added to the Asirra database are then a relatively small data set that can be classified as they first appear. Causing minor changes to images each time they appear will not prevent a computer from recognizing a repeated image as there are robust image comparator functions (e.g., image hashes, color histograms) that are insensitive to many simple image distortions. Warping an image sufficiently to fool a computer will likely also be troublesome to a human.[37]

Researchers at Google used image orientation and collaborative filtering as a CAPTCHA.[38] Generally speaking, people know what "up" is but computers have a difficult time for a broad range of images. Images were pre-screened to be determined to be difficult to detect up (e.g., no skies, no faces, no text). Images were also collaboratively filtered by showing a "candidate" image along with good images for the person to rotate. If there was a large variance in answers for the candidate image, it was deemed too hard for people as well and discarded.

Many users[who?] of the phpBB forum software (which has suffered greatly from spam) have implemented an open source image recognition CAPTCHA system in the form of an addon called KittenAuth[39] which in its default form presents a question requiring the user to select a stated type of animal from an array of thumbnail images of assorted animals. The images (and the challenge questions) can be customized, for example to present questions and images which would be easily answered by the forum's target userbase. Furthermore, for a time, RapidShare free users had to get past a CAPTCHA where they had to enter letters attached only to a cat, while others were attached to dogs.[40] This was later removed because (legitimate) users had trouble entering the correct letters.

[edit] See also

[edit] References

  1. ^ Grossman, Lev (2008-06-05). "Computer Literacy Tests: Are You Human?". Time (magazine). Retrieved 2008-06-12. "The Carnegie Mellon team came back with the CAPTCHA. (It stands for "completely automated public Turing test to tell computers and humans apart"; no, the acronym doesn't really fit.) The point of the CAPTCHA is that reading those swirly letters is something that computers aren't very good at." 
  2. ^ "Latest Status of CAPTCHA Trademark Application". USPTO. 2008-04-21. Retrieved 2008-12-21. 
  3. ^ MySQL Server. "MySQL Database". MySQL. 
  4. ^ Amrinder Arora (2007). "Statistics Hacking — Exploiting Vulnerabilities in News Websites" (PDF). International Journal of Computer Science and Network Security 7: 342–347. 
  5. ^ Stone, Todd. "Say Hello to CAPTCHAs as Advertising". AdAge.com. Retrieved July 24, 2012. 
  6. ^ Peterson, Tim. "Solve Media Brings Brand Washing to Video Ads". AdWeek.com. Retrieved July 24, 2012. 
  7. ^ "CAPTCHA: Telling Humans and Computers Apart Automatically". Carnegie Mellon University. Retrieved July 24, 2012. 
  8. ^ "Breaking CAPTCHAs Without Using OCR". Howard Yeend (pureMango.co.uk). 2005. Retrieved 2006-08-22. 
  9. ^ Kumar Chellapilla, Kevin Larson, Patrice Simard, Mary Czerwinski (2005). Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs) (PDF). Microsoft Research. Archived from the original on 2006-06-13. Retrieved 2006-08-02. 
  10. ^ Greg, Mori,; Malik, Jitendra. "Breaking a Visual CAPTCHA". Simon Fraser University. Retrieved 2008-12-21. 
  11. ^ Kluever, Kurt (May 12, 2008). "Breaking the PayPal CAPTCHA". Kloover.com. Retrieved 2008-12-21. 
  12. ^ Li, Shujun; Syed Amier Haider Shah, Muhammad Asad Usman Khan, Syed Ali Khayam, Ahmad-Reza Sadeghi and Roland Schmitz (2010). "Breaking e-Banking CAPTCHAs". Proceedings of 26th Annual Computer Security Applications Conference (ACSAC 2010). New York, NY, USA: ACM. pp. 171–180. doi:10.1145/1920261.1920288. http://www.acsac.org/2010/openconf/modules/request.php?module=oc_program&action=summary.php&id=53.
  13. ^ Kluever, Kurt (February 28, 2008). "Breaking ASP Security Image Generator". Kloover.com. Retrieved 2008-12-21. 
  14. ^ Hocevar, Sam. "PWNtcha - captcha decoder". Sam.zoy.org. Retrieved 2008-12-21. 
  15. ^ Sergei, Kruglov. "Defeating of some weak CAPTCHAs". Captcha.ru. Retrieved 2008-12-21. 
  16. ^ "Network Security Research and AI". Retrieved 2008-12-21. 
  17. ^ Dawson (2008-04-15). "Windows Live Hotmail CAPTCHA Cracked, Exploited". Slashdot (SourceForge). Retrieved 2008-04-16. 
  18. ^ Dawson (2008-02-26). "Gmail CAPTCHA Cracked". Slashdot (SourceForge). Retrieved 2008-04-16. 
  19. ^ Gregg Keizer, "Spammers' bot cracks Microsoft's CAPTCHA: Bot beats Windows Live Mail's registration test 30% to 35% of the time, says Websense", Computerworld"', February 7, 2008
  20. ^ Prasad, Sumeet (2008-02-22). "Google’s CAPTCHA busted in recent spammer tactics". Websense. Archived from the original on 2008-08-22. Retrieved 2008-12-21. 
  21. ^ Jeff Yan; Ahmad Salah El Ahmad (April 13, 2008). A Low-cost Attack on a Microsoft CAPTCHA (PDF). School of Computing Science, Newcastle University, UK. Retrieved 2008-12-21. 
  22. ^ Bajaj, Vikas (April 25, 2010). "Spammers Pay Others to Answer Security Tests". The New York Times. Retrieved 2010-04-28 
  23. ^ M. Motoyama, K. Levchenko, C. Kanich, D. McCoy,G. M. Voelker, and S. Savage. "Re: CAPTCHAs:understanding CAPTCHA-solving services in an economic context". University of California, San Diego. Retrieved 17 March 2011. 
  24. ^ Doctorow, Cory (2004-01-27). "Solving and creating CAPTCHAs with free porn". Boing Boing. Retrieved 2006-08-22. 
  25. ^ Robertson, Jordan (2007-11-01). "Scams Use Striptease to Break Web Traps". San Jose, California. Associated Press. Archived from the original on 2007-11-06. 
  26. ^ Vaas, Lisa (2007-11-01). "Striptease Used to Recruit Help in Cracking Sites". PC Magazine. Retrieved 2008-12-21. 
  27. ^ "Captcha.net". Captcha.net. Retrieved 2011-03-22. 
  28. ^ "Spam filtering services throttle Gmail to fight spammers". 2008-04-10. Retrieved 2008-04-10. 
  29. ^ Ulanoff, Lance (October 31, 2012). "Deep-Sixing CAPTCHA". PC Magazine. Ziff Davis Media. Retrieved 2007-12-12. 
  30. ^ "TicketMaster v. RMG". 
  31. ^ Zetter, Kim (March 1, 2010). "Wiseguys Indicted in $25 Million Online Ticket Ring". Wired.com. Retrieved 2012-01-02. 
  32. ^ "UNITED STATES of AMERICA vs KENNETH LOWSON, KRISTOFER KIRSCH, LOEL STEVENSON". Federal Indictment. February 23, 2010. Retrieved 2012-01-02. 
  33. ^ Young, Aaron. "Ticketmaster dumps 'hated' Captcha verification system". BBC. Retrieved 31 January 2013. 
  34. ^ "The Cutest Human-Test: KittenAuth". Thepcspy.com. Retrieved 22 January 2012. 
  35. ^ Asirra from Microsoft Research (PDF)
  36. ^ Golle, Philippe. Machine Learning Attacks Against the Asirra CAPTCHA. Stanford Crypto. Retrieved 2008-12-21. 
  37. ^ Asirra: A CAPTCHA that Exploits Interest-Aligned Manual Image Categorization from Microsoft Research (PDF)
  38. ^ What’s Up CAPTCHA? A CAPTCHA Based On Image Orientation from WWW'09 by Rich Gossweiler, Maryam Kamvar, and Shumeet Baluja
  39. ^ The Cutest Human-Test: KittenAuth from ThePCSpy.com
  40. ^ David (June 4, 2008). "Attached to a Captcha". randomwire.com. Retrieved 2008-12-21[dead link] see: Archive. 

[edit] External links