The Internet is the world's largest library. It's just that all the books are on the floor.
One of the myths surrounding CAPTCHAs is that they offer 100% protection against the abuse of web site facilities by automated systems.
That’s a reassuring idea but is it really true in practice?
One of the first visual CAPTCHAs was developed in 2002 by Yahoo in conjunction with Manuel Blum and his graduate students at the School of Computer Science at Carnegie Mellon University. Yahoo were particularly interested in automated challenges following a number of incidents in Yahoo’s chat rooms in 2000. The incidents involved a number of bots (computer software programs) posing as teenagers. Once inside the chat room, a bot would then either try to collect personal information about the teens who visited or would direct chat participants to advertisements. The bots operated by waiting until a visitor typed a question mark. They would then automatically create a response about where a person could find an answer – usually sending their “correspondent” to an advertising site.
Yahoo quickly realised that they needed a software gatekeeper that would allow human users in but keep the automated systems out. Blum and his team came up with EZ-Gimpy. The solution works by selecting a random word from a pre-existing dictionary, distorting the letters and then placing them on a busy background. Initially EZ-Gimpy worked well but by the end of 2003, second generation bots began to appear that could still read the distorted words
Another solution was Gimpy – a more difficult variant of EZ-Gimpy that uses 10 words at a time. The words are presented in distortion and clutter similar to EZ-Gimpy. The words are also overlapped, providing a test that can be challenging for humans in some cases. The user is required to name 3 of the 10 words in the image in order to pass the test.
Enter BaffleText developed by Henry Baird and Monica Chew at the University of California at Berkeley. BaffleText could create nonsense words which both overcame the problem of using a small inbuilt dictionary and reduced the likelihood of successful dictionary-based attacks.
Greg Mori and Jitendra Malik have since developed algorithms that can break 92% of EZGimpy and 33% of Gimpy CAPTCHAs. Breaking BaffleText, or its variants, is proving more difficult but it’s only a matter of time before it too is broken. Projects such as Pretend We’re Not a Turing Computer but a Human Antagonist (PWNtcha) demonstrate that, where there’s a CAPTCHA, there is, ultimately, an automated solution.