I'll be more enthusiastic about encouraging thinking outside the box when there's evidence of any thinking going on inside it. Terry Pratchett on alt.fan.pratchett

Breaking CAPTCHAs

Filed under: Misc

broken captchaOne of the myths surrounding CAPTCHAs is that they offer 100% protection against the abuse of web site facilities by automated systems.

That’s a reassuring idea but is it really true in practice?

One of the first visual CAPTCHAs was developed in 2002 by Yahoo in conjunction with Manuel Blum and his graduate students at the School of Computer Science at Carnegie Mellon University. Yahoo were particularly interested in automated challenges following a number of incidents in Yahoo’s chat rooms in 2000. The incidents involved a number of bots (computer software programs) posing as teenagers. Once inside the chat room, a bot would then either try to collect personal information about the teens who visited or would direct chat participants to advertisements. The bots operated by waiting until a visitor typed a question mark. They would then automatically create a response about where a person could find an answer – usually sending their “correspondent” to an advertising site.

Yahoo quickly realised that they needed a software gatekeeper that would allow human users in but keep the automated systems out. Blum and his team came up with EZ-Gimpy. The solution works by selecting a random word from a pre-existing dictionary, distorting the letters and then placing them on a busy background. Initially EZ-Gimpy worked well but by the end of 2003, second generation bots began to appear that could still read the distorted words

Another solution was Gimpy – a more difficult variant of EZ-Gimpy that uses 10 words at a time. The words are presented in distortion and clutter similar to EZ-Gimpy. The words are also overlapped, providing a test that can be challenging for humans in some cases. The user is required to name 3 of the 10 words in the image in order to pass the test.

Enter BaffleText developed by Henry Baird and Monica Chew at the University of California at Berkeley. BaffleText could create nonsense words which both overcame the problem of using a small inbuilt dictionary and reduced the likelihood of successful dictionary-based attacks.

Greg Mori and Jitendra Malik have since developed algorithms that can break 92% of EZGimpy and 33% of Gimpy CAPTCHAs. Breaking BaffleText, or its variants, is proving more difficult but it’s only a matter of time before it too is broken. Projects such as Pretend We’re Not a Turing Computer but a Human Antagonist (PWNtcha) demonstrate that, where there’s a CAPTCHA, there is, ultimately, an automated solution.

Published: October 6th 2007


  1. Smiffy

    The ability of software to pass this type of “Turing Test” is only half the problem; of more concern is the fact that human visitors can fail the test. So, with false positives and false negatives, just how much use is a technique?

  2. Richard Morton

    What is also interesting is the problem whereby it is cheap enough to employ real humans to just enter the captchas thus negating any Turing test like solution.

    QM Consulting Ltd

  3. Black Widow

    I have heard various stories about people solving captchas for rewards of one type or another but I’ve never come across a concrete example myself. Whilst it might be feasible, I do have to wonder whether it is an urban myth rather than a reality. I’d be very interested to hear of any documented cases.

  4. Steve

    When recently confronted with an application utilizing CAPTCHA that stated “Visually impaired users will require sighted assistance to …”, I asked a few technology security specialists about the efficacy of this technique. The comments were quite similar: “If the target is attractive enough, it is easy to outsource the breaking of it.”
    Conclusion: don’t rely on CAPTCHA to protect attractive targets.
    P.S. Don’t invite discrimination complaints by including stupid text such as “Visually impaired users will require sighted assistance to …” without providing accessible alternatives.