reCAPTCHA: Fight Spam, Read a Book

reCAPTCHA: the official website
reCAPTCHA: A new way to fight spam

You might notice that reCAPTCHA has two words. Why? reCAPTCHA is more than a CAPTCHA, it also helps to digitize old books. One of the words in reCAPTCHA is a word that the computer knows what it is, much like a normal CAPTCHA. However, the other word is a word that the computer can’t read. When you solve a reCAPTCHA, we not only check that you are a human, but use the result on the other word to help read the book!

A very interesting idea, and I think it might just work. Two problems I see here, 1. someone could just as easily mistype the second word (because that is the one you’re helping the computer “read”). This would lead to the computer related OCR recognition being completely wrong. Correctly me if I am wrong, but I’d like to hope these words that we are helping computers to “read” should be compared with the other responses from so-called humans. In theory if 95% of people say a particular word is “the” and 5% of people say the word is “then” well then the choice of “the” should win out right? 2. CAPTCHA, the original CAPTCHA, still haven’t really taken off all that well. I don’t have any exact numbers, but I’d imagine the proliferation of CAPTCHA is somewhere around 25-35% of web forms. That really isn’t a huge number. Based on that, I can’t imagine web developers are going to promptly leap over to a new (or evolved) technology in any kind of mass migration. This sounds like a great idea, and I may just get around to add this to my blog. That being said, I definitely will not be in any kind of hurry to implement this.