Walmart Worker Trampled to Death on Black Friday Deck the Desktops: Decorating Your Mac for the Holidays
Nov 30

Okay, let’s be frank. CAPTCHAs, “Completely Automatic Public Turing tests to tell Computers and Humans Apart,” suck. Their very existence reflects the sad state of our society that there are enough creeps on the internet to necessitate these crazy eyesore tests with the idea that there is such a thing as an automatically-generated image that is messy enough that computers can’t read them, but humans can. The phpBB folks had a good blog article discussing problems with CAPTCHAs a few months ago that I completely concur with.

One of the CAPTCHA implementations I’ve historically had the most complaints about has been phpBB3’s CAPTCHA, but there’s good reasons behind why it irks me more than most. phpBB is an open-source program whose source-code is freely available for anyone (including people trying to beat out phpBB’s CAPTCHA) to view, meaning that for an open-source program like phpBB, they have to make their CAPTCHA implementation secure to a much higher standard than closed-source websites. And in the jargon of CAPTCHAs, more security equals less readability.

However, this post isn’t about phpBB’s CAPTCHA, it’s about a different kind of CAPTCHA option. You may already be familiar with it, but up until about a week ago, I wasn’t. However, I was looking for some kind of effective way to protect the custom-made web forms I put together on one of my websites from being targeted by automated garbage, when I stumbled upon a service called reCAPTCHA.

reCAPTCHA is essentially a JavaScript-based CAPTCHA service that you can implement into any web form (if the form uses a supporting programming language, such as PHP and others). However, because it is run from a separate website, all of the image processing for the CAPTCHA is done off of your site, as well as the process of verifying whether or not the submitted CAPTCHA was correct. It was surprisingly easy for me to integrate it with my web forms and is quite effective.

But what’s even more interesting is that reCAPTCHA is a project which brings a secondary purpose to the act of completing a CAPTCHA. The group at Carnegie Mellon University, who put reCAPTCHA together, are actually producing reCAPTCHA alongside multiple projects to scan and digitize old books and documents. However, due to old age (and archaic font types), OCR (Optical Character Recognition) often has difficulty figuring out what a word actually says, and has a tendency to mess it up. So the idea behind reCAPTCHA is that it actually presents one of these words that’s unreadable to the computer and presents it to the user entering the CAPTCHA–this way, after getting numerous responses for the same word, it can compare those responses and figure out what the word actually says, thereby helping to digitize documents one word at a time. reCAPTCHA presents two words in each CAPTCHA–one of them is a word with a known correct answer alongside the unknown word in order to verify that it’s an actual human filling out the CAPTCHA. It’s really kind of nice to see that someone has transformed something as annoying as a CAPTCHA and turned it into something useful. (More on how it works.)

So overall, here’s a high-five to reCAPTCHA for providing a great service (and a CAPTCHA that I can actually read) with a great purpose. If you’ve got a site of your own that you want to protect from spam, you should definitely give reCAPTCHA a try.

Tags: , , , ,

Leave a Reply