I have a question for the CS majors ;)
Captchas http://www.captcha.net/ http://en.wikipedia.org/wiki/Captcha
Has anyone ever seen any writing on the idea of caching the different captcha images, manually decoding them and storing the "meaning" of the captcha in a database. Then cross referencing the "meaning" of the captcha with the image filename. That way you would have the ability to have a bot load a page/service protected by a captcha, read the image filename, and decode the captcha.
Is this a valid idea?
On Fri, 15 Apr 2005 17:03:46 -0500 Tim reid bewkard@gmail.com wrote:
I have a question for the CS majors ;)
Captchas http://www.captcha.net/ http://en.wikipedia.org/wiki/Captcha
Has anyone ever seen any writing on the idea of caching the different captcha images, manually decoding them and storing the "meaning" of the captcha in a database. Then cross referencing the "meaning" of the captcha with the image filename. That way you would have the ability to have a bot load a page/service protected by a captcha, read the image filename, and decode the captcha.
Is this a valid idea?
It would be valid, but any captcha implementation worth it's salt doesn't use flat file images. It generates a random name for the image and serves it up to the client.
Here is how it works:
1) Choose random captcha that happens to say "FooBar" which is in foobar.jpg.
2) Tell browser to load /images/AlkjsdfH293sdfhjh2234kjh.jpg
3) Have a system in place that, in the background, serves up foobar.jpg when asked for /images/AlkjsdfH293sdfhjh2234kjh.jpg
This keeps bots like you were thinking from working. Because each time the filename is different.
P.S. FYI I'm not a CS major, not that it would have helped anyway as this isn't something they are going to teach you in school. :)
--------------------------------- Frank Wiles frank@wiles.org http://www.wiles.org ---------------------------------
On 4/17/05, Frank Wiles frank@wiles.org wrote:
It would be valid, but any captcha implementation worth it's salt doesn't use flat file images. It generates a random name for the image and serves it up to the client.
Here is how it works:
Choose random captcha that happens to say "FooBar" which is in foobar.jpg.
Tell browser to load /images/AlkjsdfH293sdfhjh2234kjh.jpg
Have a system in place that, in the background, serves up foobar.jpg when asked for /images/AlkjsdfH293sdfhjh2234kjh.jpg
This keeps bots like you were thinking from working. Because each time the filename is different.
I had thought he was planning on using the whole file as the key to his cache, not merely the file-name. But articles I have read on generating captchas reccomend generating one-off captchas. You have a graphics library that takes one of the words on your wordlist, frobs it randomly, and produces a one-off image. You can even call the image generator captcha.png all the time, and let your session layer keep track of who got which word.
http://search.cpan.org/~unrtst/Authen-Captcha-1.023/Captcha.pm
for instance, uses the GD library to generate images as needed.
Actually a more complex to manage but could be more effective hack may be possible. A "human decoding farm" Basic concept is the "signature" of a captchas type challenge is routed to a human - who reads and decodes it- types it in and is instantly rewarded by... The next captcha to decode. rinse lather repeat for 8 hours or more a day. But such labor gets cheap . After all when wal-mart self checkouts merchandise made in Bangladesh and stocked by robots we ALL may wind up being captcha decoders for a spam haus.
Don't laugh- some high yield cybercrashings for financially rewarding data can pay .25 each hit At one "hit" per 10 min decoding session your resident of Calcutta earns his $1.50 hourly the hard way.
Oren
ObSciFi-Wolfsbane by Frederik Pohl and C.M Kornbluth. Briefly carries the concept of humans as components a bit farther. DejaVu of the Matrix movies but much better executed.
Maybe you were already aware of this, but human farming has already been documented. Apparently they get people to provide capcha answers in exchange for pornography. I doubt you'd see a capcha sweatshop, though I wouldn't be surprised to see a asian cybercafe offer discounts for capcha replies.
On 4/17/05, Oren Beck oren_beck@hotmail.com wrote:
Actually a more complex to manage but could be more effective hack may be possible. A "human decoding farm" Basic concept is the "signature" of a captchas type challenge is routed to a human - who reads and decodes it- types it in and is instantly rewarded by... The next captcha to decode. rinse lather repeat for 8 hours or more a day. But such labor gets cheap . After all when wal-mart self checkouts merchandise made in Bangladesh and stocked by robots we ALL may wind up being captcha decoders for a spam haus.