-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I work at the school for the blind, so OCR is a regularly used technology on our campus. It is fairly good for allowing a visually impaired person to have reasonably accurate access to printed material, but even the most expensive setups aren't 100% accurate.
It's kind of like voice recognition - in ideal circumstances you can get about 98-99% accuracy. However, you will never have ideal circumstances in a real-life setting, so actual accuracy can vary from about 75% to 99%. I have seen OCR capture an entire page of 12pt Times text at 100% accuracy, and I've also seen voice recognition systems capture a sizable dictation at 100% accuracy, but it is uncommon.
One thing to know about OCR is that higher resolution and color depth do not make for higher accuracy. We get our best accuracy at around 150-300 DPI and in black-and-white (aka line-art) modes. Higher resolution wont necessarily make the accuracy worse, but it doesn't help and it makes scan times longer and files much larger. Using gray scale or color usually does decrease the accuracy in my experience.
- -- ~Bradley Hook Education Systems Administrator Kansas State School for the Blind 1100 State Avenue Kansas City, KS 66102 Voice: (913) 281-3308 ext. 363 Mobile: (913) 645-9958 Facsimile: (913) 281-3104 http://www.kssb.net
Jon Pruente wrote: | Anyone ever wonder why banks still use magnetic ink to print the | characters on your checks? Because they print in a very specific font | and don't rely on a computer analyzing the picture of a character to | figure out what it is - magnetic ink is proven and reliable. OCR is a | long running problem. I used to play with it way back in the day | (like, '94 or '95-ish) on my old Packard Bell laptop. It was slow as | sin, but it sort of worked. AFAIK, things have generally only gotten | faster due to CPU speed, not really much better at actually | deciphering text. If all your papers printed in a very OCR friendly | font with strong contrast of ink to paper your accuracy rates would be | good, but they will never be 100%, of course. So, OCR is truly a | lossy format. Every OCR setup I've ever bothered to read about still | needs a proof reader. If a person still has to read, understand and | verify every page that is scanned in you still have a load of man | hours to deal with just getting the stuff in the system. A good data | entry clerk would be a fair match for a proof reader, I'd wager. ;) | | Jon. | _______________________________________________ | Kclug mailing list | Kclug@kclug.org | http://kclug.org/mailman/listinfo/kclug | |
****************************************************************************************** Confidentiality Statement: This message and accompanying documents are covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, and contain information intended for the specified individual(s) only. This information is confidential unless explicitly indicated otherwise. If you are not the intended recipient or an authorized agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, copying, or the taking of any action based on the contents of this information is strictly prohibited. If you have received this communication in error, please notify the sender immediately by E-mail, and delete the original message. ******************************************************************************************