OCR

Bradley Hook bhook at kssb.net
Mon Apr 7 23:02:15 CDT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I work at the school for the blind, so OCR is a regularly used
technology on our campus. It is fairly good for allowing a visually
impaired person to have reasonably accurate access to printed material,
but even the most expensive setups aren't 100% accurate.

It's kind of like voice recognition - in ideal circumstances you can get
about 98-99% accuracy. However, you will never have ideal circumstances
in a real-life setting, so actual accuracy can vary from about 75% to
99%. I have seen OCR capture an entire page of 12pt Times text at 100%
accuracy, and I've also seen voice recognition systems capture a sizable
dictation at 100% accuracy, but it is uncommon.

One thing to know about OCR is that higher resolution and color depth do
not make for higher accuracy. We get our best accuracy at around 150-300
DPI and in black-and-white (aka line-art) modes. Higher resolution wont
necessarily make the accuracy worse, but it doesn't help and it makes
scan times longer and files much larger. Using gray scale or color
usually does decrease the accuracy in my experience.

- --
~Bradley Hook
Education Systems Administrator
Kansas State School for the Blind
1100 State Avenue
Kansas City, KS 66102
Voice: (913) 281-3308 ext. 363
Mobile: (913) 645-9958
Facsimile: (913) 281-3104
http://www.kssb.net

Jon Pruente wrote:
| Anyone ever wonder why banks still use magnetic ink to print the
| characters on your checks?  Because they print in a very specific font
| and don't rely on a computer analyzing the picture of a character to
| figure out what it is - magnetic ink is proven and reliable.  OCR is a
| long running problem.  I used to play with it way back in the day
| (like, '94 or '95-ish) on my old Packard Bell laptop.  It was slow as
| sin, but it sort of worked.  AFAIK, things have generally only gotten
| faster due to CPU speed, not really much better at actually
| deciphering text.  If all your papers printed in a very OCR friendly
| font with strong contrast of ink to paper your accuracy rates would be
| good, but they will never be 100%, of course.  So, OCR is truly a
| lossy format.  Every OCR setup I've ever bothered to read about still
| needs a proof reader.  If a person still has to read, understand and
| verify every page that is scanned in you still have a load of man
| hours to deal with just getting the stuff in the system.  A good data
| entry clerk would be a fair match for a proof reader, I'd wager. ;)
|
| Jon.
| _______________________________________________
| Kclug mailing list
| Kclug at kclug.org
| http://kclug.org/mailman/listinfo/kclug
|
|

******************************************************************************************
Confidentiality Statement:
This message and accompanying documents are covered by the Electronic
Communications Privacy Act, 18 U.S.C. 2510-2521, and contain information
intended for the specified individual(s) only.  This information is
confidential unless explicitly indicated otherwise.  If you are not the
intended recipient or an authorized agent responsible for delivering it
to the intended recipient, you are hereby notified that you have
received this document in error and that any review, dissemination,
copying, or the taking of any action based on the contents of this
information is strictly prohibited.  If you have received this
communication in error, please notify the sender immediately by E-mail,
and delete the original message.
******************************************************************************************
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH+u5HdLuK9oP1lmYRAvToAJ9mqSsZAH9ua4jt4IYHR1M7lF14iwCeLSCp
BxYYcdwLxo0uQUbxGoc5hNE=
=CAQh
-----END PGP SIGNATURE-----


More information about the Kclug mailing list