--- bewkard bewkard@gmail.com wrote:
I have finally had it with paperwork. This last tax season did me in.
I've talked to a couple people about using OCR to store documents digitally. I know that a few people on the list do this as well. I was wondering if anyone could give me some tips about what works and what doesn't work. Is it better to OCR things? is it better to scan and save a PDF or some other portable document?
Unless the document is printed in an OCR-friendly font, you aren't going to have a great deal of success with even modern OCR software. If all you need is to replace a print copy with a visual image, far better to scan the document as a graphic image and then store the graphic image in some graphics format.
If the documents need to be viewed by other people then PDF is a good choice, but if you are going to be the primary viewer of the documents all you really need to do is scan into a graphics format (like EPS, TIFF, or an application-specific but open format like the GIMP's XCF format) and save those files as-is.
Hardcopy tends to outlast digital storage methods, so some companies are using OCR in reverse (printing documents to read in with OCR much later on) to store some of their more long-term information. They are making things very easy for the computer though, because they are using print fonts which are very easy for OCR applications to read.
It costs a lot of money to get a computer to accurately extract information from a printed surface, as the scientists who extracted the earliest recording of the human voice (from a graphite-sheathed cylinder) discovered themselves:
French folk song is 'world's earliest recording', beating Edison by 11 years March 27, 2008
Their experience is not entirely unlike that of trying to use modern OCR software.
____________________________________________________________________________________ You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. http://tc.deals.yahoo.com/tc/blockbuster/text5.com