On Fri, 2008-04-04 at 14:23 -0500, Billy Crook wrote:
Meant to send this out last night, but apparently it got stuck in drafts...
OCR will never be perfect. And because of that, you will *never know* for sure, where it failed. Once something becomes paper, all it is, is an image. I have never heard of OCR being a format of its own. It's usually used to 'convert' an image into text, stored as text, or convert an image stored as text, put into tags, stored with the image.
I have been storing all my tax and other documents electronically since 2004. I currently store scannedd documents in PDF format. I would prefer a multipage image format like TIFF, but haven't found a good program to do that. PDF is massively more popular.
If I can get an electronic copy from the sender I keep that and ditch the paper. Most banks and financial institutions now offer some form of electronic document delivery because it saves them money. This is usually PDF; Sometimes html. I believe the fewer format transformations I do on it, the better, so I will save it in whatever format I can get it in. If for ANY reason you think you need to print something out just to scan it in, don't. Use CupsPDF or PDF-Print, or something like it. It shows up as a printer in cups, and when you print to it, saves a pdf of what you "printed".
If I have to scan paper, I currently use a program called gscan2pdf. It runs the scanner and can save a multipage pdf file. Before you save, you have the chance to re-arrange the page order, which is handy if your ADF (automatic document feeder) skips a page, or jams. You can also rotate pages. My scanner is attached to the network, so if you remind me the day before, I can load it up, and demo the program at the lug meeting.
On Wed, Apr 2, 2008 at 9:22 PM, bewkard bewkard@gmail.com wrote:
I have finally had it with paperwork. This last tax season did me in.
I've talked to a couple people about using OCR to store documents digitally. I know that a few people on the list do this as well. I was wondering if anyone could give me some tips about what works and what doesn't work. Is it better to OCR things? is it better to scan and save a PDF or some other portable document?
Again, TIA
Tim
Kclug mailing list Kclug@kclug.org http://kclug.org/mailman/listinfo/kclug
Kclug mailing list Kclug@kclug.org http://kclug.org/mailman/listinfo/kclug
Actually, I do recall reading of someone who created a program that would back up (and later retrieve) files from paper. Of course, you couldn't store anything very large with it, but text isn't very large. Might that be a good solution here; or does it have to be human readable?