-----Original Message----- From: kclug-bounces@kclug.org [mailto:kclug-bounces@kclug.org] On Behalf Of Jeremy Turner
On Thu, 2005-04-28 at 17:44 -0500, Gerald Combs wrote:
Before I start rolling my own solution, does anyone know of
a utility or
collection of utilities that will
- Extract all of the MIME attachments from a mailing list archive,
You might check out uudeview or mpack http://www.fpx.de/fp/Software/UUDeview/ No URL for mpack. In the Debian package, the author says of mpack says to use uudeview.
- Extract the files from any archived (tar|zip|rar|...)
attachments,
I assume once you extract the files from your mail archives and remove any duplicates, it would be trivial to run a loop on all files you extracted:
<pseudo-code> if $extension eq ".tar.gz" or $extension eq ".tgz" then tar xvfz $filename else if $extension eq ".tar.bz2" or $extension eq ".tbz2" then tar xvfj $filename else if $extension eq ".zip" then unzip $filename end if </pseudo-code>
- Move each extracted file to a specific directory, renaming it if there's a naming collision, and
uudeview does this.
- Remove any duplicate files.
Maybe an MD5sum hash table to check for duplicates?
Jeremy
Wouldn't you want to check for dups. before you extract and mv to a dir? Or extract to dir., compare to final destination dir. And then mv file. You may want to compare filenames and dates to keep the newest or oldest file based on preference.
Kelsay, Brian - Kansas City, MO wrote:
Wouldn't you want to check for dups. before you extract and mv to a dir? Or extract to dir., compare to final destination dir. And then mv file. You may want to compare filenames and dates to keep the newest or oldest file based on preference.
Yup. The current plan is to loop thusly:
for each list_archive_file: run "formail -s mime_extraction_tool < list_archive_file" for each extracted_file: file_type = `file -z extracted_file` if file_type == a_tar_file: tar -xf extracted_file else if file_type == a_gzipped_tar_file: tar -xzf extracted_file else if file_type == a_zip_file: unzip extracted_file for each raw_file `find extraction_dir -type f`: if raw_file is a valid Ethereal capture file: check for file name collision in target directory check for a duplicate file in target directory: move the file to the target directory
None of these steps should take too long to implement individually. I was hoping someone would come back with "you can do that with AMaViS in about 60 seconds" so I could be lazy and not have to cobble it together myself.
On 4/29/05, Gerald Combs gerald@ethereal.com wrote:
None of these steps should take too long to implement individually. I was hoping someone would come back with "you can do that with AMaViS in about 60 seconds" so I could be lazy and not have to cobble it together myself.
There are bushels and bushels of CPAN packages to help you, but again, reading how to interface to someone else's system is a dubious gain over rolling your own. Thus the profusion. http://search.cpan.org/~dskoll/MIME-tools-5.417/ gets used in whole or partby other module authors, http://search.cpan.org/~markov/Mail-Box-2.060/lib/Mail/Box.pod is also a popular infrasturcture for dealing with e-mail storage formats.