What filesystem are you using?  Is it ReiserFS?  Also have you considered ext3 with a journal volume or jfs?  JFS is much more reliable, especially if you move to using separate journal files instead of inlining.  It helps to know what type of FS you are using to determine the problem.  I have had significant issues with ReiserFS in the past.  It is a fast FS, but tends to loose links to inodes on occasions.  Ext3 seems to be pretty stable.  Also what distro are you using?  Hopefully you are not on Suse 9.1 (workstation edition essentially).  Suse has been know it introduce some nasty bugs on occasions in their releases.  Redhat seems to be a bit cleaner on enterprise hardware.  You may also want to consider alternatives like Veritas FS, IBM's JFS package, or at least thelinux LVM since it sounds like you are doing enterprise Linux stuff.  If you are at the LUG meeting tonight, we can discuss this further. 

-----Original Message-----
From: Phil Thayer
Sent: Sep 6, 2006 2:05 PM
To: Kclug@kclug.org
Subject: File System Corruption

I ran into a problem where a file system was corrupted beyond repair and wondered if anyone has seen anything like this before or has a reasonable explanation.  Here is the scenario:
Linux was running on an Intel X64 system with two local drives mirrored in Linux containing the swap file system.
The system was booting off a SAN drive where the rest of Linux was loaded.
There were three other SAN LUN's being used that were:
    1. 500 GB
    2. 150 GB
    3. 35 GB (Linux LUN.)
We swapped the system hardware to a different box and changed the HBA to be on the new box so the HBA Bios was still allowing the Boot from SAN.
The new box had two local drives but they were mirrored at the hardware level with RAID 1 (so Linux would have only seen one drive drive.)
The system was rebooted and crashed with numerous file system corruption errors.
The 500 GB LUN on the SAN got severely corrupted on the reboot of Linux from the SAN with the new hardware to the point where it could not be repaired.
What I am wondering is, What caused the system to get corrupted?
Is it possible that the lack of a swap file system mounted would have caused this to happen?  Or is it because the /dev devices were not the same as they were in the first configuration?
Any ideas?
Phillip Thayer