/ full?
Charles Steinkuehler
charles at steinkuehler.net
Sat Nov 30 18:48:05 CST 2002
Carl Sappenfield wrote:
> If my suspicion is correct, and you don't stop the process which holds an
> open descriptor to the deleted file, you will never get your space back.
> Rebooting just seems to me the easiest way (on a home computer) to
> accomplish this if you have no idea what process is causing the problem.
> Let me describe in more detail what I think is going on:
> du is telling you how much space you've actually used, and df is telling you
> how much space the OS thinks you've used. These numbers don't look like
> they match to me.
> (du looks at each file, and tells you how many bytes it takes up, and adds
> these numbers up for each directory.
> df asks the OS how much free space it thinks you have, and consequently how
> much you've used.)
> What happens (like I said, at least in AIX) is that if a file is deleted
> while an open descriptor is held, the file name is taken out of the FAT.
> That means du won't find it when it cumulates disk usage totals. Because
> the descriptor is held, though, the OS doesn't recognize that space as being
> released, and df still reports the space as being used.
> That descriptor will not be closed until the process holding it open is
> stopped, at which point you will see your free space magically increase.
I've definitely seen a problem similar to this in linux. When deleting
large files to clear space on nearly full partitions, there is a very
definite lag between actually deleting the file, and when df shows free
space on the drive. I haven't tried to measure this lag accurately, but
it is longer than a minute or two, and shorter than a day or so (running
df after rm doesn't show the new free space, but it's available when I
check back the next day...probably a low-priority background task that
frees unused disk blocks every so often).
While this doesn't sound like the problem being experienced, it does
create a disparity between the du and df outputs (for the same reason
you mention above...du represents the space taken by files, and the file
is gone instantly, but df represents the space taken by the OS, which
takes a while to mark the space as free), and IMHO lends weight to your
suspicions.
I would also suspect something /var, probably a log file or similar.
Regardless, I don't think several gig has gone missing due to thousands
of small files...it's probably just a handful of large files that are
causing trouble. I'd start by sifting through the du list for largish
files that get rotated (log files, queues, or similar), and make sure
the rotation rules are properly implemented for each program.
Especially watch for programs that write directly to their own log file,
and will therefore keep a file-descriptor open for this file even when
it is moved (and possibly when removed) by the log rotate scripts. It
is important for these programs to be restarted (or -HUP'd, or otherwise
signaled, as appropriate) when you move their logfile...
--
Charles Steinkuehler
charles at steinkuehler.net
More information about the Kclug
mailing list