du and df show different results, after removing file df -h still showing same size linux ?

SHORT ANSWER

There are 4 reasons why du and df can show different answers:

1. Inconsistent fileystem requiring fsck(1m).
2. Process with open file which does not exist in filesystem.
3. Mount point directory contains data.
4. du command is being run as non-root and there are directories which restrict read permissions

LONG ANSWER


Before going into detail for the 4 possibilities, it is important to
recognize how du and df obtain their answers:

. du “walks” the filesystem (like “find” command would),
checking the size of each file in turn, and keeping track of the total.

. df makes a system call to the filesystem itself and requests a
number of details, one of which is the current disk space used.
(it gets the info directly from the superblocks of the filesystem).

1. Inconsistent fileystem requiring fsck(1m).

If the filesystem becomes corrupt/inconsistent for some reason, it
is quite likely that du and df will differ. What can be seen by a
process looking at the filesystem (i.e. du), does not match up with
the view the filesystem itself has (i.e. what will be returned to
the querying df process).

Corrupt/inconsistent filesystems should be repaired using fsck(1m).

2. Process with open file which does not exist in the filesystem directory
structure.

This scenario commonly occurs when some process keeps writing to a file
(usually a logfile) and a sysadmin deletes the file in panic to prevent
the filesystem from filling up. But the offending process keeps running
and the space is not freed (the process keeps the file open).

The disk blocks associated with a file are actually deleted and made
available for reuse when the last “reference” to the file is removed.
When a Unix process opens a file, the reference count to that file is
incremented. Subsequently, if the file itself is removed from the
filesystem, the data blocks remain in use until the process closes
the file, either explicitly with close(2), or implicitly when the
process dies.

Under these conditions, du will be unable to “see” the file in the
filesystem (it was rm’d from the dir. structure), and therefore will
not count its size, but df (in getting the answer from the filesystem
itself) “knows” the file still exists.

When the process closes the file (explicitly, or implicitly when the
process either quits or is killed, or the machine is rebooted), the
disk blocks will return to the freelist and du and df will agree.
Actually it is the unmount and remount of the filesystem that fixes
this problem. But obviously if some process has an open file on the
filesystem, it will be impossible to unmount the filesystem (device busy).

3. Directory mount point containing data.

As filesystems are mounted on top of directories, if a directory
mount point contains data, the du process will be unable to see this
data (seeing only the mounted filesystem), but the underlying
filesystem will still keep track of this data, consequently df will
report the extra disk space in use.

Unmounting the filesystem will reveal the data. However, if the mounted
filesystem is being used by running processes it will not be possible
to unmount it. Either identify and kill the processes (fuser(1m), etc.),
or reboot (possibly in single user mode) to check the mount point directory.

4. du cannot report on any files in a directory to which the user doesn’t have
a read permission. If the du command is run as root, this problem can be
eliminated. This can also be tested by running the du command with the -r
option. This will generate messages about directories that can be read,
files that cannot be opened and so forth, rather than being silent (the
default).

Explanation and demonstration of the missing space plus procedures to track it down and resolve it

Although the file was technically removed from the filesystem, the
disk space will not be freed if other process(es) have the same file
open. As soon as there are no active processes which have this file
open, the disk space will be freed.

To illustrate:

1. Find a filesystem with a good amount of free space.

Run “df -k” and record the “used”, “avail” and “capacity”.

2. Use mkfile(1m) to create a large file in that filesystem.

3. Run another “df -k” and record “used”, “avail”, and capacity .

The “used” should have increased, and the “avail” decreased.

4. From a second window, open the file you created. An easy way
to do this is with mdb(1).

5. From the original window, use “rm” to delete the file. Use
“ls” to confirm the file is gone.

6. Run another “df -k” and you’ll see the disk space is not freed.

7. From the second window, issue ^D to exit from mdb (thus
closing the file.

8. Run another “df -k” and you should see the disk space was freed.

NOTE: Before removing the file, you can use the fuser(1m) command
to identify processes that have a specified file open.

A common scenario is one in which a process is writing to a log file
and it keeps filling up the filesystem. Somebody tries to stop this
and just deletes the file which is still opened by the running
process. The only way to free the space at this point is to force the
process to close the file. That can be done by killing the process (if
you know which process it is) or unmounting the filesystem (as in
reboot).

Alternately, instead of deleting the log file, you can “cat /dev/null
> /path/to/logfile”, which empties the file without deleting it. But
the best way is generally to stop the offending process, delete the
file, then restart the process.

If you have already deleted the file and then discover the disk space
has not been released, you can use one of the following procedures to
attempt to locate the process that is holding on to the file:

Using /proc
-=-=-=-=-=-

The /proc filesystem gives access to all open files of all
processes. Assuming a role with sufficient permissions a user can find
all open files with a link count of zero. eg

# find /proc/*/fd -type f -links 0 \! -size 0 -ls

You can view the (e.g) log file with commands like:

# tail -f /proc//fd/

If you find that there is a process that you can’t kill that has a
huge file, you can truncate it with:

# cp /dev/null /proc//fd/

Be sure you have the right one.

Using other tools
-=-=-=-=-=-=-=-=-

1. Identify the file system and mount point that contains the file

# df /dirname

where /dirname is the name of the directory that contained the
original file.

The first field of the output will be the name of the mount point
(like / or /var).

2. Identify processes that have files open in this file system

# fuser -c /mountpoint

Refer to fuser(1M) for a full description of the output from this
command. In general, you will see a list of process IDs followed by
letter codes which indicate how the process is using files and
directories within the file system.

3. Narrow down the process list

This step is not an exact science. The list of processes obtained
from fuser may be quite long, so you need to try to focus on the
most likely suspects. For each process ID (PID) in your list, use
ptree(1) to determine what the process is doing.

# /usr/proc/bin/ptree PID

Another option is to use pcred(1) to identify the user and group
credentials of each process, to look for users that would be likely
to use the affected file.

# /usr/proc/bin/pcred PID

The output from these two commands will often help you to narrow
down the list of suspects to one or a few processes.

4. Identify the correct process

For each process in your short list, use pfiles(1) to identify all
open files.

# /usr/proc/bin/pfiles PID

The output from pfiles(1) will include a number of details about
each open file. Fields that can be very helpful in identifying the
file include: dev (device major and minor numbers), ino (inode
number), uid (user ID), gid (group ID), and size (file size in
bytes).

Note that the dev field for a regular file refers to the major and
minor device numbers of the file system in which the file
resides. You can generally identify the major and minor numbers of
your target file system by running:

# ls -lL /dev/block_device

where /dev/block_device is obtained from the second field in the
output from the “df /dirname” command executed in step one.

The major and minor device numbers will be listed in the fifth and
sixth fields of the output of the ls command (the location occupied
by the file size for a regular file).

5. Terminate the process

If you are satisfied that you have identified the process that is
using the file you tried to delete, you may decide to terminate the
process to attempt to release the file and its associated
space. Keep in mind that there might be more than one process that
has the same file open. You should use caution when evaluating the
best method and overall feasibility of terminating a single process
or service on your system.

If you absolutely cannot identify the process that is holding the file
open, or can not safely terminate the process or service with the
system running, you will need to schedule a reboot of the system to
terminate all active processes.

2 comments:

  1. Great article! Funny how things seem easy after you know how they work.

    Thanks for posting.

    ReplyDelete

Thank You for your Comments, We will read and response you soon...