File Descriptor : a simple explanation

File Descriptor : a simple explanation

What is a File Descriptor? Really, how does it work?

When you don’t have a clear and global comprehension of the Linux system, it’s hard to answer this question. Worse, when you search about File Descriptor on the net, the absolutely only things you’ll find will be either pages about redirections (as if the idea of File Descriptor could be reduced to standard input, output and error) or complicated stuff in C (you can feel my frustration here).

File Descriptor : what it is
The File Descriptor (FD) is a simple way for a process to handle a file. It’s usually a number given by the kernel. Instead of accessing the file directly (with its absolute path for example), it will access the file with its FD number.

For example, in my /var/log/directory, I’ve got few files :

bart@Host:~$ sudo ls /var/log

Now I want to open the file messages with tail :

bart@Host:~$ sudo tail -f /var/log/messages
(various things showing up)

Note : I’m using the -f option, so that tail keeps on accessing the file.

tail doesn’t open the file itself by accessing directly to the hard drive with the inode number and etc. It’s not its job. It’s the job of the kernel. So tail will just ask the kernel to open the file. The kernel will open that file and tell tail : “from now on, to access that file, use the FD number 3”.

You can see that information using the lsof command :

bart@Host:~$ sudo lsof | grep 'tail '

tail      16724     root    3r      REG      251,0   310834     152212 /var/log/messages

As you can see on the 4th column, tail handles the file with the File Descriptor number 3.

Note : the FD is given by the kernel only when a process is accessing a file. If a file is not accessed by anything, it’s got no FD number.

What is the letter r following the FD number?
According to the manpage of lsof, at the OUTPUT section, it describes the “mode under which the file is open:

  • r for read access;
  • w for write access;
  • u for read and write access;
  • space if mode unknown and no lock character follows;
  • `-‘ if mode unknown and lock character follows.

See the LOCKS section for more information on  the  lock information character.”

So it’s basically the way the file is accessed (even though, as the manpage states, there are some more complex ways to access a file).

In our example above, tail doesn’t need a write access to the file. Some programs will need read/write access (u) like vi, while others need only a write access (w), like syslog :

syslog-ng   814     root   58w      REG      251,0   310834     152212 /var/log/messages
syslog-ng   814     root   67w      REG      251,0        0     149778 /var/log/debug

Why does the FD number start at 3?
As you can see in the previous example with tail, the FD number given to the file is 3, even though this file is the first (and only) file access by tail. How come?

Simply because the first three numbers are given to the famous standard streams :

  • standard input : FD 0
  • standard output : FD 1
  • standard error : FD 2

The next FD number is then 3 because when any application is started, it can access the FD 0 to 2. You can see that in the result of a simple lsof. This means that any application is able to read the standard input and write on the standard output and error.

You can find the different FD created for an application under the /proc directory, in the PID of the application, subdirectory FD (/proc/PID/fd). Note that the different FDs are actually pointing to the actual files (or /dev nodes).

For the tail example seen above, with PID 16724 :

bart@Host:~$ sudo ls -l /proc/16724/fd
total 0
lrwx------ 1 root root 64 2011-11-03 16:44 0 -> /dev/pts/15
lrwx------ 1 root root 64 2011-11-03 16:44 1 -> /dev/pts/15
lrwx------ 1 root root 64 2011-11-03 16:44 2 -> /dev/pts/15
lr-x------ 1 root root 64 2011-11-03 16:44 3 -> /var/log/messages

Is the FD number local or global?
The FD number is local to the process handling the file. Take the examples above, with the file /var/log/messages accessed at the same time by tail and syslog-ng. The file has a different FD number for each application.

Is it possible to handle many files at once ?
Of course, File Descriptors allow applications to open many files at once. While it’s hard to do that with a application like vi, it’s rather obvious with syslog-ng :

syslog-ng   814     root   58w      REG      251,0   315269     152212 /var/log/messages
syslog-ng   814     root   59w      REG      251,0   305043     136237 /var/log/syslog
syslog-ng   814     root   60w      REG      251,0  8526940     131097 /var/log/pureftp.log
syslog-ng   814     root   61w      REG      251,0  3151296     145663 /var/log/daemon.log

In general, it’s possible for an application to open 1024 files at once, and this value can be tuned.

What are other types of FD?
We know that under Linux/UNIX, “everything is a file”. This means that actual files will be given a FD number, but other objects in the system being seen as files must have a FD.

There’s few values set by the kernel for the application to easily access these objects. They are described in the manpage of lsof. Among the most interesting :

cwd  current working directory;
mem  memory-mapped file;
pd   parent directory;
rtd  root directory;
txt  program text (code and data);

For example, if the application need to access to its current directory, it will just access the FD cwd.

This is particularly interesting in the case of the chrooted applications. The chroot environnement is set by the kernel, which knows the actual path in which the applications is jailed. To go back to the root directory (normally /), the application simply opens the FD rtd (root directory).

In the following example, if mysqld wants to go to the root directory, it will access the FD rtd, which is actually /server/mysql :

mysqld 18012 mysql rtd DIR 3,10 4096 868353 /server/mysql

Reference :