I followed the implementation of the du command (first half)

Introduction

du is a command to display the directory capacity. If you use it without options, it will display all the directory capacities under the current directory. This time, I will follow the implementation to see how the function is realized.

Expected

As a behavior, it would be nice if there was a function to trace the directory and add up the file size in it. For that, the following functions are likely to be required.

  1. Follow directories and files in order
  2. Get the file size
  3. Sum the file size in the directory

Let's look at the implementation, focusing on how these features are realized.

Follow the implementation

This time we'll look at the du.c source code included in GNU coreutils. The main function calls du_files ().

python


/* Recursively print the sizes of the directories (and, if selected, files)
   named in FILES, the last entry of which is NULL.
   BIT_FLAGS controls how fts works.
   Return true if successful.  */

static bool
du_files (char **files, int bit_flags)
{
  bool ok = true;

  if (*files)
    {
      FTS *fts = xfts_open (files, bit_flags, NULL); //(1)

      while (1)
        {
          FTSENT *ent;

          ent = fts_read (fts); //(2)
          if (ent == NULL)
            {
              if (errno != 0)
                {
                  error (0, errno, _("fts_read failed"));
                  ok = false;
                }
              break;
            }
          FTS_CROSS_CHECK (fts);
          ok &= process_file (fts, ent); //(3)
        }

      if (fts_close (fts) != 0)
        {
          error (0, errno, _("fts_close failed"));
          ok = false;
        }
    }

  return ok;
}

The processing performed here is the following three.

(1). Get the FTS structure from the file name with xfts_open (). (2). Get the FTSENT structure from the FTS structure with fts_read (). (3). Call process_file () and return to (2).

xfts_open() Xfts_open () in (1) is a wrapper for fts_open () and internally fts_open () is called. The argument passed to xfts_open () is the filename. This file name is specified by the command line argument, and if there is no argument, the current directory is passed. The FTS structure obtained by xfts_open () is a handle of the file hierarchy. The field has an FTSENT structure that represents information about the file, with the current position and child FTSENT structures. Each time you call fts_read () in (2), they change and follow the file hierarchy (see below).

fts_read() In (2) fts_read (), the FTSENT structure at the current position is obtained from the FTS structure. The FTSENT structure has the following fields (fts).

python


typedef struct _ftsent {
    unsigned short fts_info;     /*Flags for FTSENT structures*/
    char          *fts_accpath;  /*Access path*/
    char          *fts_path;     /*Root path*/
    short          fts_pathlen;  /* fts_path length*/
    char          *fts_name;     /*file name*/
    short          fts_namelen;  /* fts_name length*/
    short          fts_level;    /*depth(-1 〜 N) */
    int            fts_errno;    /*File error number*/
    long           fts_number;   /*Local number*/
    void          *fts_pointer;  /*Local address number*/
    struct ftsent *fts_parent;   /*Parent directory*/
    struct ftsent *fts_link;     /*The following file structure*/
    struct ftsent *fts_cycle;    /*Circulating structure*/
    struct stat   *fts_statp;    /* stat(2)Information*/
} FTSENT;

The file size is on fts_stap (stat).

python


struct stat {
    dev_t     st_dev;     /*ID of the device where the file is located*/
    ino_t     st_ino;     /*inode number*/
    mode_t    st_mode;    /*Access protection*/
    nlink_t   st_nlink;   /*Number of hard links*/
    uid_t     st_uid;     /*Owner's user ID*/
    gid_t     st_gid;     /*Owner's group ID*/
    dev_t     st_rdev;    /*Device ID(For special files) */
    off_t     st_size;    /*Overall size(Byte unit) */
    blksize_t st_blksize; /*File system I/In O
Block size*/
    blkcnt_t  st_blocks;  /*Number of 512B blocks allocated*/
};

Therefore, you can get the file size by following the stat structure from the FTSENT structure.

Now, the FTSENT structure was obtained by fts_read () in (2). Since fts_read () automatically follows the file hierarchy, it seems that the du command can be implemented by repeatedly callingfts_read ()and totaling the file size. And the process of totaling the file size is done by process_file () in (3).

Follow the file

Now, before going into the explanation of (3), let's see the operation of tracing the file hierarchy by fts_read (). Check the operation with the following code (Read FreeBSD's ls Let's make ls with fts (3) ~).

python


#include <stdio.h>
#include <stdlib.h>
#include <fts.h>

int main (int argc, char *argv[])
{
        FTS *ftsp;
        FTSENT *p;
        static char dot[] = ".";
        static char *dotav[] = {dot, NULL};

        if (argc == 1)
                argv = dotav;
        else
                argv++;
        ftsp = fts_open(argv, 0, NULL);
        while((p = fts_read(ftsp)) != NULL) {
                printf("%s\n", p->fts_path);
        }
        fts_close(ftsp);

        exit(EXIT_SUCCESS);
}

This code is an implementation of the ls command using the fts function. All files and directories under the directory are displayed. The fts_read () in the while statement follows the file hierarchy, and the p-> fts_path displays the file name. The execution result is as follows.

$ ./ls 
.
./dir1
./dir1/file3
./dir1/file4
./dir1/dir2
./dir1/dir2/file5
./dir1/dir2
./dir1
./ls.c
./ls
./file2
./file1
.

The display order is as follows. dir.jpeg

You can see that it traverses depth-first, files are traversed once and directories are traversed twice.

Continued

Next time will see the process of process_file () in (3).

Recommended Posts

I followed the implementation of the du command (first half)
I followed the implementation of the du command (second half)
I read the implementation of golang channel
I read the implementation of range (Objects / rangeobject.c)
[Linux] I tried to summarize the command of resource confirmation system
First Python 3 ~ The beginning of repetition ~
I investigated the mechanism of flask-login!
[Python] I thoroughly explained the theory and implementation of logistic regression
Can I pass the first grade of math test by programming?
[Python] I thoroughly explained the theory and implementation of decision trees
Get the first element of queryset
I made AI think about the lyrics of Kenshi Yonezu (implementation)
I tried to summarize the frequently used implementation method of pytest-mock
I used the worldcup command to check the outcome of the World Cup.
I want to leave an arbitrary command in the command history of Shell
I checked the contents of docker volume
I tried the asynchronous server of Django 3.0
I tried to summarize the umask command
I checked the options of copyMakeBorder of OpenCV
Othello-From the tic-tac-toe of "Implementation Deep Learning" (3)
I summarized the folder structure of Flask
I didn't know the basics of Python
Migemo version of the: find command,: mfind
The Python project template I think of.
Read the implementation of ARM global timer
Notice the completion of a time-consuming command
AtCoder Review of past questions (first half of 12 / 8,9)
Othello-From the tic-tac-toe of "Implementation Deep Learning" (2)
I tried to understand the learning function of neural networks carefully without using a machine learning library (first half).
I made an appdo command to execute a command in the context of the app
[Python] I thoroughly explained the theory and implementation of support vector machine (SVM)
I tried using scrapy for the first time
I tried the pivot table function of pandas
I tried cluster analysis of the weather map
I solved the deepest problem of Hiroshi Yuki.
Why the Python implementation of ISUCON 5 used Bottle
Before the coronavirus, I first tried SARS analysis
I checked the list of shortcut keys of Jupyter
I tried to touch the API of ebay
I tried python programming for the first time.
I tried to correct the keystone of the image
Try the free version of Progate [Python I]
I checked the session retention period of django
The story of misreading the swap line of the top command
I checked the processing speed of numpy one-dimensionalization
I tried Mind Meld for the first time
I touched some of the new features of Python 3.8 ①
I read and implemented the Variants of UKR
I want to customize the appearance of zabbix
I tried using the image filter of OpenCV
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
The story of Linux that I want to teach myself half a year ago
Second half of the first day of studying Python Try hitting the Twitter API with Bottle
[First data science ⑥] I tried to visualize the market price of restaurants in Tokyo
I took a look at the contents of sklearn (scikit-learn) (1) ~ What about the implementation of CountVectorizer? ~
I ran the TensorFlow tutorial with comments (first neural network: the beginning of the classification problem)
I want to use Linux commands at the command prompt! Use Linux commands at the command prompt instead of Git Bash