A quick overview of the Linux kernel

I didn't know anything about the Linux kernel, so I'd like to summarize it with reference to the book Complete Understanding Linux Kernel Super Introduction.

What is the Linux kernel?

The Linux kernel is the core and important software of the OS.

--The job of connecting applications and computer hardware --Computers have many types of hardware resources such as HDD, CPU, and memory that are required to operate applications, so if they are not centrally managed, there will be conflicts, so a kernel is required.

environment

It is recommended to use AWS Cloud9 [^ 1], which allows you to easily create a server and touch the terminal as an environment for hands-on.

Kernel entity

--Display the kernel version running with uname -r --vmlinuz-x.xx.x-xxxx-aws in the/boot directory is the kernel entity --You can see that the file size is only about 8.5MB

$ uname -r
5.3.0-1033-aws
$ ls -lh /boot | grep vml
-rw------- 1 root root 7.6M Aug 27  2018 vmlinuz-4.15.0-1021-aws
-rw------- 1 root root 8.5M Aug  5 14:10 vmlinuz-5.3.0-1033-aws
-rw------- 1 root root 8.5M Sep  5 16:49 vmlinuz-5.3.0-1035-aws

The Linux kernel separates non-essential functions and manages them in separate object files so that they can be used when needed.

The reason is that if you aggregate the functions in the kernel, you need to compile all the necessary functions from the beginning and statically link them, and many of the functions are in memory even though they are not actually used. This is because it is resident and the entire kernel must be rebuilt and restarted when new features are needed.

An object file that contains code that extends the kernel running the OS in this way is called a loadable kernel module (LKM).

-Loadable Kernel Module

Loadable kernel modules are located under / lib/modules/kernel version/kernel /.

$ ls -F /lib/modules/5.3.0-1033-aws/kernel/
arch/  crypto/  drivers/  fs/  lib/  net/  virt/  wireguard/  zfs/

Modules already loaded in the kernel can be listed using the lsmod command.

$ lsmod
Module                  Size  Used by
ufs                    81920  0
msdos                  20480  0
xfs                  1273856  0
xt_conntrack           16384  1
xt_MASQUERADE          20480  1
nf_conntrack_netlink    45056  0
nfnetlink              16384  2 nf_conntrack_netlink
xfrm_user              36864  1
xfrm_algo              16384  1 xfrm_user
<The following is omitted>

Process management

View running processes

The running process is displayed with the ps command.

$ ps aux | head -n 10
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.1  0.7 225392  7804 ?        Ss   09:55   0:04 /sbin/init
root         2  0.0  0.0      0     0 ?        S    09:55   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        I<   09:55   0:00 [rcu_gp]
root         4  0.0  0.0      0     0 ?        I<   09:55   0:00 [rcu_par_gp]
root         6  0.0  0.0      0     0 ?        I<   09:55   0:00 [kworker/0:0H-kb]
root         9  0.0  0.0      0     0 ?        I<   09:55   0:00 [mm_percpu_wq]
root        10  0.0  0.0      0     0 ?        S    09:55   0:00 [ksoftirqd/0]
root        11  0.0  0.0      0     0 ?        I    09:55   0:01 [rcu_sched]
root        12  0.0  0.0      0     0 ?        S    09:55   0:00 [migration/0]
<The following is omitted>

You can see that one record corresponds to one process and multiple processes are running.

Memory management

Linux separates the memory space used by each process from the physical address to achieve process-by-process separation and effectively increase the amount of available memory. The memory space is divided into pages (units of fixed size), and physical memory is allocated in page units as much as the process uses the virtual memory space. This is called the Paging Method (https://ja.wikipedia.org/wiki/%E3%83%9A%E3%83%BC%E3%82%B8%E3%83%B3%E3%82%B0%E6%96%B9%E5%BC%8F).

Page collection

Linux speeds up processing because when a process reads or writes a file to disk, the kernel temporarily stores its contents in a memory area called the page cache so that the file in the cache can be used when accessing the same file. I will. The page cache is discarded when the free memory becomes low, and releasing the cache to free memory is called page collection.

The page cache usage is the buff/cached part of the free command. (554MB in the example below)

$ free -m
              total        used        free      shared  buff/cache   available
Mem:            978         321         102           2         554         490
Swap:           488          63         425

Write 3 to / proc/sys/vm/drop_caches to free the page cache. It is a good idea to use the sync command to reflect the files on the disk in advance.

$ sync
$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
$ free -m
              total        used        free      shared  buff/cache   available
Mem:            978         325         526           2         126         520
Swap:           488          63         425

You can see that the value of buff/cache decreases, the value of free-Mem increases, and the memory is released.

Device management

Various peripheral devices (devices) use device files when exchanging data with programs. On Linux, devices are abstracted and treated as files [^ 2].

The device file list exists under the / dev directory.

$ ls -F /dev
autofs           log@                port      tty13  tty31  tty5   ttyS1      vcsa6
block/           loop-control        ppp       tty14  tty32  tty50  ttyS2      vcsu
btrfs-control    loop0               psaux     tty15  tty33  tty51  ttyS3      vcsu1
char/            loop1               ptmx      tty16  tty34  tty52  ttyprintk  vcsu2
console          loop2               pts/      tty17  tty35  tty53  udmabuf    vcsu3
core@            loop3               random    tty18  tty36  tty54  uinput     vcsu4
cpu_dma_latency  loop4               rfkill    tty19  tty37  tty55  urandom    vcsu5
cuse             loop5               rtc@      tty2   tty38  tty56  vcs        vcsu6
disk/            loop6               rtc0      tty20  tty39  tty57  vcs1       vfio/
ecryptfs         loop7               shm/      tty21  tty4   tty58  vcs2       vga_arbiter
fd@              mapper/             snapshot  tty22  tty40  tty59  vcs3       vhost-net
full             mcelog              stderr@   tty23  tty41  tty6   vcs4       vhost-vsock
fuse             mem                 stdin@    tty24  tty42  tty60  vcs5       xen/
hpet             memory_bandwidth    stdout@   tty25  tty43  tty61  vcs6       xvda
hugepages/       mqueue/             tty       tty26  tty44  tty62  vcsa       xvda1
hwrng            net/                tty0      tty27  tty45  tty63  vcsa1      zero
initctl@         network_latency     tty1      tty28  tty46  tty7   vcsa2      zfs
input/           network_throughput  tty10     tty29  tty47  tty8   vcsa3
kmsg             null                tty11     tty3   tty48  tty9   vcsa4
lightnvm/        nvram                tty12     tty30  tty49  tty50. vcsa5

Block device

The device file xvda is preceded by b. This means a block device, which is a device that exchanges data in a block format (a device that can specify an address such as a hard disk drive/memory area).

$ ls -l /dev/xvda
brw-rw---- 1 root disk 202, 0 Sep 22 00:02 /dev/xvda

You can also display the block device with the lsblk command.

$ lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0     7:0    0 12.7M  1 loop /snap/amazon-ssm-agent/495
loop1     7:1    0 96.6M  1 loop /snap/core/9804
loop2     7:2    0 28.1M  1 loop /snap/amazon-ssm-agent/2012
loop3     7:3    0 97.1M  1 loop /snap/core/9993
xvda    202:0    0   10G  0 disk 
└─xvda1 202:1    0   10G  0 part /

Character device

On the other hand, the device file tty is preceded by c. This means a character device, which is a device (keyboard, etc.) that the system transfers data byte by byte.

$ ls -l /dev/tty
crw-rw-rw- 1 root tty 5, 0 Sep 22 00:02 /dev/tty

udev

Linux uses udev as a device management tool. udev acts as a daemon on Linux, and the kernel notifies his udev when a new device is connected to or disconnected from the system. When udev is notified, it creates a device file based on the rules. Display udev events with udevadm monitor.

$ udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent
<The following is omitted>

File system

The file system is a function of the Linux kernel for manipulating computer resources. Simply put, it's a framework and method for working with files [^ 4]. A file mainly refers to data stored in auxiliary storage, but some file systems also provide information such as devices, processes, and kernel information as a file.

File system type

--Disc - ext4 - ext3 - XFS --Network file sharing - NFS - SMB/CIFS --Special use - procfs, sysfs - tmpfs - FUSE

Use the sudo parted -l command to display partition information for all block devices.

$ sudo parted -l                                                                                                                                                                          
Model: Xen Virtual Block Device (xvd)
Disk /dev/xvda: 10.7GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  10.7GB  10.7GB  primary  ext4         boot

/ proc/filesystems has a list of filesystems supported by the kernel.

$ cat /proc/filesystems 
nodev   sysfs
nodev   tmpfs
nodev   bdev
nodev   proc
nodev   cgroup
nodev   cgroup2
nodev   cpuset
nodev   devtmpfs
nodev   configfs
nodev   debugfs
nodev   tracefs
nodev   securityfs
nodev   sockfs
nodev   bpf
nodev   pipefs
nodev   ramfs
nodev   hugetlbfs
nodev   devpts
        ext3
        ext2
        ext4
        squashfs
        vfat
nodev   ecryptfs
        fuseblk
nodev   fuse
nodev   fusectl
nodev   mqueue
nodev   pstore
        btrfs
nodev   autofs
nodev   overlay
nodev   aufs

VFS

VFS (Virtual File System) is a kernel function that allows applications to access various file systems in the same way. With VFS, you don't have to worry about the difference because you have transparent access to both local and network storage.

One of the information managed by the VFS is the inode. The inode is responsible for associating file and disk information. The directory manages files indirectly by holding inode information.

The number (280257) displayed on the far left with ls -i corresponds to the inode number.

$ ls -il
total 4
280257 -rw-r--r-- 1 ubuntu ubuntu 569 Aug 28 05:46 README.md

Use the stat command to check the inode information in detail.

$ stat README.md 
  File: README.md
  Size: 569             Blocks: 8          IO Block: 4096   regular file
Device: ca01h/51713d    Inode: 280257      Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/  ubuntu)   Gid: ( 1000/  ubuntu)
Access: 2020-08-28 06:25:56.732380517 +0000
Modify: 2020-08-28 05:46:13.000000000 +0000
Change: 2020-08-28 06:27:17.072779096 +0000
 Birth: -

IO

signal

A mechanism for notifying a program of a request such as termination, interruption, or hibernation from the kernel. When the process receives the signal, it executes the process registered for each signal type.

$ cat /proc/interrupts 
           CPU0       
  0:         43   IO-APIC   2-edge      timer
  1:          9  xen-pirq   1-ioapic-edge  i8042
  4:       3576  xen-pirq   4-ioapic-edge  ttyS0
  8:          2  xen-pirq   8-ioapic-edge  rtc0
  9:          0  xen-pirq   9-ioapic-level  acpi
 12:          3  xen-pirq  12-ioapic-edge  i8042
 14:          0   IO-APIC  14-edge      ata_piix
 15:          0   IO-APIC  15-edge      ata_piix
 48:     117991  xen-percpu    -virq      timer0
 49:          0  xen-percpu    -ipi       resched0
 50:          0  xen-percpu    -ipi       callfunc0
 51:          0  xen-percpu    -virq      debug0
 52:          0  xen-percpu    -ipi       callfuncsingle0
 53:          0  xen-percpu    -ipi       spinlock0
 54:        258   xen-dyn    -event     xenbus
 55:      30247   xen-dyn    -event     blkif
 56:      11722   xen-dyn    -event     eth0
NMI:          0   Non-maskable interrupts
LOC:          0   Local timer interrupts
SPU:          0   Spurious interrupts
PMI:          0   Performance monitoring interrupts
IWI:          0   IRQ work interrupts
RTR:          0   APIC ICR read retries
RES:          0   Rescheduling interrupts
CAL:          0   Function call interrupts
TLB:          0   TLB shootdowns
TRM:          0   Thermal event interrupts
THR:          0   Threshold APIC interrupts
DFR:          0   Deferred Error APIC interrupts
MCE:          0   Machine check exceptions
MCP:         13   Machine check polls
HYP:     161747   Hypervisor callback interrupts
ERR:          0
MIS:          0
PIN:          0   Posted-interrupt notification event
NPI:          0   Nested posted-interrupt event

-Memory Management-Wikipedia -Kernel --Wikipedia -Chapter 14 Task Scheduler Tuning -Virtual File System-wikipedia

Recommended Posts

A quick overview of the Linux kernel
A memo for utilizing the unit test mechanism KUnit of the Linux kernel
A memorandum of kernel compilation
A brief summary of Linux
Avoiding the pitfalls of using a Mac (for Linux users?)
[Linux] [kernel module] Specify / limit the execution CPU of kthread
A rough summary of the differences between Windows and Linux
Try the Linux kernel lockdown mechanism
The story of writing a program
Linux overview
On Linux, the time stamp of a file is a little past.
How to output the output result of the Linux man command to a file
[Understanding in 3 minutes] The beginning of Linux
Try to make a kernel of Jupyter
Measure the relevance strength of a crosstab
[python] [meta] Is the type of python a type?
Understand the "temporary" part of UNIX / Linux
A brief summary of Pinax overview #djangoja
[Linux] Learn the basics of shell commands
A memo explaining the axis specification of axis
Get the filename of a directory (glob)
The story of blackjack A processing (python)
Notice the completion of a time-consuming command
Compiling the Linux kernel (Linux 5.x on Ubuntu 20.04)
Display the signal strength RSSI of a specific SSID (raspberry pi (linux))
[Linux] Command to get a list of commands executed in the past
View the full path (absolute path) of a file in a directory in Linux Bash
I measured the run queue wait time of a process on Linux
A note about the functions of the Linux standard library that handles time
How to calculate the volatility of a brand
Get the caller of a function in Python
Visualize the inner layer of a neural network
The origin of Manjaro Linux is "Mount Kilimanjaro"
Calculate the memory sharing rate of Linux processes
How to access the contents of a Linux disk on a Mac (but read-only)
[2020July] Check the UDID of the iPad on Linux
Make a copy of the list in Python
Find the number of days in a month
A note about the python version of python virtualenv
The story of making a lie news generator
Set a fixed IP in the Linux environment
Calculate the probability of outliers on a boxplot
[Python] A rough understanding of the logging module
Output in the form of a python array
It's a Mac. What is the Linux command Linux?
Get the latest Linux kernel version with Arch Linux
The story of making a mel icon generator
[AWS Lambda] Create a deployment package using the Docker image of Amazon Linux
[Linux command] A memorandum of frequently used commands
A discussion of the strengths and weaknesses of Python
Kernel / VM Advent Calendar 2013 Day 3: Let's try the lock function called lockref of the Linux kernel
An easy way to measure the processing speed of a disk recognized by Linux
I can't find the clocksource tsc! ?? The story of trying to write a kernel patch
The story of Linux that I want to teach myself half a year ago
For Debian users who are having trouble with a bug in the Linux kernel 5.10
[Linux] A list of Linux commands that beginners should know
A story that reduces the effort of operation / maintenance
Get the variable name of the variable as a character string.
Calculate volume from the two-dimensional structure of a compound
The kernel of jupyter notebook can no longer connect
A quick introduction to the neural machine translation library