I didn't know anything about the Linux kernel, so I'd like to summarize it with reference to the book Complete Understanding Linux Kernel Super Introduction.
The Linux kernel is the core and important software of the OS.
--The job of connecting applications and computer hardware --Computers have many types of hardware resources such as HDD, CPU, and memory that are required to operate applications, so if they are not centrally managed, there will be conflicts, so a kernel is required.
It is recommended to use AWS Cloud9 [^ 1], which allows you to easily create a server and touch the terminal as an environment for hands-on.
--Display the kernel version running with uname -r
--vmlinuz-x.xx.x-xxxx-aws
in the/boot
directory is the kernel entity
--You can see that the file size is only about 8.5MB
$ uname -r
5.3.0-1033-aws
$ ls -lh /boot | grep vml
-rw------- 1 root root 7.6M Aug 27 2018 vmlinuz-4.15.0-1021-aws
-rw------- 1 root root 8.5M Aug 5 14:10 vmlinuz-5.3.0-1033-aws
-rw------- 1 root root 8.5M Sep 5 16:49 vmlinuz-5.3.0-1035-aws
The Linux kernel separates non-essential functions and manages them in separate object files so that they can be used when needed.
The reason is that if you aggregate the functions in the kernel, you need to compile all the necessary functions from the beginning and statically link them, and many of the functions are in memory even though they are not actually used. This is because it is resident and the entire kernel must be rebuilt and restarted when new features are needed.
An object file that contains code that extends the kernel running the OS in this way is called a loadable kernel module (LKM).
Loadable kernel modules are located under / lib/modules/kernel version/kernel /
.
$ ls -F /lib/modules/5.3.0-1033-aws/kernel/
arch/ crypto/ drivers/ fs/ lib/ net/ virt/ wireguard/ zfs/
Modules already loaded in the kernel can be listed using the lsmod
command.
$ lsmod
Module Size Used by
ufs 81920 0
msdos 20480 0
xfs 1273856 0
xt_conntrack 16384 1
xt_MASQUERADE 20480 1
nf_conntrack_netlink 45056 0
nfnetlink 16384 2 nf_conntrack_netlink
xfrm_user 36864 1
xfrm_algo 16384 1 xfrm_user
<The following is omitted>
The running process is displayed with the ps
command.
$ ps aux | head -n 10
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.1 0.7 225392 7804 ? Ss 09:55 0:04 /sbin/init
root 2 0.0 0.0 0 0 ? S 09:55 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? I< 09:55 0:00 [rcu_gp]
root 4 0.0 0.0 0 0 ? I< 09:55 0:00 [rcu_par_gp]
root 6 0.0 0.0 0 0 ? I< 09:55 0:00 [kworker/0:0H-kb]
root 9 0.0 0.0 0 0 ? I< 09:55 0:00 [mm_percpu_wq]
root 10 0.0 0.0 0 0 ? S 09:55 0:00 [ksoftirqd/0]
root 11 0.0 0.0 0 0 ? I 09:55 0:01 [rcu_sched]
root 12 0.0 0.0 0 0 ? S 09:55 0:00 [migration/0]
<The following is omitted>
You can see that one record corresponds to one process and multiple processes are running.
Linux separates the memory space used by each process from the physical address to achieve process-by-process separation and effectively increase the amount of available memory. The memory space is divided into pages (units of fixed size), and physical memory is allocated in page units as much as the process uses the virtual memory space. This is called the Paging Method (https://ja.wikipedia.org/wiki/%E3%83%9A%E3%83%BC%E3%82%B8%E3%83%B3%E3%82%B0%E6%96%B9%E5%BC%8F).
Linux speeds up processing because when a process reads or writes a file to disk, the kernel temporarily stores its contents in a memory area called the page cache so that the file in the cache can be used when accessing the same file. I will. The page cache is discarded when the free memory becomes low, and releasing the cache to free memory is called page collection.
The page cache usage is the buff/cached part of the free
command. (554MB in the example below)
$ free -m
total used free shared buff/cache available
Mem: 978 321 102 2 554 490
Swap: 488 63 425
Write 3 to / proc/sys/vm/drop_caches
to free the page cache. It is a good idea to use the sync
command to reflect the files on the disk in advance.
$ sync
$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
$ free -m
total used free shared buff/cache available
Mem: 978 325 526 2 126 520
Swap: 488 63 425
You can see that the value of buff/cache decreases, the value of free-Mem increases, and the memory is released.
Various peripheral devices (devices) use device files when exchanging data with programs. On Linux, devices are abstracted and treated as files [^ 2].
The device file list exists under the / dev
directory.
$ ls -F /dev
autofs log@ port tty13 tty31 tty5 ttyS1 vcsa6
block/ loop-control ppp tty14 tty32 tty50 ttyS2 vcsu
btrfs-control loop0 psaux tty15 tty33 tty51 ttyS3 vcsu1
char/ loop1 ptmx tty16 tty34 tty52 ttyprintk vcsu2
console loop2 pts/ tty17 tty35 tty53 udmabuf vcsu3
core@ loop3 random tty18 tty36 tty54 uinput vcsu4
cpu_dma_latency loop4 rfkill tty19 tty37 tty55 urandom vcsu5
cuse loop5 rtc@ tty2 tty38 tty56 vcs vcsu6
disk/ loop6 rtc0 tty20 tty39 tty57 vcs1 vfio/
ecryptfs loop7 shm/ tty21 tty4 tty58 vcs2 vga_arbiter
fd@ mapper/ snapshot tty22 tty40 tty59 vcs3 vhost-net
full mcelog stderr@ tty23 tty41 tty6 vcs4 vhost-vsock
fuse mem stdin@ tty24 tty42 tty60 vcs5 xen/
hpet memory_bandwidth stdout@ tty25 tty43 tty61 vcs6 xvda
hugepages/ mqueue/ tty tty26 tty44 tty62 vcsa xvda1
hwrng net/ tty0 tty27 tty45 tty63 vcsa1 zero
initctl@ network_latency tty1 tty28 tty46 tty7 vcsa2 zfs
input/ network_throughput tty10 tty29 tty47 tty8 vcsa3
kmsg null tty11 tty3 tty48 tty9 vcsa4
lightnvm/ nvram tty12 tty30 tty49 tty50. vcsa5
The device file xvda
is preceded by b
. This means a block device, which is a device that exchanges data in a block format (a device that can specify an address such as a hard disk drive/memory area).
$ ls -l /dev/xvda
brw-rw---- 1 root disk 202, 0 Sep 22 00:02 /dev/xvda
You can also display the block device with the lsblk
command.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 12.7M 1 loop /snap/amazon-ssm-agent/495
loop1 7:1 0 96.6M 1 loop /snap/core/9804
loop2 7:2 0 28.1M 1 loop /snap/amazon-ssm-agent/2012
loop3 7:3 0 97.1M 1 loop /snap/core/9993
xvda 202:0 0 10G 0 disk
└─xvda1 202:1 0 10G 0 part /
On the other hand, the device file tty
is preceded by c
. This means a character device, which is a device (keyboard, etc.) that the system transfers data byte by byte.
$ ls -l /dev/tty
crw-rw-rw- 1 root tty 5, 0 Sep 22 00:02 /dev/tty
udev
Linux uses udev as a device management tool. udev acts as a daemon on Linux, and the kernel notifies his udev when a new device is connected to or disconnected from the system. When udev is notified, it creates a device file based on the rules. Display udev events with udevadm monitor
.
$ udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent
<The following is omitted>
The file system is a function of the Linux kernel for manipulating computer resources. Simply put, it's a framework and method for working with files
[^ 4]. A file mainly refers to data stored in auxiliary storage, but some file systems also provide information such as devices, processes, and kernel information as a file.
--Disc - ext4 - ext3 - XFS --Network file sharing - NFS - SMB/CIFS --Special use - procfs, sysfs - tmpfs - FUSE
Use the sudo parted -l
command to display partition information for all block devices.
$ sudo parted -l
Model: Xen Virtual Block Device (xvd)
Disk /dev/xvda: 10.7GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:
Number Start End Size Type File system Flags
1 1049kB 10.7GB 10.7GB primary ext4 boot
/ proc/filesystems
has a list of filesystems supported by the kernel.
$ cat /proc/filesystems
nodev sysfs
nodev tmpfs
nodev bdev
nodev proc
nodev cgroup
nodev cgroup2
nodev cpuset
nodev devtmpfs
nodev configfs
nodev debugfs
nodev tracefs
nodev securityfs
nodev sockfs
nodev bpf
nodev pipefs
nodev ramfs
nodev hugetlbfs
nodev devpts
ext3
ext2
ext4
squashfs
vfat
nodev ecryptfs
fuseblk
nodev fuse
nodev fusectl
nodev mqueue
nodev pstore
btrfs
nodev autofs
nodev overlay
nodev aufs
VFS
VFS (Virtual File System) is a kernel function that allows applications to access various file systems in the same way. With VFS, you don't have to worry about the difference because you have transparent access to both local and network storage.
One of the information managed by the VFS is the inode. The inode is responsible for associating file and disk information. The directory manages files indirectly by holding inode information.
The number (280257) displayed on the far left with ls -i
corresponds to the inode number.
$ ls -il
total 4
280257 -rw-r--r-- 1 ubuntu ubuntu 569 Aug 28 05:46 README.md
Use the stat
command to check the inode information in detail.
$ stat README.md
File: README.md
Size: 569 Blocks: 8 IO Block: 4096 regular file
Device: ca01h/51713d Inode: 280257 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ ubuntu) Gid: ( 1000/ ubuntu)
Access: 2020-08-28 06:25:56.732380517 +0000
Modify: 2020-08-28 05:46:13.000000000 +0000
Change: 2020-08-28 06:27:17.072779096 +0000
Birth: -
IO
A mechanism for notifying a program of a request such as termination, interruption, or hibernation from the kernel. When the process receives the signal, it executes the process registered for each signal type.
$ cat /proc/interrupts
CPU0
0: 43 IO-APIC 2-edge timer
1: 9 xen-pirq 1-ioapic-edge i8042
4: 3576 xen-pirq 4-ioapic-edge ttyS0
8: 2 xen-pirq 8-ioapic-edge rtc0
9: 0 xen-pirq 9-ioapic-level acpi
12: 3 xen-pirq 12-ioapic-edge i8042
14: 0 IO-APIC 14-edge ata_piix
15: 0 IO-APIC 15-edge ata_piix
48: 117991 xen-percpu -virq timer0
49: 0 xen-percpu -ipi resched0
50: 0 xen-percpu -ipi callfunc0
51: 0 xen-percpu -virq debug0
52: 0 xen-percpu -ipi callfuncsingle0
53: 0 xen-percpu -ipi spinlock0
54: 258 xen-dyn -event xenbus
55: 30247 xen-dyn -event blkif
56: 11722 xen-dyn -event eth0
NMI: 0 Non-maskable interrupts
LOC: 0 Local timer interrupts
SPU: 0 Spurious interrupts
PMI: 0 Performance monitoring interrupts
IWI: 0 IRQ work interrupts
RTR: 0 APIC ICR read retries
RES: 0 Rescheduling interrupts
CAL: 0 Function call interrupts
TLB: 0 TLB shootdowns
TRM: 0 Thermal event interrupts
THR: 0 Threshold APIC interrupts
DFR: 0 Deferred Error APIC interrupts
MCE: 0 Machine check exceptions
MCP: 13 Machine check polls
HYP: 161747 Hypervisor callback interrupts
ERR: 0
MIS: 0
PIN: 0 Posted-interrupt notification event
NPI: 0 Nested posted-interrupt event
-Memory Management-Wikipedia -Kernel --Wikipedia -Chapter 14 Task Scheduler Tuning -Virtual File System-wikipedia
Recommended Posts