Container-like # 2 made with C

Introduction

This article is the 22nd day article of FUN Advent Calendar 2020.

Yesterday was Tomoka-san's Story of running FunLocks, a hackathon that is kind to beginners only on campus. This year, I'm writing articles for part1 and part2 respectively, but both are next to Tomoka-san. I wonder why. Strange.

environment

debian buster
kernel 4.19.0

It is running on Sakura's cloud.

Range handled this time

Implement namespace, chroot, cgroup. This time, the cgroup only controls the cpu.

Isolation of namespace

In order to create a container, we need to isolate the namespace. The system call corresponding to this is unshare. A system call with similar functionality is clone. It's a big deal, so let's briefly summarize the differences between unshare and clone before considering which one to use.

unshare Control namespace sharing for the current process.

argument

int flags
** Do not share resources corresponding to the given flag **. In other words, if you don't specify anything, everything will be shared.

Return value: int

0 on success, -1 on failure

clone Namespace sharing control is performed at the same time as the child process is created. Image of mixed fork + unshare

argument

unsigned long flags
** Share the resource corresponding to the given flag **.
*void child_stack
Omitted
*void ptid
Omitted
*void ctid
Omitted
*struct pt_regs regs
Omitted

There are a lot of things. Unlike unshare, it also creates a child process, so it requires the memory used by the child process and other things. (I haven't checked it because it's annoying, so please check it by yourself)

Return value: long

The thread id of the child process is returned. On failure-1, no child process is created.

Which one do you use after all?

It's been a bit long, but this time I'd like to use ** unshare **. The reason is as follows.

--The arguments and return values are easy to understand. --The description of clone says that it is often used for threads.

Try to implement

#define _GNU_SOURCE
#include<sched.h>
#include<unistd.h>
#include<stdio.h>
#include<errno.h>

int main(){
        const unsigned int UNSHARE_FLAGS = ( CLONE_FILES | CLONE_NEWIPC | CLONE_NEWUTS | CLONE_NEWPID);
        if (unshare(UNSHARE_FLAGS) < 0){
                perror("unshare");
        }
        printf("ok\n");
        return 0;
}

I think you can separate the namespace with. I'm only running unshare and I'm only checking that there were no errors, so I'm not sure if it worked at this stage.

The explanation of flag is omitted. It is written in UNSHARE, so please take a look.

Change root

After separating the namespace, I will change the root. Use the system call chroot.

chroot Change the root directory. The path specified by the argument will be treated as'/' after that.

argument

const char* path
Path to change

Return value

0 on success, -1 on error

Try to implement

#include<unistd.h>
#include<errno.h>
#include<stdio.h>

int main(){
        char *argv[3];
        argv[0] = "/";
        argv[1] = NULL;

        if( chroot("./test") < 0 ){
                perror("chroot");
                return 1;
        }
        if ( chdir("./test") < 0 ){
                perror("chdir");
                return 1;
        }
        if ( execve("/bin/ls", argv, NULL) < 0){
                perror("execve");
                return 1;
        }
        printf("ok\n");
        return 0;
}

Create a directory called test and chroot to it. Then I wrote the code to execute / bin/ls. Since test is an empty directory, of course, it doesn't work because there is no / bin/ls. Now you can see that chroot is working fine. (Not a smart confirmation method) By the way, if I prepared a debian root directory that is not the one on the host machine and chrooted it, it worked fine.

Resource limits

In order to be a container, it is necessary to limit resources. For example, memory usage, CPU usage restrictions, and so on. When it comes to resource limits, there are no system calls, but a Linux feature called cgroups. There are v1 and v2 for cgroup, but this time we will use v2. The detailed explanation and specifications of cgroup v2 will be long, so I will not explain them here. If you are interested, please see here.

Limiting resources using cgroups

To be able to use cgroups, you need to take the following steps:

Mount cgroupfs
Creating a cgroup (creating a directory)
Write the control process ID to cgroup.procs
Subsystem climbing
Restrictions on subsystems

Try to implement

This time in my environment it was already mounted on / sys/fs/cgroup so I won't do it.


#include<sys/mount.h>
#include<fcntl.h>
#include<unistd.h>
#include<errno.h>
#include<stdio.h>

int main(){
        //make cgroup
        if( access("/sys/fs/cgroup/container", F_OK) < 0){
                if( mkdir("/sys/fs/cgroup/container", 0644) < 0){
                        perror("mkdir");
                        return -1;
                }
        }
        //set pid
        int fd;
        fd = open("/sys/fs/cgroup/container/cgroup.procs", O_WRONLY);
        if( fd < 0 ){
                perror("cgroup open");
                return -1;
        }
        int _pid = getpid();
        char buff[6];
        snprintf(buff, 6 , "%d", _pid);
        write(fd, buff, 6);
        close(fd);

        //set subsystem
        fd = open("/sys/fs/cgroup/container/cgroup.subtree_control", O_WRONLY);
        if( fd < 0 ){
                perror("subsystem open");
                return -1;
        }
        write(fd, "+cpu", 5);
        close(fd);

        //set cpu max
        fd = open("/sys/fs/cgroup/container/cpu.max", O_WRONLY);
        if( fd < 0 ){
                perror("cpu open");
                return -1;
        }
        write(fd, "10000", 6);
        close(fd);
        while(1){
                fd = open("/dev/null", O_WRONLY);
                write(fd,"hello world\n", 12);
                close(fd);
        }
}

It limits the CPU to 10% for the current process and outputs "hello world" to/dev/null infinitely. (Like a yes command?) Let's check the CPU usage of the process while doing this. screen-capture-online-video-cutter.com-3.gif You can see that it is fixed at about 10%. Maybe it's working fine.

Supplement

        //set cpu max
        fd = open("/sys/fs/cgroup/container/cpu.max", O_WRONLY);
        if( fd < 0 ){
                perror("cpu open");
                return -1;
        }
        write(fd, "10000", 6);
        close(fd);c

There is "10000" in write, but this is CPU time. This time, the maximum CPU time is 100000, so it is 1/10 of "10000". Now you can set the CPU usage.

end

This time I implemented namespace, chroot, cgroup. If you have any questions about the code, or if there are any differences or mistakes in the behavior, please contact Twitter. I don't know when it will be next time, but I will implement capability.

Where I referred

Container-like # 1 made with C Introduction to Containers Learned with LXC-Technology for Realizing Lightweight Virtualization Environment Man page of UNSHARE Man page of CLONE Man page of CHROOT