This article is a brief introduction to virtio fs as of December 2019.
Official site: https://virtio-fs.gitlab.io/
virtiofs is a new file system for sharing directories between hosts and guest VMs (s), developed by redhat engineers.
The main use case is to first use virtiofs for the root file system of lightweight VMs (such as kata-container). There are advantages such as shortened boot time by reducing unnecessary file copy to the guest. Another use case is to hide file system details from guests. Since the guest cannot see the details of the file system of the shared directory, the guest does not have to worry about the IP and security settings of the network file system, for example.
Network filesystems such as NFS and 9pfs already exist as a way to share directories between hosts and guests. However, they utilize the network stack / protocol and are not optimized for use in virtual environments (communication between hosts and guests on the same machine). Also, the semantics of a network file system are often different from the semantics of a local file, which can affect the behavior of the (guest) application.
In order to improve these problems, virtiofs aims to be a file system that (1) has high IO performance and (2) provides guests with the same semantics as the local file system. And to achieve this, development is being done based on (partially extended) the FUSE protocol, which is network stack independent and close to the linux VFS interface [^ fuse].
[^ fuse]: FUSE also has the advantage of having many years of experience.
In a normal FUSE file system, a file system daemon that runs in user space receives a FUSE request from the kernel and processes it according to the request. In virtiofs, a daemon resides on the host (in user space), receives FUSE requests from guests, and interacts with the host's filesystem as needed. Note that the interaction between the guest and the daemon is done by virtio vhost-user as well as DPDK and SPDK [^ vhost].
[^ vhost]: To explain vhost-user roughly, control processing such as initialization is performed via qemu, but data exchange is performed via qemu by using virtqueue created on shared memory. It is a mechanism that allows the guest and the process in the host user space (here, the virtiofs daemon) to interact with each other. For the background of vhost-user, for example, [series](https://www.redhat.com/ja/blog/virtio-networking-first-series-finale-and-plans-2020?source] of this article. = bloglisting & f% 5B0% 5D = post_tags% 3A Networking) will be helpful
For virtiofs, see the official design document and the main developer's kvm forum 2019 slide See also / TmvA / virtio-fs-a-shared-file-system-for-virtual-machines-stefan-hajnoczi-red-hat?iframe=no).
status Support for linux and qemu is required to use virtiofs. At the moment (December 2019), the development of basic functions is almost completed, and the kernel part has been merged in Linux 5.4 (however, the DAX function described later is not included yet). On the qemu side, on the other hand, the vhost-user-fs-pci code was merged in 4.2, but the daemon code is currently under review. If you want to actually run it on Linux + qemu, please refer to Explanation of official website.
On the other hand, kata-container's virtio fs support is already active, and although it is experimental, virtio fs can be used from v1.7. It works easily, so I'll explain how to do it next.
For the time being, the easiest way to try virtio fs at this point is to use kata-container. Let's make sure that the shared directory is actually used as the rootfs of the container.
First, install the latest version of kata-container using the kata-deploy command (files are located under / opt / kata):
# docker run --runtime=runc -v /opt/kata:/opt/kata -v /var/run/dbus:/var/run/dbus -v /run/systemd:/run/systemd -v /etc/docker:/etc/docker -it katadocker/kata-deploy kata-deploy-docker install
All you have to do is specify kata-qemu-virtiofs for docker runtime and virtiofs will be used:
# docker run --runtime=kata-qemu-virtiofs -it busybox
If you check the mount inside the container, you can see that virtiofs is used for the root filesystem [^ 1]:
[^ 1]: If you use kata-qemu for runtime you can see that traditional 9pfs is used
(In container)
/ # mount -t virtio_fs (note:virtio in the upstream version_It's virtio fs instead of fs)
kataShared on / type virtio_fs (rw,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,dax)
kataShared on /etc/resolv.conf type virtio_fs (rw,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,dax)
kataShared on /etc/hostname type virtio_fs (rw,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,dax)
kataShared on /etc/hosts type virtio_fs (rw,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,dax)
Some config files also appear to be mounted, as they are bind mounted.
If you check the process on the host, you can see that the virtiofs daemon (virtiofsd) is running:
(In host)
$ pgrep -a virtiofsd
13154 /opt/kata/bin/virtiofsd --fd=3 -o source=/run/kata-containers/shared/sandboxes/<container ID> -o cache=always --syslog -o no_posix_lock -f
The directory specified for source here will be the shared directory used by virtiofs.
Make sure that the host and guest actually share the directory.
First, let's take a look at the directory specified as source on the host:
(In host)
# ls /run/kata-containers/shared/sandboxes/<container ID>
<container ID>
<container ID>-resolv.conf
<container ID>-hosts
<container ID>-hostname
# ls /run/kata-containers/shared/sandboxes/<container ID>/<container ID>
rootfs
# ls /run/kata-containers/shared/sandboxes/<container ID>/<container ID>/rootfs
bin dev etc home mnt proc root sys tmp usr var
You can see that there are some config files directly under the shared directory, and rootfs under the directory with another container ID [^ 2].
[^ 2]: The reason why the guest does not see directly under the shared directory is probably because you are pivot_root.
Since the contents of this directory are shared with the guest, you can confirm that the contents can be read from the host when the guest actually creates an appropriate file.
(In container)
/ # echo abc > XXX
/ # ls
XXX bin dev etc home mnt proc root sys tmp usr var
(In host)
# ls /run/kata-containers/shared/sandboxes/<container ID>/<container ID>/rootfs
XXX bin dev etc home mnt proc root sys tmp usr var
# cat /run/kata-containers/shared/sandboxes/<container ID>/<container ID>/rootfs/XXX
abc
On the contrary, if you create a file from the host, you can read it from the guest.
One of the goals of virtiofs is to achieve high IO performance, and one of the features for this is DAX. DAX is an abbreviation for Direct Access and is a term often used in the context of non-volatile memory [^ dax]. However, virtiofus DAX is irrelevant to the actual non-volatile memory, allowing guests to access host memory without using the guest's page cache = (multiple) guests and hosts sharing host memory (page cache), is what it means. Performance is improved because there is no guest-host communication when the data is in memory. Sharing the page cache also has the advantage that data changes are immediately visible to other guests / hosts (similar to local file semantics) and memory usage is reduced.
[^ dax]: Accessing the device directly without using the page cache
I will briefly explain the mechanism of DAX. First of all, in order to use virtio, you need to add a virtio device when qemu starts, but this is recognized as PCI by the guest [^ vhost-user-fs]. And PCI has a control register called BAR that indicates the memory area of the device. The function of DAX is realized by accessing the memory area visible from this BAR by mmap [^ aaa]. Of course, the size of the area is limited, so virtiofs controls which data is map / unmapped where in the BAR space (called DAX window). A FUSE protocol has also been added for this purpose to request areas to map / unmap.
[^ vhost-user-fs]: virtiofs recognizes that virtiofs is available to guests by adding a device called vhost-user-fs-pci to qemu's boot options. [^ aaa]: Note that this area looks like a non-volatile device to the guest, so the kernel code uses the DAX code.
Note that virtiofs merged into kernel 5.4 does not yet have DAX support features, so if you want to try this feature you can either compile it yourself using the development branch or use kata-container (kata). -Already included in the container version of virtiofs).
The virtiofs daemon (virtiofsd) that will be merged into qemu is implemented in C, but of course other implementations are possible. In fact, rust (crosvm) has also implemented a daemon [^ crosvm]. Also, I couldn't find much detailed information, but it seems that some people are already thinking about how to use virtiofs and SPDK in combination [^ snia]. Since the code has been merged upstream, I think there will be more use cases of virtiofs next year.
Recommended Posts