I wanted to get rid of the persistent storage of my Raspberry Pi kubernetes cluster at home, so I messed up with Ceph. It was quite a lot of work, so I will write it down.
It is a simple figure. There is one kubernetes cluster via wifi with poor line quality, which also serves as an IoT experiment.
--Raspberry Pi 4B (RAM: 4GB) * 5 units (1 Master node, 4 Worker nodes) --Boot from USB SSD (250GB) --One is remote and is being used as a surveillance camera (I will not use it this time) --OS is Raspberry Pi OS 64bit (beta)
# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
chino Ready master 8d v1.19.2 10.0.0.1 <none> Debian GNU/Linux 10 (buster) 5.4.51-v8+ docker://19.3.13
chiya Ready worker 8d v1.19.2 10.0.0.5 <none> Debian GNU/Linux 10 (buster) 5.4.51-v8+ docker://19.3.13
cocoa Ready worker 46h v1.19.2 10.0.0.2 <none> Debian GNU/Linux 10 (buster) 5.4.51-v8+ docker://19.3.13
rize Ready worker 2d2h v1.19.2 10.0.0.3 <none> Debian GNU/Linux 10 (buster) 5.4.51-v8+ docker://19.3.13
syaro Ready worker 42h v1.19.2 10.0.0.4 <none> Debian GNU/Linux 10 (buster) 5.4.51-v8+ docker://19.3.13
# uname -a
Linux chino 5.4.51-v8+ #1333 SMP PREEMPT Mon Aug 10 16:58:35 BST 2020 aarch64 GNU/Linux
Create a new partition on the SSD of the worker node. In my configuration, all nodes are operated only with SSD (250GB) connected by USB boot, so this time I started the Raspberry Pi OS of microSD and worked.
# e2fsck -f /dev/sda2
# resize2fs /dev/sda2 100G
# fdisk /dev/sda
Device Boot Start End Sectors Size Id Type
/dev/sda1 8192 532479 524288 256M c W95 FAT32 (LBA)
/dev/sda2 532480 488397167 487864688 233G 83 Linux
* Press "p" to display partition information. "/"partition(/dev/sda2)Check the start position of.
* Use "d" to delete the second partition.
* Create the second partition again from the same start position. The end position is "+Specified by "100G".
* Save the partition change with "w".
# partprobe
# e2fsck -f /dev/sda2
* If an error occurs here, press "Ctrl" without repairing.+Exit with "c", this time with parted/dev/Re-cutting sda2 may work.
# fdisk /dev/sda
* With "n", all the rest/dev/Create sda3. The partition type is 8e "Linux LVM".
# partprobe
# fdisk -l /dev/sda
Device Boot Start End Sectors Size Id Type
/dev/sda1 8192 532479 524288 256M c W95 FAT32 (LBA)
/dev/sda2 532480 210247679 209715200 100G 83 Linux
/dev/sda3 210247680 488397167 278149488 132.6G 8e Linux LVM
This is done with 3 worker nodes. I didn't want to stop the service, so I drained kubernetes one by one and rejoined it. It does not use the worker node (chiya) in another room that joins the kubernetes cluster via an Ethernet converter.
Also, if there is no problem with power supply, adding another SSD to USB does not have the risk of breaking with resize2fs, and the work is faster. ... this time, resize2fs fails and rebuilds two worker nodes.
I worked as described in the article "I put Ceph in Kubernetes on Raspberry Pi". Since the document URL of Ceph has changed, there are many dead links in Ceph related information, but it would be very helpful to have such information in Japanese.
The following is the work on the master node. I just typed the password while executing the command.
# mkdir ceph-deploy
# cd ceph-deploy
# pip install ceph-deploy
# ceph-deploy --username pi new cocoa rize syaro
# ceph-deploy --username pi install cocoa rize syaro
# ceph-deploy install chino
# ceph-deploy --username pi mon create-initial
# ceph-deploy --username pi admin cocoa rize syaro
# ceph-deploy admin chino
# ceph-deploy --username pi osd create cocoa --data /dev/sda3
# ceph-deploy --username pi osd create rize --data /dev/sda3
# ceph-deploy --username pi osd create syaro --data /dev/sda3
# ceph-deploy --username pi mds create cocoa
Creating a pool.
# ceph osd pool create kubernetes 128
Creating a ConfigMap.
# ceph mon dump
dumped monmap epoch 2
epoch 2
fsid 56b03534-e602-4856-8734-8bcdf5cc8670
last_changed 2020-09-20 01:24:55.648189
created 2020-09-20 01:24:08.811926
0: 10.0.0.2:6789/0 mon.cocoa
1: 10.0.0.3:6789/0 mon.rize
2: 10.0.0.4:6789/0 mon.syaro
* The created yaml is as follows.
# cat csi-config-map.yaml
apiVersion: v1
kind: ConfigMap
data:
config.json: |-
[
{
"clusterID": "56b03534-e602-4856-8734-8bcdf5cc8670",
"monitors": [
"10.0.0.2:6789",
"10.0.0.3:6789",
"10.0.0.4:6789"
]
}
]
metadata:
name: ceph-csi-config
# kubectl apply -f csi-config-map.yaml
Secret creation.
# ceph auth get-or-create client.kubernetes mon 'profile rbd' osd 'profile rbd pool=kubernetes' mgr 'profile rbd pool=kubernetes'
[client.kubernetes]
key = AQBrNmZfVCowLBAAeN3EYjhOPBG9442g4NF/bQ==
* The created yaml is as follows.
# cat csi-rbd-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: csi-rbd-secret
namespace: default
stringData:
userID: kubernetes
userKey: AQBrNmZfVCowLBAAeN3EYjhOPBG9442g4NF/bQ==
# kubectl apply -f csi-rbd-secret.yaml
Creating a ConfigMap (empty).
# cat kms-config.yaml
apiVersion: v1
kind: ConfigMap
data:
config.json: |-
{
}
metadata:
name: ceph-csi-encryption-kms-config
# kubectl apply -f kms-config.yaml
Perhaps "# kubectl create configmap ceph-csi-encryption-kms-config" is fine.
# wget https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-provisioner-rbac.yaml
# kubectl apply -f csi-provisioner-rbac.yaml
# wget https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-nodeplugin-rbac.yaml
# kubectl apply -f csi-nodeplugin-rbac.yaml
I will write by slightly changing the procedure from the site that I refer to from here. It's better not to work on the master node, but I did it on the master node. Get yaml of csi first and change cephcsi to arm64.
# wget https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-rbdplugin-provisioner.yaml
# sed -i -e 's/quay.io\/cephcsi\/cephcsi:canary/quay.io\/cephcsi\/cephcsi:canary-arm64/g' csi-rbdplugin-provisioner.yaml
# wget https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-rbdplugin.yaml
# sed -i -e 's/quay.io\/cephcsi\/cephcsi:canary/quay.io\/cephcsi\/cephcsi:canary-arm64/g' csi-rbdplugin.yaml
Check the version of the container image used in image.
# grep image: csi-rbdplugin-provisioner.yaml csi-rbdplugin.yaml
csi-rbdplugin-provisioner.yaml: image: quay.io/k8scsi/csi-provisioner:v1.6.0
csi-rbdplugin-provisioner.yaml: image: quay.io/k8scsi/csi-snapshotter:v2.1.0
csi-rbdplugin-provisioner.yaml: image: quay.io/k8scsi/csi-attacher:v2.1.1
csi-rbdplugin-provisioner.yaml: image: quay.io/k8scsi/csi-resizer:v0.5.0
csi-rbdplugin-provisioner.yaml: image: quay.io/cephcsi/cephcsi:canary-arm64
csi-rbdplugin-provisioner.yaml: image: quay.io/cephcsi/cephcsi:canary-arm64
csi-rbdplugin.yaml: image: quay.io/k8scsi/csi-node-driver-registrar:v1.3.0
csi-rbdplugin.yaml: image: quay.io/cephcsi/cephcsi:canary-arm64
csi-rbdplugin.yaml: image: quay.io/cephcsi/cephcsi:canary-arm64
# git clone --single-branch --branch release-1.6 https://github.com/kubernetes-csi/external-provisioner
# git clone --single-branch --branch release-2.1 https://github.com/kubernetes-csi/external-snapshotter
# git clone --single-branch --branch release-2.1 https://github.com/kubernetes-csi/external-attacher
# git clone --single-branch --branch release-0.5 https://github.com/kubernetes-csi/external-resizer
# git clone --single-branch --branch release-1.3 https://github.com/kubernetes-csi/node-driver-registrar
Download go. https://golang.org/dl/ As I mentioned earlier, it seems that "go-1.13" is assumed in the make of the container image of the version obtained above.
# wget https://dl.google.com/go/go1.15.1.linux-arm64.tar.gz
# tar xzvf go1.15.1.linux-arm64.tar.gz
# rm go1.15.1.linux-arm64.tar.gz
# echo 'export GOPATH=$HOME/go' >> ~/.bashrc
# echo 'export PATH=$GOPATH/bin:$PATH' >> ~/.bashrc
# source ~/.bashrc
# go version
go version go1.15.1 linux/arm64
I think it depends on the generation, but in my heart, go is read as "Ngo".
# cd external-provisioner
# make
# docker build -t csi-provisioner:v1.6.0 .
# cd ../external-snapshotter
# make
# cp cmd/csi-snapshotter/Dockerfile .
# docker build -t csi-snapshotter:v2.1.0 .
# cd ../external-attacher
# make
# docker build -t csi-attacher:v2.1.0 .
# cd ../external-resizer
# make
# docker build -t csi-resizer:v0.5.0 .
# cd ../node-driver-registrar
# make
# docker build -t csi-node-driver-registrar:v1.3.0 .
Only snapshotter got confused about Dockerfile. ... Probably the work procedure is different, but I made an image forcibly. The image will be in docker, so save it and load it into docker on the worker node.
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
csi-node-driver-registrar v1.3.0 a6e114649cf9 33 hours ago 14.2MB
csi-resizer v0.5.0 d6b561c2aa0a 34 hours ago 41.6MB
csi-attacher v2.1.0 807c3900bf76 34 hours ago 41.7MB
csi-snapshotter v2.1.0 653dbf034d1d 34 hours ago 41.8MB
csi-provisioner v1.6.0 4b18fda1685c 34 hours ago 43.4MB
(The following is omitted)
# docker save csi-node-driver-registrar:v1.3.0 -o csi-node-driver-registrar.tar
# docker save csi-resizer:v0.5.0 -o csi-resizer.tar
# docker save csi-attacher:v2.1.0 -o csi-attacher.tar
# docker save csi-snapshotter:v2.1.0 -o csi-snapshotter.tar
# docker save csi-provisioner:v1.6.0 -o csi-provisioner.tar
Copy scp or sftp to the worker node. After copying, execute the following command on each worker node. For the time being, I put it in a worker node that I do not use.
# docker load -i csi-node-driver-registrar.tar
# docker load -i csi-resizer.tar
# docker load -i csi-attacher.tar
# docker load -i csi-snapshotter.tar
# docker load -i csi-provisioner.tar
Now let's get back to working with ceph-deploy. Changed the "image:" part of "csi-rbdplugin-provisioner.yaml" and "csi-rbdplugin.yaml" to the repository tag of the created image as follows.
# grep -n image: csi-rbdplugin-provisioner.yaml csi-rbdplugin.yaml
csi-rbdplugin-provisioner.yaml:35: image: csi-provisioner:v1.6.0
csi-rbdplugin-provisioner.yaml:52: image: csi-snapshotter:v2.1.0
csi-rbdplugin-provisioner.yaml:68: image: csi-attacher:v2.1.0
csi-rbdplugin-provisioner.yaml:82: image: csi-resizer:v0.5.0
csi-rbdplugin-provisioner.yaml:102: image: quay.io/cephcsi/cephcsi:canary-arm64
csi-rbdplugin-provisioner.yaml:142: image: quay.io/cephcsi/cephcsi:canary-arm64
csi-rbdplugin.yaml:28: image: csi-node-driver-registrar:v1.3.0
csi-rbdplugin.yaml:50: image: quay.io/cephcsi/cephcsi:canary-arm64
csi-rbdplugin.yaml:102: image: quay.io/cephcsi/cephcsi:canary-arm64
Now let's deploy. There are only successful examples, but ...
# kubectl apply -f csi-rbdplugin-provisioner.yaml
# kubectl apply -f csi-rbdplugin.yaml
# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
csi-rbdplugin-hm9bm 3/3 Running 3 24h 10.0.0.4 syaro <none> <none>
csi-rbdplugin-provisioner-54dd99dd97-f9x2s 6/6 Running 6 24h 172.16.4.40 syaro <none> <none>
csi-rbdplugin-provisioner-54dd99dd97-flbh9 6/6 Running 0 24h 172.16.2.28 cocoa <none> <none>
csi-rbdplugin-provisioner-54dd99dd97-s9qf4 6/6 Running 0 24h 172.16.3.54 rize <none> <none>
csi-rbdplugin-t7569 3/3 Running 0 24h 10.0.0.3 rize <none> <none>
csi-rbdplugin-x4fzk 3/3 Running 3 24h 10.0.0.5 chiya <none> <none>
csi-rbdplugin-xwrnx 3/3 Running 0 24h 10.0.0.2 cocoa <none> <none>
It is recommended to set "# kubectl get events -w" at the time of deployment. Create a Storage Class. As it is now, it may be better that namespace is not default.
# cat csi-rbd-sc.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: csi-rbd-sc
provisioner: rbd.csi.ceph.com
parameters:
clusterID: "56b03534-e602-4856-8734-8bcdf5cc8670"
pool: kubernetes
csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
csi.storage.k8s.io/provisioner-secret-namespace: default
csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
csi.storage.k8s.io/node-stage-secret-namespace: default
reclaimPolicy: Delete
mountOptions:
- discard
# kubectl apply -f csi-rbd-sc.yaml
After that, please specify "storage Class Name" as "csi-rbd-sc" when creating PVC on the application side. Can't be used with dynamic provisioning .... I could not confirm it with the PVC definition of the site I am referring to, but when I deleted "volume Mode: Block" in Raw mode, the following event was output.
default 0s Warning FailedMount pod/es-cluster-0 MountVolume.MountDevice failed for volume "pvc- ...
...... rbd error output: modinfo: ERROR: Module rbd not found.
modprobe: FATAL: Module rbd not found in directory /lib/modules/5.4.51-v8+
rbd: failed to load rbd kernel module (1)
rbd: sysfs write failed
rbd: map failed: (2) No such file or directory
deault 0s Warning FailedMount pod/es-cluster-0 Unable to attach or mount volumes: unmounted.....
Returning to the premise, we need a kernel module called "rbd". And in Raspberry Pi OS 64bit (Beta), the "rbd" module is not compiled in the first place. That is, you have to compile the kernel.
The reason why the kernel module is required even though the command can be used from each client is "[Deep Dive Into Ceph's Kernel Client](https://engineering.salesforce.com/deep-dive-into-cephs-kernel-client-" edea75787528) ”I understood. (About half) The command can be used in the library, and it seems that the kernel module is required in the container environment.
Compiling the Linux kernel, which used to be a daily task. When there was no concept of kernel module yet, there was a limit to the kernel size that could be loaded, and everyone compiled a kernel that fits their machine at the last minute. You did kernel compilation every time you added a new device.
So I've done it, but what's going on now? I worked like that. The manual is "Kernel building", but since it is still a 64-bit version of beta, [Posted by the person who is actually doing it]( I referred to https://www.raspberrypi.org/forums/viewtopic.php?t=280341). The work took about an hour.
# git clone --depth=1 -b rpi-5.4.y https://github.com/raspberrypi/linux.git
# cd linux/
# make bcm2711_defconfig
# vi .config
(「CONFIG_BLK_DEV_RBD=Add "m" appropriately.)
# grep RBD .config
CONFIG_BLK_DEV_DRBD=m
# CONFIG_DRBD_FAULT_INJECTION is not set
CONFIG_BLK_DEV_RBD=m
# make -j4
(I think that cephfs questions will come up in relation to enabling RBD, but I didn't understand well, so I set them all to "N".)
# make modules_install
# make dtbs_install
# cp /boot/kernel8.img /boot/kernel8_old.img
# cp arch/arm64/boot/Image /boot/kernel8.img
# vi /boot/config.txt
(In the forum post, "#"Is attached...In the first place, if it is a comment, it should not be necessary to write it, so I will remove it.)
# head -n 10 /boot/config.txt
# For more options and information see
# http://rpf.io/configtxt
# Some settings may impact device functionality. See link above for details
device_tree=dtbs/5.4.65-v8+/broadcom/bcm2711-rpi-4-b.dtb
overlay_prefix=dtbs/5.4.65-v8+/overlays/
kernel=kernel8.img
# uncomment if you get no picture on HDMI for a default "safe" mode
#hdmi_safe=1
In the old days, I restarted while saying "Hahhh", but I will do it quickly. I'm confident that I can recover ... (But I've been pinging from another machine all the time.)
# reboot
I will check it all. Since it is a custom kernel, you can no longer receive paid support. (There is no such thing)
# uname -a
Linux syaro 5.4.65-v8+ #1 SMP PREEMPT Mon Sep 21 01:04:01 JST 2020 aarch64 GNU/Linux
# modprobe rbd
# lsmod | grep rbd
rbd 98304 0
libceph 278528 1 rbd
# modinfo rbd
filename: /lib/modules/5.4.65-v8+/kernel/drivers/block/rbd.ko
license: GPL
description: RADOS Block Device (RBD) driver
author: Jeff Garzik <[email protected]>
author: Yehuda Sadeh <[email protected]>
author: Sage Weil <[email protected]>
author: Alex Elder <[email protected]>
srcversion: BC90D52477A5CE4593C5AC3
depends: libceph
intree: Y
name: rbd
vermagic: 5.4.65-v8+ SMP preempt mod_unload modversions aarch64
parm: single_major:Use a single major number for all rbd devices (default: true) (bool)
Now, let's retry the deployment that failed once because the rbd kernel module was missing.
# cat es_master_sts.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: es-cluster
spec:
selector:
matchLabels:
app: es
serviceName: "es-cluster"
replicas: 1
template:
metadata:
labels:
app: es
spec:
containers:
- name: es
image: elasticsearch:7.9.1
env:
- name: discovery.type
value: single-node
ports:
- containerPort: 9200
name: api
- containerPort: 9300
name: gossip
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: csi-rbd-sc
# kubectl apply -f es_master_sts.yaml
Check the status of the pod.
# kubectl get pods -w
NAME READY STATUS RESTARTS AGE
csi-rbdplugin-hm9bm 3/3 Running 3 25h
csi-rbdplugin-provisioner-54dd99dd97-f9x2s 6/6 Running 6 25h
csi-rbdplugin-provisioner-54dd99dd97-flbh9 6/6 Running 0 25h
csi-rbdplugin-provisioner-54dd99dd97-s9qf4 6/6 Running 0 25h
csi-rbdplugin-t7569 3/3 Running 0 25h
csi-rbdplugin-x4fzk 3/3 Running 3 25h
csi-rbdplugin-xwrnx 3/3 Running 0 25h
es-cluster-0 0/1 Pending 0 0s
es-cluster-0 0/1 Pending 0 0s
es-cluster-0 0/1 Pending 0 2s
es-cluster-0 0/1 ContainerCreating 0 2s
es-cluster-0 1/1 Running 0 8s
Event monitoring.
# kubectl get events -w
LAST SEEN TYPE REASON OBJECT MESSAGE
0s Normal SuccessfulCreate statefulset/es-cluster create Claim data-es-cluster-0 Pod es-cluster-0 in StatefulSet es-cluster success
0s Normal ExternalProvisioning persistentvolumeclaim/data-es-cluster-0 waiting for a volume to be created, either by external provisioner "rbd.csi.ceph.com" or manually created by system administrator
0s Normal Provisioning persistentvolumeclaim/data-es-cluster-0 External provisioner is provisioning volume for claim "default/data-es-cluster-0"
0s Normal SuccessfulCreate statefulset/es-cluster create Pod es-cluster-0 in StatefulSet es-cluster successful
0s Warning FailedScheduling pod/es-cluster-0 0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims.
0s Warning FailedScheduling pod/es-cluster-0 0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims.
0s Normal ProvisioningSucceeded persistentvolumeclaim/data-es-cluster-0 Successfully provisioned volume pvc-1c1abfad-87fa-4882-a840-8449c6d50326
0s Normal Scheduled pod/es-cluster-0 Successfully assigned default/es-cluster-0 to syaro
0s Normal SuccessfulAttachVolume pod/es-cluster-0 AttachVolume.Attach succeeded for volume "pvc-1c1abfad-87fa-4882-a840-8449c6d50326"
0s Normal Pulled pod/es-cluster-0 Container image "elasticsearch:7.9.1" already present on machine
0s Normal Created pod/es-cluster-0 Created container es
0s Normal Started pod/es-cluster-0 Started container es
Storage related confirmation.
# kubectl get sc,pv,pvc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
storageclass.storage.k8s.io/csi-rbd-sc rbd.csi.ceph.com Delete Immediate false 25h
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-1c1abfad-87fa-4882-a840-8449c6d50326 1Gi RWO Delete Bound default/data-es-cluster-0 csi-rbd-sc 78s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/data-es-cluster-0 Bound pvc-1c1abfad-87fa-4882-a840-8449c6d50326 1Gi RWO csi-rbd-sc 78s
Especially, the created file remained even after deleting and redeploying the pod. (Of course) It's not enough to make a separate item, but it's a performance aspect. Originally I am not good at hardware, and I do not have much know-how here, but I will write the result written with dd.
# dd if=/dev/zero of=aaa bs=4096 count=100000
First time | Second time | Third time | 4th | 5th time | |
---|---|---|---|---|---|
dd to the worker's disk on the worker's OS | 186 MB/s | 133 MB/s | 141 MB/s | 140 MB/s | 133 MB/s |
dd from the container on the hostpath directory | 120 MB/s | 121 MB/s | 134 MB/s | 131 MB/s | 131 MB/s |
ceph-csi | 186 MB/s | 185 MB/s | 174 MB/s | 178 MB/s | 180 MB/s |
Ceph-csi did not appear at the time of measurement, but sometimes it is 50MB / s, and the command return is slow, so it feels peaky. What's affecting ...
Perhaps, if possible, I think that the type built with ROOK (operator) can somehow be moved by changing the container image. That would be a good way to study OpenShift Container Storage ... (but it's going to be heavier ...) This time, I referred to the materials of various people, but I was surprised that there was also a material of Mr. Yuryu from the Red Hat era, "I wonder if I was doing Ceph so long ago." One of the people I admire personally, I have been indebted since the time of "Linux Kernel Updates".
There is a part where the final performance result is "?" ... Well, considering the performance, I do not use Raspberry Pi, and I originally wanted to use CSI as a function, so I am satisfied with the result. Finally, pods that use persistent storage can be made redundant, apart from hostPath.
Recommended Posts