[DOCKER] Cannot communicate via Linux Bridge

Introduction

When I investigated when troubleshooting the title, I decided to write it because there were few articles that unexpectedly mentioned this matter as critical.

Event

I created a Linux Bridge on CentOS7, assigned a docker container under it, and checked the communication, but I could not communicate. qiita-bridge-1.png

Conclusion

Linux Bridge operates using a kernel module called bridge, but its security is managed by a kernel module called br_netfilter (dependency with bridge), and br_netfilter controls communication by looking at iptales settings. .. Therefore, communication can be performed by performing any of the following.

① Disable Bridge Netfilter ② Set permission for iptables

Verification environment

OS:CentOS 7.5 Kernel Ver:3.10.0-862.14.4.el7.x86_64 docker Ver:18.06.1-ce

Verification with normal docker 0

First of all, confirm that the containers can communicate with each other via docker0, which is usually assigned when the docker container is deployed.

Deploy docker container

Run docker run and deploy two containers.

 docker run -d --name cent1 centos/tools:latest /sbin/init
 docker run -d --name cent2 centos/tools:latest /sbin/init

Confirm that the container started normally with the docker ps command.

 docker ps
CONTAINER ID        IMAGE                 COMMAND             CREATED              STATUS              PORTS               NAMES
8126f9f72ee2        centos/tools:latest   "/sbin/init"        6 seconds ago        Up 3 seconds                            cent2
a957a097b6a5        centos/tools:latest   "/sbin/init"        About a minute ago   Up About a minute                       cent1

Check the assignment status to docker0

First, check the association between the NIC of the deployed container and the NIC of the docker host. Checking each NIC of the docker container is as follows. eth0 of cent1 is index9 of docker host eth0 of cent2 is index11 of docker host You can see that it is tied to.

 docker exec cent1 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
8: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

 docker exec cent2 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
10: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

Checking the NIC of the docker host is as follows.

 ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:b3:b5:18 brd ff:ff:ff:ff:ff:ff
3: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:b3:b5:22 brd ff:ff:ff:ff:ff:ff
4: vlan10@ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:b3:b5:22 brd ff:ff:ff:ff:ff:ff
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:1c:c2:6d:d0 brd ff:ff:ff:ff:ff:ff
9: vethc59a2d1@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether f6:1a:1b:00:b9:b5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
11: vethfee6857@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether 86:45:ea:11:db:35 brd ff:ff:ff:ff:ff:ff link-netnsid 1

Furthermore, if you check the information of Linux Bridge, you can see that veth on the host side of cent1 and 2 is assigned to docker0 as shown below.

 brctl show
bridge name	bridge id		STP enabled	interfaces
docker0		8000.02421cc26dd0	no		vethc59a2d1
							vethfee6857

The above can be summarized as shown in the picture below.

qiita-bridge-2.png

Communication confirmation via docker0

If you ping from cent1 to cent2 via docker0 and check the communication, you can communicate normally as follows.

 docker exec cent2 ping -c 3 172.17.0.3
PING 172.17.0.3 (172.17.0.3) 56(84) bytes of data.
64 bytes from 172.17.0.3: icmp_seq=1 ttl=64 time=10.2 ms
64 bytes from 172.17.0.3: icmp_seq=2 ttl=64 time=0.048 ms
64 bytes from 172.17.0.3: icmp_seq=3 ttl=64 time=0.045 ms

--- 172.17.0.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.045/3.448/10.252/4.811 ms

Verification with newly created Bridge

Here is the main issue. Next, when you create a new Linux Bridge and assign a docker container, check if it can communicate like docker0.

Create a new bridge

Create a Bridge named new-bridge1 as the new bridge.

 brctl addbr new-bridge1
 brctl show
bridge name	bridge id		STP enabled	interfaces
docker0		8000.02421cc26dd0	no		vethc59a2d1
							vethfee6857
new-bridge1		8000.000000000000	no		

After creating it, start Bridge as follows.

 ip l set dev new-bridge1 up
 ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:b3:b5:18 brd ff:ff:ff:ff:ff:ff
3: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:b3:b5:22 brd ff:ff:ff:ff:ff:ff
4: vlan10@ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:b3:b5:22 brd ff:ff:ff:ff:ff:ff
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:1c:c2:6d:d0 brd ff:ff:ff:ff:ff:ff
9: vethc59a2d1@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master new-bridge1 state UP mode DEFAULT group default 
    link/ether f6:1a:1b:00:b9:b5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
11: vethfee6857@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master new-bridge1 state UP mode DEFAULT group default 
    link/ether 86:45:ea:11:db:35 brd ff:ff:ff:ff:ff:ff link-netnsid 1
12: new-bridge1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 86:45:ea:11:db:35 brd ff:ff:ff:ff:ff:ff

Exclude container NICs from docker0

The NIC of the container deployed by docker (to be exact, the veth corresponding to the container NIC on the docker host side) is in the state assigned to docker0. Exclude these container NICs from docker 0 for verification.

 brctl delif docker0 vethc59a2d1
 brctl delif docker0 vethfee6857
 brctl show
bridge name	bridge id		STP enabled	interfaces
docker0		8000.02421cc26dd0	no		
new-bridge1		8000.000000000000	no		

Assign the container NIC to the created Bridge

Assign the container NIC to the newly created new-bridge1.

 brctl addif new-bridge1 vethc59a2d1
 brctl addif new-bridge1 vethfee6857

 brctl show
bridge name	bridge id		STP enabled	interfaces
docker0		8000.02421cc26dd0	no		
new-bridge1		8000.8645ea11db35	no		vethc59a2d1
							vethfee6857

By performing the operations up to this point, the state will be as shown in the picture below.

qiita-bridge-3.png

Confirmation of communication via the newly created Bridge

Via the newly created new-bridge1, try pinging from cent1 to cent2 in the same way as docker0 earlier to check communication.

 docker exec cent1 ping -c 3 172.17.0.3
PING 172.17.0.3 (172.17.0.3) 56(84) bytes of data.

--- 172.17.0.3 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 1999ms

Then, unlike the case via docker0 earlier, you can see that there is no communication between cent1 and cent2.

Event investigation

Packet capture

First, try to get tcpdump on each NIC. Two things can be seen from the results below. (1) The ARP request has arrived normally from cent1 to cent2, and cent1 has received the response. ② The ping has reached the Linux Bridge (new-bridge1) but not the cent2

qiita-bridge-4.png

cent1 NIC

 tcpdump -i vethc59a2d1 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vethc59a2d1, link-type EN10MB (Ethernet), capture size 262144 bytes
23:20:39.379638 IP 172.17.0.2 > 172.17.0.3: ICMP echo request, id 45, seq 1, length 64
23:20:40.378780 IP 172.17.0.2 > 172.17.0.3: ICMP echo request, id 45, seq 2, length 64
23:20:41.378785 IP 172.17.0.2 > 172.17.0.3: ICMP echo request, id 45, seq 3, length 64
23:20:44.383711 ARP, Request who-has 172.17.0.3 tell 172.17.0.2, length 28
23:20:44.383744 ARP, Reply 172.17.0.3 is-at 02:42:ac:11:00:03 (oui Unknown), length 28

cent2 NIC

 tcpdump -i vethfee6857
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vethfee6857, link-type EN10MB (Ethernet), capture size 262144 bytes
23:20:44.383726 ARP, Request who-has 172.17.0.3 tell 172.17.0.2, length 28
23:20:44.383741 ARP, Reply 172.17.0.3 is-at 02:42:ac:11:00:03 (oui Unknown), length 28

new-bridge1

 tcpdump -i new-bridge1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on new-bridge1, link-type EN10MB (Ethernet), capture size 262144 bytes
23:20:39.379638 IP 172.17.0.2 > 172.17.0.3: ICMP echo request, id 45, seq 1, length 64
23:20:40.378780 IP 172.17.0.2 > 172.17.0.3: ICMP echo request, id 45, seq 2, length 64
23:20:41.378785 IP 172.17.0.2 > 172.17.0.3: ICMP echo request, id 45, seq 3, length 64
23:20:44.383711 ARP, Request who-has 172.17.0.3 tell 172.17.0.2, length 28
23:20:44.383741 ARP, Reply 172.17.0.3 is-at 02:42:ac:11:00:03 (oui Unknown), length 28

Regarding ①, I will check the ARP cache in each container just in case. You can see that the MAC address of the communication partner is written normally.

 docker exec cent1 arp -e
Address                  HWtype  HWaddress           Flags Mask            Iface
172.17.0.3               ether   02:42:ac:11:00:03   C                     eth0
gateway                          (incomplete)                              eth0

 docker exec cent2 arp -e
Address                  HWtype  HWaddress           Flags Mask            Iface
172.17.0.2               ether   02:42:ac:11:00:02   C                     eth0
gateway                          (incomplete)                              eth0

Why this happens

Linux Bridge operates using a kernel module called bridge, but its security is managed by a kernel module called br_netfilter (dependency with bridge), and br_netfilter controls communication by looking at iptables settings. It seems. Therefore, by default, communication via Bridge is not allowed, and this happens.

$ lsmod | grep br_netfilter
br_netfilter           24576  0
bridge                155648  1 br_netfilter

Solution

Communication between containers will be possible by either of the following measures.

Part 1 Disable Bridge Netfilter

Netfilter, which normally controls Linux Bridge communication, is enabled, Communication can be achieved by intentionally disabling this. In addition, enabling / disabling Bridge Netfilter is a kernel parameter net.bridge.bridge-nf-call-iptables. Can be set by.

Check the status of Bridge Netfilter Currently, 1 is set and it is in a valid state.

 sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 1

setting change Set net.bridge.bridge-nf-call-iptables = 0 in /etc/sysctl.conf and Reflect the settings.

 cat /etc/sysctl.conf
 sysctl settings are defined through files in
 /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.

 Vendors settings live in /usr/lib/sysctl.d/.
 To override a whole file, create a new file with the same in
 /etc/sysctl.d/ and put new settings there. To override
 only specific settings, add a file with a lexically later
 name in /etc/sysctl.d/ and put new settings there.

 For more information, see sysctl.conf(5) and sysctl.d(5).
net.bridge.bridge-nf-call-iptables = 0

 sysctl -p 
net.bridge.bridge-nf-call-iptables = 0

 sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 0

Part 2 Set permissions on iptables

Bridge Netfilter refers to iptables to control communication. Therefore, by adding the permission rule to iptables as follows You will be able to communicate via the Linux Bridge. Here, when adding a rule with the iptables command, by specifying a packet matching module called physdev that manages the input and output of the bridge with -m, it is set to allow all communication via Brige.

iptables --Description of system administration commands --List of Linux commands https://kazmax.zpp.jp/cmd/i/iptables.8.html

 iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT

 iptables -nvL --line-number
Chain INPUT (policy ACCEPT 52 packets, 3250 bytes)
num   pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy DROP 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            PHYSDEV match --physdev-is-bridged
2     2006 2508K DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
3     2006 2508K DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
4     1126 2451K ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
5       46  5840 DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0           
6      834 51247 ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0           
7       46  5840 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0           

 (Omitted)

In kubernetes, it is specified that net.bridge.bridge-nf-call-iptables = 1. Because I faced this problem while doing the following validation on kubernetes Added support for iptables rules.

Play with Multus https://rheb.hatenablog.com/entry/multus_introduction

Why can communication via docker0 be possible?

At this point, one question arises: "Why docker0 can communicate even though it is actually the same Linux Bridge?" The answer lies in the iptables settings. docker seems to describe the rules required for installation and docker network creation in iptables. If you check the iptables setting information, you can see that communication from docker0 to the outside and communication via docker0 are ACCEPTed in FRWARD Chain Nos. 5 and 6. In addition, docker also makes NAT settings for iptables.

 iptables -nvL --line-number
Chain INPUT (policy ACCEPT 228K packets, 579M bytes)
num   pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy DROP 12 packets, 1008 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1     9003   12M DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
2     9003   12M DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
3     5650   12M ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
4        0     0 DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0           
5     3341  191K ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0           
6        0     0 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT 130K packets, 7700K bytes)
num   pkts bytes target     prot opt in     out     source               destination         

Chain DOCKER (1 references)
num   pkts bytes target     prot opt in     out     source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1     3341  191K DOCKER-ISOLATION-STAGE-2  all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0           
2     9003   12M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 DROP       all  --  *      docker0  0.0.0.0/0            0.0.0.0/0           
2     3341  191K RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1     9003   12M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0  

Summary

When using Linux Bridge, it should work only on the L2 layer, so I thought that I should be able to communicate without worrying about anything, but it was a mistake. Linux Bridge also works like L3, including the ability to assign IP. This time I experimented with a container, but I'm assuming that if you assign a KVM virtual machine and a similar problem occurs, this example can solve it. (Not verified)

Thank you

In conducting this survey, we received various survey cooperation and consultations from the people around us. I would like to take this case and thank you.

reference

KVM bridge network settings https://qiita.com/TsutomuNakamura/items/e15d2c8c02586a7ae572#bridge-%E3%83%88%E3%83%A9%E3%83%95%E3%82%A3%E3%83%83%E3%82%AF%E3%81%AEnetfilter-%E3%82%92%E7%84%A1%E5%8A%B9%E5%8C%96%E3%81%99%E3%82%8B

11.2. Bridge network with libvirt https://docs.fedoraproject.org/ja-JP/Fedora/13/html/Virtualization_Guide/sect-Virtualization-Network_Configuration-Bridged_networking_with_libvirt.html

Recommended Posts

Cannot communicate via Linux Bridge
[Windows] RDP to Windows via Linux