Flannel analysis of kubernetes mainstream network scheme

I Flannel brief description

Flannel is one of the CNI network plug-ins of kubernetes cluster. It is essentially an overlay network. Flannel supports a variety of network forwarding strategies, such as vxlan and hostgw.

II Characteristics of Flannel network

  • Make Docker containers created by different Node hosts in the cluster have unique virtual IP addresses in the whole cluster.
  • This is an overlay network, through which data packets are transmitted to the target container intact. The overlay network is a virtual network built on another network and supported by its infrastructure. Overlay network separates the network service from the underlying infrastructure by encapsulating a packet in a bright packet. After the encapsulated data packet is forwarded to the endpoint, it is unpacked.
  • Create a new virtual network card flannel0 to receive the data of docker bridge. Packet and forward (vxlan) the received data by maintaining the routing table.
  • Etcd ensures that the configuration of flanneled on all nodes is consistent. At the same time, flanneld on each node monitors the data changes on etcd and senses the changes of nodes in the cluster in real time.

III Explanation of each component

  • cni0
    Bridge equipment: each time a Pod is created, a pair of Veth pairs will be created, one end of which is eth0 in the Pod and the other end is the port (network card) in the cni0 bridge. The traffic sent by Pod from network card eth0 will be sent to the port (network card) of cni0 bridge device.

  • The IP address obtained by cni0 device is the first address of the network to which the node is assigned

  • flannel.1
    The equipment of overlay network is used to process vxlan messages (packet and unpacking). The Pod data traffic between different node s is sent from the overlay device to the opposite end in the form of tunnel.

  • flanneld
    Flanneld runs flanneld as an agent in each host. It will obtain a small network segment subnet from the network address space of the cluster for all hosts, and the IP addresses of all containers in the host will be allocated from it. At the same time, flanneld monitors k8s cluster database, which is flannel 1. The device provides necessary mac, IP and other network data information when encapsulating data.

IV Pod communication flow on different nodes

1) The data is generated in the Pod and sent to cni0 according to the routing information of the Pod
2) cni0 sends the data to the tunnel device flannel according to the routing table of the node one
3)flannel.1. Check the destination IP of the data packet, obtain the necessary information of the opposite end tunnel equipment from flanneld, and package the data packet.
4)flannel.1 send the data packet to the opposite end device. The network card of the opposite node receives the data packet and finds that the data packet is an overlay data packet. Unpack the outer layer and send the inner layer to the flannel 1 equipment.
5)flannel.1 device checks the data packet, matches according to the routing table, and sends the data to cni0 device.
6) cni0 matches the routing table and sends data to the corresponding port on the bridge.

The flannel network (POD CIDR) defined by the test cluster kubernetes is 172.20.0.0/16. The following example is used to explain the communication between different pods in the network:

10.19.114.100 - pod1 route
#kubectl -n stack exec -it api-0 -- bash
#ip route show
default via 172.20.0.1 dev eth0 
172.20.0.0/24 dev eth0 proto kernel scope link src 172.20.0.73
172.20.0.0/16 via 172.20.0.1 dev eth0

10.19.114.101 - pod2 route
#kubectl -n stack exec -it redis-64c6c549ff-5plcq -- bash
#ip route show
default via 172.20.1.1 dev eth0 
172.20.0.0/16 via 172.20.1.1 dev eth0 
172.20.1.0/24 dev eth0 proto kernel scope link src 172.20.1.11

It can be seen that the default POD network card gateway is the. 1 Gateway, and the gateway is the IP of cni0. Next, analyze the direction of traffic after reaching the host~

10.19.114.100 Host routing
#ip route -n
default via 10.19.114.1 dev eth0
10.19.114.0/24 dev eth0 proto kernel scope link src 10.19.114.100
10.250.250.0/24 dev eth1 proto kernel scope link src 10.250.250.100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.20.0.0/24 dev cni0 proto kernel scope link src 172.20.0.1
172.20.1.0/24 via 172.20.1.0 dev flannel.1 onlink 
172.20.2.0/24 via 172.20.2.0 dev flannel.1 onlink
10.19.114.101 Host routing
#ip route -n
default via 10.19.114.1 dev eth0
10.19.114.0/24 dev eth0 proto kernel scope link src 10.19.114.101
10.250.250.0/24 dev eth1 proto kernel scope link src 10.250.250.101
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.20.0.0/24 via 172.20.0.0 dev flannel.1 onlink 
172.20.1.0/24 dev cni0 proto kernel scope link src 172.20.1.1
172.20.2.0/24 via 172.20.2.0 dev flannel.1 onlink

It can be seen from the above routing that according to the minimum matching principle, it is matched to the above routing table item. Packets from 10.19.114.100 to 172.20.1.0/24 network segment are sent to 172.20.1.0 gateway, and the gateway device is flannel one

flannel.1 is a vxlan device, when the data packet comes to the flannel 1, the data package needs to be encapsulated. At this time, the dst ip is 172.20.1.11 and the src ip is 172.20.0.73. The mac address corresponding to the 172.20.1.11 ip address needs to be known for packet encapsulation. At this point, flannel 1. Instead of sending an arp request to obtain the mac address of 172.20.1.11, the Linux kernel sends a "L3 Miss" event request to the user space flanned program. After receiving the request event from the kernel, the flanned program looks for the flannel of the subnet that can match the address from etcd 1. The mac address of the device, that is, the flannel in the host where the pod is sent 1 mac address of the device. Flannel records all network segments and mac information when assigning ip network segments to nodes, so it can know.

#ip neigh |grep 172
172.20.2.0 dev flannel.1 lladdr 82:c4:0e:f2:00:6f PERMANENT
172.20.1.0 dev flannel.1 lladdr 42:6e:8b:9b:e2:73 PERMANENT

Here, the inner data packet of vxlan is encapsulated. The format is as follows:

The forwarding process of VXLAN mainly depends on the implementation of FDB(Forwarding Database). VXLAN equipment finds the corresponding VTEP IP address according to the MAC address, and then encapsulates and sends the layer-2 data frame to the corresponding VTEP.

#/sbin/bridge fdb show dev flannel.1
42:6e:8b:9b:e2:73 dst 10.19.114.101 self permanent
ba:8b:ce:f3:b8:51 dst 10.19.114.101 self permanent
42:6f:c7:06:3e:a0 dst 10.19.114.102 self permanent
82:c4:0e:f2:00:6f dst 10.19.114.102 self permanent

The kernel needs to check the fdb(forwarding database) on the node to obtain the node address of the destination vtep device in the inner layer packet. Because the mac address of the destination device has been found to be 42:6e:8b:9b:e2:73 from the arp table, and the IP address of the node node corresponding to the mac address exists in the fdb. If there is no such information in the fdb, the kernel will launch an "L2 MISS" event to the flanned program in user space. After receiving this event, flanneld will query etcd, obtain the "Public IP" of the node corresponding to the vtep device, and register the information in fdb.
When the kernel checks that fdb obtains the ip address sent to the machine, arp obtains the mac address, and then the outer packaging of vxlan can be completed.

Specific analysis can be done through wireshark packet capture:

10.19.114.101 when the eth0 network card of the node receives the vxlan device package, kernal will recognize that it is a vxlan package, disassemble the package and transfer it to the flannel on the node 1 equipment. In this way, the packet will arrive at the destination node from the sending node, flannel 1. The device will receive a packet as follows:

The destination address is 172.20.1.11 and arrives at 10.19.114.101 flannel 1. Find your own routing table and complete forwarding according to the routing table. As can be seen from the figure below, flannel 1 forward the traffic to 172.20.1.0/24 to cni0.

Check the cni0 bridge information. The cni0 network realizes communication through veth by binding the network card of pod and the network card of host computer:

#brctl show
bridge name     bridge id               STP enabled     interfaces
cni0            8000.a656432b14cf       no              veth1f7db117
                                                        veth3ee31d24
                                                        veth521bc030
                                                        veth5a59ced4
                                                        veth649412bd
                                                        veth65bbf59f
                                                        veth6ed62916
                                                        veth7e8e7733
                                                        veth9787b6ba
                                                        veth98c762b8
                                                        vethaf05d94b
                                                        vethc07c69cd
                                                        vethdf62bded
                                                        vethe2cf7392
                                                        vethf4995a29
docker0         8000.024216a031b6       no

It can be seen from the figure below that the POD network card of 172.20.1.11 corresponds to link netnsid 0

It can be seen from the figure below that the veth of the 172.20.1.11 POD network card on the host computer is vethf4995a29

Therefore, the veh pair of the pod mounted on the cni0 bridge is vehf4995a29, eth0@if21 and vethf4995a29@if3 A pair of veth and pair. So as to inject traffic into the eth0 network card of pod.

This article is reproduced from: https://mp.weixin.qq.com/s/68QMlmGVJTZO5nkrpc-uMg

Tags: Kubernetes

Posted by suneel on Thu, 05 May 2022 07:24:46 +0300