Kubenetes Pod Deployment & Rolling Upgrade Tuning
pod The number of pods deployed in rolling upgrade deployment to the update rate of available indicators is the core indicator to measure the scheduling capability of Kubenetes
copyrollingUpdate: maxSurge: 25% #Number of instances per rollover maxUnavailable: 25% #How many instances are allowed to be unavailable during the update
By default, the rolling upgrade is updated one by one. When dozens or hundreds of POD s need to be updated, plus, system Admission Webhook, Scheduler Binding Score & filter, Probe readiness detection, the entire process Qps and Burst limit, the entire process will be slower.
Pod Deployment & Rolling Upgrade
The core involves several steps:
Deployment core process:
- kubectl sends a deployment request to the apiserver (for example using: kubectl create -f deployment.yml)
- The apiserver persists the Deployment to etcd; etcd communicates with the apiserver through http2.0.
- The controller manager monitors the apiserver through the watch api. After the deployment controller sees a newly created deplayment object, it pulls it from the queue, creates a ReplicaSet according to the description of the deployment and returns the ReplicaSet object to the apiserver and persists it back to etcd.
- By analogy, when the replicaset controller sees the newly created replicaset object, it pulls it from the queue and creates a pod object according to the description.
- Then the scheduler scheduler sees the unscheduled pod object, selects a schedulable node according to the scheduling rules, loads it into the nodeName field in the pod description, and returns the pod object to the apiserver and writes it into etcd.
- kubelet sees that the nodeName field in the pod object belongs to the node, pulls it from the queue, and creates the container described in the pod through the container runtime.
The communication between components is obtained through the http2.0 watch apiserver resource, which is real-time long connection notification information. Basically no acceleration points available
1. Adjust the kubelet reporting frequency
Adjust the kubelet reporting frequency so that the scheduler can obtain more accurate node resource topology information for more accurate calculation of node weights
The default kubelet is via
copy--node-status-update-frequency=10s #Default reporting time --kube-api-qps=5 --kube-api-burst=10
copy--node-status-update-frequency=3s --kube-api-qps=50 #Information reporting after pod deployment --kube-api-burst=100 #Information reporting after pod deployment
2. The controller manager adjusts the Node information acquisition cycle
The default controller manager checks the kubelet cycle
copy--node-monitor-period=5s #Check kubelet status interval --node-monitor-grace-period=40s #Check notready node interval --pod-eviction-timeout=5m # Rescheduling interval after pod binding failure
copy--node-monitor-period=2s --node-monitor-grace-period=20s --pod-eviction-timeout=30s
3. Turn off custom webhook and scheduler
Unnecessary custom scheduling in the shutdown link
4. Adjust the controller manager concurrency
kube-controller-manager to adjust:
copy--concurrent-deployment-syncs=5 --concurrent-endpoint-syncs=5 --concurrent-namespace-syncs=10 --concurrent-replicaset-syncs=5 --concurrent-service-syncs=10 --kube-api-qps=20 --kube-api-burst=30
copy--concurrent-deployment-syncs=50 --concurrent-endpoint-syncs=50 --concurrent-namespace-syncs=100 --concurrent-replicaset-syncs=50 --concurrent-service-syncs=100 --kube-api-qps=500 --kube-api-burst=100
5. pod set request and limit, and add PDB
copyapiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 1000 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.15.4 resources: limits: cpu: "10m" memory: 10Mi requests: cpu: "10m" memory: 10Mi --- apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: nginx-pdb spec: maxUnavailable: 25% minAvailable: 25% selector: matchLabels: app: nginx
1000 pod s are created to the ready state. After the above configuration, the time-consuming is from 78s->24s, and the Qps of the control plane is 10 times higher
Divergence: Resource Topology-Aware Scheduling Optimization
next stage: Resource topology-aware scheduling intervention