Kubernetes Pod Deployment & Rolling Upgrade Tuning

Kubenetes Pod Deployment & Rolling Upgrade Tuning

pod The number of pods deployed in rolling upgrade deployment to the update rate of available indicators is the core indicator to measure the scheduling capability of Kubenetes

for example:

  rollingUpdate:
    maxSurge: 25% #Number of instances per rollover
    maxUnavailable: 25% #How many instances are allowed to be unavailable during the update
copy

By default, the rolling upgrade is updated one by one. When dozens or hundreds of POD s need to be updated, plus, system Admission Webhook, Scheduler Binding Score & filter, Probe readiness detection, the entire process Qps and Burst limit, the entire process will be slower.

Pod Deployment & Rolling Upgrade

The core involves several steps:

Deployment core process:

  1. kubectl sends a deployment request to the apiserver (for example using: kubectl create -f deployment.yml)
  2. The apiserver persists the Deployment to etcd; etcd communicates with the apiserver through http2.0.
  3. The controller manager monitors the apiserver through the watch api. After the deployment controller sees a newly created deplayment object, it pulls it from the queue, creates a ReplicaSet according to the description of the deployment and returns the ReplicaSet object to the apiserver and persists it back to etcd.
  4. By analogy, when the replicaset controller sees the newly created replicaset object, it pulls it from the queue and creates a pod object according to the description.
  5. Then the scheduler scheduler sees the unscheduled pod object, selects a schedulable node according to the scheduling rules, loads it into the nodeName field in the pod description, and returns the pod object to the apiserver and writes it into etcd.
  6. kubelet sees that the nodeName field in the pod object belongs to the node, pulls it from the queue, and creates the container described in the pod through the container runtime.

optimization point

The communication between components is obtained through the http2.0 watch apiserver resource, which is real-time long connection notification information. Basically no acceleration points available

1. Adjust the kubelet reporting frequency

Adjust the kubelet reporting frequency so that the scheduler can obtain more accurate node resource topology information for more accurate calculation of node weights

The default kubelet is via

   --node-status-update-frequency=10s  #Default reporting time
   --kube-api-qps=5
   --kube-api-burst=10
copy

change to

   --node-status-update-frequency=3s  
   --kube-api-qps=50 #Information reporting after pod deployment
   --kube-api-burst=100 #Information reporting after pod deployment
copy

2. The controller manager adjusts the Node information acquisition cycle

The default controller manager checks the kubelet cycle

   --node-monitor-period=5s #Check kubelet status interval
   --node-monitor-grace-period=40s #Check notready node interval
   --pod-eviction-timeout=5m # Rescheduling interval after pod binding failure
copy

change to

   --node-monitor-period=2s
   --node-monitor-grace-period=20s
   --pod-eviction-timeout=30s
copy

3. Turn off custom webhook and scheduler

Unnecessary custom scheduling in the shutdown link

kube-scheduler:

  --feature-gates=CustomResourceValidationExpressions=false,...
copy

4. Adjust the controller manager concurrency

kube-controller-manager to adjust:

  --concurrent-deployment-syncs=5
  --concurrent-endpoint-syncs=5
  --concurrent-namespace-syncs=10
  --concurrent-replicaset-syncs=5
  --concurrent-service-syncs=10
  --kube-api-qps=20
  --kube-api-burst=30
copy

Change

  --concurrent-deployment-syncs=50
  --concurrent-endpoint-syncs=50
  --concurrent-namespace-syncs=100
  --concurrent-replicaset-syncs=50
  --concurrent-service-syncs=100
  --kube-api-qps=500
  --kube-api-burst=100
copy

5. pod set request and limit, and add PDB

pod deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 1000
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.15.4
	    resources:
          limits:
            cpu: "10m"
            memory: 10Mi
          requests:
            cpu: "10m"
            memory: 10Mi
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: nginx-pdb
spec:
  maxUnavailable: 25%
  minAvailable: 25%
  selector:
    matchLabels:
      app: nginx
copy

Summarize

1000 pod s are created to the ready state. After the above configuration, the time-consuming is from 78s->24s, and the Qps of the control plane is 10 times higher

Divergence: Resource Topology-Aware Scheduling Optimization

next stage: Resource topology-aware scheduling intervention

Tags: etcd

Posted by poizn on Wed, 16 Nov 2022 05:57:27 +0300