[3-4 Golang] GC—Scheduling and Tuning

  The basic knowledge of garbage collection has been introduced almost, but it is necessary to know that the garbage collection process requires CPU time, which may affect the scheduling of user coroutines, so garbage collection related tuning is required in some scenarios. This article mainly introduces the trigger timing of garbage collection and several scheduling modes of the garbage collector. Only by understanding these can we know how to tune. Finally, combined with the commonly used caching framework bigcache, we analyze how to reduce the pressure of garbage collection.

Trigger timing

   When is garbage collection triggered? First of all, it may be triggered when the memory usage increases by a certain percentage (you can't let the memory grow), is there any other way? We can also trigger it manually through the runtime.GC function (which will block the user coroutine until the garbage collection process ends). In addition, the Go auxiliary thread will also detect that if garbage collection is not performed for more than 2 minutes, it will force garbage collection to start. The three trigger modes are defined as follows:

// gcTriggerHeap indicates that a cycle should be started when
// the heap size reaches the trigger heap size computed by the
// controller.
gcTriggerHeap gcTriggerKind = iota   //Memory grows to trigger threshold

// gcTriggerTime indicates that a cycle should be started when
// it's been more than forcegcperiod nanoseconds since the
// previous GC cycle.
gcTriggerTime                        //timing trigger

// gcTriggerCycle indicates that a cycle should be started if
// we have not yet started cycle number gcTrigger.n (relative
// to work.cycles).
gcTriggerCycle                      //Can be used to force trigger

//Contains trigger type, current time, number of cycles
type gcTrigger struct {
    kind gcTriggerKind
    now  int64  // gcTriggerTime: current time
    n    uint32 // gcTriggerCycle: cycle number to start

  Remember that the previous article introduced that the garbage collection entry function is gcstart. The input parameter of this function is gcTrigger. Before each garbage collection is started, it will detect whether garbage collection should be triggered. The detection method is as follows:

func (t gcTrigger) test() bool {
    switch t.kind {
    case gcTriggerHeap:
        //memory reaches threshold
        return gcController.heapLive >= gcController.trigger
    case gcTriggerTime:
        //GC can be turned off: Initialized from GOGC. GOGC=off means no GC.
        if gcController.gcPercent.Load() < 0 {
            return false

        //forcegcperiod is defined as 2 minutes
        lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime))
        return lastgc != 0 && t.now-lastgc > forcegcperiod
    case gcTriggerCycle:
        // Can be used to force trigger
        return int32(t.n-work.cycles) > 0
    return true

   is always given. There are three triggering methods for garbage collection: applying for memory, timing triggering, and active triggering. The logic of active triggering and timing triggering is relatively simple, so I will not introduce too much here. We focus on the garbage collection triggered by application memory.

How can    apply for memory to trigger garbage collection? Think about whether the entry function for applying for memory is mallocgc, so you only need to judge here:

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
    //Only when the memory requested at one time is too large (more than 32768) will it detect whether to enable GC
    if size <= maxSmallSize {
    } else {
        shouldhelpgc = true

    if shouldhelpgc {
        if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {

   However, have you ever thought about how to count the number of bytes of memory currently allocated? And how to calculate the memory threshold triggered by the next garbage collection? In theory, the number of bytes of allocated memory should also be updated in the mallocgc function, but the Go language does not do this, but is updated when the mspan cache is obtained from mcentral to mcache (that is to say, the cache to mcache is "applied"), so This data is not actually the amount of memory allocated by the real user code (slightly larger).

func (c *mcache) refill(spc spanClass) {
    //Memory used by this mspan
    usedBytes := uintptr(s.allocCount) * s.elemsize
    //Update Global Statistics Variables
    gcController.update(int64(s.npages*pageSize)-int64(usedBytes), int64(c.scanAlloc))

func (c *gcControllerState) update(dHeapLive, dHeapScan int64) {
    if dHeapLive != 0 {
        atomic.Xadd64(&gcController.heapLive, dHeapLive)

   Another question, how to calculate the memory threshold triggered by the next garbage collection? Of course, after each garbage collection is over, the memory threshold triggered by the next garbage collection needs to be updated. The threshold is related to the number of marked memories of the previous garbage collection and the trigger ratio. What is the trigger ratio? It is related to the environment variable GOGC. The trigger threshold calculation process is as follows:

func (c *gcControllerState) commit(triggerRatio float64) {
    //The gcPercent data comes from the environment variable GOGC

    // goal next garbage collection trigger goal
    goal := ^uint64(0)
    if gcPercent := c.gcPercent.Load(); gcPercent >= 0 {
        goal = c.heapMarked + (c.heapMarked+atomic.Load64(&c.stackScan)+atomic.Load64(&c.globalsScan))*uint64(gcPercent)/100

    //trigger is calculated by goal
    //Since the user coroutine is still concurrently applying for memory during the garbage collection process, the final trigger threshold is not equal to the goal

    c.trigger = trigger

  Note that in the process of garbage collection, user coroutines are still applying for memory concurrently, so the final trigger threshold trigger is not equal to goal, but it is undeniable that trigger is calculated by goal, and also involves the pacing algorithm, which will not be expanded here. .

  On the one hand, garbage collection can reclaim useless memory and prevent Go programs from occupying too much memory. However, the garbage collection process also needs to occupy CPU time, which may have a certain impact on the scheduling of user coroutines. Therefore, in some scenarios, garbage collection may be required. Tuning generally means adjusting the environment variable GOGC to balance the CPU usage of Go program memory usage and garbage collection.

Garbage Collection Scheduling Mode

   The garbage collection process also takes up CPU time. In the last article, we saw that the number of garbage collection work coroutines gcBgMarkWorker is consistent with the number of logical processors P. Are these coroutines scheduled at the same time? After scheduling, does it run until the garbage collection process ends? How does the Go language ensure that the garbage collection process occupies a certain percentage of the CPU, so that it will not affect the user coroutine too much, and it will not be too slow? The Go language defines three garbage collection work coroutine scheduling modes:

// gcMarkWorkerDedicatedMode indicates that the P of a mark
// worker is dedicated to running that mark worker. The mark
// worker should run without preemption.
gcMarkWorkerDedicatedMode      //Currently P can only run garbage collection work coroutines and cannot be preempted

// gcMarkWorkerFractionalMode indicates that a P is currently
// running the "fractional" mark worker. The fractional worker
// is necessary when GOMAXPROCS*gcBackgroundUtilization is not
// an integer and using only dedicated workers would result in
// utilization too far from the target of gcBackgroundUtilization.
// The fractional worker should run until it is preempted and
// will be scheduled to pick up the fractional part of
// GOMAXPROCS*gcBackgroundUtilization.
gcMarkWorkerFractionalMode     //Currently P runs a garbage collection work coroutine, which needs to guarantee the total CPU usage time

// gcMarkWorkerIdleMode indicates that a P is running the mark
// worker because it has nothing else to do. The idle worker
// should run until it is preempted and account its time
// against gcController.idleMarkTime.
gcMarkWorkerIdleMode          //If the current P has no user coroutines that can be scheduled, only the garbage collection work coroutine is scheduled.

   First of all, it must be clear that the Go language guarantees that the CPU utilization of the garbage collection work coroutine is about 25%. How to guarantee it? Assuming that the Go program creates 8 logical processors P, 25% is equivalent to scheduling garbage collection main coroutines on two logical processors P (currently P only runs garbage collection work coroutines), then if the number of logical processors P What if it is not divisible by 4? At this time, there will definitely be some P scheduling modes that use gcMarkWorkerFractionalMode, and the calculation method is as follows:

func (c *gcControllerState) startCycle(markStartTime int64, procs int) {

    // gcBackgroundUtilization is constant 0.25, procs is the number of logical processors P
    totalUtilizationGoal := float64(procs) * gcBackgroundUtilization
    //Rounded up, these P can only run garbage collection worker coroutines
    c.dedicatedMarkWorkersNeeded = int64(totalUtilizationGoal + 0.5)
    //When non-integer, calculation error
    utilError := float64(c.dedicatedMarkWorkersNeeded)/totalUtilizationGoal - 1
    //The error is relatively large and needs to be corrected
    if utilError < -maxUtilError || utilError > maxUtilError {

        //Too many P s are in gcMarkWorkerDedicatedMode, minus 1
        if float64(c.dedicatedMarkWorkersNeeded) > totalUtilizationGoal {

        //Some Ps need to be in gcMarkWorkerFractionalMode mode, and the proportion of CPU time occupied by these Ps is fractionalUtilizationGoal
        c.fractionalUtilizationGoal = (totalUtilizationGoal - float64(c.dedicatedMarkWorkersNeeded)) / float64(procs)
    } else {
        //The error is small and no correction is required
        c.fractionalUtilizationGoal = 0

  When garbage collection starts, the number of logical processors P in each scheduling mode is calculated. When the Go language scheduler schedules garbage collection work coroutines, it sets the scheduling mode of each work coroutine. Refer to the implementation of the function findRunnableGCWorker:

func (c *gcControllerState) findRunnableGCWorker(_p_ *p) *g {
    //The function decIfPositive determines whether the input parameter is a positive number and subtracts 1

    // The scheduling mode is gcMarkWorkerDedicatedMode
    if decIfPositive(&c.dedicatedMarkWorkersNeeded) {
        _p_.gcMarkWorkerMode = gcMarkWorkerDedicatedMode
    }else if c.fractionalUtilizationGoal == 0 {
    } else {

        //delta garbage collection mark start to present time period
        delta := nanotime() - c.markStartTime

        // Calculate the proportion of CPU occupied by garbage collection in gcMarkWorkerFractionalMode scheduling mode, and return directly if it exceeds
        if delta > 0 && float64(_p_.gcFractionalMarkTime)/float64(delta) > c.fractionalUtilizationGoal {
        // Running in gcMarkWorkerFractionalMode scheduling mode
        _p_.gcMarkWorkerMode = gcMarkWorkerFractionalMode


   Why is there no gcMarkWorkerIdleMode? Think about what this pattern means: if there are currently no user coroutines to schedule in P, the garbage collection worker coroutine is only scheduled. So it should be that when the Go language scheduler obtains the runnable user coroutine, it finds that there is no runnable coroutine and it is in the garbage collection process at this time, and then schedules the garbage collection work coroutine (refer to the scheduler to find the coroutine function findrunnable).

  The scheduling mode has been determined. When the garbage collection worker coroutine gcBgMarkWorker is running, it will decide how to execute the mark scanning process according to the scheduling mode:

func gcBgMarkWorker() {
    for {

        startTime := nanotime()

        systemstack(func() {
            switch pp.gcMarkWorkerMode {
                throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")

            //Only run the garbage collection coroutine, the first time the marker scan process is performed until it is preempted
            case gcMarkWorkerDedicatedMode:
                gcDrain(&pp.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
                //If preempted, add the coroutine of the current P queue to the global coroutine queue
                if gp.preempt {
                    if drainQ, n := runqdrain(pp); n > 0 {
                        globrunqputbatch(&drainQ, int32(n))
                //Perform a marker scan until the end of the task
                gcDrain(&pp.gcw, gcDrainFlushBgCredit)
            //Perform a marker scan at a percentage of CPU time, or until preempted
            case gcMarkWorkerFractionalMode:
                gcDrain(&pp.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit)
            //Only perform marker scans when idle
            case gcMarkWorkerIdleMode:
                gcDrain(&pp.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)

        duration := nanotime() - startTime

        //Count the running time of the mark scanning process in gcMarkWorkerFractionalMode mode
        if pp.gcMarkWorkerMode == gcMarkWorkerFractionalMode {
            atomic.Xaddint64(&pp.gcFractionalMarkTime, duration)


gcDrainUntilPreempt gcDrainFlags = 1 << iota    // run gcDrain until preempted
gcDrainFlushBgCredit                            // gcDrain updates the global cash pool (review of auxiliary markers)
gcDrainIdle                                     // gcDrain only runs when idle
gcDrainFractional                                // gcDrain can only occupy a certain percentage of CPU

The   gcDrain function is also a loop, which cyclically obtains gray nodes and executes the mark scanning process. How to implement these scheduling modes? That is, when does the loop end? You only need to judge at the beginning of each loop. For example, in the loop condition, judge whether it is preempted, whether the CPU time exceeds a certain percentage, whether there is a user coroutine waiting for scheduling in the current P, etc.

func gcDrain(gcw *gcWork, flags gcDrainFlags) {

    // Can it be preempted
    preemptible := flags&gcDrainUntilPreempt != 0

    //End-of-cycle detection method
    if flags&(gcDrainIdle|gcDrainFractional) != 0 {
        if idle {
            check = pollWork     //Check if there are user coroutines waiting to be scheduled
        } else if flags&gcDrainFractional != 0 {
            check = pollFractionalWorkerExit   //Detect CPU time usage ratio

    //If preemption is allowed and preempted, end
    for !(gp.preempt && (preemptible || atomic.Load(&sched.gcwaiting) != 0)) {
        //mark scan

        //Check if the loop is over
        if check != nil && check() {



Overview of bigcache

   How to tune garbage collection? On the one hand, the environment variable GOGC can be adjusted according to the business type to balance the memory usage of the Go program and the occupancy of the CPU by garbage collection; on the other hand, the amount of memory allocated by the user code can be minimized, such as the use of object pools (reuse).

   Are there any other options? Recall the whole process of mark scanning: 1) select an object from the gray object collection, mark it as black; 2) scan all the objects pointed to by this object and add it to the gray object collection; 3) keep repeating step 1/2. Essentially, as long as the object contains pointers, it needs to continue scanning, so the Go language divides each mspan into two specifications, with pointers and without pointers, and mspans without pointers do not need to continue scanning.

   Usually, in order to improve service performance, local cache is used, so user code will inevitably allocate a large amount of memory, and the pressure of garbage collection will be very large. How to solve this problem? bigcache is a commonly used local memory cache component, which reduces the pressure of garbage collection scanning by removing pointers.

   Can the cache component also remove pointers? Think about how the general cache data storage is designed: there is usually a map that stores cache key-value objects, and generally implements a cache elimination algorithm based on LRU. bigcache also uses map to store cache objects, so how do you remove pointers? Because the cached key and value are both integers! The key is stored according to the hash value, where is the string key stored? How is the value stored as an integer? In fact, the value stored is also the location index. The real data entry is stored in the byte array, unlike the traditional map, the entry is an independent object/node.

type cacheShard struct {
    // map definition, key-value are integers
    hashmap     map[uint64]uint32
    //The real storage data entry is a byte array
    entries     queue.BytesQueue
    //Read and write need to be locked
    lock        sync.RWMutex

   See, the definition of map does not contain pointers, and the data key-value is encoded as binary and stored in the entries byte array, and the whole does not contain pointers. Let's take a brief look at the data lookup logic of bigcache:

func (s *cacheShard) get(key string, hashedKey uint64) ([]byte, error) {

    // or the byte encoding of the entry
    wrappedEntry, err := s.getWrappedEntry(hashedKey)

    // The readKeyFromEntry function decodes the byte array as
    if entryKey := readKeyFromEntry(wrappedEntry); key != entryKey {
        // Hash collision, return error

    // decoding
    entry := readEntry(wrappedEntry)

    return entry, nil

func (s *cacheShard) getWrappedEntry(hashedKey uint64) ([]byte, error) {
    itemIndex := s.hashmap[hashedKey]

    // itemIndex is the byte array index
    wrappedEntry, err := s.entries.Get(int(itemIndex))
    return wrappedEntry, err

The data storage of   bigcache does not contain pointers. In this way, the pressure of garbage collection scanning is reduced. However, since entries are ordinary byte arrays, flexible cache elimination strategies cannot be implemented.


  This article mainly introduces the trigger timing of garbage collection and several scheduling modes of the garbage collector. You need to understand that garbage collection occupies CPU resources, which may affect the scheduling and execution of user coroutines. Therefore, some business scenarios require garbage collection tuning. . Finally, combined with the commonly used caching framework bigcache, learn how to reduce the pressure of garbage collection (no pointers).

Tags: Go

Posted by rvdb86 on Thu, 13 Oct 2022 06:43:38 +0300