How can go zero withstand the traffic impact

No matter in single services or micro services, the API interfaces provided by developers for the front end have access limits. When the access frequency or concurrency exceeds its tolerance range, we must consider flow restriction to ensure the availability of the interface or degrade the availability. That is, the interface also needs to be equipped with a fuse to prevent the system paralysis caused by excessive pressure on the system due to unexpected requests.

Go zero integrates an out of the box current limiter. There are two built-in current limiters, which also correspond to two types of use scenarios:

type principle scene
periodlimit Limit access times per unit time The data transmission rate needs to be forcibly limited
tokenlimit Token bucket current limiting Limit the average transmission rate of data while allowing some degree of burst transmission

This article will introduce periodlimit.

use

const (
    seconds = 1
    total   = 100
    quota   = 5
)
// New limiter
l := NewPeriodLimit(seconds, quota, redis.NewRedis(s.Addr(), redis.NodeType), "periodlimit")

// take source
code, err := l.Take("first")
if err != nil {
    logx.Error(err)
    return true
}

// switch val => process request
switch code {
    case limit.OverQuota:
        logx.Errorf("OverQuota key: %v", key)
        return false
    case limit.Allowed:
        logx.Infof("AllowedQuota key: %v", key)
        return true
    case limit.HitQuota:
        logx.Errorf("HitQuota key: %v", key)
        // todo: maybe we need to let users know they hit the quota
        return false
    default:
        logx.Errorf("DefaultQuota key: %v", key)
        // unknown response, we just let the sms go
        return true
}

periodlimit

Go zero uses sliding window counting to calculate the number of accesses to the same resource in a period of time. If it exceeds the specified limit, access will be denied. Of course, if you access different resources for a period of time, the number of visits to each resource does not exceed the limit. This situation allows a large number of requests to come in.

In a distributed system, there are multiple micro services to provide services. Therefore, when the instantaneous traffic accesses the same resource at the same time, how to make the counter count normally in the distributed system? At the same time, multiple calculations may be involved in computing resource access. How to ensure the atomicity of calculation?

  • Go zero counts resource accesses with the help of the incrby of redis
  • lua script is used to calculate the whole window to ensure the atomicity of the calculation

Here are some key attributes of lua script control:

argument mean
key[1] Identification of access resources
ARGV[1] Limit = > the total number of requests. If it exceeds the limit, the speed will be limited. Can be set to QPS
ARGV[2] Window size = > sliding window. Use ttl to simulate the sliding effect
-- to be compatible with aliyun redis, 
-- we cannot use `local key = KEYS[1]` to reuse thekey
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
-- incrbt key 1 => key visis++
local current = redis.call("INCRBY", KEYS[1], 1)
-- If this is the first visit, set the expiration time => TTL = window size
-- Because it only limits the number of visits for a period of time
if current == 1 then
    redis.call("expire", KEYS[1], window)
    return 1
elseif current < limit then
    return 1
elseif current == limit then
    return 2
else
    return 0
end

As for the above return code, return it to the caller. It is up to the caller to decide the subsequent operation of the request:

return code tag call code mean
0 OverQuota 3 over limit
1 Allowed 1 in limit
2 HitQuota 2 hit limit

The following figure describes the process of requesting entry and the subsequent events when the request triggers limit:

Subsequent treatment

If you request a large number of incoming calls at a certain time point of the service, the periodlimit will reach the limit threshold in a short time, and the set time range is far from reaching. The processing of subsequent requests becomes a problem.

code is returned instead of being processed in periodlimit. The subsequent requests are handled by the developers themselves.

  1. If you don't handle it, it's simply to reject the request
  2. If you need to handle these requests, developers can use mq to buffer the requests and reduce the pressure on the requests
  3. tokenlimit is adopted to allow temporary traffic impact

So next, let's talk about tokenlimit

summary

The periodlimit current limiting scheme in go zero is based on the redis counter. By calling redis lua script, the atomicity of the counting process is guaranteed, and the counting is normal in the case of distribution.

However, this scheme also has disadvantages, because it needs to record all behavior records in the time window. If this amount is particularly large, the memory consumption will become very serious.

reference resources

At the same time, you are welcome to use go zero and join us, https://github.com/tal-tech/go-zero

If you think the article is good, welcome github to click star 👏

Project address:
https://github.com/tal-tech/go-zero

Posted by 9three on Sat, 07 May 2022 08:13:09 +0300