Meituan distributed ID generator (Leaf), which can't be missed, is an easy-to-use batch

Leaf

Leaf is a distributed ID generation service launched by meituan. Its name is taken from the words of German philosopher and mathematician Leibniz: "there are no two identity leaves in the world." ("there are no two same leaves in the world"), taking a name has such moral meaning, meituan programmer Niu break!

Advantages of Leaf: high reliability, low latency, global uniqueness and other characteristics.

At present, the mainstream distributed ID generation methods are generally based on database segment mode and snowflake algorithm, while Leaf just combines these two methods at the same time, which can be flexibly switched according to different business scenarios.

Next, combined with the actual combat, we will introduce the Leaf segment mode and Leaf snowflake mode of Leaf in detail

1, Leaf segment mode

The leaf segment mode is an optimization to directly use the self incremented ID of the database as the distributed ID to reduce the frequency of operation on the database. It is equivalent to obtaining self incrementing IDs from the database in batches. One number segment range is taken out from the database each time. For example, (11000] represents 1000 IDs. The business service will locally generate 1 ~ 1000 self incrementing IDs from the number segment and load them into memory.

The general process is shown in the figure below:

 

After the number segment is exhausted, go to the database to obtain a new number segment, which can greatly reduce the pressure of the database. Yes, max_ Update the ID field once, update max_id= max_id + step, if the update is successful, the new number segment is successful. The range of the new number segment is (max_id, max_id + step].

Due to the dependence on the database, we first design the following table structure:

CREATE TABLE `leaf_alloc` (
  `biz_tag` varchar(128) NOT NULL DEFAULT '' COMMENT 'business key',
  `max_id` bigint(20) NOT NULL DEFAULT '1' COMMENT 'Currently assigned maximum id',
  `step` int(11) NOT NULL COMMENT 'The initial step size is also the minimum step size for dynamic adjustment',
  `description` varchar(256) DEFAULT NULL COMMENT 'business key Description of',
  `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Update time of database maintenance',
  PRIMARY KEY (`biz_tag`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Copy code

Insert a test piece of business data in advance

INSERT INTO `leaf_alloc` (`biz_tag`, `max_id`, `step`, `description`, `update_time`) VALUES ('leaf-segment-test', '0', '10', 'test', '2020-02-28 10:41:03');
Copy code
  • biz_tag: biz is used for different business needs_ Tag field to isolate. If you need to expand the capacity in the future, you only need to use biz_tag sub database and sub table
  • max_id: the maximum value of the current business number segment, which is used to calculate the next business number segment
  • Step: step size, that is, the number of ID S obtained each time
  • Description: there is nothing to say about the business description

Download the leaf project locally: https://github.com/Meituan-Dianping/Leaf

Modify the leaf in the project Properties file, add database configuration

leaf.name=com.sankuai.leaf.opensource.test
leaf.segment.enable=true
leaf.jdbc.url=jdbc:mysql://127.0.0.1:3306/xin-master?useUnicode=true&characterEncoding=utf8
leaf.jdbc.username=junkang
leaf.jdbc.password=junkang

leaf.snowflake.enable=false
 Copy code

Note: leaf snowflake. Enable and leaf segment. Enable cannot be started at the same time, otherwise the project will not start.

The configuration is quite simple. It's OK after directly starting the LeafServerApplication. Next, test it. leaf is a signaling service based on Http requests. There are only two methods in the LeafController, one is the number segment interface and the other is the snowflake interface. The key is the business biz pre inserted in the database_ tag.

@RestController
public class LeafController {
    private Logger logger = LoggerFactory.getLogger(LeafController.class);

    @Autowired
    private SegmentService segmentService;
    @Autowired
    private SnowflakeService snowflakeService;

    /**
     * Segment mode
     * @param key
     * @return
     */
    @RequestMapping(value = "/api/segment/get/{key}")
    public String getSegmentId(@PathVariable("key") String key) {
        return get(key, segmentService.getId(key));
    }

    /**
     * Snowflake algorithm mode
     * @param key
     * @return
     */
    @RequestMapping(value = "/api/snowflake/get/{key}")
    public String getSnowflakeId(@PathVariable("key") String key) {
        return get(key, snowflakeService.getId(key));
    }

    private String get(@PathVariable("key") String key, Result id) {
        Result result;
        if (key == null || key.isEmpty()) {
            throw new NoKeyException();
        }
        result = id;
        if (result.getStatus().equals(Status.EXCEPTION)) {
            throw new LeafServerException(result.toString());
        }
        return String.valueOf(result.getId());
    }
}Copy code

visit: http://127.0.0.1:8080/api/segment/get/leaf -Segment test, the result returned normally, I felt it was ok, but when I checked the data in the database table, I found a problem.

 

 

Usually, when using the number segment mode, the time to get the number segment is when the previous number segment is consumed, but just now I got an ID, but Max has been updated in the database_ ID, that is to say, leaf has acquired one more number segment. What kind of operation is this?

 

Why is Leaf designed like this?

Leaf hopes to achieve no blocking in the process of getting the number segment in DB!

When the number segment is exhausted, go to the DB and take down a number segment. If the network jitters or the DB has a slow query, and the business system cannot get the number segment, the response time of the whole system will slow down. This is intolerable for businesses with huge traffic.

Therefore, when the current segment is consumed to a certain point, Leaf will load the next segment into memory step by step. Instead of waiting until the number segment is exhausted to update the number segment. This greatly reduces the risk of the system.

So when exactly is a certain point?

An experiment is done here. The length of the segment is set as step=10, max_id=1,

 

When I got the first ID, I saw that the number segment increased by 1 / 10

 

 

When I take the third Id, I see that the number segment has increased again, 3 / 10

 

 

Leaf adopts double buffer mode, and its service has two segment buffer segments. When the current number segment has consumed 10% and the next number segment has not been obtained, another update thread will be started to update the next number segment.

In short, Leaf guarantees that it will always cache two more number segments. Even if the database hangs at any time, it will ensure that the sending service can work normally for a period of time.

 

Generally, the length of the recommended segment is set to 600 times (10 minutes) of the QPS issued during the peak service period, so that even if the DB goes down, the Leaf can continue to issue numbers for 10-20 minutes without being affected.

advantage:

  • Leaf service can be easily linearly expanded, and its performance can fully support most business scenarios.
  • High disaster tolerance: the Leaf service has an internal number segment cache. Even if the DB goes down, the Leaf can still provide external services normally in a short time.

Disadvantages:

  • The ID number is not random enough, which can reveal the information of the number of numbers issued, which is not very safe.
  • DB downtime will make the whole system unavailable (it is possible to use the database).

2, Leaf snowflake

Leaf snowflake basically follows the design of snowflake. The ID composition structure is a Long type composed of 64 bits: positive digits (accounting for 1 bit) + timestamp (accounting for 41 bits) + machine ID (accounting for 5 bits) + machine room ID (accounting for 5 bits) + self increment (accounting for 12 bits).

Leaf snowflake is different from the original snowflake algorithm, mainly in the generation of workId. Leaf snowflake relies on zookeeper to generate workId, that is, the upper machine ID (accounting for 5 bits) + machine room ID (accounting for 5 bits). The workId in the leaf is generated based on the sequence ID of the zookeeper. When each application uses the leaf snowflake, it will generate a sequence ID in the zookeeper at startup, which is equivalent to a sequence node corresponding to a machine, that is, a workId.

 

The process of leaf snowflake starting service is roughly as follows:

  • Start the leaf snowflake service, connect Zookeeper and_ Under the forever parent node, check whether you have registered (whether there are children in this order).
  • If you have registered, directly retrieve your workerID (int type ID number generated by zk order node) and start the service.
  • If it has not been registered, create a persistent sequence node under the parent node. After successful creation, retrieve the sequence number as its own workerID number and start the service.

However, leaf snowflake is a weak dependency on zookeeper. In addition to going to ZK to get data every time, it will also cache a workerID file on the local file system. Once the zookeeper has a problem and the machine needs to be restarted due to failure, the service can still be started normally.

Starting the leaf snowflake mode is also relatively simple. Start the local ZooKeeper and modify the leaf in the project Properties file, close leaf Segment mode, enable leaf Snowflake mode.

leaf.segment.enable=false
#leaf.jdbc.url=jdbc:mysql://127.0.0.1:3306/xin-master?useUnicode=true&characterEncoding=utf8
#leaf.jdbc.username=junkang
#leaf.jdbc.password=junkang

leaf.snowflake.enable=true
leaf.snowflake.zk.address=127.0.0.1
leaf.snowflake.port=2181
 Copy code
    /**
     * Snowflake algorithm mode
     * @param key
     * @return
     */
    @RequestMapping(value = "/api/snowflake/get/{key}")
    public String getSnowflakeId(@PathVariable("key") String key) {
        return get(key, snowflakeService.getId(key));
    }

Copy code

Test it and visit: http://127.0.0.1:8080/api/snowflake/get/leaf-segment-test

 

advantage:

  • The ID number is a 64 bit number of 8byte with increasing trend, which meets the requirements of the primary key stored in the above database.

Disadvantages:

  • Depending on ZooKeeper, there is a risk of service unavailability (I really don't know what the disadvantages are)

3, Leaf monitoring

Request address: http://127.0.0.1:8080/cache

For the monitoring of the service itself, Leaf provides the memory data mapping interface of the Web layer, which can see the distribution status of all segments in real time. For example, the usage of double buffer s in each segment and the location of the current ID can be viewed on the Web interface.

 

Insert picture description here

summary

This article does not make too much analysis on the Leaf source code, because the amount of Leaf code is simple and easy to read.

Tags: Java Database Zookeeper Algorithm Distribution

Posted by Chris_Evans on Wed, 25 May 2022 07:06:18 +0300