Remember to optimize the Python Web interface once and improve the performance by 25 times!

Remember to optimize the Python Web interface once and improve the performance by 25 times!

Python Zen August 9

Source: Lin_R   
Link: https://segmentfault.com/a/1190000020956724

background

The business platform we are responsible for once found that the loading of the setting page is particularly slow, which is simply heinous

It's definitely impossible for users to wait for 36s, so we're going to start the optimization journey.

throw a stone to clear the road

Since it is the response problem of the website, we can quickly find the optimization direction through Chrome, a powerful tool.

Through Chrome's Network, you can not only see the time-consuming of interface requests, but also see the allocation of time. Select an item with less configuration and simply request:

Although it is only a project with only three records, it takes 17s to load the project settings. Through Timing, you can see that the total request takes 17.67s, but 17.57s is in the Waiting(TTFB) state.

TTFB yes Time to First Byte It refers to the time when the browser starts to receive the server response data (background processing time)+Redirection time) is an important indicator of the response speed of the server.

 

Profile flame diagram + code tuning

 

You can probably know that the general direction of optimization is the processing of the back-end interface. The back-end code is implemented by Python + Flask. First, don't guess blindly and go directly to the Profile:

The first wave of Optimization: functional interaction redesign

To be honest, it's desperate to see this Code: can't see anything at all? Just see a lot of gevent and Threading, because there are too many coprocesses or threads?

At this time, it must be analyzed in combination with the code (for the sake of brevity, the parameter part uses "..." Substitute):

def get_max_cpus(project_code, gids):
    """
    """
    ...
    # Define another acquisition cpu Function of
    def get_max_cpu(project_setting, gid, token, headers):
        group_with_machines = utils.get_groups(...)
        hostnames = get_info_from_machines_info(...)
        res = fetchers.MonitorAPIFetcher.get(...)
        vals = [
            round(100 - val, 4)
            for ts, val in res['series'][0]['data']
            if not utils.is_nan(val)
        ]
        max_val = max(vals) if vals else float('nan')
        max_cpus[gid] = max_val
       
    #  Start thread batch request
    for gid in gids:
        t = Thread(target=get_max_cpu, args=(...))
        threads.append(t)
        t.start()
        
    #Recycle thread
    for t in threads:
        t.join()

    return max_cpus

As you can see from the code, in order to get all the # CPUs of # gids # more quickly_ Max data, allocate a thread for each gid to request, and finally return the maximum value.

There are two problems:

  1. Creating and destroying threads in a web api is very costly, because the interface will be triggered frequently and the operation of threads will occur frequently. We should use thread pools as much as possible to reduce the system cost;
  2. This request is to load the maximum CPU value of the machine under a gid (Group) in the past 7 days. You can simply pat your head and think about it. This value is not a real-time value or an average value, but a maximum value. In many cases, it may not be as valuable as expected;

Now that we know the problem, we have targeted solutions:

  1. Adjust the function design. Instead of loading the maximum CPU by default, users click to load (one is to reduce the possibility of concurrency, the other is not to affect the whole);
  2. Because of the adjustment of 1, the multithreading implementation is removed;

Take another look at the flame diagram after the first wave of Optimization:

Although there is still a lot of room for optimization in the flame diagram, it looks at least a little normal.

The second wave of Optimization: Mysql operation optimization processing

Zoom in on the page (from the logic diagram):

You can see that a large piece of operation is carried out by utils py:get_ group_ profile_ Settings: this function causes database operation hotspots.

Similarly, code analysis is also required:

def get_group_profile_settings(project_code, gids):
    
    #Get the # Mysql # ORM # operation object
    ProfileSetting = unpurview(sandman.endpoint_class('profile_settings'))
    session = get_postman_session()
    
    profile_settings = {}
    for gid in gids:
        compound_name = project_code + ':' + gid
        result = session.query(ProfileSetting).filter(
            ProfileSetting.name == compound_name
        ).first()
        
        if result:
            result = result.as_dict()
            tag_indexes = result.get('tag_indexes')
            profile_settings[gid] = {
                'tag_indexes': tag_indexes,
                'interval': result['interval'],
                'status': result['status'],
                'profile_machines': result['profile_machines'],
                'thread_settings': result['thread_settings']
            }
            ...(ellipsis)
    return profile_settings

When you see Mysql, the first reaction is the index problem, so you should first look at the index of the database. If there is an index, it should not be a bottleneck:

It's strange that there is an index here. Why is the speed still like this!

When I was at a loss, I suddenly remembered that when I was in the first wave of optimization, I found that the more GIDS (groups), the more obvious the impact. Then I looked back at the above code and saw the sentence:

for gid in gids: 
    ...

I seem to understand something.

Here, every gid queries the database once, and the project often has 20 ~ 50 + groups, which must explode directly.

In fact, I can avoid using Mysql to record too many data in a single field, and I can avoid using Mysql to query too many data

Just when I wanted to start it without delay, I caught a glimpse of another place in the code that can be optimized, that is:

Seeing here, familiar friends will probably understand what's going on.

GetAttr , this method is used when Python obtains the , method / attribute of an object. Although it cannot be used, if it is used too frequently, there will be a certain performance loss.

Combined with the code:

def get_group_profile_settings(project_code, gids):
    
    #Get the # Mysql # ORM # operation object
    ProfileSetting = unpurview(sandman.endpoint_class('profile_settings'))
    session = get_postman_session()
    
    profile_settings = {}
    for gid in gids:
        compound_name = project_code + ':' + gid
        result = session.query(ProfileSetting).filter(
            ProfileSetting.name == compound_name
        ).first()
        ...

In this for , which traverses many times, session Query (profilesetting) is repeatedly invalid, and then the attribute method of filter is frequently read and executed, so it can also be optimized here.

The following questions are summarized:

1. There is no batch query for database query;
2. ORM Too many objects are generated repeatedly, resulting in performance loss;
3. The attribute is not reused after reading, resulting in frequent in loops with large traversal times getAttr,The cost is magnified;

Then the right remedy is:

def get_group_profile_settings(project_code, gids):
    
    # obtain Mysql ORM Operation object
    ProfileSetting = unpurview(sandman.endpoint_class('profile_settings'))
    session = get_postman_session()
    
    
    # Batch query And will filter Mentioned outside the cycle
    query_results = query_instance.filter(
        ProfileSetting.name.in_(project_code + ':' + gid for gid in gids)
    ).all()

    #All query results will be processed separately
    profile_settings = {}
    for result in query_results:
        if not result:
            continue
        result = result.as_dict()
        gid = result['name'].split(':')[1]
        tag_indexes = result.get('tag_indexes')
        profile_settings[gid] = {
            'tag_indexes': tag_indexes,
            'interval': result['interval'],
            'status': result['status'],
            'profile_machines': result['profile_machines'],
            'thread_settings': result['thread_settings']
        }

            ...(ellipsis)
    return profile_settings

Optimized flame diagram:

Compare the flame diagram at the same position before optimization:

Obvious optimization points: before optimization, the bottom utils py:get_ group_ profile_ Settings and database related hotspots are greatly reduced.

Optimization effect

The response time of the interface of the same project is optimized from 37.6 s to 1.47 s. The specific screenshot is as follows:

Optimization summary

As a famous saying goes:

If a data structure is good enough, it doesn't need a good algorithm.

When optimizing functions, the fastest optimization is to remove that function!

The second is to adjust the {frequency} or} complexity of the function trigger!

Considering this function optimization method from top to bottom and from the user's use scenario will often bring simpler and more efficient results. Hey hey!

Of course, many times we can't be so lucky. If we really can't remove or adjust it, we can give play to the value of being a program Ape: Profile

For Python, you can try: cProflile + gprof2dot

For Go, you can use pprof + Go torch

Many times, the code problems you see are not necessarily the real performance bottleneck. You need to combine tools to objectively analyze them, so as to effectively hit the pain point!

In fact, this 1.47s is not the best result. There is more room for optimization, such as:

  1. The way of front-end rendering and rendering, because the whole table is presented after a lot of data is assembled. Cells with slow response can be displayed first by default, and the data can be updated after being returned;
  2. According to the flame diagram, there are still many details that can be optimized, and the external interface for requesting data can be replaced, such as re optimizing the logic related to GetAttr;
  3. More extreme is to directly convert Python to GO;

However, these optimizations are not so urgent, because this 1.47s is the optimization result of relatively large projects. In fact, most projects can return in less than 1s

The re optimization may cost more, and the result may only be from 500ms to 400ms. The result is not so cost-effective.

Therefore, we must always be clear about our optimization objectives, always consider the input-output ratio, and make relatively high value in limited time (if we have free time, we can certainly do it to the end)

finish
Reply to the "Keywords" below to obtain high-quality resources

Reply to the keyword "Python", get the electronic version of the advanced essential book "advanced Python" immediately, reply to the keyword "Flask", get the reply keyword "wx" of the Flask tutorial immediately, and join the high-quality Python exchange group
recommend:
  1. The highest level of computer room wiring

  2. Use this website to find out that you have been sold

  3. This network troubleshooting tool can be called an artifact!



Welcome to my video number
 
 

Wechat scan
Follow the official account

Posted by bestrong on Wed, 18 May 2022 21:12:02 +0300