Development oriented memory debugging artifact, how to use ASAN to detect memory leakage, stack overflow and other problems

  • The original content of GreatSQL community cannot be used without authorization. Please contact Xiaobian and indicate the source for reprint.

[toc]

introduce

First of all, let's introduce it Sanitizer The project is an open source project produced by Google. The project includes memory and thread error detection tools such as ASAN, LSAN, MSAN and TSAN. Here is a brief introduction to the functions of these tools:

  • ASAN: memory error detection tool. Add - fsanitize=address in the compilation command to enable it
  • LSAN: memory leak detection tool, which has been integrated into ASAN. You can set the environment variable ASAN_ OPTIONS=detect_ Leaves = 0 to turn off the LSAN on ASAN. You can also use the - fsanitize = leave compilation option instead of - fsanitize=address to turn off the memory error detection of ASAN and only turn on the memory leak check.
  • MSAN: for the detection tool of uninitialized memory reading in the program, you can add - fsanitize=memory -fPIE -pie enable in the compilation command, and you can also add - fsanitize memory track origins option to trace back to the location where the memory is created
  • TSAN: a tool for detecting data competition between threads. Add - fsanitize=thread in the compilation command to enable it
    ASAN is the highlight we want to introduce today.

ASAN, full name AddressSanitizer, can be used to detect memory problems, such as buffer overflow or illegal access to dangling pointers.

According to Google engineers, ASAN has detected more than 300 potential unknown bug s in the chromium project, and the loss of program performance is also considerable when using ASAN as a memory error detection tool.

According to the test results, it may reduce the performance by about 2 times, an order of magnitude faster than Valgrind (the official data is about 10-50 times lower).

Moreover, compared with Valgrind, which can only check the out of bounds access of heap memory and the access of dangling pointers, ASAN can not only detect the out of bounds access of heap memory and the access of dangling pointers, but also detect the out of bounds access of stack and global objects.

This is also an important reason why ASAN is outstanding in the comparison of many memory detection tools. Basically, now C/C + + projects will use ASAN to ensure product quality, especially in large projects.

How to use ASAN

As such a powerful weapon, it will not fall out of favor on the battlefield of programmers.

From llvm3 1,GCC4.8,XCode7.0,MSVC16. Since 9, ASAN has become the built-in tool of many mainstream compilers. Therefore, it is also very convenient to use ASAN in the project.

Now you only need to add the - fsanitize=address detection option in the compilation command to let ASAN show its magic power in your project. Next, let's take a few examples to see what abilities ASAN has.

be careful:

  1. In the following example, the debug flag - g is turned on because when a memory error is found, the debug symbol can help the error report more accurately tell the stack information of the error location. If the stack information in the error report looks incorrect, try using - fno omit frame pointer to improve the generation of stack information.
  2. If the - fsanize = link option and the - fsanize = link option must be added separately during the compilation phase.

Detect memory leaks

// leak.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, const char *argv[]) {
    char *s = (char*)malloc(100);
    strcpy(s, "Hello world!");
    printf("string is: %s\n", s);
    return 0;
}

In the above code, we allocated 100 bytes of memory space, but it was never released before the main function returned. Next, we use ASAN to see if it can be detected. Add the - fsanitize=address -g parameter to build the code and execute:

~/Code/test$ gcc noleak.c -o noleak -fsanitize=address -g
~/Code/test$ ./leak 
string is: Hello world!

=================================================================
==1621572==ERROR: LeakSanitizer: detected memory leaks    // 1)

Direct leak of 100 byte(s) in 1 object(s) allocated from:   // 2)
    #0 0x7f5b986bc808 in __interceptor_malloc ../../../../src/libsanitizer/ASAN/ASAN_malloc_linux.cc:144
    #1 0x562d866b5225 in main /home/chenbing/Code/test/leak.c:7
    #2 0x7f5b983e1082 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: 100 byte(s) leaked in 1 allocation(s).

Here, the report provided by ASAN indicates that the cause of the error is detected memory leaks (1). At the same time, 2) it indicates that ASAN detects that the application has allocated 100 bytes, captures the stack information of the memory allocation location, and tells us that the memory is in the leak c: 7 assigned.

With such a detailed and accurate error report, isn't the memory problem less troublesome?

Detect dangling pointer access

// uaf.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, const char *argv[]) {
    char *s = (char*)malloc(100);
    free(s);
    strcpy(s, "Hello world!");  // use-after-free
    printf("string is: %s\n", s);
    return 0;
}

In the above code, we allocate 100 bytes of memory space, and then release it. Next, we write to the previously allocated memory address, which is a typical illegal access of dangling pointer. Similarly, let's use ASAN to see if it can be detected. Add the - fsanitize=address -g parameter to build the code and execute:

~/Code/test$ gcc uaf.c -o uaf -fsanitize=address -g
~/Code/test$ ./uaf 
=================================================================
==1624341==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b0000000f0 at pc 0x7f9f776bb58d bp 0x7fffabad8280 sp 0x7fffabad7a28    // 1)
WRITE of size 13 at 0x60b0000000f0 thread T0  // 2)
    #0 0x7f9f776bb58c in __interceptor_memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:790
    #1 0x55b9cf56e26d in main /home/chenbing/Code/test/uaf.c:9
    #2 0x7f9f77452082 in __libc_start_main ../csu/libc-start.c:308
    #3 0x55b9cf56e16d in _start (/home/chenbing/Code/test/uaf+0x116d)

0x60b0000000f0 is located 0 bytes inside of 100-byte region [0x60b0000000f0,0x60b000000154) // 3)
freed by thread T0 here:
    #0 0x7f9f7772d40f in __interceptor_free ../../../../src/libsanitizer/ASAN/ASAN_malloc_linux.cc:122
    #1 0x55b9cf56e255 in main /home/chenbing/Code/test/uaf.c:8
    #2 0x7f9f77452082 in __libc_start_main ../csu/libc-start.c:308

previously allocated by thread T0 here: // 4)
    #0 0x7f9f7772d808 in __interceptor_malloc ../../../../src/libsanitizer/ASAN/ASAN_malloc_linux.cc:144
    #1 0x55b9cf56e245 in main /home/chenbing/Code/test/uaf.c:7
    #2 0x7f9f77452082 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-use-after-free ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:790 in __interceptor_memcpy
Shadow bytes around the buggy address:  // 5)
  0x0c167fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c167fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c167fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c167fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c167fff8000: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
=>0x0c167fff8010: fd fd fd fd fd fa fa fa fa fa fa fa fa fa[fd]fd
  0x0c167fff8020: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
  0x0c167fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c167fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c167fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c167fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASAN internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==1624341==ABORTING

The error report looks long, but it's actually not complicated,

  • 1) Tell us the reason for the error is: heap use after free. It accesses the dangling pointer. The address of the memory is 0x60b00000000f0. At the same time, it also tells us the contents of PC, BP and SP registers when the error occurs. We can not care about these, because the following report allows us to ignore these registers to locate the problem.
  • Next are 2), 3) and 4), which respectively report the stack information and thread information of accessing the dangling pointer position, memory released position, memory allocation position. From 2), we can see that the error occurred in UAF Line 8 of the C file. Other parts of the report
  • 5) Provides the details of the shadow memory corresponding to the memory address accessed incorrectly, where fa represents the red zone of the heap memory and fd represents the heap memory that has been released.

Detect heap overflow

// overflow.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, const char *argv[]) {
    char *s = (char*)malloc(12);
    strcpy(s, "Hello world!");
    printf("string is: %s\n", s);
    free(s);
    return 0;
}

In the above code, we only allocated 2 bytes, but in the subsequent operation, we wrote 13 bytes of data (the string also contains \ 0 as the terminator). At this time, the writing of data obviously overflows the allocated memory block. Similarly, add the - fsanitize=address -g parameter to build the code and execute:

~/Code/test$ gcc overflow.c -o overflow -fsanitize=address -g
~/Code/test$ ./overflow 
=================================================================
==2172878==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000001c at pc 0x7f1cd3d3d58d bp 0x7ffee78e6500 sp 0x7ffee78e5ca8     //1)
WRITE of size 13 at 0x60200000001c thread T0        // 2)
    #0 0x7f1cd3d3d58c in __interceptor_memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:790
    #1 0x555593131261 in main /home/chenbing/Code/test/overflow.c:7
    #2 0x7f1cd3ad4082 in __libc_start_main ../csu/libc-start.c:308
    #3 0x55559313116d in _start (/home/chenbing/Code/test/overflow+0x116d)

0x60200000001c is located 0 bytes to the right of 12-byte region [0x602000000010,0x60200000001c)    // 3)
allocated by thread T0 here:
    #0 0x7f1cd3daf808 in __interceptor_malloc ../../../../src/libsanitizer/ASAN/ASAN_malloc_linux.cc:144
    #1 0x555593131245 in main /home/chenbing/Code/test/overflow.c:6
    #2 0x7f1cd3ad4082 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:790 in __interceptor_memcpy
Shadow bytes around the buggy address:      // 4)
  0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa 00[04]fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASAN internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==2172878==ABORTING

The above report is similar to the error report of accessing dangling pointers, and the same is true

1) Tell us the cause of the error is: heap buffer overflow. The heap memory overflows. The address of the memory is 0x60200000001c.

2) Describes the location stack where writing data causes overflow,

3) Is the corresponding memory allocation location stack, 4) or shadow memory snapshot.

new/delete mismatch in C + +

// bad_delete.cpp
#include <iostream>
#include <cstring>

int main(int argc, const char *argv[]) {
    char *cstr = new char[100];
    strcpy(cstr, "Hello World");
    std::cout << cstr << std::endl;

    delete cstr;
    return 0;
}

This code allocates a piece of memory through the new [] keyword, but uses the delete heap memory to release before the function returns, rather than delete [], which will cause the allocated memory not to be fully released. Still, add the - fsanitize=address -g parameter to build the code and execute:

~/Code/test$ g++ bad_delete.cpp -o bad_delete -fsanitize=address -g
~/Code/test$ ./bad_delete 
Hello World
=================================================================
==2180936==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60b0000000f0     // 1
    #0 0x7fa9f877cc65 in operator delete(void*, unsigned long) ../../../../src/libsanitizer/ASAN/ASAN_new_delete.cc:177
    #1 0x55d09d3fe33f in main /home/chenbing/Code/test/bad_delete.cpp:10
    #2 0x7fa9f8152082 in __libc_start_main ../csu/libc-start.c:308
    #3 0x55d09d3fe20d in _start (/home/chenbing/Code/test/bad_delete+0x120d)

0x60b0000000f0 is located 0 bytes inside of 100-byte region [0x60b0000000f0,0x60b000000154)       // 2
allocated by thread T0 here:
    #0 0x7fa9f877b787 in operator new[](unsigned long) ../../../../src/libsanitizer/ASAN/ASAN_new_delete.cc:107
    #1 0x55d09d3fe2e5 in main /home/chenbing/Code/test/bad_delete.cpp:6
    #2 0x7fa9f8152082 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: alloc-dealloc-mismatch ../../../../src/libsanitizer/ASAN/ASAN_new_delete.cc:177 in operator delete(void*, unsigned long)
==2180936==HINT: if you don't care about these errors you may set ASAN_OPTIONS=alloc_dealloc_mismatch=0
==2180936==ABORTING

This error report is much more concise than the above two, but the information provided is completely enough to locate the problem:

1) The error type is reported: alloc dealloc mismatch. The allocation and release operations do not match. The address of the memory is 0x60b00000000f0,

2) Is the corresponding memory allocation location stack. The report will not explicitly tell the wrong location that the memory should be released using delete [] because the allocated and released keywords in C + + can be rewritten or the keywords that do not match in other specific scenarios can also completely free the memory.

Therefore, ASAN cannot guarantee that the alloc dealloc mismatch will meet the user's expectations. Therefore, in this report, ASAN explains that if this is a false positive error for the user, ASAN can be used_ OPTIONS=alloc_ dealloc_ Mispatch = 0 to disable the triggering of the report,

For example:

~/Code/test$ ASAN_OPTIONS=alloc_dealloc_mismatch=0 ./bad_delete 
Hello World

ASAN is added when executing the code above_ OPTIONS=alloc_ dealloc_ Mismatch = 0 parameter, so ASAN will not consider alloc dealloc mismatch as an error and issue an error report.

Detect stack overflow

// sbo.c
#include <stdio.h>

int main(int argc, const char *argv[]) {
    int stack_array[100];
    stack_array[101] = 1;
    return 0;
}

In the above code, we created an array with a capacity of 100 on the stack, but in the subsequent write operation, we wrote data on the address exceeding the data capacity, resulting in stack overflow. Add the - fsanitize=address -g parameter to build the code and execute:

~/Code/test$ g++ sbo.c -o sbo -fsanitize=address -g
chenbing@GreatDB-CB:~/Code/test$ ./sbo 
=================================================================
==2196928==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffc33777f24 at pc 0x562dccb592b6 bp 0x7ffc33777d40 sp 0x7ffc33777d30    1)
WRITE of size 4 at 0x7ffc33777f24 thread T0
    #0 0x562dccb592b5 in main /home/chenbing/Code/test/sbo.c:6
    #1 0x7f45bf52d082 in __libc_start_main ../csu/libc-start.c:308
    #2 0x562dccb5910d in _start (/home/chenbing/Code/test/sbo+0x110d)

Address 0x7ffc33777f24 is located in stack of thread T0 at offset 452 in frame    2)
    #0 0x562dccb591d8 in main /home/chenbing/Code/test/sbo.c:4

  This frame has 1 object(s):     3)
    [48, 448) 'stack_array' (line 5) <== Memory access at offset 452 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork  4)
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /home/chenbing/Code/test/sbo.c:6 in main
Shadow bytes around the buggy address:    5)
  0x1000066e6f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000066e6fa0: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
  0x1000066e6fb0: f1 f1 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000066e6fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000066e6fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1000066e6fe0: 00 00 00 00[f3]f3 f3 f3 f3 f3 f3 f3 00 00 00 00
  0x1000066e6ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000066e7000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000066e7010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000066e7020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000066e7030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASAN internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==2196928==ABORTING

The contents of this report are basically similar to those of the previous reports. There is no more explanation here. Let's focus on several different places,

3) It shows that the offset range of the stack object in the function stack area is [48, 448) (closed on the left and open on the right), while the position accessed through the stack object in the code is 512, resulting in stack overflow.

There is another point to note: the report mentions a stack overflow scenario with possible false positives: if the program uses some special stack expansion mechanism, swapcontext or vfork may have false positives. For more instructions on false positives, please refer to the following two issue s:

Detect global buffer overflow

// gbo.c
#include <stdio.h>

int global_array[100] = {-1};

int main(int argc, char **argv) {
  global_array[101] = 1;
  return 0;
}

The above code is similar to the code of the stack overflow case, except that we create an array with a capacity of 100 on the global data segment, and then add the - fsanitize=address -g parameter to build the code and execute:

~/Code/test$ g++ gbo.c -o gbo -fsanitize=address -g
~/Code/test$ ./gbo 
=================================================================
==2213117==ERROR: AddressSanitizer: global-buffer-overflow on address 0x558855e231b4 at pc 0x558855e20216 bp 0x7ffd9569d280 sp 0x7ffd9569d270
WRITE of size 4 at 0x558855e231b4 thread T0
    #0 0x558855e20215 in main /home/chenbing/Code/test/gbo.c:7
    #1 0x7efd3da4f082 in __libc_start_main ../csu/libc-start.c:308
    #2 0x558855e2010d in _start (/home/chenbing/Code/test/gbo+0x110d)

0x558855e231b4 is located 4 bytes to the right of global variable 'global_array' defined in 'gbo.c:4:5' (0x558855e23020) of size 400
SUMMARY: AddressSanitizer: global-buffer-overflow /home/chenbing/Code/test/gbo.c:7 in main
Shadow bytes around the buggy address:
  0x0ab18abbc5e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ab18abbc5f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ab18abbc600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ab18abbc610: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ab18abbc620: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0ab18abbc630: 00 00 00 00 00 00[f9]f9 f9 f9 f9 f9 00 00 00 00
  0x0ab18abbc640: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
  0x0ab18abbc650: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ab18abbc660: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ab18abbc670: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ab18abbc680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASAN internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==2213117==ABORTING

The above report is basically the same as the stack overflow case report, except for the report method of error type and global object code location, which will not be introduced here.

Well, let's stop here about the use cases of ASAN. You can go to them for more information ASAN In the project

Basic principles of ASAN

ASAN's memory detection method is very similar to Valgrind's AddrCheck tool. It uses shadow memory to record whether each byte of the application can be accessed safely. When accessing memory, it checks its mapped shadow memory.

However, ASAN uses a more efficient shadow memory mapping mechanism and more compact memory coding. In addition to heap memory, ASAN can also detect incorrect access in stack and global objects, and it is one order of magnitude faster than AddrCheck.

ASAN consists of two parts: code instrumentation module and runtime library.

  • The code instrumentation module will modify the code to check the access status of each memory block when accessing the memory, which is called shadow status, and create a redzone memory area on both sides of the memory.
  • The runtime library provides a set of interfaces to replace malloc, free and related functions, so as to create a redzone around the heap space when allocating it, and report errors when memory errors occur.

First, let's introduce what shadow memory and redzone are.

  • shadow memory

    In ASAN, the memory address returned by malloc function is usually aligned with at least 8 bytes. For example, malloc(15) will allocate 2 blocks of 8-byte memory. In this scenario, the first 5 bytes of the second 8-byte memory are accessible, but the remaining 3 bytes are not accessible.

    The so-called shadow memory is to reserve an address space in the virtual address space of the application to store the information that maps which bytes in the memory block accessed by the application can be used. This information is the shadow state. Each byte of shadow memory is mapped to 8 bytes of application memory. Therefore, there may be three shadow states:

    1. 0: all 8 bytes of the mapping can be used
    2. K (1 < = k < = 7): indicates that only the first k bytes of the mapped 8 bytes can be used
    3. Negative value: indicates that the mapped 8 bytes are unusable, and different values indicate different memory types mapped (heap, stack, global object or released memory)

      ASAN uses direct mapping with scale and offset to convert the application address to its corresponding shadow memory address:

      shadow_address = (addr >> 3) + offset

      Assuming that max - 1 is the most effective address in the virtual address space, the value of offset should be the area from offset to offset+Max/8 that is not occupied at startup.

    • In 32-bit linux system, the virtual address space is 0x00000000-0xffffffff, offset = 0x20000000(2^29).
    • On 64 bit systems, ofsset = 0x000011000000000 (2 ^ 44).
    • In some cases (for example, using the - fPIE/-pie compiler flag on Linux), zero offsets can be used to further simplify detection.

    The following is the address space distribution in 32-bit linux system

    0x1 0000 0000 ---------------
                  |   HIGH      |
                  |   MEMORY    |
      0x4000 0000 ---------------
                  | HIGH SHADOW |
      0x2800 0000 ---------------
                  | BAD REGION  |
      0x2400 0000 ---------------
                  | LOW SHADOW  |
      0x2000 0000 ---------------
                  | LOW MEMORY  |
      0x0000 0000 ---------------
      

    The virtual address space is divided into high and low parts, and the memory address of each part is mapped to the corresponding shadow memory. Note: mapping the address in shadow memory will get the address in Bad area, which is the address space marked as inaccessible by page protection.

    The shadow mapping method can be deduced in the form of (addr > > scale) + offset. The value range of scale is 1 ~ 7. When scale=N, the shadow memory occupies 1 / 2^N of the virtual address space, and the minimum size of red zone is 2^N bytes (ensuring the alignment requirements of malloc()). Each byte in shadow memory describes the state of 2^N memory bytes and has 2^N + 1 different values.

  • redzone

    ASAN will allocate additional memory around the memory of heap, stack and global objects used by the application. This additional memory is called redzone. Redzone will be marked as unusable by shadow memory. When the application accesses redzone memory, it indicates that it has overflowed. In this case, ASAN will report the corresponding error after detecting the shadow state of redzone. The larger the readzone, the larger the range of memory underflow and overflow detection. The specific allocation strategy will be covered below.

Code stake

ASAN will insert stakes at the location where the application accesses the memory. For the location where the full 8-byte memory is accessed, insert the following code to check the shadow memory corresponding to the memory, so as to judge whether the access is abnormal:

ShadowAddr = (Addr >> 3) + Offset;

if (*ShadowAddr != 0)
  ReportAndCrash(Addr);

Since the application accesses 8 bytes of memory, the storage value of its mapped shadow memory must be 0, indicating that the 8 bytes of memory are fully available. Otherwise, an error is reported.

The application program's access to 1, 2, or 4-byte memory is more complex. If the stored value of shadow memory corresponding to the accessed memory block is not negative and not 0, or the memory block to be accessed exceeds the available range represented by shadow memory, it means that unusable memory will be accessed this time:

ShadowAddr = (Addr >> 3) + Offset;
k = *ShadowAddr;
if (k != 0 && ((Addr & 7) + AccessSize > k))
  ReportAndCrash(Addr);

It should be noted that ASAN inserts the source code after LLVM compiles and optimizes the code, which means that ASAN can only detect the memory access that survives after LLVM optimization. For example, the code optimized by LLVM to access stack objects will not be recognized by ASAN.

At the same time, ASAN will not post the memory access code generated by LLVM, such as register overflow check, etc.

In addition, even if the error reporting code ReportAndCrash(Addr) will only be called once, the error reporting code must be quite compact because it will be inserted in many places in the code.

At present, ASAN uses a simple function call to handle error reports. Of course, another option is to insert a hardware exception.

Runtime library

When the application starts, the entire shadow memory is mapped, so it cannot be used by other parts of the application. The BAD area is also protected and cannot be accessed by applications.

In the linux operating system, the shadow memory area will not be occupied, so the mapping is always successful. However, in MacOS, it may be necessary to disable address space layout (ASLR).

In addition, according to GOOGLE engineers, the layout of shadow memory area is also applicable to windows operating system.

When ASAN is enabled, malloc and free functions in the source code will be replaced by malloc and free functions in the runtime library.

The memory area allocated by malloc is organized into an array of free lists corresponding to the size of the object. When the free list corresponding to the requested memory size is empty, a memory area with a redzone is allocated from the operating system (for example, using mmap). n memory blocks, n+1 redzones will be allocated:

| redzone-1 | memory-1 | redzone-2 | memory-2 | redzone-3 |

The free function makes the entire memory area unusable and puts it in isolation, so that the area will not be immediately allocated to the application by malloc.

At present, the isolation area is implemented using a FIFO queue, which has a certain amount of memory at any time.

By default, malloc and free record the current call stack to provide more informative error reports. The malloc call stack is stored in the left redzone (the larger the redzone, the more frames can be stored), while the free call stack is stored at the beginning of the memory area itself.

By now, you should have understood how ASAN realizes the detection of dynamically allocated memory, but you may have doubts: dynamic allocation supports error detection by allocating redzone through malloc function. How do stack objects and global objects without malloc classified memory realize it? In fact, the principle is also very simple:

  • For global variables, redzone is created at compile time, and the address of redzone is passed to the runtime library when the application starts. The runtime library function makes redzone unusable and records the address for further error reporting.
  • For stack objects, redzones are created and made unusable at run time. Currently, 32 byte redzone is used. For example, the following code snippet:

    void foo() {
      char a[10];
      <function body> 
    }

    The codes processed by ASAN are roughly as follows:

    void foo() {
      char rz1[32]
      char arr[10];
      char rz2[32-10+32];
    
      unsigned * shadow = (unsigned*)(((long)rz1>>8)+Offset);
    
      // Set redzone to unavailable
      shadow[0] = 0xffffffff; // rz1
      shadow[1] = 0xffff0200; // arr and rz2
      shadow[2] = 0xffffffff; // rz2
    
      <function body>
    
      // Set all memory available
      shadow[0] = shadow[1] = shadow[2] = 0; 
    }

summary

ASAN uses shadow memory and redzone to provide accurate and immediate error detection.

The traditional view is that shadow memory and redzone either generate high overhead through multi-level mapping scheme, or occupy a lot of program memory. However, the shadow mapping mechanism and shadow state coding used by ASAN reduce the occupation of memory space.

Finally, if you think ASAN instrumentation code and detection are too slow for some of your code, you can use the compiler flag to disable the of specific functions to make ASAN skip instrumentation and detection of a function in the code. The compiler instructions to skip analysis functions are:

__attribute__((no_sanitize_address))

Enjoy GreatSQL :)

Article recommendation:

GreatSQL for financial applications is officially open source
https://mp.weixin.qq.com/s/cI...

Changes in GreatSQL 8.0.25 (2021-8-18)
https://mp.weixin.qq.com/s/qc...

MGR and GreatSQL resource summary
https://mp.weixin.qq.com/s/qX...

GreatSQL MGR FAQ
https://mp.weixin.qq.com/s/J6...

Compile and install GreatSQL/MySQL under Linux
https://mp.weixin.qq.com/s/WZ...

#About GreatSQL

GreatSQL is a MySQL branch maintained by Wanli database. It focuses on improving the reliability and performance of MGR and supports the parallel query feature of InnoDB. It is a branch version of MySQL suitable for financial applications.

Gitee:

https://gitee.com/GreatSQL/Gr...

GitHub:

https://github.com/GreatSQL/G...

Bilibili:

https://space.bilibili.com/13...

Wechat & QQ group:

You can search and add GreatSQL community assistant wechat friends, send verification information "add group" to join GreatSQL/MGR communication wechat group

QQ group: 533341697

Wechat assistant: wanlidbc

This article is composed of blog one article multi posting platform OpenWrite release!

Tags: Database MySQL SQL

Posted by realnsleo on Wed, 11 May 2022 07:47:07 +0300