Graphic packing of GoLang

Graphic packing of GoLang

1. Problem introduction

We have introduced the data structure of the interface. Whether it is a null interface or a non null interface, they are essentially two pointers: one is related to the type metadata and the other is related to the data loaded by the interface.
There is a question that needs to be explored carefully: the data field is a pointer, so how does it accept assignments from a value type? Like the following two pieces of code:

(1) Assign an integer n to the empty interface type variable e1:

n := 10
var e1 interface{} = n

(2) Assign the address of an integer variable to the empty interface type variable e2

n := 10
var e2 interface{} = &n

Now, please think: what do e1 and e2 look like respectively?
e2 is relatively simple_ Type refers to int * type metadata, and data stores the address of variable n.
e1 is different_ There is no problem with type pointing to int type metadata, but what does the data pointer store? n cannot be stored because it is an integer and data is a pointer.
Since it's hard to guess, let's decompile and explore it to see what data stores with our own eyes?

2. Problem exploration

func v2e(n int) (e interface{}) {
  e = n
  return
}

Decompile the above code to obtain the assembly code as follows:

$ go tool objdump -S -s '^gom.v2e$' eface.o
TEXT gom.v2e(SB) gofile..eface.go
func v2e(n int) (e interface{}) {
  0xb0a       65488b0c2528000000      MOVQ GS:0x28, CX
  0xb13       488b8900000000          MOVQ 0(CX), CX          [3:7]R_TLS_LE
  0xb1a       483b6110                CMPQ 0x10(CX), SP
  0xb1e       763c                    JBE 0xb5c
  0xb20       4883ec18                SUBQ $0x18, SP
  0xb24       48896c2410              MOVQ BP, 0x10(SP)
  0xb29       488d6c2410              LEAQ 0x10(SP), BP
        e = n
  0xb2e       488b442420              MOVQ 0x20(SP), AX
  0xb33       48890424                MOVQ AX, 0(SP)
  0xb37       e800000000              CALL 0xb3c              [1:5]R_CALL:runtime.convT64
  0xb3c       488b442408              MOVQ 0x8(SP), AX
        return
  0xb41       488d0d00000000          LEAQ 0(IP), CX          [3:7]R_PCREL:type.int
  0xb48       48894c2428              MOVQ CX, 0x28(SP)
  0xb4d       4889442430              MOVQ AX, 0x30(SP)
  0xb52       488b6c2410              MOVQ 0x10(SP), BP
  0xb57       4883c418                ADDQ $0x18, SP
  0xb5b       c3                      RET
func v2e(n int) (e interface{}) {
  0xb5c       e800000000              CALL 0xb61              [1:5]R_CALL:runtime.morestack_noctxt
  0xb61       eba7                    JMP gom.v2e(SB)

The length of the code is not too long, but it is easier to understand by converting it into equivalent pseudo code:

func v2e(n int) (e eface) {
entry:
  gp := getg()
  if SP <= gp.stackguard0 {
    goto morestack
  }
  e.data = runtime.convT64(n)//Key 1
  e._type = &type.int //Key 2
  return
morestack:
  runtime.morestack_noctxt()
  goto entry
}

Ignoring the code related to stack growth, what we are really interested in is the two lines of code assigned to the two members of e:
1. First, the value of variable n is used as a parameter to call runtime Convt64 and assign the return value to e.data, so the data stores runtime Return value of convt64.
2. Put type The address of int is assigned to E_ type. It's easy to understand. Next, we'll look at runtime The logic of convt64:

func convT64(val uint64) (x unsafe.Pointer) {
  if val < uint64(len(staticuint64s)) {
    x = unsafe.Pointer(&staticuint64s[val])
  } else {
    x = mallocgc(8, uint64Type, false)
    *(*uint64)(x) = val
  }
  return
}

Main logic: when the value of val is less than the length of staticuint64s, the address of item val in staticuint64s is returned directly. Otherwise, allocate a uint64 through mallocgc, assign the value of val to it and return its address. This staticuint64s is a uint64 array with a length of 256. The value of each element is consistent with the subscript. It stores 256 values from 0 to 255. It is mainly used to avoid frequent heap allocation of common numbers.
On the whole, the function of convT64 is actually to allocate a uint64 to the heap, assign the val parameter to it as the initial value, and then return its address.

3. Exploration conclusion

1. interface {} is designed as a container, but it is essentially a pointer, which can directly load the address. If it is used to load the value, the actual memory should be allocated elsewhere and the memory address should be stored here. (the function of convT64 is to allocate the memory space for storing the value. In fact, there are a series of such functions in runtime, such as convT32, convTstring, convTslice, etc.)
2. Through the optimization method of staticuint64s, it can be inferred in reverse that the value of uint64 allocated by convT64 is immutable at the semantic level and is a constant similar to const. This design is mainly designed to cooperate with interface {} to simulate "load value".
3. As for why this value cannot be modified, because interface {} is just a container. It supports loading and fetching data, but it does not support modifying directly in the container. This is somewhat similar to automatic boxing in Java and C # except that interface {} is a universal wrapper class.

4. Does value type packing necessarily lead to heap allocation?

This problem also needs to be verified. Now that we know that escaping will cause heap allocation, we can construct a value type boxing but not escaping scenario, which is the fn function in the following code:

func fn(n int) bool {
  return notNil(n)
}

func notNil(a interface{}) bool {
  return a != nil
}

Inline optimization needs to be prohibited during compilation. The compiler can still determine that there is no escape through the code implementation of notNil function. Decompile fn to obtain the following assembly code:

$ go tool objdump -S -s '^gom.fn$' eface.o
TEXT gom.fn(SB) gofile..eface.go
func fn(n int) bool {
  0xfd6         65488b0c2528000000      MOVQ GS:0x28, CX
  0xfdf         488b8900000000          MOVQ 0(CX), CX          [3:7]R_TLS_LE
  0xfe6         483b6110                CMPQ 0x10(CX), SP
  0xfea         764a                    JBE 0x1036
  0xfec         4883ec28                SUBQ $0x28, SP
  0xff0         48896c2420              MOVQ BP, 0x20(SP)
  0xff5         488d6c2420              LEAQ 0x20(SP), BP
        return notNil(n)
  0xffa         488b442430              MOVQ 0x30(SP), AX
  0xfff         4889442418              MOVQ AX, 0x18(SP)
  0x1004        488d0500000000          LEAQ 0(IP), AX          [3:7]R_PCREL:type.int
  0x100b        48890424                MOVQ AX, 0(SP)
  0x100f        488d442418              LEAQ 0x18(SP), AX
  0x1014        4889442408              MOVQ AX, 0x8(SP)
  0x1019        e800000000              CALL 0x101e             [1:5]R_CALL:gom.notNil
  0x101e        0fb6442410              MOVZX 0x10(SP), AX
  0x1023        88442438                MOVB AL, 0x38(SP)
  0x1027        488b6c2420              MOVQ 0x20(SP), BP
  0x102c        4883c428                ADDQ $0x28, SP
  0x1030        c3                      RET
func fn(n int) bool {
  0x1031        0f1f440000              NOPL 0(AX)(AX*1)
  0x1036        e800000000              CALL 0x103b             [1:5]R_CALL:runtime.morestack_noctxt
  0x103b        eb99                    JMP gom.fn(SB)

Convert the above code into equivalent pseudo code:
Note the local variable v in the pseudo code, which is actually implicitly allocated by the compiler (not on the heap ~), and is used as the value copy of n.

func fn(n int) bool {
entry:
  gp := getg()
  if SP <= gp.stackguard0 {
    goto morestack
  }
  v := n
  return notNil(eface{_type: &type.int, data: &v})
morestack:
  runtime.morestack_noctxt()
  goto entry
}

Therefore, when the interface {} loads the value, it must copy a copy separately, instead of directly allowing the data to store the address of the original variable, but whether the heap allocation needs to be analyzed by escape: when the escape is involved after the value type is boxed, a series of convT functions in runtime will be used.

Tags: Go Back-end programming language

Posted by themaxx113 on Thu, 12 May 2022 22:04:12 +0300