Graphic packing of GoLang
1. Problem introduction
We have introduced the data structure of the interface. Whether it is a null interface or a non null interface, they are essentially two pointers: one is related to the type metadata and the other is related to the data loaded by the interface.
There is a question that needs to be explored carefully: the data field is a pointer, so how does it accept assignments from a value type? Like the following two pieces of code:
(1) Assign an integer n to the empty interface type variable e1:
n := 10 var e1 interface{} = n
(2) Assign the address of an integer variable to the empty interface type variable e2
n := 10 var e2 interface{} = &n
Now, please think: what do e1 and e2 look like respectively?
e2 is relatively simple_ Type refers to int * type metadata, and data stores the address of variable n.
e1 is different_ There is no problem with type pointing to int type metadata, but what does the data pointer store? n cannot be stored because it is an integer and data is a pointer.
Since it's hard to guess, let's decompile and explore it to see what data stores with our own eyes?
2. Problem exploration
func v2e(n int) (e interface{}) { e = n return }
Decompile the above code to obtain the assembly code as follows:
$ go tool objdump -S -s '^gom.v2e$' eface.o TEXT gom.v2e(SB) gofile..eface.go func v2e(n int) (e interface{}) { 0xb0a 65488b0c2528000000 MOVQ GS:0x28, CX 0xb13 488b8900000000 MOVQ 0(CX), CX [3:7]R_TLS_LE 0xb1a 483b6110 CMPQ 0x10(CX), SP 0xb1e 763c JBE 0xb5c 0xb20 4883ec18 SUBQ $0x18, SP 0xb24 48896c2410 MOVQ BP, 0x10(SP) 0xb29 488d6c2410 LEAQ 0x10(SP), BP e = n 0xb2e 488b442420 MOVQ 0x20(SP), AX 0xb33 48890424 MOVQ AX, 0(SP) 0xb37 e800000000 CALL 0xb3c [1:5]R_CALL:runtime.convT64 0xb3c 488b442408 MOVQ 0x8(SP), AX return 0xb41 488d0d00000000 LEAQ 0(IP), CX [3:7]R_PCREL:type.int 0xb48 48894c2428 MOVQ CX, 0x28(SP) 0xb4d 4889442430 MOVQ AX, 0x30(SP) 0xb52 488b6c2410 MOVQ 0x10(SP), BP 0xb57 4883c418 ADDQ $0x18, SP 0xb5b c3 RET func v2e(n int) (e interface{}) { 0xb5c e800000000 CALL 0xb61 [1:5]R_CALL:runtime.morestack_noctxt 0xb61 eba7 JMP gom.v2e(SB)
The length of the code is not too long, but it is easier to understand by converting it into equivalent pseudo code:
func v2e(n int) (e eface) { entry: gp := getg() if SP <= gp.stackguard0 { goto morestack } e.data = runtime.convT64(n)//Key 1 e._type = &type.int //Key 2 return morestack: runtime.morestack_noctxt() goto entry }
Ignoring the code related to stack growth, what we are really interested in is the two lines of code assigned to the two members of e:
1. First, the value of variable n is used as a parameter to call runtime Convt64 and assign the return value to e.data, so the data stores runtime Return value of convt64.
2. Put type The address of int is assigned to E_ type. It's easy to understand. Next, we'll look at runtime The logic of convt64:
func convT64(val uint64) (x unsafe.Pointer) { if val < uint64(len(staticuint64s)) { x = unsafe.Pointer(&staticuint64s[val]) } else { x = mallocgc(8, uint64Type, false) *(*uint64)(x) = val } return }
Main logic: when the value of val is less than the length of staticuint64s, the address of item val in staticuint64s is returned directly. Otherwise, allocate a uint64 through mallocgc, assign the value of val to it and return its address. This staticuint64s is a uint64 array with a length of 256. The value of each element is consistent with the subscript. It stores 256 values from 0 to 255. It is mainly used to avoid frequent heap allocation of common numbers.
On the whole, the function of convT64 is actually to allocate a uint64 to the heap, assign the val parameter to it as the initial value, and then return its address.
3. Exploration conclusion
1. interface {} is designed as a container, but it is essentially a pointer, which can directly load the address. If it is used to load the value, the actual memory should be allocated elsewhere and the memory address should be stored here. (the function of convT64 is to allocate the memory space for storing the value. In fact, there are a series of such functions in runtime, such as convT32, convTstring, convTslice, etc.)
2. Through the optimization method of staticuint64s, it can be inferred in reverse that the value of uint64 allocated by convT64 is immutable at the semantic level and is a constant similar to const. This design is mainly designed to cooperate with interface {} to simulate "load value".
3. As for why this value cannot be modified, because interface {} is just a container. It supports loading and fetching data, but it does not support modifying directly in the container. This is somewhat similar to automatic boxing in Java and C # except that interface {} is a universal wrapper class.
4. Does value type packing necessarily lead to heap allocation?
This problem also needs to be verified. Now that we know that escaping will cause heap allocation, we can construct a value type boxing but not escaping scenario, which is the fn function in the following code:
func fn(n int) bool { return notNil(n) } func notNil(a interface{}) bool { return a != nil }
Inline optimization needs to be prohibited during compilation. The compiler can still determine that there is no escape through the code implementation of notNil function. Decompile fn to obtain the following assembly code:
$ go tool objdump -S -s '^gom.fn$' eface.o TEXT gom.fn(SB) gofile..eface.go func fn(n int) bool { 0xfd6 65488b0c2528000000 MOVQ GS:0x28, CX 0xfdf 488b8900000000 MOVQ 0(CX), CX [3:7]R_TLS_LE 0xfe6 483b6110 CMPQ 0x10(CX), SP 0xfea 764a JBE 0x1036 0xfec 4883ec28 SUBQ $0x28, SP 0xff0 48896c2420 MOVQ BP, 0x20(SP) 0xff5 488d6c2420 LEAQ 0x20(SP), BP return notNil(n) 0xffa 488b442430 MOVQ 0x30(SP), AX 0xfff 4889442418 MOVQ AX, 0x18(SP) 0x1004 488d0500000000 LEAQ 0(IP), AX [3:7]R_PCREL:type.int 0x100b 48890424 MOVQ AX, 0(SP) 0x100f 488d442418 LEAQ 0x18(SP), AX 0x1014 4889442408 MOVQ AX, 0x8(SP) 0x1019 e800000000 CALL 0x101e [1:5]R_CALL:gom.notNil 0x101e 0fb6442410 MOVZX 0x10(SP), AX 0x1023 88442438 MOVB AL, 0x38(SP) 0x1027 488b6c2420 MOVQ 0x20(SP), BP 0x102c 4883c428 ADDQ $0x28, SP 0x1030 c3 RET func fn(n int) bool { 0x1031 0f1f440000 NOPL 0(AX)(AX*1) 0x1036 e800000000 CALL 0x103b [1:5]R_CALL:runtime.morestack_noctxt 0x103b eb99 JMP gom.fn(SB)
Convert the above code into equivalent pseudo code:
Note the local variable v in the pseudo code, which is actually implicitly allocated by the compiler (not on the heap ~), and is used as the value copy of n.
func fn(n int) bool { entry: gp := getg() if SP <= gp.stackguard0 { goto morestack } v := n return notNil(eface{_type: &type.int, data: &v}) morestack: runtime.morestack_noctxt() goto entry }
Therefore, when the interface {} loads the value, it must copy a copy separately, instead of directly allowing the data to store the address of the original variable, but whether the heap allocation needs to be analyzed by escape: when the escape is involved after the value type is boxed, a series of convT functions in runtime will be used.