Based on ARM assembly

ARM is a reduced instruction set processor. The design of its instruction set is fixed length, that is, the machine code corresponding to its assembly is fixed length (2 bytes or 4 bytes). For fixed length, the advantage is that it can be executed faster, because the speed of instruction decoding of CPU is faster than that of x86 CPU, but the disadvantage is also obvious. After all, fixed length means that its instructions are limited (there are few instructions), so it will reduce the complexity of software to a certain extent (it takes several instructions to complete a function).

At the beginning, the instruction set of arm was designed with a fixed length of 2 bytes. With the development of time, it will be found that the instructions may not be expressed enough, so a 4-byte instruction set appeared later. The 2-byte instruction set is called thumb instruction set, while the 4-byte instruction set is called arm instruction set. At this time, we can choose freely. If we need small volume, we can choose thumb compilation, and if we need better performance, we can choose arm instruction set.

However, for some instructions, such as addition, there is little difference between using 2-byte and 4-byte instructions, so there is thumb2 instruction set. Its instruction set is longer (either 2-byte or 4-byte), which is a collection of the advantages of the above two instruction sets. Now ndk compiler can only compile thumb2 instruction set arm instruction set.

Let's talk about the working state of ARM processor, that is, how the CPU decodes instructions according to a few bytes. After all, there are multiple sets of instruction sets

1.ARM state
2.Thumb state

So how to indicate the status of the CPU instruction set? In fact, there is a bit in the flag register to indicate which state. See the flag register below for details.

Now we can talk about registers, as shown in the following figure

The above figure shows the access to registers in different modes. What is the working mode of CPU? In short, it means the division of permission levels under x86, that is, how many rings. There are seven modes in arm. In fact, there are still two modes in essence, that is, three rings and 0 rings. Only the 0 ring is split more carefully, so there are so many modes.

For the registers of ARM, at present, we only need to care about the user mode. We only need to remember the following two kinds. A total of 17 registers are available

1.General purpose register R0-R15
2.Flag register cpsr

For registers in other modes, there will be corresponding registers, that is, the darker ones in the figure above, but the access to registers is the same. For example, when entering the management mode, we use SPSP register to access SPSP_svc register. In the abort mode, we only need to use the SPSP register, but at this time, we essentially access the SPSP_abt register, no need to care about the real register changes and names.

It can also be found from the above figure that the r0-r7 and R15 registers are universal in any mode.

Now let's talk about general registers and program counters in a little more detail. The name of the register in the above figure can be found in the assembly, followed by a bracket, indicating the register alias. The following two instructions are equivalent

mov r15,#0
mov pc,#0

Generally speaking, the three registers R13, R14 and R15 are encoded by aliases, because they are more readable (with special meaning), and the rest are represented by R0-R12

PC -  r15 Program register,be similar to eip
LR -  r14 Link register,be similar to CALL instructions,The address of the next instruction is r14
SP -  r13 Stack pointer

r13 and r15 should be well understood. For r14, using the call instruction in x86 will push the return address into the stack, and here it means that the return address is in the LR register. However, there is only one register, indicating that when there are function calls inside the function, we first need to save the LR register to prevent the register value from being overwritten.

For function return, it is relatively simple at this time. You only need the following assembly instructions to return

mov pc,lr

Now let's talk about the flag register

N : When this bit is 1, it indicates a negative number,0 indicates a positive number
Z : When this bit is 1, it means that the two numbers are equal,When it is 0, it means that the two numbers are not equal
C : When this bit is 1,If it is an addition operation, it means that carry is generated,Otherwise, the bit is 0 (If there is a borrow bit in the subtraction operation, the position is 0,Otherwise, it is 1)
V : Use addition/Subtraction operation,Indicates a signed overflow,Otherwise, the bit is 0
I : When this bit is 1,IRQ Interrupt prohibited
F : When this bit is 1,FIQ Interrupt prohibited
T : When this bit is 1,Processor in Thumb Run in state;When this bit is 0,Processor in ARM Run in state

The key point is that for C-bit, the value produced by addition operation and subtraction operation is different. For the T-bit, it is said that what kind of working state the CPU uses.

The lowest five bits indicate the working mode processed by the CPU, that is, it is used to represent the seven modes (permissions) mentioned above. For example, the user mode represents 10000.

For coding, generally we only need to care about the highest four bits and T bits.


Now we can start to write the ARM assembly code. First, we can write a C code, and then compile an assembly code with ndk to observe its framework


int main(int argc,char* argv[])
    puts("hello arm");
    return 0;

The commands for compiling and generating assembly are as follows:

	armv7a-linux-androideabi22-clang -E Hello.c -o Hello.i
	armv7a-linux-androideabi22-clang -S Hello.i -o Hello.s

Note that the environment variables of ndk need to be added here. My directory here is as follows. You can modify and add them according to the directory


Generated s file is the corresponding assembly file. Since the assembly code generated here has many useless pseudo instructions (. Instructions at the beginning), I will simplify it. The simplified assembly code is as follows:

	.text  @Code snippet
	.fpu	neon       @floating-point coprocessor  soft Represents software simulation
	.globl	main       @Global symbol is main
	.p2align	2  @Alignment value,Indicates 2^n
	.type	main,%function    @to specify main Type of,Here is a function
	.code	32                @32 express arm Instruction set,16 express thumb Instruction set
	push	{r11, lr}
	mov	r11, sp
	sub	sp, sp, #16
	ldr	r2, .LCPI0_0
	add	r2, pc, r2
	mov	r0, r2
	bl	puts
	movw	r1, #0
	mov	r0, r1
	mov	sp, r11
	pop	{r11, pc}
	.p2align	2
	.long	.L.str-(.LPC0_0+8)  @Define data
	.size	main, .Lfunc_end0-main  @Size of function

	.type	.L.str,%object          @ @.str
	.section	.rodata.str1.1,"aMS",%progbits,1  @definition rodata paragraph
	.asciz	"hello arm"  @character string
	.size	.L.str, 10

	.section	".note.GNU-stack","",%progbits @Indicates that the stack cannot execute code

In ARM, @ is used for annotation. The middle part is the function code of main. There are not too many annotations. The remaining pseudo instructions are basically annotated later. Next, compile and run first to see whether the result is correct. The compile and run commands are as follows. Note that you need to start an Android simulator or real machine first

	//Compile links to generate executables
	armv7a-linux-androideabi22-clang -c Hello.s -o Hello.o
	armv7a-linux-androideabi22-clang Hello.o -o Hello

	//Run executable
	adb push Hello /data/local/tmp
	adb shell chmod 777 /data/local/tmp/Hello
	adb shell /data/local/tmp/Hello

Let's explain the assembly code above. First, we can use it section to define a segment. The following segments are scheduled in the system

.text: Represents a code snippet
.data: Represents the initialized data segment
.bss: Represents an uninitialized data segment
.rodata: Represents a read-only data segment

Let's look at the alignment value pseudo instruction. There can't be a 1-byte instruction set in the ARM instruction set, so the minimum instruction is 2 bytes, and any segment is the nth power of 2. The assembly code above represents 4-byte alignment.

. code pseudo instruction. If it is 32, it means the arm instruction set, that is, the generated bytecode is 4 bytes, while 16 means the thumb2 instruction set. Here you can change it to 16, and then use IDA disassembly to observe the bytecode.

The following is the assembly code of the function. First, look at the beginning and end. It is very similar to the assembly function framework in x86. It saves the environment and prompts the operation of the stack

push {r11, lr} @preservation r11 and lr register,Because there are function calls inside,lr Represents the return address of the function,So it needs to be saved
mov	r11, sp    @preservation sp
sub	sp, sp, #16  @Lift stack,The immediate number needs to be preceded by#


mov	sp, r11   @Restore stack
pop	{r11, pc} @Restore environment,Here originally lr Save values to pc,Indicates return

It should be noted that the previously saved lr value is finally given to the pc, which is equivalent to the return mentioned above

mov pc,lr @lr The return address of the function is saved in,Here it is pc Equivalent to return

Let's look at the assembly code in the middle, that is, calling the puts function

	ldr	r2, .LCPI0_0
	add	r2, pc, r2
	mov	r0, r2
	bl	puts  @bl The instruction represents a function call

For the puts function, what needs to be passed should be an address value, that is, the first address of the string. For the ARM assembly, mov has no way to operate the memory, and only the str/ldr instruction is used to store and read the memory. Since an address value may take up 4 bytes, it may not be directly represented, so we need to calculate the address of the string ourselves

pc + offset

We can use the address value of the current instruction plus the offset of the. Here, we can calculate the first address of the string, so LCPI0_0 actually stores the offset, and then uses the ldr instruction to load the offset value. The next add instruction plus pc is the result.

	.long	.L.str-(.LPC0_0+8)  @Define data  .L.str-.LPC0_0-8

. long means to define 4 bytes of data, which we can also use Byte and short to define single byte and double byte data respectively.

It's just strange that for the offset value, why do you need to subtract 8 at last? Here, because in the ARMCPU, the pc points not to the current instruction, but to the next instruction, that is, the three-level pipeline.

100: mov r0,pc  @r0=108
104: mov r1,r2
108: mov r2,r3

Suppose that the result of r0 execution of the above code is not 100, but 108. See the following figure for specific reasons

Therefore, because the PC points to the next instruction, in the arm instruction, each instruction occupies 4 bytes, so after subtracting 8, its offset value + PC is normal. So if for If the value of the code pseudo instruction is changed to 16, then the value needs to be changed to 4, so as to ensure the correctness of the offset value.

OK, finally, let's talk about the calling convention of the function. The convention here is relatively simple. The first four registers use R0~R3 to pass parameters, and the rest use stack to pass parameters. Therefore, you will find that the address value of the string is finally assigned to r0.

Let's add a function and try it ourselves

	.fpu	neon
	.globl	main                    @ -- Begin function main
	.p2align	2
	.type	main,%function
	.code	32                      @ @main
	push	{r11, lr}
	mov	r11, sp
	sub	sp, sp, #16
	mov r0,#5 @ transfer parameter I
	mov r1,#8 @ transfer parameter II
	bl MyAdd   @Call function
	ldr	r2, .LCPI0_0
	add	r2, pc, r2
	movw	r3, #0
	mov	r0, r2  @Parameter 1 Parameter 2 in MyAdd In the return value of,nothing less than r1
	bl	printf  @Modify call printf function
	movw	r1, #0
	str	r0, [sp]                @ 4-byte Spill
	mov	r0, r1
	mov	sp, r11
	pop	{r11, pc}
	.p2align	2
	.long	.L.str-(.LPC0_0+8)
	.size	main, .Lfunc_end0-main
	.type MyAdd,%function
	.code 32
MyAdd:  @Added add function
	add r1,r0,r1  @Here, my return value is placed in r1
	mov pc,lr  @return
	.size MyAdd,.LMyAdd_end-MyAdd
	.type	.L.str,%object          @ @.str
	.section	.rodata.str1.1,"aMS",%progbits,1
	.asciz	"hello arm:%d"  @Modify string
	.size	.L.str, 13

	.section	".note.GNU-stack","",%progbits

In the above code, for the Add function, I'm just lazy. I put the return value in the r1 register, because r1 is not modified after returning. In this way, r1 will be automatically taken as the second parameter when I call the printf function again.

The results are as follows:

adb push Hello /data/local/tmp
Hello: 1 file pushed. 0.7 MB/s (6560 bytes in 0.009s)
adb shell chmod 777 /data/local/tmp/Hello
adb shell /data/local/tmp/Hello
hello arm:13

Finally, the printed result is displayed. 13 is correct, indicating that both functions have been called correctly.

Tags: ARM

Posted by medar on Tue, 24 May 2022 22:31:35 +0300