Lab1: exercise 3 - analyze the process of bootloader entering protection mode

Exercise 3: analyze the process of bootloader entering protection mode.

1. Subject requirements

The BIOS will read the main boot sector of the hard disk into the memory and jump to the corresponding location in the memory to execute the bootloader. Please analyze how bootloader completes the transition from real mode to protected mode.

Tip: you need to read the section "protection mode and segmentation mechanism" and lab1 / boot / bootasm S source code. To understand how to switch from real mode to protected mode, you need to know:

  • Why and how to turn on A20
  • How to initialize GDT table
  • How to enable and enter protection mode

2. Preparatory knowledge

bootloader actually performs some basic functions:

Turn on the protection mode of 80386, so that now the software enters the 32-bit addressing space, that is, the addressing mode has changed. In order to do this well, it also needs to turn on A20, initialize GDT table (Global Descriptor Table), enable and enter protected mode

After executing ljmp, the computer will enter 32-bit protection mode

(1) Compilation

Ucore uses at & T assembly format, which is used in at & T assembly format

Register name should be added '%' As a prefix;

use '$' The prefix represents an immediate operand;

.set symbol, expression
 take symbol The value of is set to expression

 cli Shield system interrupt
.code16  Since the code segment runs in real mode, tell the compiler to compile in 16 bit mode

grade:stay x86 In assembly code, a label consists of a unique name plus a colon. It can appear anywhere in the assembler and has the same address as the line immediately following it.
Generally speaking, when the program wants to jump to another position, there needs to be an identification to indicate the new position, which is the label. By putting a label in front of the target address, you can use the label in the instruction instead of using the address directly.
	inb $0x64,%al
	testb $0x2,%al
	jnz seta20.1

The destination operand is to the right of the source operand;
The word length of the operand is determined by the last letter of the operator, which is the suffix'b','w','l'Indicates that the operands are bytes respectively( byte,8 Bit), word( word,16 Bits) and long words( long,32 Bit);

(2) The role of bootloader

1. Close interrupt

2. A20 enable

3. Global descriptor table initialization

4. Protection mode start

5. Set segment register (long jump updates CS, and updates other segment registers according to the set segment selector)

6. Set stack, esp 0x700 ebp 0

7. After entering bootmain, read the kernel image into memory, check whether it is legal, start the operating system, and give it control

(3) Real mode

When the CPU is reset or powered on, it starts in real mode, and the processor works in real mode. In the real mode, the memory addressing mode is the same as that of 8086. The content of the 16 bit segment register is multiplied by 16 (10H) as the segment base address, plus the 16 bit offset address to form a 20 bit physical address. The maximum addressing space is 1MB and the maximum segment is 64KB. You can use 32-bit instructions. The 32-bit x86 CPU is used as a high-speed 8086. In real mode, all segments can be read, written and executed.

In real mode, the whole physical memory is regarded as a segmented area. The program code and data are located in different areas. The operating system and user program are not treated differently, and each pointer points to the actual physical address. In this way, if a pointer of the user program points to the operating system area or other user program area and modifies the content, the consequences are likely to be disastrous. The conversion from real mode to protected mode can be completed by modifying A20 address line.

(4) Protection mode

In real mode, the program address is the real physical address and can access any address space. In this way, different processes may access other process programs, resulting in serious errors. In the protected mode, the program address is the virtual address, and then the OS system manages the memory access permission, so that each process can only access the physical memory space allocated to itself, ensuring the security of the program. For example, the address access of Linux system adopts paging mechanism. When loading the program, the process allocated by the OS can access the physical page space, and set the page directory entry and page table entry to ensure the normal operation of the program. In this way, the running address of the program is indirectly managed by the OS to prevent the interaction between processes, which is guaranteed by the stability of the OS.


CR0 is the control register, which contains 6 predefined flags. Bit 0 is the protected enable bit PE (protected enable), which is used to start the protection mode. If PE is in position 1, the protection mode starts. If PE=0, it runs in real mode.

For details of CR0 and other control registers, please refer to the following links:

3. Experimental steps

(1) Code analysis

1,bootasm.S code

#notes:#include <asm.h>
asm.h The header file contains some macro definitions for defining gdt,gdt Is the global segment descriptor table used in protected mode, in which segment descriptors are stored.
 3 # Start the CPU: switch to 32-bit protected mode, jump into C.
# The BIOS loads this code from the first sector of the hard disk into
# memory at physical address 0x7c00 and starts executing in real mode
# with %cs=0 %ip=7c00.
This note describes the purpose to be completed: start the protection mode and switch to C Function.
It's just said here bootasm.S The role of documents. After the computer is powered on, the BIOS take bootasm.S The generated executable code is copied from the first sector of the hard disk to the physical address 0 in memory x7c00 place,And start execution.
At this time, the system is in real mode. Memory not available more than 1 M. 
.set PROT_MODE_CSEG,        0x8                     # kernel code segment selector
.set PROT_MODE_DSEG,        0x10                    # kernel data segment selector
 The function of these two segment selectors is to provide gdt Index of code and data segments in
.set CR0_PE_ON,             0x1                     # protected mode enable flag
 This variable is on A20 The flag of the address line, 1, is to turn on the protection mode
# start address should be 0:7c00, in real mode, the beginning address of the running bootloader
.globl start
These two lines of code are equivalent to defining C In language main Function, start It's equivalent to main,BIOS When calling the program, execute from here
.code16                                             # Assemble for 16-bit mode
 Because the following code is executed in real mode, tell the compiler to compile in 16 bit mode.
    cli                                             # Disable interrupts
    cld                                             # String operations increment
 Turn off the interrupt and set the string operation to the increment direction. cld The function of is to direct flag The flag bit is cleared, which means that the instruction of automatically increasing the source index and target index(as MOVS)They will be added at the same time.
    # Set up the important data segment registers (DS, ES, SS).
    xorw %ax, %ax                                   # Segment number zero
ax Register is eax The lower sixteen bits of the register, using xorw Clear ax,The effect is equivalent to movw $0, %ax.  But it seems xorw Better performance, google After a while, I didn't get a good answer
    movw %ax, %ds                                   # -> Data Segment
    movw %ax, %es                                   # -> Extra Segment
    movw %ax, %ss                                   # -> Stack Segment
 Clear segment selection sub zero
    # Enable A20:
    #  For backwards compatibility with the earliest PCs, physical
    #  address line 20 is tied low, so that addresses higher than
    #  1MB wrap around to zero by default. This code undoes this.
The preparations are ready. Let's start the real one. Activate it A20 Address bit. Due to the need to be compatible with early pc,The 20th bit of the physical address is bound to 0, so it is higher than 1 MB Your address is back to 0 x00000.
All right, activate A20 After, you can access all 4 G Memory, you can use protected mode.
How to activate it? For historical reasons A20 The address bits are managed by the keyboard controller chip 8042. So send a command to 8042 to activate A20
8042 There are two IO Port: 0 x60 And 0 x64, Activate process bit: send 0 xd1 Command to 0 x64 port --> Send 0 xdf To 0 x60,done!
    inb $0x64, %al                                  # Wait for not busy(8042 input buffer empty).
    testb $0x2, %al
    jnz seta20.1
#Before sending the command, wait for the keyboard input buffer to be empty, which is observed through the 2nd bit of the status register of 8042, and the value of the status register can be obtained by reading 0x64 port.
#The above instruction means that if the second bit of the status register is 1, it jumps to seta20 Execute at the 1 symbol, knowing that the second bit is 0, which means that the buffer is empty
    movb $0xd1, %al                                 # 0xd1 -> port 0x64
    outb %al, $0x64                                 # 0xd1 means: write data to 8042's P2 port
 Send 0 xd1 To 0 x64 port
    inb $0x64, %al                                  # Wait for not busy(8042 input buffer empty).
    testb $0x2, %al
    jnz seta20.2
    movb $0xdf, %al                                 # 0xdf -> port 0x60
    outb %al, $0x60                                 # 0xdf = 11011111, means set P2's A20 bit(the 1 bit) to 1
Here, A20 Activation complete.
    # Switch from real to protected mode, using a bootstrap GDT
    # and segment translation that makes virtual addresses
    # identical to physical addresses, so that the
    # effective memory map does not change during the switch.
To switch to protected mode, you need to specify a temporary GDT,To translate logical addresses. Used here GDT adopt gdtdesc Segment definition. The translated physical address is the same as the virtual address, so the memory mapping will not change during the conversion process
    lgdt gdtdesc
 load gdt
    movl %cr0, %eax
    orl $CR0_PE_ON, %eax
    movl %eax, %cr0
 Opening the protection mode flag bit is equivalent to pressing the protection mode switch. cr0 Bit 0 of the register is this switch, through CR0_PE_ON or cr0 Register, position 0 to 1
    # Jump to next instruction, but in 32-bit code segment.
    # Switches processor into 32-bit mode.
    ljmp $PROT_MODE_CSEG, $protcseg
 Since the above code has opened the protected mode, we need to use the logical address instead of the address of the previous real mode.
It's used here PROT_MODE_CSEG, His value is 0 x8. According to the format definition of segment selection sub, 0 x8 Translate into:
        INDEX         TI     CPL
0000 0000 1      00      0
INDEX representative GDT Index in, TI Representative use GDTR Medium GDT, CPL The representative is at the privilege level.
PROT_MODE_CSEG Selectors selected GDT The first segment descriptor in. Used here gdt It's a variable gdt. As you can see below gdt The base address of the first segment descriptor of is 0 x0000,Therefore, the physical address after mapping is the same as that before conversion.
.code32                                             # Assemble for 32-bit mode
    # Set up the protected-mode data segment registers
    movw $PROT_MODE_DSEG, %ax                       # Our data segment selector
    movw %ax, %ds                                   # -> DS: Data Segment
    movw %ax, %es                                   # -> ES: Extra Segment
    movw %ax, %fs                                   # -> FS
    movw %ax, %gs                                   # -> GS
    movw %ax, %ss                                   # -> SS: Stack Segment
Reinitialize each segment register.
    # Set up the stack pointer and call into C. The stack region is from 0--start(0x7c00)
    movl $0x0, %ebp
    movl $start, %esp
    call bootmain
 Stack top set at start Location, i.e. address 0 x7c00 Office, call Function puts the return address on the stack and gives control to bootmain
    # If bootmain returns (it shouldn't), loop.
    jmp spin
# Bootstrap GDT
.p2align 2                                          # force 4 byte alignment
    SEG_NULLASM                                     # null seg
    SEG_ASM(STA_X|STA_R, 0x0, 0xffffffff)           # code seg for bootloader and kernel
    SEG_ASM(STA_W, 0x0, 0xffffffff)                 # data seg for bootloader and kernel
    .word 0x17                                      # sizeof(gdt) - 1
    .long gdt                                       # address gdt

2,<asm. h> Content of

// Macro used to define segment descriptor
#ifndef __BOOT_ASM_H__
#define __BOOT_ASM_H__
// assembler macros to create x86 segments
// An empty segment descriptor is defined
#define SEG_NULLASM                                            \
        .word 0, 0;                                             \
        .byte 0, 0, 0, 0
//  Define a segment descriptor with type, base and lim as parameters, where 0xC0=(1100)2
//  The unit of 1 in the first 4G segment descriptor is 1
//  The second 1 corresponds to the D bit of the segment descriptor. Setting 1 indicates that it is a segment descriptor in protected mode
//  The specific format of segment descriptor is defined in MMU H medium
// The 0xC0 means the limit is in 4096-byte units
// and (for executable segments) 32-bit mode.
#define SEG_ASM(type,base,lim)                                  \
        .word (((lim) >> 12) & 0xffff), ((base) & 0xffff);      \
        .byte (((base) >> 16) & 0xff), (0x90 | (type)),         \
                (0xC0 | (((lim) >> 28) & 0xf)), (((base) >> 24) & 0xff)
//  Executable segment
#define STA_X     0x8       // Executable segment
//  Non executable segment
#define STA_E     0x4       // Expand down (non-executable segments)
//  Only executable segments
#define STA_C     0x4       // Conforming code segment (executable only)
//  A segment that can be written but cannot be executed
#define STA_W     0x2       // Writeable (non-executable segments)
//  Readable executable segment
#define STA_R     0x2       // Readable (executable segments)
//  Indicates whether the descriptor has been accessed; When the selection word is loaded into the segment register, this bit is marked as 1
#define STA_A     0x1       // Accessed

(2) Problem solving

1.1 why and how to turn on A20

First of all, this is a problem left over from history.

In August 1981, IBM initially launched the personal computer, and the CPU used by IBM PC was Inter 8088 There are only 20 address lines in this microcomputer. At that time, when the memory RAM was only a few hundred KB or less than 1MB, 20 address lines were enough to address these memories. The highest addressable address is 0xffff,

That is 0x10ffef. Addressing addresses exceeding 0x100000 (1MB) will be wrapped around 0xffef by default. When IBM introduced the AT machine in 1985, it used the Inter 80286 CPU, which has 24 address lines and can be addressed up to 16MB, and has a surround that realizes address addressing like 8088.

But at that time, some programs used this surround mechanism to work. In order to achieve full compatibility, IBM invented the use of a switch to turn on or disable 0x100000 address bits. Since there happened to be free port pins (output port P2, pin P21) on the 8042 keyboard controller at that time,

Therefore, this pin is used as an and gate to control this address bit. This signal is called A20. If it is zero, the addresses of bit 20 and above are cleared. Thus, compatibility is realized.

When A20 address line control is disabled, the program is like running on 8086. Addresses above 1MB are inaccessible, and only discontinuous addresses of odd MB can be accessed. In order to enable the addressing capability of all address bits, a command must be sent to the keyboard controller 8082. The keyboard controller 8042 will place the A20 line at the high potential to make all 32 address lines available and access 4GB memory.

1.2 specific steps for opening A20 Gate (refer to bootasm.S)

There are three methods to control A20 gate:

1.804x keyboard controller method

  2.Fast A20 method

  3.BIOS interrupt method

ucore experiment uses the first 804x keyboard controller method, which is also the oldest and slowest one.

Since the A20 address line is disabled by default when the machine is started, the operating system must use appropriate methods to turn it on.

  1. Wait until 8042 Input buffer is empty;
  2. Send the Write 8042 Output Port (P2) command to the 8042 Input buffer;
  3. Wait until 8042 Input buffer is empty;
  4. Get the second position 1 of byte from 8042 Output Port (P2), and then write it to 8042 Input buffer

The code to open A20 Gate is:

    inb $0x64, %al                                  # Wait for not busy(8042 input buffer empty).
    #Read one byte of data from 0x64 port into al
    testb $0x2, %al
    #If the second bit of al is found to be 0 in the above test, the instruction will not be executed
    jnz seta20.1
   #Cycle check
    movb $0xd1, %al                                 # 0xd1 -> port 0x64
    outb %al, $0x64                                 # 0xd1 means: write data to 8042's P2 port
    #Write data from al to port 0x64
    inb $0x64, %al                                  # Wait for not busy(8042 input buffer empty).
    testb $0x2, %al
    jnz seta20.2
    movb $0xdf, %al                                 # 0xdf -> port 0x60
    outb %al, $0x60                                 # 0xdf = 11011111, means set P2's A20 bit(the 1 bit) to 1

The first step is to send a command to the 0x64 port of the 804x keyboard controller. The command transmitted here is 0xd1, which means to write data to P2 of the keyboard controller. This is seta20 1 code snippet.

The second step is to write data to the P2 port of the keyboard controller. The method of writing data is to write the data through the 0x60 port of the keyboard controller. The data written is 0xdf, because A20 gate is contained in P2 port of keyboard controller. With the writing of 0xdf, A20 gate is opened.

The next thing to do is to enter the "protection mode".

2.1 what is a GDT table

The full name of GDT is Global Descriptor Table, and its Chinese name is "Global Descriptor Table". If you want to address memory in "protected mode", you need GDT first. Each item in the GDT table is called "segment descriptor", which is used to record some attribute information of each memory segment. Each "segment descriptor" occupies 8 bytes.

In protected mode, we divide the memory space into segments by setting GDT (these segments can overlap), so that different programs can access different memory spaces. This is different from the addressing mode in real mode. In real mode, we can only use address = segment < < 4 | offset for addressing (although it is also segment + offset, we will not really segment in real mode). In this case, any program can access the entire 1MB space. In protected mode, the program cannot access the whole memory space by segmentation

2.2 initialize GDT table

In order to make the segmented storage management mechanism work normally, it is necessary to establish the segment descriptor and segment descriptor table. The global descriptor table is an "array" that stores multiple segment descriptors, and its starting address is saved in the global descriptor table register GDTR. GDTR is 48 bits long, of which the high 32 bits are the base address and the low 16 bits are the segment boundary. Here, you only need to load the GDT table and its descriptor that have been statically stored in the boot area into the GDTR register:

lgdt gdtdesc#CPU A separate register is prepared for us called GDTR To save us GDT Location in memory and us GDT The length of the.#GDTR The register has a total of 48 bits, of which the upper 32 bits are used to store our data GDT In the memory location, the remaining lower 16 bits are used to store our data GDT How many segment descriptors are there. #16 The maximum number of bits can be 65536. Here, we change the unit into bytes, and a segment descriptor is 8 bytes, so GDT There can be up to 8192 segment descriptors.#The CPU not only uses a separate register GDTR to store our GDT, but also provides an instruction to let us pass the address and length of GDT to the GDTR register: lgdt gdtdesc

gdtdesc and gdt are placed together in bootasm Bottom of s file

# 16 Bit gdt size sizeof(gdt) - 1# Physical address of 32-bit gdt

The 48 bits are passed to the GDTR register, and the GDT is ready

3.1 how to enable and enter the protection mode

3.1.1 modify the PE value of CR0 register

Just as the switch A20 gate is responsible for turning on the memory addressing of more than 1MB, we also need to turn on a switch to enter the "protection mode". This switch is called "control register". There are four control registers of x86, namely CR0, CR1, CR2 and CR3 (these four registers are 32 bits), and the switch controlling the entry of "protection mode" is on CR0.  

CR0 contains six predefined flags. Bit 0 is the protected enabled bit PE (protected enable), which is used to start the protection mode. If PE is in position 1, the protection mode will start. If PE=0, it will run in real mode.

Bits related to protection mode on CR0, as shown in the figure:

The code to turn on protected mode is:

movl %cr0, %eax``orl $CR0_PE_ON, %eax``movl %eax, %cr0

Because we cannot directly operate CR0, we must first use a general register to save the value of the current CR0 register. The first line here is to use the general register eax to save the value of CR0 register;

Then cr0_ The macro PE is defined in MMU In the H file, there is a value 0x00000001. After "or" operation between this value and the value of cr0 register in eax, it is ensured that the 0th bit of cr0 is set to 1, that is, PE = 1, and the switch of protection mode is turned on.

PG = 0 in the 31st bit of cr0 means that we only use segmented mode and do not use paging. At this time, write the value in the newly calculated eax register back to cr0 register to complete the switching to protection mode.

3.1.2 update the base address of CS register through long jump
ljmp $PROT_MODE_CSEG, $protcseg

Where protcseg is a label (the purpose of the label has been explained in the relevant part of the experiment in this paper)

Since the protected mode has been enabled, we need to use the logical address instead of the address of the previous real mode
Pay attention to prot here_ MODE_ Cseg and PROT_MODE_DSEG, which are defined as 0x8 and 0x10 respectively, represents the selector of code segment and data segment.

According to the format definition of segment selection sub, 0x8 is translated into:

        INDEX TI CPL

        0000 0000 1000

INDEX stands for the INDEX in GDT, TI stands for using GDT in GDTR, and CPL stands for being at privilege level.

3.1.3 set segment register and build stack

Note that when the stack is built here, the ebp register is normally a stack frame, but it does not need to be set to 0x7c00 here, because 0x7c00 here is the highest address of the stack, and there is no valid content on it. Later, because of the call, ebp will be set to the starting address of the stack of the called function, so don't worry about it here.

1         movw $PROT_MODE_DSEG, %ax
2         movw %ax, %ds
3         movw %ax, %es
4         movw %ax, %fs
5         movw %ax, %gs
6         movw %ax, %ss
7         movl $0x0, %ebp
8         movl $start, %esp
3.1.4 switch to the protection mode and enter the boot main method
          call bootmain    

4 Summary

The Bootload startup process can be summarized as follows:

First, the BIOS reads the first sector (where bootloader is stored) to the location where the physical address in memory is 0x7c00. At the same time, the segment register CS value is 0x0000 and the IP value is 0x7c00. Then it starts to execute the bootloader program. CLI shields interrupts (shields all interrupts: it is usually the responsibility of the operating system device driver to provide services for interrupts. Therefore, it is not necessary to respond to any interrupts during the execution of bootloader. Interrupt shielding is completed by writing the interrupt mask register provided by CPU); CLD resets DF, that is, DF=0. By executing CLD instruction, the direction flag DF can be controlled to determine whether the memory address increases (DF=0, increases to the high address) or decreases (DF=1, decreases to the ground address). Set the register values of ax, ds, es and ss to 0; A20 gate is closed, and addresses higher than 1MB are rolled back to 0 by default. Therefore, to activate A20, send a command to 8042 to activate a208042. There are two IO ports: 0x60 and 0x64. Activation process: send 0xd1 command to 0x64 port -- > send 0xdf to 0x60 to open A20 gate. Transition from real mode to protected mode (in the real mode, the whole physical memory is regarded as an area. The program code and data are located in different areas. The operating system and user programs are not treated differently, and each pointer points to the actual physical address, and the address is the IP value. In this way, if a pointer of the user program points to the operating system area or other user program areas and modifies the content, the consequences are likely to be disastrous.), Therefore, the global descriptor table is initialized so that the virtual address and physical address matching can be converted to each other; The lgdt assembly instruction stores the starting position and size of the descriptor table (processing function in asm.h header file) processed by gdt into the gdtr register; Set bit 0 of CR0 to 1 and enter the protection mode; The instruction jump jumps from the code segment to the starting position of protcseg. Set data segment register in protection mode; Set the stack register and call the bootmain function;

original text

Posted by killfall on Tue, 03 May 2022 08:43:07 +0300