by danang.wijanarko@gmail.com
Often called as bootstrap. Power on then RESET pin of CPU raised. After RESET asserted some register of processor (specially cs and ip) are set to fixed values, and code at 0xfffffff0 is executed (located @ ROM BIOS). Then BIOS program do some basic scanning to the local hardware attached in computer.
BIOS uses Real Mode Address that is composed of a segment and offset (corresponding physical address is given by ((seg*16)+off)). That's why there's no need of GDT or LDT or paging table for CPU addressing circuit to translate logical address → physical address. Here it is the BIOS procedure:
-
Doing Power-On Self-Test (POST)
-
Initialize the hardware devices. Specifically for PCI on setting IRQ for hardware not to conflict.
-
Searching OS to boot. BIOS does this by selecting apropriate boot device order. When valid device is found, copies the contents of its first sector (512 bytes) into RAM, starting from physical address 0x00007c00 then jump and execute those code. This code usually a bootsector or a bootloader.
The makefile flows like this (all relative to the linux root source code).
1. bzImage [arch/i386/tools/build.c]
[arch/i386/boot/Makefile] {
tools/build -b bbootsect bsetup compressed/bvmlinux.out CURRENT > bzImage
}
2. arch/i386/boot/bbootsect.o
2. arch/i386/boot/bsetup.o
2. arch/i386/boot/compressed/bvmlinux.out
[arch/i386/boot/Makefile] {
objcopy -O binary -R .note -R .comment -S \
arch/i386/boot/compressed/bvmlinux arch/i386/boot/compressed/bvmlinux.out
}
3. arch/i386/boot/compressed/bvmlinux
[arch/i386/boot/compressed/Makefile] {
ld -m elf_i386 -Ttext 0x100000 -e startup_32 -o bvmlinux head.o misc.o piggy.o
}
4. arch/i386/boot/compressed/head.o
4. arch/i386/boot/compressed/misc.o
4. arch/i386/boot/compressed/piggy.o
[arch/i386/boot/compressed/Makefile] {
objcopy -O binary -R .note -R .comment -S vmlinux $tmppiggy;
gzip -f -9 < $tmppiggy > $tmppiggy.gz;
echo "
SECTIONS {
.data : {
input_len = .;
LONG(input_data_end - input_data) input_data = .;
*(.data) input_data_end = .;
}
}" > $tmppiggy.lnk;
ld -m elf_i386 -r -o piggy.o -b binary $tmppiggy.gz -b elf32-i386 -T $tmppiggy.lnk;
}
5. vmlinux
[linux/Makefile] {
ld -m elf_i386 -T arch/i386/vmlinux.lds -e stext \
arch/i386/kernel/head.o \
arch/i386/kernel/init_task.o \
init/main.o \
init/version.o \
init/do_mounts.o \
--start-group \
arch/i386/kernel/kernel.o \
arch/i386/mm/mm.o \
kernel/kernel.o \
mm/mm.o \
fs/fs.o \
ipc/ipc.o \
drivers/char/char.o \
drivers/block/block.o \
drivers/misc/misc.o \
drivers/net/net.o \
drivers/media/media.o \
net/network.o \
arch/i386/lib/lib.a \
lib/lib.a \
arch/i386/lib/lib.a \
--end-group \
-o vmlinux
}
6. arch/i386/kernel/*
6. arch/i386/lib/*
6. arch/i386/math-emu/*
6. arch/i386/mm/*
6. drivers/*
6. fs/*
6. init/*
6. ipc/*
6. kernel/*
6. lib/*
6. mm/*
net/*
This code is invoked by BIOS. Particulary the size is not only 512 bytes, but rest of code is loaded by this first sector afterwards.
Linux Kernel is compressed, so size is matter when we'd like to boot from floppy. Compression is done at compile time then decompression is done by the loader. The code is arch/i386/boot/bootsect.S. After this code is compiled, the size is 512 bytes. After kernel image is already compiled, the bootsect code is placed at the begining of this kernel image. When BIOS fetch first sector on floppy, actually this bootsect code is fetched by BIOS. Here it is what the bootsect does:
-
Moves itself from 0x00007c00 to 0x00090000
-
Set real mode stack from address 0x00003ff4. This stack grows toward lower address.
-
Sets up disk parameter table, used by the BIOS to handle the floppy device driver.
-
Invokes a BIOS procedure to display a "Loading" message.
-
Invokes a BIOS procedure to load setup() code of the kernel image from floppy and puts this setup() code in RAM starting from address 0x00090200.
-
Invokes a BIOS procedure to load the rest of the kernel image from floppy and puts the image in RAM starting from either low address 0x00010000 (zImage) or high address 0x00100000 (bzImage).
-
Jumps to setup() code.
Wellknown boot loader in linux are LILO and GRUB. Basically all boot loader performs the same operation. It manages its own bussiness then at last taking care the image. Typically they copy the integrated boot loader of the kernel image to address 0x00090000, then the setup() code to address 0x00090200, and the rest of the kernel image to address 0x00010000 or 0x00100000. Then jumps to setup() code.
It is placed at offset 0x200 of the kernel image by the linker, immediately after the integrated boot loader (that's why the loader easily locate the code and copy the code into RAM, starting from physical address 0x00090200). setup() function initialize the hardware device in the computer and set up the environment for the execution of the kernel program. The kernel does not rely on BIOS, to enhance portability and robustness. Here it is what setup() function performs:
-
Invokes a BIOS procedure to find out the amount of RAM on system.
-
Sets the keyboard repeat delay and rate.
-
Initializes the video adapter.
-
Reinitializes the disk controller and determines the hard disk parameters
-
Checks for an IBM Micro Channel bus (MCA).
-
Checks for a PS/2 pointing device (bus mouse).
-
Checks for Advanced Power Management (APM) BIOS support.
-
If the kernel was loaded low in RAM (at physical address 0x00010000), moves it to physical address 0x00001000. Conversely if it was loaded high, keep it just the way it was. This step is necessary because kernel image is compressed, and the decompression routine needs some free space to use as a temporary buffer following the kernel image in RAM.
The decision of loaded high or low is based on code32_start variable in the header of setup.S. This is decided by __BIG_KERNEL__ preprocessor macro that has a correlation with the compilation process:
gcc -E -D__KERNEL__ -I/usr/src/linux-2.4.20/include -D__BIG_KERNEL__ -D__ASSEMBLY__ -traditional -DSVGA_MODE=NORMAL_VGA setup.S -o bsetup.s; as -o bsetup.o bsetup.s;
-
You see there... the "-D__BIG_KERNEL__". Now you've should got the idea.
-
Sets up a provisional IDT and GDT.
-
Resets FPU, if any.
-
Reprograms the PIC and maps the 16 hardware interrupts (IRQ lines) to the range of vectors from 32 to 47. We need to perform this because BIOS erroneously maps the IRQs in the range from 0 to 15, which is already used for CPU exceptions.
-
Switches the CPU to Protected Mode by setting PE in the cr0 status register. Until here, the PG bit in cr0 register is still cleared, so paging is still disabled.
-
Jumps to startup_32() function
The startup_32() functions
There are 2 startup_32() functions:
-
startup_32() function in arch/i386/boot/compressed/head.S. After setup() terminates, we have been moved either to physical address 0x00100000 or 0x00001000. This function does:
-
Initializes the segment registers and a provisional stack.
-
Fills the area of uninitialized data of the kernel identified by the _edata and _end symbols with zeros.
-
Invokes the decompress_kernel() function to decompress the kernel image. The "Uncompressing Linux..." message is displayed first. Then after decompression is done, the "OK, booting the kernel." message is shown. if the kernel image was loaded low, the decompressed kernel is placed at physical address 0x00100000. Otherwise, if the kernel image was loaded high, the decompressed kernel is placed in temporary buffer located after the compressed image. The decompressed image is then moved into its final position, which starts at physical address 0x00100000.
-
Jumps to physical address 0x00100000.
-
startup_32() function in arch/i386/kernel/head.S. This function is the begin of decompressed kernel at 0x00100000 (similiar name does not create any problems becouse both functions are executed by jumping to their initial physical address). This second startup_32() function sets up the execution environment for the first linux process (process 0 / swapper). This is what it does:
-
Initializes the segment registers with their final values.
-
Sets up the kernel mode stack for process 0.
-
Initializes the provisional kernel page tables contained in swapper_pg_dir and pg0 to identically map linear addresses to the same physical addresses.
-
Stores the address of the Page Global Directory (PGD) in the cr3 register, and enables paging by setting the PG bit in the cr0 register.
-
Fills the bss segment of the kernel with zeros.
-
Invokes setup_idt() to fill the IDT with null interrupt handlers.
-
Puts the system parameters obtained from the BIOS and the paramaters passed to the OS into the first page frame.
-
Identifies the model of the processor.
-
Loads the gdtr and idtr registers with the address of the GDT and IDT tables.
-
Jumps to the start_kernel() function.
The start_kernel() function
This function completes the initialization of the kernel. Almost all rest kernel component is initialized by this function. For example:
-
Page tables are initialized by calling paging_init() function.
-
Page descriptors are initialized by calling kmem_init(), free_area_init(), and mem_init() functions.
-
Final initialization of the IDT is performed by invoking trap_init() and init_IRQ().
-
Slab allocator is initialized by calling kmem_cache_init() and kmem_cache_sizes_init() functions.
-
System date and time are initialized by time_init() function.
-
Kernel thread for process 1 is created by invoking kernel_thread() function. In turn this kernel thread creates the other kernel threads and executes the /sbin/init program.
Besides the "Linux version 2.4..." message, which is displayed right after the beginning of start_kernel(), many others are displayed along with this last phase, both by the init functions and by the kernel threads. At the end shows up login prompt.
This is just an emphasize to the detail of previous step. I'll skip the arch/i386/boot/bootsect.S, because this code is rather obsolete.
setup() is located at physical address 0x00090200. setup.S is responsible for getting the system data from the BIOS and putting them into appropriate places in system memory.
Other boot loaders, like GNU GRUB and LILO, can load bzImage too. Such boot loaders should load bzImage into memory and setup "real-mode kernel header", esp. type_of_loader, then pass control to bsetup directly. setup.S assumes:
-
bsetup or setup may not be loaded at SETUPSEG:0, i.e. CS may not be equal to SETUPSEG when control is passed to setup.S.
-
The first 4 sectors of setup are loaded right after bootsect. The reset may be loaded at SYSSEG:0, preceding vmlinux. This assumption does not apply to bsetup.
The arch/i386/boot/setup.S contains:
-
Header and another important variable.
-
Checking code integrity. As setup.S code may not be contiguous, we should check code integrity by checking the signature.
-
Checking loader type. Check if the loader is compatible with the image.
-
Getting memory size. Try three different memory detection schemes to get the extended memory size (above 1 MB) in KB. First, try e820h, which lets us assemble a memory map. Then try e801h, which returns a 32-bit memory size. And finally 88h, which returns 0-64 MB.
-
Checking hardware support. Check hardware support, like keyboard, video adapter (calling arch/i386/boot/video.S), hard disk, MCA bus and pointing device.
-
Check BIOS APM support
-
Prepare for Protected Mode
-
Enable A20
-
Switch to Protected Mode
-
Jump to arch/i386/boot/compressed/head.S:startup_32
arch/i386/boot/compressed/head.S
We are in bvmlinux now! With the help of misc.c:decompress_kernel(), we are going to decompress piggy.o to get the resident kernel image linux/vmlinux. This file is of pure 32-bit startup code. After decompression succeed, the piggy.o has been unzipped (vmlinux) and control is passed to __KERNEL_CS:100000, i.e. linux/arch/i386/kernel/head.S:startup_32().
The decompressed kernel code is just a normal code that flowing down.
|startup_32 --- [arch/i386/kernel/head.S]:
|start_kernel --- [init/main.c]
|lock_kernel --- [include/asm/smplock.h]
|setup_arch --- [arch/i386/kernel/setup.c]
|trap_init --- [arch/i386/kernel/traps.c]
|init_IRQ --- [arch/i386/kernel/i8259.c]
|sched_init --- [kernel/sched.c]
|softirq_init --- [kernel/softirq.c]
|time_init --- [arch/i386/kernel/time.c]
|console_init --- [drivers/char/tty_io.c]
#ifdef CONFIG_MODULES
|init_modules --- [kernel/module.c]
#endif
|kmem_cache_init --- [mm/slab.c]
|sti --- [include/asm/system.h]
|calibrate_delay --- [init/main.c]
|mem_init --- [arch/i386/mm/init.c]
|kmem_cache_sizes_init --- [mm/slab.c]
|pgtable_cache_init --- [arch/i386/mm/init.c]
|fork_init --- [kernel/fork.c]
|proc_caches_init --- [kernel/fork.c]
|vfs_caches_init --- [fs/dcache.c]
|buffer_init --- [fs/buffer.c]
|page_cache_init --- [mm/filemap.c]
|signals_init --- [kernel/signal.c]
#ifdef CONFIG_PROC_FS
|proc_root_init --- [fs/proc/root.c]
#endif
#if defined(CONFIG_SYSVIPC)
|ipc_init --- [ipc/util.c]
#endif
|check_bugs --- [include/asm/bugs.h]
|smp_init --- [init/main.c]
|rest_init --- [init/main.c]
|kernel_thread --- [arch/i386/kernel/process.c]
|unlock_kernel --- [include/asm/smplock.h]
|cpu_idle --- [arch/i386/kernel/process.c]
The rest_init() does the following:
-
Launches the kernel thread "init".
-
Calls unlock_kernel().
-
Makes the kernel run cpu_idle() routine, that will be the idle loop executing when nothing is scheduled.
In fact the start_kernel procedure never ends. It will execute cpu_idle routine endlessly. The "init", the first kernel thread, does:
|init --- [init/main.c]
|lock_kernel --- [include/asm/smplock.h]
|do_basic_setup --- [init/main.c]
|mtrr_init --- [arch/i386/kernel/mtrr.c]
|sysctl_init --- [kernel/sysctl.c]
|pci_init --- [drivers/pci/pci.c]
|sock_init --- [net/socket.c]
|start_context_thread --- [kernel/context.c]
|do_initcalls --- [init/main.c]
|(*call())-> kswapd_init --- [mm/vmscan.c]
|prepare_namespace --- [init/do_mounts.c]
|free_initmem --- [arch/i386/mm/init.c]
|unlock_kernel --- [include/asm/smplock.h]
|execve --- [include/asm/unistd.h]
The contents of empty_zero_page are used to pass parameters from the 16-bit realmode code of the kernel to the 32-bit part. Don't be confused by the bootsector. Remember from now on, bootsector space can be overwritten, because it is no longer used. So we place the data on it and we call it, empty_zero_page. References/settings to it mainly are in:
-
arch/i386/boot/setup.S
-
arch/i386/boot/video.S
-
arch/i386/kernel/head.S
-
arch/i386/kernel/setup.c
|
Offset
|
Type
|
Description
|
|
0
|
32 bytes
|
struct screen_info, SCREEN_INFO ATTENTION, overlaps the following !!!
|
|
2
|
unsigned short
|
EXT_MEM_K, extended memory size in Kb (from int0x15)
|
|
0x20
|
unsigned short
|
CL_MAGIC, commandline magic number (=0xA33F)
|
|
0x22
|
unsigned short
|
CL_OFFSET, commandline offset Address of commandline is calculated:
0x90000 + contents of CL_OFFSET (only taken, when CL_MAGIC = 0xA33F)
|
|
0x40
|
20 bytes
|
struct apm_bios_info, APM_BIOS_INFO
|
|
0x80
|
16 bytes
|
hd0-disk-parameter from intvector 0x41
|
|
0x90
|
16 bytes
|
hd1-disk-parameter from intvector 0x46
|
|
0xa0
|
16 bytes
|
System description table truncated to 16 bytes ( struct sys_desc_table_struct ). Just look at MCA (Micro Channel) bus detection at setup.S.
|
|
0xb0 - 0x1df
|
|
Free. Add more parameters here if you really need them.
|
|
0x1e0
|
unsigned long
|
LT_MEM_K, alternative mem check, in Kb
|
|
0x1e8
|
char
|
number of entries in E820MAP (below)
|
|
0x1f1
|
char
|
size of setup.S, number of sectors
|
|
0x1f2
|
unsigned short
|
MOUNT_ROOT_RDONLY (if !=0)
|
|
0x1f4
|
unsigned short
|
size of compressed kernel-part in the (b)zImage-file (in 16 byte units, rounded up)
|
|
0x1f6
|
unsigned short
|
swap_dev (unused AFAIK)
|
|
0x1f8
|
unsigned short
|
RAMDISK_FLAGS
|
|
0x1fa
|
unsigned short
|
VGA-Mode (old one)
|
|
0x1fc
|
unsigned short
|
ORIG_ROOT_DEV (high=Major, low=minor)
|
|
0x1ff
|
char
|
AUX_DEVICE_INFO
|
|
0x200
|
|
start: -- short jump to start of setup code aka "reserved" field.
|
|
0x202
|
4 bytes
|
Signature for SETUP-header, ="HdrS"
|
|
0x206
|
unsigned short
|
Version number of header format Current version is 0x0201...
|
|
0x208
|
8 bytes
|
realmode_swtch: -- (used by setup.S for communication with boot loaders, look there)
|
|
0x210
|
char
|
type_of_loader: -- LOADER_TYPE, = 0, old one else it is set by the loader:
0xTV: T=0 for LILO 1 for Loadlin 2 for bootsect-loader 3 for SYSLINUX 4 for ETHERBOOT V = version
|
|
0x211
|
char
|
loadflags:
bit0 = 1: kernel is loaded high (bzImage)
bit7 = 1: Heap and pointer (see below) set by boot loader.
|
|
0x212
|
unsigned short
|
setup_move_size: -- (setup.S)
|
|
0x214
|
unsigned long
|
code32_start: -- KERNEL_START, where the loader started the kernel
|
|
0x218
|
unsigned long
|
ramdisk_image: -- INITRD_START, address of loaded ramdisk image
|
|
0x21c
|
unsigned long
|
ramdisk_size: -- INITRD_SIZE, size in bytes of ramdisk image
|
|
0x220
|
4 bytes
|
bootsect_kludge: -- (setup.S)
|
|
0x224
|
unsigned short
|
heap_end_ptr: --setup.S heap end pointer
|
|
0x226
|
unsigned short
|
pad1:
|
|
0x228
|
unsigned long
|
cmd_line_ptr:
|
|
0x22c
|
unsigned long
|
ramdisk_max:
|
|
...
|
...
|
...
|
|
0x2d0 - 0x600
|
|
E820MAP
|
|
0x600 - 0x7d4
|
|
EDDBUF
|
|
0x800
|
string, 2K max
|
COMMAND_LINE, the kernel commandline as copied using CL_OFFSET.
Note: this will be copied once more by setup.c into a local buffer which is only 256 bytes long (#define COMMAND_LINE_SIZE 256).
|
The physical memory layout
0A0000
- Reserved for BIOS: Do not use. Reserved for BIOS EBDA.
09A000
- Stack/heap/cmdline: For use by the kernel real-mode code.
098000
- Kernel setup: The kernel real-mode code.
090200
- Kernel boot sector: The kernel legacy boot sector.
090000
- Protected-mode kernel: The bulk of the kernel image.
010000
- Boot loader: ← Boot sector entry point 0000:7C00
001000
- Reserved for MBR/BIOS
000800
- Typically used by MBR
000600
- BIOS use only
000000
The situation:
|
End of code
|
bootsect position
|
setup position
|
image position
|
Note
|
|
arch/i386/boot/bootsect
|
0x00090000
|
0x00090200
|
0x00010000 (arch/i386/boot/compressed/vmlinux)
or
0X0010000 (arch/i386/boot/compressed/bvmlinux)
|
This compressed image (arch/i386/boot/compressed/vmlinux or arch/i386/boot/compressed/bvmlinux) is composed of: arch/i386/boot/compressed/head.o, arch/i386/boot/compressed/misc.o, and arch/i386/boot/compressed/piggy.o (the real compressed vmlinux -- remember vmlinux here != with arch/i386/boot/compressed/vmlinux (you musn't confused))
|
|
arch/i386/boot/setup
|
0x00090000
|
0x00090200
|
0x00001000 (arch/i386/boot/compressed/vmlinux) or 0X0010000 (arch/i386/boot/compressed/bvmlinux)
|
See, if image loaded low then moves it to 0x00001000, otherwise keep it at 0X0010000
|
|
arch/i386/boot/compressed/head()
|
0x00090000
|
0x00090200
|
0X0010000 (decompressed piggy.o = vmlinux)
|
See, either zImage or bzImage, after decompress_kernel(), the decompressed kernel is moved to 0X0010000. After this the code then jump to 0X0010000. This is full blown last kernel position after decompression.
|