Jun 30, 2011

Simple OS - bootloader part1

The past few days, I've finished writing the bootloader. I summarize some of them and will post them on the blog.

Before we started writing our code, there are some background knowlege.

what is a bootloader?
A bootloader is a program that will load the kernel image into the memory, and jumps to it.

how does it works?
First when you press the power bottom, the bios will start first. And after the bios is loaded into the memory,
it will first check which device you want to boot and check the first sector(The MBR) of the device. If it is OK, the bios will put the MBR code into the memory address 0x7c00 and jump to it.

what is MBR?
MBR is the abbreviation of master boot record. As the name suggest the code inside the MBR is the bootloader. In most cases, the MBR is in the first sector of your devices, such as the hard disk, floopy disk , compact disk and so on.

The size of the MBR is 512 bytes. There are many fields contains in a MBR.
a. code                 440bytes
b. Disk signature   4 bytes.
c. null                   2 bytes
d. Partition tables. 64bytes
e. MBR signature 2 bytes.

BIOS interrupt:

what is bios interrupt?
bios interrupt is a low level interrupt which is loaded before the bootloader. BIOS interrupt contains many useful functions which can communicate with the I/O without fully understand the architecture.

why using bios interrupt?
as I previous mentioned, bootloader is to load the kernel image into the memory, and therefore there is no os system call or drivers to help you communicate with the I/O. The best way and the most convenient way is to use the bios interrupt to handle the I/O.

Coding time:
After understand the information, it's time to write a simple hello world program in the boot loader.
.section .text
.global main
#FAT12 file system format
#there is nothing to change in this part
        jmp start_prog
        .byte   0x90
        .ascii  "MicrMike"
        .word   512
        .byte   1
        .word   1
        .byte   2
        .word   224
        .word   2880
        .byte   0xf0
        .word   9
        .word   18
        .word   2
        .long   0
        .long   2880
        .byte   0
        .byte   0
        .byte   0x29
        .long   0x19900303
        .ascii  "HELLO-OS   "
        .ascii  "FAT12   "
        .fill   18, 1, 0
        movw $0, %ax
        movw %ax, %ss
        movw %ax, %ds
        movw %ax, %es
        movw $msg, %si
#using bios interrupt 10h
#parameter of bios interrupt 10h
#%ah: function number
#%al: the offset of the message
        movb    $0xe, %ah
        movb    (%si), %al
        cmpb    $0, %al
        je      fin
        int     $0x10
        addw    $1, %si
        jmp     loop
        jmp     fin
        .ascii  "Hello world!!."
        .byte 0
        .org    0x1fe, 0x00
        .word   0xaa55
P.S the above source code is using the at&t syntax. In this article I won't tell you how to write assembly, but you can google to find some great tutorial.
There are many things that is worth notice in the source code.
1. The red highlight is the FAT12 file system format. Do not change this part.
2. Since we are in the real mode and the ld will assume that the code is in 0x0, I need to initial the whole base register, such as ds, ss and so on.
3. In order to print a message on the screen, I use the bios interrupt.
    bios interrupt 10h
    %ah stores the function number, in this case use the $0xe.
    %al stores the address of the message.
4. in the bottom of the code, don't forget to put the MBR signature, otherwise the bios will think the MBR is     useless.

1. use gcc to compile the source code:
  gcc -c Helloworld.S
2. use ld to link the obj file into the binary:
  ld -Ttext=0x0 --oformat binary Helloworld.o -o Helloworld.bin
3. use mkdosfs to create a virtual floppy disk
  mkdosfs -C os.flp 1440
4. install the binary file into the virtual floppy disk by using dd 
  dd status=noxfer conv=notrunc if=$boot_bin of=os.flp
reference website:

using qemu to test the result.
qemu -fda os.flp
and you will see a hello world in the qemu.

Jun 29, 2011

Shell code 3(print message)

After the exit_shell.out program is finished, it's time to move on.
This time I will use a new system call which will print a message to the screen.

Background Knowlege:
the way I use the system call is still the same, using the int $0x80.
The parameters of print system call:
eax = 4  => this is the system call number.
ebx = 1  => we want to print the message to the stdout.
ecx =  msg => the address of the message.
edx = len   => the length of the message.

OK, coding time.
1.wrote the program into inline assembly.
char msg[]="Run Han";
int main(){
                 movl $4,%eax;\
                 movl $1,%ebx;\
                 movl $0x7,%edx;\
                 movl $msg,%ecx;\
                 int  $0x80;\
                 movl $1,%eax;\
                 movl $0,%ebx;\
                 int  $0x80;\
        return 0;
2.compile the source code with the following command:
gcc -g -o print.out print.c
3. execute the program

As u can see, the output message is in the global data, that's not good. I want the message is inside the shell code. Let's modified the source code a bit.
int main(){
                 movl $4,%eax;\
                 movl $1,%ebx;\
                 movl $0x7,%edx;\
                 movl $msg,%ecx;\
                 int  $0x80;\
                 movl $1,%eax;\
                 movl $0,%ebx;\
                 int  $0x80;\
                 .string "Run Han"\
        return 0;

However, in this way, how do I know the address of the message.
Thanks to this web site: http://insecure.org/stf/smashstack.html
I found the solution. The following is the modified version of the code.
int main(){
        __asm__("jmp 2f;\n\                   #2bytes
                 popl %esi;\n\                #1bytes
                 movl $4,%eax;\n\             #5bytes
                 movl $1,%ebx;\n\             #5bytes
                 movl $0x7,%edx;\n\           #5bytes
                 movl %esi,%ecx;\n\           #2bytes
                 int  $0x80;\n\               #2bytes
                 movl $1,%eax;\n\             #5bytes
                 movl $0,%ebx;\n\             #5bytes
                 int  $0x80;\n\               #2bytes
                 call 1b;\n\                  #5bytes
                 .string "Run Han";\n\
        return 0;
Use the relative jmp/call to accomplish this job.
1. we first use the relative jump to jump to the call instruction.
2. next we use the relative call instruction to transfer the execution flow to the label 1.

So how does these steps has anything to do with the address of the message.
This is a very tricky way. When u call a function, the cpu will automatically push the eip into the stack. And look what is behind the call function, it is the message. So the cpu will help us to push the address of the message to the stack.

And now let's compile the file, and test the output.
It works.

The next step is much easier than above, all I have to do is use the objdump to dump the file, and change the inline assembly into shell code.

char shellcode[]=
"Run Han";
int main(){
        int *ptr;
        int i;
         *overflow the return address.
         *transfer the execution flow to shellcode.
                ptr = (int*)&ptr+i;
                *(ptr) = (int)shellcode;
        return 0;
compile the program and don't forget to use the execstack command to enable the executable stack.
demo video:

Shell code 2 (cont.)

After the previous article, I wrote a program which use a shell code to exit normally.
However, there is a more simple way to accomplish this job. We can use the linux system call.

In linux, when you want to use a system call in assembly, just use the int $0x80.
The system call number is store in the %eax, and the parameter of the system call is store in the %ebx and so on.

Ok, it's time to use these information and write them into shell code.

First write a normal program and use inline assembly in the program.

int main(){
     __asm__("movw $1, %eax;\
              movw $0, %ebx;\
              int  $0x80;");
     return 0;

and now compile it with the following command.
gcc -static -g -o exit.out exit.c
objdump -d exit.out >> dump.txt
dump.txt main function

080482c0 <main>:
 80482c0:       55                      push   %ebp
 80482c1:       89 e5                   mov    %esp,%ebp
 80482c3:       b8 01 00 00 00          mov    $0x1,%eax
 80482c8:       bb 00 00 00 00          mov    $0x0,%ebx
 80482cd:       cd 80                   int    $0x80
 80482cf:       b8 00 00 00 00          mov    $0x0,%eax
 80482d4:       5d                      pop    %ebp
 80482d5:       c3                      ret
the machine code part is what we need.
Let's change them into shell code.


char shellcode[]=
int main(){
        int *ptr;
        int i;
                ptr = (int*)&ptr+i;
                *(ptr) = (int)shellcode;
        return 0;
P.S The for loop is to overflow the return address and transfer the execution flow to the shellcode.
compile it with the following and remember to use the execstack to enable the executable stack.
gcc -static -g -o exit_shell.out exit_shell.c
execstack -s exit_shell.out

now use gdb to verify our thoughts.
(gdb) disassem main
Dump of assembler code for function main:
   0x080482c0 <+0>: push   %ebp
   0x080482c1 <+1>: mov    %esp,%ebp
   0x080482c3 <+3>: sub    $0x10,%esp
   0x080482c6 <+6>: movl   $0x0,-0x8(%ebp)
   0x080482cd <+13>: jmp    0x80482eb <main+43>
   0x080482cf <+15>: lea    -0x4(%ebp),%eax
   0x080482d2 <+18>: mov    -0x8(%ebp),%edx
   0x080482d5 <+21>: shl    $0x2,%edx
   0x080482d8 <+24>: add    %edx,%eax
   0x080482da <+26>: mov    %eax,-0x4(%ebp)
   0x080482dd <+29>: mov    -0x4(%ebp),%eax
   0x080482e0 <+32>: mov    $0x80ce028,%edx
   0x080482e5 <+37>: mov    %edx,(%eax)
   0x080482e7 <+39>: addl   $0x1,-0x8(%ebp)
   0x080482eb <+43>: cmpl   $0x9,-0x8(%ebp)
   0x080482ef <+47>: jle    0x80482cf <main+15>
   0x080482f1 <+49>: mov    $0x0,%eax
   0x080482f6 <+54>: leave  
   0x080482f7 <+55>: ret    
End of assembler dump.
(gdb) b *(main+55)
Breakpoint 1 at 0x80482f7: file exit_shell.c, line 12.
(gdb) r
Breakpoint 1, 0x080482f7 in main () at exit_shell.c:12
12 }
(gdb) ni
0x080ce028 in shellcode ()
(gdb) x/4i $eip
=> 0x80ce028 <shellcode>: mov    $0x1,%eax
   0x80ce02d <shellcode+5>: mov    $0x0,%ebx
   0x80ce032 <shellcode+10>: int    $0x80
   0x80ce034 <shellcode+12>: add    %al,(%eax)
(gdb) ni
0x080ce02d in shellcode ()
(gdb) ni
0x080ce032 in shellcode ()
(gdb) ni
Program exited normally.
The result is just what we expected. When main function returned, it will start execute the shellcode.
And exited normally.
video demo is right here (using full screen is better):

nachos (cont.)

After building and install the nachos, the second project is to fix a bug in nachos.

Bug description:
When nachos is executing two different executable files, the result will be weird.
The following is picture is how the bug looks like.

Finding the Bug:
Now I know what the bug looks like, It's time to find out what cause the bug.
Since our TA have told us that the bug may be in the ./userprog/addrspace.h and ./userprog/addrspace.cc, my team mate and I start to understand what this two files are doing.

what we are interested are 
1. AddrSpace::Load(), which will load the image into the physical addrspace.
2. AddrSpace::AddrSpace, which is the constructor of this class, and will create the page table of this process.

In the source code of AddrSpace::Load()
these code is to load the .text and .data into the physical memory

if (noffH.code.size > 0) {        
        executable->ReadAt(&(kernel->machine->mainMemory[noffH.code.virtualAddr]),noffH.code.size, noffH.code.inFileAddr);
if (noffH.initData.size > 0) {        
        executable->ReadAt(&(kernel->machine->mainMemory[noffH.initData.virtualAddr]),noffH.initData.size, noffH.initData.inFileAddr);

The allocating process will allocate the virtual address space into the physical address space.
However as the red highlight suggest, the physical address of the process is exactly the same as the virtual address of the process.
If the second process is allocate into the memory, it will overwrite the previous process. Hence, when the previous process restore the state, it will execute the code of another process, not the one belongs to it.
Here, we may find our problem.

But not so fast, before we modified the code and fixed the bug, there is one thing that we need to be aware.Nachos is using paging to map the virtual address to physical address. Therefore, when we modified the allocating part, we also need to modified the paging (page tables).The mapping of the page table is in the constructor. As the code suggest, the page table is one to one linear mapping. Therefore, if another process is load into the memory, and the process 1 restore the state. The page table of process 1 will tell that the physical address space of process1 in in the old place where it has already been overwritten by process 2.

Fixed the bug:
Now we have the information we need. It's time to modified the code and fix the bug.
The method of our team is to add a static unsigned int variable which will record how many physical frame is being used, and when process 2 is load into the memory, it will not overwritten the address space of process 1.

1. first add a static unsigned int variable in the addrspace.h

class AddrSpace {


    TranslationEntry *pageTable;

    unsigned int numPages;
    static unsigned int usedPhys;
    bool Load(char *fileName);
    void InitRegisters();

2. initialize the static variable in the addrspace.cc
    #include "addrspace.h"
  unsigned int AddrSpace::usedPhys=0;
3. modified the AddrSpace::Load()
&(kernel->machine->mainMemory[noffH.code.virtualAddr+128*usedPhys]),noffH.code.size, noffH.code.inFileAddr);

why usedPhys multiply 128 before adding to the virtualAddr?
The reason is simple, cuz the usedPhys is the page frame which is being used, not the physical memory.
Therefore, when we are adding the usedPhys we also need to multiply the page size of the page frame, which is 128 byte per page.

4.remapping the page table
   for (unsigned int i = 0; i < numPages; i++) {
pageTable[i].virtualPage = i;
pageTable[i].physicalPage = i+usedPhys
   the numPages is the number of the page frame the current process is using.
   there is no need to map the whole page again, just those the process is using.
5. record the usedPhys
    usedPhys += numPages;
   after those steps, adding the numPages to the usedPhys. Therefore, when the process 2 is loading it will not overwritten the address space of process 1.

6. the last steps is to compile the source code again and test the result.