A executable format may contains many section, such as .text(where your code is stored), .data(where your global variables and static variables is stored)...
Have you ever wonder if we stored the executable bytes in the .data section? This is what a shell code is.
here is an example to demonstrate how it works.
My environment is ubuntu 10.10 and gcc version is 4.4.5.
First writing a simple program which just do nothing at all.
example1.c
#include <stdlib.h>
int main()
{
exit(0);
return 0;
}makefile:
gcc -g -static -o example1.out example1.c //compile the source file with static link
objdump -D example1.out >> example1.dump //dump the executable file
now lets see what happened by looking at the example1.dump.
080482c0 <main>
80482c0: 55 push %ebp
80482c1: 89 e5 mov %esp,%ebp
80482c3: 83 e4 f0 and $0xfffffff0,%esp
80482c6: 83 ec 10 sub $0x10,%esp
80482c9: c7 04 24 00 00 00 00 movl $0x0,(%esp)
80482d0: e8 db 08 00 00 call 8048bb0 <exit>
the first two instructions is to set the stack frame of main.
the third instruction is to align the stack segment.
the last two instruction is to push the parameter into the stack and call function exit.
What we are interested in is the last two instruction.
lets write a second example.
example2.c
#include <stdlib.h>
int main()
{
asm("movl $0x0,(%esp);\
call 0x8048bb0;\
");
}
int main()
{
asm("movl $0x0,(%esp);\
call 0x8048bb0;\
");
}
lets see the dump file.
080482c0 <main>:
80482c0: 55 push %ebp
80482c1: 89 e5 mov %esp,%ebp
80482c3: c7 04 24 00 00 00 00 movl $0x0,(%esp)
80482ca: e8 e1 08 00 00 call 8048bb0 <exit>
and now lets turn it into shell code.
but before we do that there are something we should know first.
when gcc transfer the assembly into machine code, it will use a related call which will called the function by the offset.
Therefore, when we turn this into shell code, the address will be different depending on where your shell code is located.
Therefore, we need to use an absolute call instead of a related call.
lets first write them into assembly language.
example3.c
#include <stdlib.h>
int main()
{
asm("movl $0x0,(%esp);\
movl $0x8048bb0,%eax;\
call *%eax;\
");
}
and the dump file:int main()
{
asm("movl $0x0,(%esp);\
movl $0x8048bb0,%eax;\
call *%eax;\
");
}
080482c0 <main>:
80482c0: 55 push %ebp
80482c1: 89 e5 mov %esp,%ebp
80482c3: c7 04 24 00 00 00 00 movl $0x0,(%esp)
80482ca: b8 b0 8b 04 08 mov $0x8048bb0,%eax
80482cf: ff d0 call *%eax
80482d1: 5d pop %ebp
80482d2: c3 ret
In example3 the highlight is the main difference. The instruction byte become 0xff instead of 0xe8
and now let's transfer 0x80482c3~ 0x80482cf into shell code.
example4.c
#include <stdlib.h>
char shellcode[] = "\xc7\x04\x24\x00\x00\x00\x00\xb8\xb0\x8b\x04\x08\xff\xd0";
int main()
{
int *ptr;
int i;
for(i=0;i<10;i++)
{
ptr = (int*)&ptr+i;
(*ptr) = (int)shellcode;
}
return 0;
}
char shellcode[] = "\xc7\x04\x24\x00\x00\x00\x00\xb8\xb0\x8b\x04\x08\xff\xd0";
int main()
{
int *ptr;
int i;
for(i=0;i<10;i++)
{
ptr = (int*)&ptr+i;
(*ptr) = (int)shellcode;
}
return 0;
}
you may wonder what is the purpose of the for loop and the int pointer.
When we finished writing the shellcode, the executable bytes is in the data section.
Therefore we need to find a way to transfer the executable path to the shellcode.
And the method I used is to override the Return address of the main function, so when the main function is return, it will return to the shellcode instead of the caller function.
now execute it, and you will meet a segmentation fault.
it's time to use gdb to debug this program.
(gdb) disassem main
Dump of assembler code for function main:
0x080482c0 <+0>: push %ebp
0x080482c1 <+1>: mov %esp,%ebp
0x080482c3 <+3>: sub $0x10,%esp
0x080482c6 <+6>: movl $0x0,-0x8(%ebp)
0x080482cd <+13>: jmp 0x80482eb <main+43>
0x080482cf <+15>: lea -0x4(%ebp),%eax
0x080482d2 <+18>: mov -0x8(%ebp),%edx
0x080482d5 <+21>: shl $0x2,%edx
0x080482d8 <+24>: add %edx,%eax
0x080482da <+26>: mov %eax,-0x4(%ebp)
0x080482dd <+29>: mov -0x4(%ebp),%eax
0x080482e0 <+32>: mov $0x80ce028,%edx
0x080482e5 <+37>: mov %edx,(%eax)
0x080482e7 <+39>: addl $0x1,-0x8(%ebp)
0x080482eb <+43>: cmpl $0x9,-0x8(%ebp)
0x080482ef <+47>: jle 0x80482cf <main+15>
0x080482f1 <+49>: mov $0x0,%eax
0x080482f6 <+54>: leave
0x080482f7 <+55>: ret
End of assembler dump.
(gdb) b *(main+55)
Breakpoint 1 at 0x80482f7: file example4.c, line 13.
(gdb) r
Starting program: example4.out Breakpoint 1, 0x080482f7 in main () at example4.c:13
13 }
(gdb) x/a $esp
0xbffff2ec: 0x80ce028 <shellcode>
(gdb) x/3i 0x80ce028
0x80ce028 <shellcode>: movl $0x0,(%esp)
0x80ce02f <shellcode+7>: mov $0x8048bb0,%eax
0x80ce034 <shellcode+12>: call *%eax
(gdb) disassem exit
Dump of assembler code for function exit:
0x08048bd0 <+0>: push %ebp
0x08048bd1 <+1>: mov %esp,%ebp
0x08048bd3 <+3>: sub $0x18,%esp
0x08048bd6 <+6>: mov 0x8(%ebp),%eax
0x08048bd9 <+9>: movl $0x1,0x8(%esp)
0x08048be1 <+17>: movl $0x80ce03c,0x4(%esp)
0x08048be9 <+25>: mov %eax,(%esp)
0x08048bec <+28>: call 0x8048ad0 <__run_exit_handlers>
End of assembler dump.
oh no, the address of exit() is changed.
this may be our problem, let's modify the shellcode
example5.c
#include <stdlib.h>
char shellcode[] = "\xc7\x04\x24\x00\x00\x00\x00\xb8\xd0\x8b\x04\x08\xff\xd0";
int main()
{
int *ptr;
int i;
for(i=0;i<10;i++)
{
ptr = (int*)&ptr+i;
(*ptr) = (int)shellcode;
}
return 0;
}
compile it and execute it. and still meet segmentation fault.
let's use gdb again.
(gdb) b *(main+55)
Breakpoint 1 at 0x80482f7: file example5.c, line 13.
(gdb) r
Starting program: example5.out
Breakpoint 1, 0x080482f7 in main () at example5.c:13
13 }
(gdb) x/a $esp
0xbffff2ec: 0x80ce028 <shellcode>
(gdb) x/3i 0x80ce028
0x80ce028 <shellcode>: movl $0x0,(%esp)
0x80ce02f <shellcode+7>: mov $0x8048bd0,%eax
0x80ce034 <shellcode+12>: call *%eax
(gdb) x/i exit
0x8048bd0 <exit>: push %ebp
everything looks ok, so where is the problem.
the reason causing this problem is the nonexecutable stack.
type sudo apt-get install execstack in your command shell.
execstack -s example5.out
after that the program is finally finished as expected.
next article: shell code 2 (cont.)
reference website:
https://paulmakowski.wordpress.com/2011/01/25/smashing-the-stack-in-2011/
http://stackoverflow.com/questions/5850524/buffer-overflow-problem
Hi grate lesson's they are very composed.
ReplyDeleteI have encountered a problem when trying to
dump example1.c, on VM backtrack rs 5, the dump is substantially bigger and complicated.
any idea way?
thanks
tks for your comment.
ReplyDeleteWhen using objdump to dump a binary, it will dump the whole code section(There are many functions in code section). I just post the main function. If u objdump the binary into a file, u can search for main function in the dump file.
got it thank's
ReplyDeleteHi, thanks for this good article, and i'm following your instructions step by step...
ReplyDeleteWhile in the example4.c, I'm still wondering why this can override the return address and how to find the return address. Because I'm working on an x64 machine. I believe some of the digits should be different.
Thanks.
int *ptr;
ReplyDeletefor(i=0;i<10;i++)
{
ptr = (int*)&ptr+i;
(*ptr) = (int)shellcode;
}
The code will change the stack frame dynamically. Let's assume the stack frame of our main function is like the following figure.
==============Figure 1 =====================
----------------------------
parameter 2 | <= ptr + 4
----------------------------
parameter 1 | <= ptr + 3
----------------------------
Return Address(RA) | <= ptr + 2
----------------------------
%ebp | <= ptr + 1
----------------------------
local variable(which is ptr)|
----------------------------
=================End of Figure 1================
Therefore if we change the value of *(ptr + i), it will change the value of the stack frame.
For example,
*(ptr+2) = 0x41414141
since ( ptr + 2 ) points to the Return Address, the RA will be changed into 0x41414141. Hence, when the function return, it will return to 0x41414141 instead of the caller.
It is the basic idea of Buffer overflow attack(BOA)
The following is a great article about this kind of Attack.
http://insecure.org/stf/smashstack.html
As for x64 machine, I think the basic idea is still the same, but the address is 64 bits instead of 32 bits.
This website describe how stack frame looks like in amd64 system v abi.
http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/
Hope This will help u. :)