Hitchhiker's World (Issue #9) http://www.infosecwriters.com/hhworld/ An Introduction to Linux Kernel Backdoors ========================================= Ayan Chakrabarti Introduction ------------ Today, most linux rootkits are implemented as kernel modules, and for good reason. The kernel gives a programmer control over every aspect of the OS and gives him/her control that makes it the logical target while developing rootkits and backdoors. In this article, I will briefly introduce some of the basic ideas in this approach, mainly from the point of view of use in a honeypot. The example has been tested with the 2.6.3 version of the linux kernel, and may not work with the 2.4 series (as in, I haven't tested it with 2.4, so you're on your own there). Please get back to me if you have any problems or questions while using the example. Please download the following : http://www.infosecwriters.com/hhworld/hh9/lvtes.tgz The Kernel Module ----------------- Kernel modules are basically programs that can be dynamically loaded and unloaded from a running kernel. The idea is to keep the memory footprint of the kernel as small as possible, loading only those drivers that are needed at the moment. A module is quite different from a normal executable. In fact, its more like a library. When the module is loaded, it is first "linked" with the running kernel. A module usually imports the addresses of various functions in the kernel. These are setup first. Other house-keeping activites like adding the module's name and information to a linked list of modules are also done. Then an initialization procedure is called where the module is expected to set up the hardware and register callbacks, etc. Once the module has to be removed, a cleanup procedure is called and then the module is unloaded - it is removed from the list of modules and any allocated memory is freed. The following is a simple module that just "logs" messages when it is loaded and unloaded. --------test.c---- #include #include #include #include int test_init(void) { printk(KERN_INFO "test_init called !\n"); return 0; } void test_exit(void) { printk(KERN_INFO "test_exit called !\n"); return; } module_init(test_init); module_exit(test_exit); --------------------- The test_init and test_exit functions are the module init & exit functions respectively. You can specify the init and exit functions of your module by using the module_init & module_exit macros. As you can guess, the printk function (the kernel mode equivalent of the printf function) prints messages to the kernel log (run dmesg to check this log). KERN_INFO is basically the priority of the message, this priority is used to decide whether a kernel log will be echoed to the screen or not. Now, lets go on to compiling the module. The first assumption here is that the kernel that is currently running on the machine has been compiled from source on the machine. Essentially, this means that the build subdirectory of the /lib/modules/version directory (where version is the version of the running kernel, run uname -r to check) points to the directory from which this kernel was built. If you're running the default kernel your distro supplied, you will have to recompile your kernel. I suggest doing this anyways, running a kernel that is best suited to your system is always a good idea. Anyway, back to compiling the module. We will use the kernel's build mechanism to build our module. This is the easiest and most elegant approach. Make a file called Makefile also in the current directory to contain just the one following line, obj-m := test.o Now run the following command, make -c /lib/modules/`uname -r`/build SUBDIRS=`pwd` modules What this command basically does is asks the kernel build system to build all modules and includes the current directory as a directory to search for modules. The makefile we made adds test.o as a module (The build system will figure out that it needs to compile test.c to get test.o). Since all other modules would already have been compiled, the kernel build system will go ahead and compile your module to generate test.ko To load the module, run "insmod ./test.ko" and to remove it, run "rmmod test". Run dmesg after each command and check to see the messages added to the kernel log. This has been a very brief introduction to Linux Kernel Modules. While this should be sufficient for the rest of the article, before you start coding, I strongly suggest going through Alessandro Rubini's book, Linux Device Drivers. LVTES ----- LVTES (Low Visibility Tool for Electronic Surveillance) is a linux kernel module which logs keys in local and remote sessions to a directory which is made hidden. The module also doesn't show up in the list of modules when you run lsmod or other such utilities. The source code of LVTES should be available with this article. The rest of the article talks about some concepts on which LVTES is based. The article itself will not discuss the source code. I suggest you read the article and then go through the sources, preferably with a copy of the Linux Device Drivers book at hand, to understand the implementations better. Another great source of information is the kernel souce itself. I suggest you read through as much of the source you think is relevant to get a feel of it. System Calls and their Abuse ---------------------------- What's a System Call ? Well, a system call is what you use do get anything done by the OS. Its analogue in the DOS world would be int 21h, infact system calls in Linux are accessed through the 80h interrupt. Basically, a system call is a service provided by the OS to programs. For instance, if you want to read a file, you'll use a system call, if you want to list files in a directory, you'll use a system call, if you want to open a socket, even then you'll use a system call. How does the whole thing work ? Let's say you issue a fread in your C program. The standard C library implements this as a system call to sys_read. It basically loads up registers appropriately and issues an int 80h. Now, the appropriate handler in the kernel code gets control. Look at the file arch/i386/kernel/entry.S in your kernel source directory for this code. What the handler does is that it looks at something called a System Call Table where for every system call number, there is a pointer to the procedure that's supposed to handle it. The handler figures out the address of the procedure to call and passes control to it. You can see the sys_call_table pointer being defined in entry.S The point here is that with the privileges of a kernel module, you can hook these system calls. Replace the pointer in the sys_call_table with a pointer to your own procedure (after saving the old pointer of course) and you're there. In earlier versions of the kernel, the sys_call_table address was exported. You could just put an extern void ** sys_call_table and it would work. That's no longer the case in 2.6. Here, you'll have to retrieve the address from either the System.map file (which contains memory addresses of all symbols in the kernel) or by running nm on the vmlinux file which is the uncompressed image of the kernel. So, why would you want to hook a system call ? Let's look at the keylogging angle. Every program gets its input by reading from its standard input, that's a sys_read on file descriptor 0, or by opening /dev/console and reading from there - the latter happens for instance when a program wants to turn off echoing when its reading a password. So, we hook the sys_read call. Whenever our code gets control, we check to see if the read is on file descriptor 0 and if so, what kind of device that points to. Now, devices we're interested in are /dev/ttyN which are basically the text mode consoles and /dev/ptsN which are "virtual" consoles - xterm consoles, remote ssh sessions, etc are run on these devices. Now every character device is identified my a unique major and minor number - all /dev/ttyN will have the same major number but different minor numbers. Data structures in the process hold information about what kind of device each file descriptor points to. We check to see if file descriptor 0 points to one of the devices we're interested in and if so which one - this helps us separate logs in different consoles to different files. If the sys_read isn't on file descriptor 0, we check to see if its a read from /dev/console and if so, we see if the process's file descriptor 0 corresponds to a device we are interested in. In either of the other two cases, we log whatever is read (by calling the original sys_read) in the appropriate log files. And there you have it, a keylogger that can log both local and remote sessions. Another interesting system call is getdents, used to list files in a directory. You can hook this (and its extended version getdents64) to hide files and directories (like say the directory in which you store your log files). Also, since process information is maintained as directories in /proc, and a program like ps uses getdents on /proc to list processes, a similar technique can also be used to hide processes. There are lots of other interesting system calls you could hook, for example you could hook sys_read and just hide contents of certain parts of files. A very good utility which you may find handy is strace. This basically runs a program and finds what all system calls that program calls. So you can run, "strace ls" and find out which system calls ls uses. Hiding the Module ----------------- Well, what's the point of making a module which hides the log directory, when all you've to do is lsmod or cat /proc/modules to get your module's name in the list. Trying to give your module a harmless name is really lame - any decent sysad will realize the difference at once. One approach could be to hook the sys_read system call on /proc/modules and filter out references to our module. A more elegant approach arises from realizing how this list of modules is maintained. Go through the kernel sources and have a look at what the kernel is doing when it loads or unloads a module. The following is a listing from the module.c file in the kernel subdirectory of the linux-2.6.3 sources. ----listing begin----- /* Free a module, remove from lists, etc (must hold module mutex). */ static void free_module(struct module *mod) { /* Delete from various lists */ spin_lock_irq(&modlist_lock); list_del(&mod->list); spin_unlock_irq(&modlist_lock); /* Arch-specific cleanup. */ module_arch_cleanup(mod); /* Module unload stuff */ module_unload_free(mod); /* This may be NULL, but that's OK */ module_free(mod, mod->module_init); kfree(mod->args); if (mod->percpu) percpu_modfree(mod->percpu); /* Finally, free the core (containing the module structure) */ module_free(mod, mod->module_core); } ----listing end------- You see the list_del call. The kernel maintains records of all loaded modules in a linked list. When a module is unloaded, its entry is removed from this list, which is what the list_del call does in the free_module procedure. Now, if in our init function itself, we delete our module from this list, then our module becomes invisible. It also becomes impossible to unload this module. Some other important Issues --------------------------- Concurrency is a pretty important issue, especially with the 2.6 series of the kernel. A module should be written so as to avoid race conditions, etc. The LVTES code uses spin locks and atomic operations to address this issue. Another important thing to realize is that your variables are in kernel memory while data passed to system calls is in user memory. User memory has a habit of being paged out every now and then and therefore you should always use functions like copy_from_user and copy_to_user to transfer data from and to user memory. Conclusion ---------- This has been a brief introduction to the world of the linux kernel backdoors. This is just the beginning, its possible to do a lot more. Some ideas worth looking at could be to hook calls to hide specific processes, etc; sending out logged data over a network connection; hiding the fact that specific ports, etc are opened; not allowing sniffers and utilities like tcpdump to log certain packets; and the list could go on.