Hitchhiker's World (Issue #8)
http://www.infosecwriters.com/hhworld/

A lightweight virtual machine for running user-level operating systems

Hideki Eiraku
College of Information Sciences, University of Tsukuba
mailto: hdk /at/ coins.tsukuba.ac.jp
http://www.coins.tsukuba.ac.jp/~hdk/

Yasushi Shinjo
Institute of Information Sciences and Electronics, University of Tsukuba
mailto: yas /at/ is.tsukuba.ac.jp
http://www.is.tsukuba.ac.jp/~yas/

Abstract

A user-level operating system (OS) is an operating system that runs as a regular user process on another host operating system. This article describes a lightweight virtual machine (VM) for running user-level OSes. Unlike normal virtual machines, a lightweight virtual machine emulates a part of a computer, such as privileged instructions and serial ports, and most instructions are executed by the real CPU. The implementation of the lightweight virtual machine is simplified by cooperating with the host OSes. The current lightweight virtual machine can execute NetBSD and FreeBSD as user-level OSes on NetBSD or Linux as host OSes.

Table of Contents

1 Introduction

Running multiple operating systems simultaneously over a single hardware platform has become popular. Its first benefit is that we can use application programs written for different operating systems simultaneously on a single computer. The second benefit is that we can execute several versions of a single operating system on the same platform. Sometime we run multiple copies of a single operating system for realizing virtual hosting and honeypots.

There are two major methods to run multiple operating systems over a single hardware platform. The first method is to use virtual machines [Bochs] [VMware]. Typical virtual machines provide isolated execution environments for multiple operating system kernels, which can run over the native hardware. The other method is to implement user-level operating systems. A user-level operating system is an operating system that runs as a regular user process on another host operating system. For example, User Mode Linux (UML) [UML] is a user-level OS that runs on Linux and provides another Linux system image.

Conventional user-level OSes view the underlying host operating system as a specific hardware architecture. Therefore, the implementation of a user-level OS often requires porting of an existing kernel to a new hardware architecture. For example, User Mode Linux adds a new architecture called um based on the i386 architecture (Figure 1). In general, such porting involves significant implementation effort, and requires detailed knowledge about the kernel and the base and new architectures. In porting of User Mode Linux, the size of the new um-dependent part is 33,000 lines while the size of the base i386-dependent part is 40,000 lines.

Figure 1: The porting problem in a conventional user-level operating system (User Mode Linux)
Porting arch/i386 to arch/um.

To address the porting problem, we are developing a lightweight virtual machine for running user-level OSes [BSDCon03] [LilyVM2003]. In our implementation method of user-level operating systems, the real CPU executes most instructions of an operating system kernel. Since the CPU cannot execute all the instructions at the user mode, we supplement the real CPU with a lightweight virtual machine. A lightweight virtual machine or a partial emulator emulates a part of a computer, such as privileged instructions, hardware interrupts, and the interaction with some peripheral devices. In contrast, we refer to a program that executes all instructions as a heavyweight virtual machine or a full emulator.

Using a lightweight virtual machine has the following advantages. First, we do not have port existing native OSes to host OSes. We can generate a user-level OS based on a native OS without detailed knowledge about user-level OS internals (Figure 2). Second, compared with Bochs, a full PC emulator, we get much better performance. A limitation of our method is that we need source code of the user-level OS. Furthermore, the current NetBSD on our lightweight virtual machine is slower than NetBSD on VMware and User Mode Linux on Linux.

Figure 2: Execution of a user-level operating system with the lightweight virtual machine and the real CPU.
Translating a native OS into a user-level OS by an assembler preprocessor.

In the following three sections (2-4), we describe three key techniques to run a user-level OS on our lightweight virtual machine. Section 5 describes modifications of the NetBSD kernel and the FreeBSD kernel for running as user processes. Section 6 shows the performance of the user-level NetBSD. Section 7 summarizes related work. Section 8 concludes the article and shows future directions.

2 Detecting execution of privileged instructions and some non-privileged instructions

Since an operating system kernel includes privileged instructions, the real CPU cannot execute the kernel in the user mode. We supplement this with a lightweight virtual machine.

Figure 3 shows execution of a user-level OS (including its user processes) and a lightweight VM. They are running at the same position as other processes of the host OS. The machine instructions of those processes are interpreted by the real CPU. In addition to the real CPU, the lightweight virtual machine interprets some instructions of the user-level OS and its processes. Those processes also issue system calls to the host OS.

Figure 3: Execution of a user-level operating system with the real CPU, the lightweight virtual machine, and the host operating system
A user-level OS is executed by the real CPU and the lightweight virtual machine.

Since the CPU cannot execute privileged instructions at the user mode, the lightweight virtual machine has to emulate the privileged instructions. This emulation is not a problem because the lightweight virtual machine can detect the execution of the privileged instructions through the signal and the system call tracing facility.

The real problem on emulation is to detect the executions of some non-privileged instructions that are tightly coupled with corresponding privileged instructions. For example, ltr (load task register) of IA-32 is a privileged instruction while str (store task register) is a non-privileged instruction. We have to detect the executions of both ltr and str.

To detect execution of such non-privileged instructions, we rewrite the privileged and non-privileged instructions with subroutine calls that emulate these instructions. An example of rewriting follows:

Before: 
        ...
        ltr     %eax
        ...
        str     %eax
        ...
After: 
        ...
        call    ltr_eax
        ...
        call    str_eax
        ...
The subroutines ltr_eax and str_eax perform emulation of the corresponding instructions. We also perform inlining for simple instructions.

Note that we only have to rewrite the kernel-level program, and we do not have to rewrite application level-level programs. Therefore, we can execute same binaries for native OSes on user-level OSes.

Since the rewriting is performed by an assembler preprocessor, we do not have to extract files to be rewritten. The preprocessor accepts both assembly source files and the output files of the C compiler from the C source files containing the asm() statement.

We have implemented a lightweight virtual machine for IA-32. The virtual machine consists of 1100 lines of C code in a separated process, and 250 lines of C code and 300 lines of assembly language code included in the user-level OS. Our lightweight virtual machine is much smaller than Bochs that consists of 50,000 lines of C++ code.

3 Redirection of system calls and page faults

When a user process on a user-level OS issues a system call or causes a page fault, the host OS should not interpret the event by itself. Instead, the host OS should notify the user-level OS of the event (a system call or page fault).

To implement the system call redirection, we use the process trace facility of Linux as User Mode Linux [UML]. In Linux, if a parent (or tracing) process specifies PTRACE_SYSCALL to the system call ptrace(), the child (or traced) process continues execution as for PTRACE_CONT, but it will stop on entry or exit of the system call. When the child process is stopped, the parent process can examine and modify the CPU registers and arguments or results of the system call.

When a user process of a user-level OS issues a system call, the process of the host OS is stopped on entry of the system call. The lightweight virtual machine changes the system call number with an illegal one and continues the execution. Next, the host OS tries to execute the body of the system call in a regular way. However, the host OS cannot execute it because the system call number is wrong. Therefore, the host OS sets an error value and notifies the lightweight virtual machine of exiting of the system call. Next, the lightweight virtual machine reconstructs an interrupt frame on the user-level OS. Finally, the lightweight virtual machine changes the program counter and switches to the user-level OS. Page faults (SIGSEGV) are handled in a similar way as system calls.

At first, we developed our lightweight virtual machine on Linux to run NetBSD as a user-level OS. After we had succeeded in running NetBSD on Linux, we started running NetBSD on NetBSD. We found that the unmodified NetBSD does not provide enough facilities to implement our partial emulator. To run NetBSD on NetBSD, we decided to add a new facility to the host NetBSD kernel. The basic task is to introduce the PTRACE_SYSCALL facility of Linux to NetBSD. We also add a facility to get the values of special registers (the error code, cr2, and the trap number).

In addition to NetBSD, we have also studied FreeBSD 5.0. The process trace facility of FreeBSD through the /proc file system is not enough to implement our partial emulator. In FreeBSD, the strace command uses the /proc file system to control a child process to get the arguments and results of system calls. FreeBSD provides enough facilities for the strace command. However, when a process is stopped on the entry of a system call, the system call number has been copied to a local variable, so we cannot changed the system call number by changing the content of the register eax. Therefore, the /proc file system of FreeBSD 5.0 is not enough to implement system call redirection.

We have confirmed that we can change the system call number by moving the stop point earlier. By using this facility, we are now porting our lightweight virtual machine to FreeBSD.

4 Emulation of essential peripheral devices

To run user-level OSes, the lightweight virtual machine has to emulate essential peripheral devices. Unlike the full PC emulator Bochs [Bochs], the lightweight virtual machine provides a minimum number of peripheral devices, including serial ports, the timer for periodic interrupt, and the real time clock (RTC).

For hard disks, we have developed a simple device driver instead of emulating IDE or SCSI disks. This method simplifies the implementation of our lightweight VM. As a base device driver, we use the memory disk driver of NetBSD. The memory disk of NetBSD is a block device, and it does not use interrupt for I/O. Instead, the memory disk reads and writes a memory region in the kernel. We have changed these operations with the system calls for the host OS.

To emulate Memory Management Unit (MMU), we use the system call mmap() and munmap() of the host OS. Currently, we use a single process of the host OS for running all the user processes of the user-level OS. Therefore, on context switching, we have to replace all the contents of MMU. In other words, we have to issue a number of system call mmap() and munmap(). We have a plan to use several processes of the host OS as MMU context caches.

5 Modifications to NetBSD and FreeBSD for running as user-level operating systems

In the implementation of a user-level OS, the final goal is to generate the user-level OS from the corresponding native OS for the bare hardware automatically. However, we had to slightly modify the native NetBSD and FreeBSD. In this section, we show the modifications to NetBSD and FreeBSD for running them as user-level OSes.

5.1 Modifications to NetBSD for running as a user process

We ran NetBSD 1.5.2 as a user process by changing 6 constants to adjust the address space and removing device drivers from the configuration file. The base address of NetBSD is changed from 0xc0000000 to 0xa000000 because the memory region after 0xc0000000 is occupied by the host operating system kernel.

Note that this modification is achieved without detailed knowledge about the NetBSD kernel. This is the significant difference from conventional user-level OSes, such as User Mode Linux. User Mode Linux requires adding a new architecture called um. The code size under the um directory is 33,000 lines, and this is comparable with the code size of the native i386 architecture (44,000 lines). This porting may cause a maintenance problem. When the native i386 architecture gets a new facility, the um architecture has to catch up the facility manually. In contrast, the core of our user-level NetBSD is automatically generated from the native i386 NetBSD. Therefore, we can follow the evolution of native i386 NetBSD more easily.

5.2 Modifications to FreeBSD for running as a user process

We have also executed the FreeBSD 4.7 kernel as a user process. In addition to address constants, we have changed the places that call BIOS. We have simply commented out the places and replaced with the code that returns parameters, such as the size of memory and the type of CPU. Since we did not have BIOS code, changing the FreeBSD kernel was much easier than implementing BIOS. Furthermore, changing the kernel reduces the effort to implement the partial emulator. Calling BIOS requires emulation of the virtual 8086 mode of Pentium. Our partial emulator does not have that facility.

6 Experiments

We made an experiment to measure the performance of our user-level OS (NetBSD 1.5.2). In this experiment, we use a PC with a Pentium III 1GHz and 512M bytes of main memory. The host operating system for our lightweight virtual machine is modified NetBSD 1.6.1 or unmodified Debian 3.0 with the Linux kernel 2.4.20. We also run NetBSD on VMware Workstation 3.2 on Linux, NetBSD on Bochs 2.02 on NetBSD, and User Mode Linux 2.4.20-uml-6 on Linux 2.4.20.

We ran the make command for compiling the GNU patch command (Version 2.5.4), and measured the execution times. The source code of the patch command consists of 15 C files and 17 headers. Total length of those files is 9200 lines or 244 k bytes. Although the source files are same on NetBSD and Linux, the header files in /usr/include are different. In this compilation, total 460 header files (2 M bytes) are included in NetBSD while 770 header files (6 M bytes) are included in Linux. Therefore, the execution time on NetBSD/Physical is shorter than that on Linux/Physical.

The result is shown in Table 1. In Table 1, `LVM' sands for our lightweight virtual machine.

NetBSDs on our partial emulators were faster than NetBSD on Bochs by a factor of 10. However, they were slower than NetBSD on the physical PC and VMware by a factor of 15 and 4, respectively. This slowdown is cased by overheads of the system call redirection and the MMU emulation

Table 1:The execution times of compilation in seconds.
OS/Environment time(sec)
NetBSD/LVM/Linux 13.7
NetBSD/LVM/NetBSD 20.4
NetBSD/Physical 3.6
NetBSD/Bochs/Linux 550.
NetBSD/VMware/Linux 3.9
Linux/Physical 4.1
User Mode Linux/Linux 9.5

7 Related work

Emulating a part of computer hardware is a trend in recent virtual machine research. For example, Xen and Denali call such emulation paravirtualization. Xen is a mainframe-type virtual machine monitor for IA-32 [Xen03]. Xen replaces hardware interrupts with its own event system. Xen allows guest operating systems read access to page tables but it mediates write access. Xen hosts Linux and Windows XP. Denali is a virtual machine monitor for distributed and networked systems [Denali03]. Denali has its own memory management mechanism and interrupts. Denali runs thousands of the Ilwaco guest OS. Compared with those systems, our virtual machine differs in that we use a language processor (the assembler preprocessor). Both Xen and Denali require porting effort when they need to execute existing native operating systems.

8 Conclusion

In this article, we have describe the lightweight virtual machine for running user-level operating systems. One of the main feature of the virtual machine is that its implementation is simplified by cooperating with the host OSes. For example, the host OSes are translated at compile time to detect execution of some non-privileged instructions, so the lightweight virtual machine does not have to analyzes the user-level OS code at run time. Furthermore, we embedded a special device driver of disks instead of emulating physical disks. The lightweight virtual machine can execute NetBSD and FreeBSD on unmodified Linux and slightly modified NetBSD. As a result, the lightweight virtual machine enables to implement user-level NetBSD and FreeBSD without large porting effort.

We are now implementing networking facilities. A user-level OS can be connected to the Internet by using PPP over a serial port. We are trying to eliminate the PPP daemon running on the host operating system.

Acknowledgment

The development of this lightweight virtual machine is partially supported by the Mito software creation grant of The Information-technology Promotion Agency (IPA), under Project Manager Masami Hagiya.

Bibliography

[Bochs]
Kevin Lawton, Bryce Denney, N.David Guarneri, Volker Ruppert, Christophe Bothamy, and Michael Calabrese.
Bochs x86 PC emulator Users Manual, 2003.
http://bochs.sourceforge.net/
[VMware]
Jeremy Sugerman, Ganesh Venkitachalam, and Beng-Hong Lim.
Virtualizing I/O devices on VMware workstation's hosted virtual machine monitor.
In Proceeding of USENIX Annual Technical Conference, 2001.
[UML]
Jeff Dike.
A user-mode port of the Linux kernel.
the 4th Annual Linux Showcase & Conference, 2000.
http://user-mode-linux.sourceforge.net/
[BSDCon03]
Hideki Eiraku and Yasushi Shinjo.
Running BSD Kernels as User Processes by Partial Emulation and Rewriting of Machine Instructions.
In Proceeding of USENIX BSD Conference 2003 (BSDCon'03), pp.91-102, 2003.
http://www.usenix.org/events/bsdcon03/tech/eiraku.html
[LilyVM2003]
Hideki Eiraku and Yasushi Shinjo.
LilyVM, A lightweight virtual machine for running user-level operating systems.
2003.
http://lilyvm.sourceforge.net/
[Xen03]
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho and Rolf Neugebauer.
Xen and the art of virtualization.
In Proceedings of the nineteenth ACM symposium on Operating systems principles (SOSP19), pp.164-177, 2003.
http://www.cl.cam.ac.uk/Research/SRG/netos/xen/
[Denali03]
Andrew Whitaker, Marianne Shaw, and Steven D. Gribble.
Scale and Performance in the Denali Isolation Kernel.
In Proceedings of USENIX 5th Symposium on Operating Systems Design and Implementation (OSDI) , 2002.
http://denali.cs.washington.edu/
Yasushi Shinjo <yas /at/ is.tsukuba.ac.jp>



Copyright © 2003, Infosecwriters.com and Authors. All rights reserved.