Running multiple operating systems simultaneously over a single hardware platform has become popular. Its first benefit is that we can use application programs written for different operating systems simultaneously on a single computer. The second benefit is that we can execute several versions of a single operating system on the same platform. Sometime we run multiple copies of a single operating system for realizing virtual hosting and honeypots.
There are two major methods to run multiple operating systems over a single hardware platform. The first method is to use virtual machines [Bochs] [VMware]. Typical virtual machines provide isolated execution environments for multiple operating system kernels, which can run over the native hardware. The other method is to implement user-level operating systems. A user-level operating system is an operating system that runs as a regular user process on another host operating system. For example, User Mode Linux (UML) [UML] is a user-level OS that runs on Linux and provides another Linux system image.
Conventional user-level OSes view the underlying host operating system as a specific hardware architecture. Therefore, the implementation of a user-level OS often requires porting of an existing kernel to a new hardware architecture. For example, User Mode Linux adds a new architecture called um based on the i386 architecture (Figure 1). In general, such porting involves significant implementation effort, and requires detailed knowledge about the kernel and the base and new architectures. In porting of User Mode Linux, the size of the new um-dependent part is 33,000 lines while the size of the base i386-dependent part is 40,000 lines.
To address the porting problem, we are developing a lightweight virtual machine for running user-level OSes [BSDCon03] [LilyVM2003]. In our implementation method of user-level operating systems, the real CPU executes most instructions of an operating system kernel. Since the CPU cannot execute all the instructions at the user mode, we supplement the real CPU with a lightweight virtual machine. A lightweight virtual machine or a partial emulator emulates a part of a computer, such as privileged instructions, hardware interrupts, and the interaction with some peripheral devices. In contrast, we refer to a program that executes all instructions as a heavyweight virtual machine or a full emulator.
Using a lightweight virtual machine has the following advantages. First, we do not have port existing native OSes to host OSes. We can generate a user-level OS based on a native OS without detailed knowledge about user-level OS internals (Figure 2). Second, compared with Bochs, a full PC emulator, we get much better performance. A limitation of our method is that we need source code of the user-level OS. Furthermore, the current NetBSD on our lightweight virtual machine is slower than NetBSD on VMware and User Mode Linux on Linux.
In the following three sections (2-4), we describe three key techniques to run a user-level OS on our lightweight virtual machine. Section 5 describes modifications of the NetBSD kernel and the FreeBSD kernel for running as user processes. Section 6 shows the performance of the user-level NetBSD. Section 7 summarizes related work. Section 8 concludes the article and shows future directions.
Since an operating system kernel includes privileged instructions, the real CPU cannot execute the kernel in the user mode. We supplement this with a lightweight virtual machine.
Figure 3 shows execution of a user-level OS (including its user processes) and a lightweight VM. They are running at the same position as other processes of the host OS. The machine instructions of those processes are interpreted by the real CPU. In addition to the real CPU, the lightweight virtual machine interprets some instructions of the user-level OS and its processes. Those processes also issue system calls to the host OS.
Since the CPU cannot execute privileged instructions at the user mode, the lightweight virtual machine has to emulate the privileged instructions. This emulation is not a problem because the lightweight virtual machine can detect the execution of the privileged instructions through the signal and the system call tracing facility.
The real problem on emulation is to detect the executions of some non-privileged instructions that are tightly coupled with corresponding privileged instructions. For example,
ltr (load task register) of IA-32 is a privileged instruction while
str (store task register) is a non-privileged instruction.
We have to detect the executions of both
To detect execution of such non-privileged instructions, we rewrite the privileged and non-privileged instructions with subroutine calls that emulate these instructions. An example of rewriting follows:
Before: ... ltr %eax ... str %eax ... After: ... call ltr_eax ... call str_eax ...The subroutines
str_eaxperform emulation of the corresponding instructions. We also perform inlining for simple instructions.
Note that we only have to rewrite the kernel-level program, and we do not have to rewrite application level-level programs. Therefore, we can execute same binaries for native OSes on user-level OSes.
Since the rewriting is performed by an assembler preprocessor, we do not have to extract files to be rewritten.
The preprocessor accepts both assembly source files and the output files of the C compiler from the C source files containing the
We have implemented a lightweight virtual machine for IA-32. The virtual machine consists of 1100 lines of C code in a separated process, and 250 lines of C code and 300 lines of assembly language code included in the user-level OS. Our lightweight virtual machine is much smaller than Bochs that consists of 50,000 lines of C++ code.
When a user process on a user-level OS issues a system call or causes a page fault, the host OS should not interpret the event by itself. Instead, the host OS should notify the user-level OS of the event (a system call or page fault).
To implement the system call redirection, we use the process trace facility of Linux as User Mode Linux [UML].
In Linux, if a parent (or tracing) process specifies
PTRACE_SYSCALL to the system call
ptrace(), the child (or traced) process continues execution as for
PTRACE_CONT, but it will stop on entry or exit of the system call.
When the child process is stopped, the parent process can examine and modify the CPU registers and arguments or results of the system call.
When a user process of a user-level OS issues a system call, the process of the host OS is stopped on entry of the system call.
The lightweight virtual machine changes the system call number with an illegal one and continues the execution.
Next, the host OS tries to execute the body of the system call in a regular way.
However, the host OS cannot execute it because the system call number is wrong.
Therefore, the host OS sets an error value and notifies the lightweight virtual machine of exiting of the system call.
Next, the lightweight virtual machine reconstructs an interrupt frame on the user-level OS.
Finally, the lightweight virtual machine changes the program counter and switches to the user-level OS.
Page faults (
SIGSEGV) are handled in a similar way as system calls.
At first, we developed our lightweight virtual machine on Linux to run NetBSD as a user-level OS.
After we had succeeded in running NetBSD on Linux, we started running NetBSD on NetBSD.
We found that the unmodified NetBSD does not provide enough facilities to implement our partial emulator.
To run NetBSD on NetBSD, we decided to add a new facility to the host NetBSD kernel.
The basic task is to introduce the
PTRACE_SYSCALL facility of Linux to NetBSD.
We also add a facility to get the values of special registers (the error code, cr2, and the trap number).
In addition to NetBSD, we have also studied FreeBSD 5.0.
The process trace facility of FreeBSD through the /proc file system is not enough to implement our partial emulator.
In FreeBSD, the
strace command uses the /proc file system to control a child process to get the arguments and results of system calls.
FreeBSD provides enough facilities for the
However, when a process is stopped on the entry of a system call, the system call number has been copied to a local variable, so we cannot changed the system call number by changing the content of the register
Therefore, the /proc file system of FreeBSD 5.0 is not enough to implement system call redirection.
We have confirmed that we can change the system call number by moving the stop point earlier. By using this facility, we are now porting our lightweight virtual machine to FreeBSD.
To run user-level OSes, the lightweight virtual machine has to emulate essential peripheral devices. Unlike the full PC emulator Bochs [Bochs], the lightweight virtual machine provides a minimum number of peripheral devices, including serial ports, the timer for periodic interrupt, and the real time clock (RTC).
For hard disks, we have developed a simple device driver instead of emulating IDE or SCSI disks. This method simplifies the implementation of our lightweight VM. As a base device driver, we use the memory disk driver of NetBSD. The memory disk of NetBSD is a block device, and it does not use interrupt for I/O. Instead, the memory disk reads and writes a memory region in the kernel. We have changed these operations with the system calls for the host OS.
To emulate Memory Management Unit (MMU), we use the system call
munmap() of the host OS.
Currently, we use a single process of the host OS for running all the user processes of the user-level OS.
Therefore, on context switching, we have to replace all the contents of MMU.
In other words, we have to issue a number of system call
We have a plan to use several processes of the host OS as MMU context caches.
In the implementation of a user-level OS, the final goal is to generate the user-level OS from the corresponding native OS for the bare hardware automatically. However, we had to slightly modify the native NetBSD and FreeBSD. In this section, we show the modifications to NetBSD and FreeBSD for running them as user-level OSes.
We ran NetBSD 1.5.2 as a user process by changing 6 constants to adjust the address space and removing device drivers from the configuration file.
The base address of NetBSD is changed from
0xa000000 because the memory region after
0xc0000000 is occupied by the host operating system kernel.
Note that this modification is achieved without detailed knowledge about the NetBSD kernel.
This is the significant difference from conventional user-level OSes, such as User Mode Linux. User Mode Linux requires adding a new architecture called um.
The code size under the
um directory is 33,000 lines, and this is comparable with the code size of the native i386 architecture (44,000 lines).
This porting may cause a maintenance problem. When the native i386 architecture gets a new facility, the um architecture has to catch up the facility manually.
In contrast, the core of our user-level NetBSD is automatically generated from the native i386 NetBSD.
Therefore, we can follow the evolution of native i386 NetBSD more easily.
We have also executed the FreeBSD 4.7 kernel as a user process. In addition to address constants, we have changed the places that call BIOS. We have simply commented out the places and replaced with the code that returns parameters, such as the size of memory and the type of CPU. Since we did not have BIOS code, changing the FreeBSD kernel was much easier than implementing BIOS. Furthermore, changing the kernel reduces the effort to implement the partial emulator. Calling BIOS requires emulation of the virtual 8086 mode of Pentium. Our partial emulator does not have that facility.
We made an experiment to measure the performance of our user-level OS (NetBSD 1.5.2). In this experiment, we use a PC with a Pentium III 1GHz and 512M bytes of main memory. The host operating system for our lightweight virtual machine is modified NetBSD 1.6.1 or unmodified Debian 3.0 with the Linux kernel 2.4.20. We also run NetBSD on VMware Workstation 3.2 on Linux, NetBSD on Bochs 2.02 on NetBSD, and User Mode Linux 2.4.20-uml-6 on Linux 2.4.20.
We ran the make command for compiling the GNU patch command (Version 2.5.4), and measured the execution times.
The source code of the patch command consists of 15 C files and 17 headers. Total length of those files is 9200 lines or 244 k bytes.
Although the source files are same on NetBSD and Linux, the header files in
/usr/include are different.
In this compilation, total 460 header files (2 M bytes) are included in NetBSD while 770 header files (6 M bytes) are included in Linux.
Therefore, the execution time on NetBSD/Physical is shorter than that on Linux/Physical.
The result is shown in Table 1. In Table 1, `LVM' sands for our lightweight virtual machine.
NetBSDs on our partial emulators were faster than NetBSD on Bochs by a factor of 10. However, they were slower than NetBSD on the physical PC and VMware by a factor of 15 and 4, respectively. This slowdown is cased by overheads of the system call redirection and the MMU emulation
OS/Environment time(sec) NetBSD/LVM/Linux 13.7 NetBSD/LVM/NetBSD 20.4 NetBSD/Physical 3.6 NetBSD/Bochs/Linux 550. NetBSD/VMware/Linux 3.9 Linux/Physical 4.1 User Mode Linux/Linux 9.5
In this article, we have describe the lightweight virtual machine for running user-level operating systems. One of the main feature of the virtual machine is that its implementation is simplified by cooperating with the host OSes. For example, the host OSes are translated at compile time to detect execution of some non-privileged instructions, so the lightweight virtual machine does not have to analyzes the user-level OS code at run time. Furthermore, we embedded a special device driver of disks instead of emulating physical disks. The lightweight virtual machine can execute NetBSD and FreeBSD on unmodified Linux and slightly modified NetBSD. As a result, the lightweight virtual machine enables to implement user-level NetBSD and FreeBSD without large porting effort.
We are now implementing networking facilities. A user-level OS can be connected to the Internet by using PPP over a serial port. We are trying to eliminate the PPP daemon running on the host operating system.
The development of this lightweight virtual machine is partially supported by the Mito software creation grant of The Information-technology Promotion Agency (IPA), under Project Manager Masami Hagiya.
Copyright © 2003, Infosecwriters.com and Authors. All rights reserved.