This is an archived copy of a previous semester's site.
Please see the current semester's site.
It is often helpful to be able to isolate one application from another running on the same system.
Isolation can provide some security, keeping one compromised system from compromising others. Isolation can also provide functionality, allowing conflicting applications to coexist.
However, isolation is expensive, often requiring duplicated resources between applications and overhead in multiplexing hardware and other shared resources between the isolated apps. Isolation can also inhibit functionality, making communication between different parts of a system more complicated.
Because of these conflicting pros and cons, various forms of isolation are available with different forms best suited to different tasks.
As discussed on its own page, virtual memory is an almost-universally-implemented isolation tool that causes each process to have its own address space. Coupled with a few related techniques for separating registers values between processes, this means that each process’s local variables, data structures, and running code are isolated from those of other processes.
Virtual memory does not require total isolation. Its core operation is mapping between virtual and physical addresses, with a separate OS-managed map, called a page table
, for each process. The OS can put the same physical address in multiple processes’ page tables, creating shared memory.
Read-only shared memory is commonly used to allow many applications to use the same code for common library functions like malloc
and printf
without requiring that many copies of that code be present in memory. This works very well if all the processes desire the same version of that code, but if multiple versions exist then picking the right one for each copy becomes a challenge. Files containing code designed to be shared in this way are called shared object files
(.so) in most operating systems except those created by Microsoft, where they are called dynamically-linked library files
(.dll) instead. Getting the right versions of these shared pieces of code linked together often just works
, but when it does not it can be quite challenging to resolve.
Read/write shared memory is sometimes also used to communicate between processes. Setting up such memory requires special system calls and some way of having the communicating processes communicate to the OS which other processes they want to share the memory with; multiple approaches to resolving these challenges exist and are all outside the scope of this class.
In common operation, each process has its own virtual memory, but all processes share the same file system. A limited degree of isolation is supported by the file system itself using a facility known as permissions.
Several variants of the permission system exists, but the Unix model has become the most widespread and is what this section discusses.
The operating system keeps track of a set of users and a set of groups. Each process is being run as and each file is owned by one user and one group. Each file also has a nine-bit permission, treated as nine separate Boolean flags. Three of these flags explain what the processes belonging to the user may do with the file; three what processed belonging to the group may do with the file; and three what other processes may do with the file. For each, the three flags are r
, w
, and x
, with the following meanings
Flag | On a file | On a directory |
---|---|---|
r |
read the contents of the file, viewing its bytes | read the contents of the directory, listing the files and directories it contains |
w |
write the contents of the file (i.e. change its bytes) | write the contents of the directory, adding or remove files or directories from it |
x |
execute the file, running it as a program | traverse the directory, gaining access to the files and directories it contains |
There are some nuances about these permissions to note:
w
enables modifying the contents of the directory, but the contents are only visible if the directory also has x
permission; thus directory w
is meaningless unless paired with x
x
enables running the file, but it can only be run if it can be read; thus file x
is meaningless unless paired with r
r
vs x
If you have r
permissions on directory f/
but not x
permissions, ls f
will lists the files and directories inside f
, but cd f
or ls f/a
will fail with a permission denied error.
If you have x
permissions on directory f/
but not r
permissions, ls f
will fail with a permission denied error, but cd f
and ls f/a
will succeed (assuming f/a
exists).
Separating these two permissions seems strange and some non-Unix-like file systems merge the two into a single permission.
Setting the contents of a file needs w
permissions on that file.
Creating a file is adding something to the directory that contains it, and thus needs w
permissions on that directory, not that file.
If I have r
permissions on a file and w
permissions on its directory, I can (1) create a new file, (2) copy the files contents into the new file, (3) remove the old file, and (4) rename the new file to have the same name as the old file, thus simulating w
permission on the file. However, this is not quite the same as writing to the file because if some other process had the old file open when we did this 4-step process that process would not see any of this happen: removing a file does not invalidate open file handles against it and the operating system doesn’t actually reclaim the storage space used by the file until all such file handles are closed.
Traditionally, these permissions are ordered [user, group, other] with each being ordered [read, write, execute]. They are sometimes presented as a bitvector in octal (base-8) and sometimes as a letter for present permissions and a hyphen for missing permissions.
Both rwxr-xr--
and 0754
refer to the same permission set:
These file permissions, coupled with various permissions for changing a process’s user and group, create fairly course-grained but quite reliable isolation of parts of a file system. For example, the /usr
directory and the directories and files inside it (including most installed programs and libraries) are typically owned by root
with permissions rwxr-xr-x
, meaning only root
can change this part of the file system (i.e. install or uninstall programs) but everyone can list and run those programs.
Most operating systems that handle users and permissions in any way also flag one or a few user accounts as super users.
In Unix-derived OSes the only super-user account is called root
. In some other OSes any user account can be a super-user account by being marked as an administrator
account. A super-user account can ignore most or all permissions, doing things that other accounts cannot.
On the course VM, we give each of you a non-root
account but also give you permissions to run a special program sudo
, short for super-user do
, which will launch processes as root
instead of as you. This gives you near-total control over the system, letting you bypass most permissions, but also requires that you explicitly set out to do so by typing sudo
in front of commands that you want to violate normal permissions, hopefully preventing you from accidentally doing something you’ll regret.
User accounts and file system permissions can be seen as a way to limit certain system calls, notably those handling files, to provide more isolation between applications than virtual memory along provides. This idea of adding more constraints to specific system calls to add more isolation between processes in how those calls are handled can be extended in various ways.
One very successful example of modifying just a few system calls to provide much more isolation is the chroot
system call. The goal of this command is to let specific processes have much more limited file access than the usual permission system normally allows, limiting all of the processes activities to just a single directory and its subdirectories. It does this by changing what the root
of the file system (i.e. directory /
) is for that process.
After running chroot("/tmp/jail")
, a process can only access files in the "/tmp/jail"
directory tree. If it tries to fopen("/usr/bin/python3", "r")
it will instead get what all other processes call "/tmp/jail/usr/bin/python3"
.
Most isolation techniques need some way to make exceptions; chroot
can do this using hard links
, single files that appear in multiple paths within the file system. Other approaches to sharing some parts of a file system within a chroot jail have also been added to more recent directory isolation tools
While chroot is one of the most popular of these techniques, it is not the only one. The ability to open sockets can be disabled or replaced by some non-socket stub; the amount of CPU time or memory that can be accessed can be limited; and so on. Adding a new isolation option to a system call requires changing the system call code in the operating system kernel, which is a nontrivial process, so the set of isolations is somewhat limited, but as needs for new isolation options are recognized operating systems tend to respond by adding new options.
The word container
is used for many different kinds of isolation tools, but the most common meaning is for OS-level virtualization, which basically means a streamlined system for using a combination of all of the chroot-like isolated system call tools to create something that almost looks like an entirely new computer. The best-known container platform as of 2024 is Docker but there are many other container platforms with similar feature sets, each of which is tailored to appeal to a slightly different set of use cases.
Common to many of these container tools are the following:
Among many things that differ between container tools are
As of 2024, I would characterize the container space as mature enough to use in production, but exploring and expanding with new options added frequently and not yet in the contracting and standardizing
stage.
Kubernetes is a widely deployed tool for managing a large number of containers, possibly running on many different computers, and distributing work between them. Kubernetes also has other features and is out of scope for this class, but the name is often used in Container as a Service advertising and purchasing.
Containers operate by isolating the behavior of specific system calls. Virtual machines operate by isolating every machine code instruction that would engage the operating system in the first place: system calls, failed virtual memory lookups, divide-by-zero exceptions, and so on. Normally, these events each cause a special function called a handler
in the operating system to run. In virtualized mode, they are instead routed to a separate virtual OS’s handler, allowing a fully-fledged guest
OS to be installed as a virtual machine running inside a host
OS.
The host OS usually intervenes by pretending to be the various peripherals that a computer uses to connect with the world: screens and keyboards and mice and networks and disks and so on. The guest OS sees itself as if it were running on its own machine, but when it connects to anything outside the processor and memory, the host OS has the ability to see that connection and handle it however it wishes: forwarding it to actual hardware, translating it into some other kind of operation, ignoring it entirely, etc.
A number of virtual machines can share the same hardware, allowing fuller utilization of hardware resources. A single virtual machine can be transferred from one piece of hardware to another, allowing easier replacement of failing hardware and upgrades to new hardware. For these reasons, if you rent a computer
or server
that is not physically located in your building it is likely that what you are actually getting is a virtual machine.
The most complete way to isolate a process from others is to not actually run it at all, instead running a process that pretends to be a computer, parsing the machine code and updating process state to emulate computer state. Emulators let me run
code compiled for x86-64 on an Arm chip or vice versa, or even run
code compiled for x86-64 on an x86-64 chip without actually running it, instead running different instructions with side effects, behaviors, and limitations selected by the emulator designer.
Emulators can be incorporated into a full virtual machine system as part of an isolation system, but they can also be designed to translate one program into the most similar program for another ISA and run it directly, providing no isolation at all. I include them on this page not because they are always, or even often, used for strong isolation but because some other isolation tools, notably virtual machines, are often configured to use emulation if the ISA of the virtual machine and the ISA of the host machine are not the same.