Kernel mode

Your computer has at least two modes of operation: user mode and kernel mode. All the code you have ever written – and for most of you all the code you will ever write in the rest of your programming career – runs in user mode.

There are some things that user mode code cannot do on its own, including:

Each of these situations is handled by the process pausing user code, jumping to kernel code in kernel mode, and letting the kernel decide what to do from there. The kernel code, which is delivered as a key part of an operating system, has three common strategies it might follow:

1 System calls: explicit requests to the kernel

Q: How do you write data to a file or the terminal?
A: By calling a function like printf or fwrite.


Q: How do those functions work?
A: By doing some up-front work which you could code yourself if you wanted, and then calling write.


Q: How does write work?
A: By using a special assembly instruction called a System Call1 For strange historical reasons, sometimes a system call is called a trap or user interrupt instead. which freezes your code, switches to kernel mode, and lets the kernel code decide what to do based on the values your code left in registers before it used that instruction.


Each operating system has its own set of system calls. Each ISA has its own system call instruction. To help make code you write not depend too much on those details, the C standard library includes a set of thin wrapper functions around these system calls like open, read, write, and close for working with files or sbrk for adjusting how much heap memory your code has access to. Unless you are porting C to a new operating system or hardware, you can treat these functions as being system calls themselves.

1.1 File descriptors

C was developed alongside the operating system Unix; Unix became a model for many parts of most operating systems today. One very successful part of Unix’s design was unifying many different kernel-moderated activities in a common framework known as file descriptors.

From user code perspective a file descriptor is just a non-negative integer. This is conceptually an index into an array of connections managed by the kernel and might include things like attached terminals, open files, network connections, or any other communication- or storage-oriented interface.

In general, file descriptors are assumed to have a single-pass byte-oriented nature: once you read an array of bytes they have been read and the next time you read you’ll get the bytes after them; similarly, once you write some bytes they are written and the next time you write it will be the bytes that follow them. This is a great match for things like networks which transmit information in series; it’s also a common use case for managing files and other navigable array-of-bytes objects.

For things like files that keep their data around and can be asked for the same information over and over, file handles have an additional notion of a file offset which is the number of bytes into the file the next read or write will being at. Every read and write increases the file offset by the number of bytes read or written. The file offset can also be queried and directly modified using lseek.

Communicating over file descriptors often involves a significant wait time, as networks, disks, and so on tend to run much more slowly than processor code. Because of this, the system calls that use them tend to be blocking,2 In 2023 I wrote a short story about life as a process interacting with others using blocking system calls, hoping it would help the reader conceptualize how this works. meaning they suspend your program and let the computer do other things until the data you requested or provided is delivered.

By default, every C program starts with three open file descriptors:

In C++, these three are wrapped as the C input or cin, C output or cout, C error or cerr.

Pipes

Manipulating the three standard file descriptors is one of the sources of power in command-line programming. One of the most versatile of the file descriptor command-line tools is the pipe operator |. If I run ./a | ./b that means run both ./a and ./b; have ./a’s stdout go to ./b’s stdin. Many command line tools are specifically designed to use stdin and stdout in a way that makes combining applications with pipes quite useful

1.2 FILE *

The recommended way of working with files in C is not to use open/read/write/close/lseek directly, but rather to use various functions that work with FILE * like fopen/fread/fwrite/fclose/fseek/ftell; or nicer wrappers around them like fprintf/fscanf/fputs, or versions that default to FILE *stdin or FILE *stdout like printf/scanf/puts.

FILE * is what is known as an opaque pointer, meaning it is a pointer to some type but the library isn’t telling you want that type looks like. Internally, it does something called buffered I/O: it has a file descriptor, but also a largish buffer (an array of bytes). When you ask for a small amount of data it instead loads a large amount of data into the buffer with a single system call, then handles your next several requests from that buffer without needing more system calls until you use up the entire buffer. This can make your code much faster in practice because it talks to the operating system less often.

2 Invisible context switches

Sometimes your code switches into kernel mode without your asking it to do so. Common reasons include:

In general, faults and interrupts will be handled without your code even knowing they happened. They might result in a context switch meaning your code will be paused and another program run for a while before your code is resumed. They might result in new resources being allocated or other kernel-managed modifications of the system to meet your request. They might result in data becoming available to a blocked file descriptor and the blocked code unblocking. And in some cases they might result in the kernel passing a failure back to user code using a signal.

A common kind of fault is called a page fault and occurs when your program accesses a page of memory that it hasn’t accessed before. This is a key part of how operating systems control how much memory each application receives.

It is common for operating systems to set up a timer that interrupts your code at least 100 times per second so they can check if there are other programs that should run instead and do other bookkeeping.