Using Containers

1 Vocabulary

There are some terms in the world of containers that are worth understanding on their own.

An image is a particular container setup, typically some Linux operating system with a set of pre-installed programs and files. It captures the state that each container should have when it is first set up.

A container is an isolated lightweight virtual machine. When first created it has a state that perfectly matches some image; thereafter programs that run in the container can change that state in the way that programs usually do, adding and removing files and so on.

A runtime is the program that makes containers, loads images onto them, and runs them.

Runtimes are generally optimized for a small number of images, each of which take significant time to set up, and many containers, each of which can be set up quickly. That design lets us use one container per task we want to isolate even if the task will only take a second to run.

2 Manual container management

Docker popularized a way of interacting with containers directly from the command line. Docker’s command-line interface has been copied by other container runtimes such as podman and nerdctl.

2.1 Creating images

Images are created by creating a file describing the setup steps and then running a tool that follows those steps to create an image.

The most common form of the describing file is called a Dockerfile1 Several tools require it to be named Dockerfile with that capitalization and no file extension, which is why it is often capitalized even when not referring to the filename itself.. Each Dockerfile consists of a series of commands beginning with one of a pre-defined set of upper-case commands. There are many of these commands, but a common set are:

Once a Dockerfile is ready, you can create an image by cding into the directory that contains it and running docker build -t myimagename .2 Note that the . at the end of the docker build command is name of the directory that contains the Dockerfile and is required.

There are other ways to create images, but Dockerfiles are versatile and common enough to be sufficient for most needs.

Often created images are uploaded to an image registry; docker hub is one of the most popular, but most large-scale cloud providers have their own registries as well. If you try to use an image you don’t have in your runtime, either when creating a container or in the FROM line of a dockerfile, then the runtime will check its default registry for an image of that name and download it if found.

2.2 Listing and deleting images

Each image is stored internally with a large hexadecimal ID, and generally also with a human-friendly name.

2.3 Creating containers

To create a new container we must specify what image it uses. We do this with docker run imagename

By default, docker run won’t connect the container to any resources other than the CPU. That means that it can’t display anything (it isn’t connected to the terminal), can’t use the internet, can’t modify any files we can see, and so on. Except for using some CPU cycles, it doesn’t seem to do anything.

To make a container be useful, we have to connect it to resources when we run it. There are many ways to do this, but three of the most common are listed here:

Another very common docker run argument is --rm. This doesn’t give any new permissions to the container; rather, it tells the runtime to remove the container as soon as it is finished running.

Multiple-run containers

Every use for containers that I’ve seen runs a program in a container and then is done with that container forever. The --rm argument makes sense for these uses.

Docker also supports multiple runs of the same container. When you

docker run --name="mycontainer123" imagename

without --rm, it’s entrypoint command runs, but after it returns the container still exists. I can thereafter run another command in the same container using something like

docker exec -it mycontainer123 python3 -m asyncio

to run the new command in the same container, complete with any state changes that were made by previous container runs.

Although this ability to continue using the same container with all of its state persisting sounds potentially very useful and very much like how most virtual machines work, I’ve yet to encounter a practical use of it except in tutorials and documentation describing how it works.

2.4 Listing and deleting containers

3 Scaling and DevOps

Containers are widely used in large-scale web services where individual containers are created and run without direct human intervention.

Kubernetes is a widely-supported system for running containers on demand, often for the purpose of scaling up resources for highly concurrent tasks with highly uneven demand. It uses images just as described above, and spawns containers from them, but those containers are clustered in groups called pods and distributed across multiple machines (which Kubernetes called nodes) with jobs being assigned to individual containers by a system called the control plane. This can, for example, allow a web app to run on just one container most of the time but scale up to thousands of containers on hundreds of servers during times of peak usage.

DevOps is an entire class of practices that focus on automating common tasks that software developers engage in. Two of the most common are continuous integration (CI) and continuous deployment (CD) CI systems watch a code repository for changes and when one occurs running automated tasks within the repository like running tests and generating documentation. CD systems are similar but focus on changing things not in the repository, for example by updating a website when a version of CI-test-passing code is provided to the code repository. Both generally operate by using containers to do all the actual work, helping ensure that no automated operation can accidentally polute the code repository.

Kubernetes and most DevOps tools use data files that describe the work to be done. These are often written in YAML, one of the more human-readable data file formats, and generally have a mix of use-case-specific operations (including triggers like when code arrives and actions like copy the repo into a new directory) and container operations (including what image to use, what resources to give the container, and how long to wait before force-removing it).

A full exploration of any of these tools is beyond the scope of this course.