There are some terms in the world of containers that are worth understanding on their own.
An image is a particular container setup, typically some Linux operating system with a set of pre-installed programs and files. It captures the state that each container should have when it is first set up.
A container is an isolated lightweight virtual machine. When first created it has a state that perfectly matches some image; thereafter programs that run in the container can change that state in the way that programs usually do, adding and removing files and so on.
A runtime is the program that makes containers, loads images onto them, and runs them.
Runtimes are generally optimized for a small number of images, each of which take significant time to set up, and many containers, each of which can be set up quickly. That design lets us use one container per task we want to isolate even if the task will only take a second to run.
Docker popularized a way of interacting with containers directly from the command line. Docker’s command-line interface has been copied by other container runtimes such as podman and nerdctl.
Images are created by creating a file describing the setup steps and then running a tool that follows those steps to create an image.
The most common form of the describing file is called a Dockerfile1 Several tools require it to be named Dockerfile
with that capitalization and no file extension, which is why it is often capitalized even when not referring to the filename itself.. Each Dockerfile consists of a series of commands beginning with one of a pre-defined set of upper-case commands. There are many of these commands, but a common set are:
FROM
imagename
Initializes an image by copying another image. There’s always exactly one of these commands and it precedes other commands.
Starting from a pre-built image not only makes writing the Dockerfile easy, it also can save disk space because most container runtimes can do the copy in a by-reference instead of by-copy way.
A special command, FROM scratch
, copies nothing. It must be followed by some local-system-based commands like COPY
in order to get anything (even a basic shell) into the image.
RUN
shell command —or— [executable
, arg1
, “*arg2”, …]
Runs a command inside the image.
This step is often used to run installation commands, often using a package manager made available in an image loaded in the FROM
command. For example, the Dockerfile we used to create the image used in your VS Code used RUN
to run apt-get
, the Debian package manager.
Dockerfile RUN vs docker run
Because RUN
is inside a Dockerfile, it is run while setting up the image; changes it makes to the image persist and are copied into every subsequent container made from that image.
Once an image is created, commands can be run inside containers made from it using commands like docker run
. Those commands will impact the state of that container only; the image won’t be changed.
ENTRYPOINT
shell command —or— [executable
, arg1
, “*arg2”, …]
Sets the command to be run at the start of each new container made from the image. Typically this is used to pick a program like python
that has an interactive mode so that each container starts already in that mode.
CMD
shell command —or— [executable
, arg1
, “*arg2”, …]
Sets the default command to be run at the start of each new container made from the image. This is much like ENTRYPOINT
, but it can be overridden by giving a different command to the runtime when creating a container.
COPY
local/path /path/inside/image
Copies a file or directory tree from your computer into the image.
USER
username
Runs any commands that follow it and run in the image (such as RUN
and CMD
) as the given user (which must be a user that exists inside the image).
WORKDIR
/path/inside/image
The Dockerfile version of cd
.
Once a Dockerfile is ready, you can create an image by cd
ing into the directory that contains it and running docker build -t myimagename .
2 Note that the .
at the end of the docker build
command is name of the directory that contains the Dockerfile
and is required.
There are other ways to create images, but Dockerfiles are versatile and common enough to be sufficient for most needs.
Often created images are uploaded to an image registry; docker hub is one of the most popular, but most large-scale cloud providers have their own registries as well. If you try to use an image you don’t have in your runtime, either when creating a container or in the FROM
line of a dockerfile, then the runtime will check its default registry for an image of that name and download it if found.
Each image is stored internally with a large hexadecimal ID, and generally also with a human-friendly name.
docker images
lists all images that are usable by containers.
docker images --all
3 --all
can be abbreviated as -a
lists all images, even unnamed ones (typically unnamed because they are or once were dependencies (in a FROM
-like way) of named images).
docker rmi imagename
removes a single image.
docker image prune
removes all dangling
images (meaning old versions or dependencies of removed images).
docker system prune --all
4 --all
can be abbreviated -a
both removes any stopped containers and any images that are not used by running containers. Often, this is all images in the runtime.
To create a new container we must specify what image it uses. We do this with docker run imagename
By default, docker run
won’t connect the container to any resources other than the CPU. That means that it can’t display anything (it isn’t connected to the terminal), can’t use the internet, can’t modify any files we can see, and so on. Except for using some CPU cycles, it doesn’t seem to do anything.
To make a container be useful, we have to connect it to resources when we run it. There are many ways to do this, but three of the most common are listed here:
-it
5 -i
is short for --interactive
and -t
is short for --tty
maps the container’s stdin, stdout, and stderr to the terminal, allowing us to interact with it.
-p 5000:3456
6 -p
is short for --publish
causes network connections to port 5000
to be forwarded into the container and remapped to port 3456
inside it.
-v /real/host/path:/inner/path
7 -v
is short for --volume
causes the directory /real/host/path
on your disk to show up in the container as /inner/path
. This means that any changes the container makes to files in that path will be visible to you even after the container finishes.
Another very common docker run
argument is --rm
. This doesn’t give any new permissions to the container; rather, it tells the runtime to remove the container as soon as it is finished running.
Every use for containers that I’ve seen runs a program in a container and then is done with that container forever. The --rm
argument makes sense for these uses.
Docker also supports multiple runs of the same container. When you
docker run --name="mycontainer123" imagename
without --rm
, it’s entrypoint command runs, but after it returns the container still exists. I can thereafter run another command in the same container using something like
docker exec -it mycontainer123 python3 -m asyncio
to run the new command in the same container, complete with any state changes that were made by previous container runs.
Although this ability to continue using the same container with all of its state persisting sounds potentially very useful and very much like how most virtual machines work, I’ve yet to encounter a practical use of it except in tutorials and documentation describing how it works.
docker container ls
lists the currently running containers.
docker container ls --all
8 --all
can be abbreviated as -a
lists the currently exiting containers: both running and finished (but not removed).
docker container rm mycontainerid
removes a container that is not running.
docker container rm --force mycontainerid
9 --force
can be abbreviated as -f
removes a container whether it is running or not.
docker container prune
removes all containers that are not running.
Containers are widely used in large-scale web services where individual containers are created and run without direct human intervention.
Kubernetes is a widely-supported system for running containers on demand, often for the purpose of scaling up resources for highly concurrent tasks with highly uneven demand. It uses images just as described above, and spawns containers from them, but those containers are clustered in groups called pods
and distributed across multiple machines (which Kubernetes called nodes
) with jobs being assigned to individual containers by a system called the control plane.
This can, for example, allow a web app to run on just one container most of the time but scale up to thousands of containers on hundreds of servers during times of peak usage.
DevOps is an entire class of practices that focus on automating common tasks that software developers engage in. Two of the most common are continuous integration (CI) and continuous deployment (CD) CI systems watch a code repository for changes and when one occurs running automated tasks within the repository like running tests and generating documentation. CD systems are similar but focus on changing things not in the repository, for example by updating a website when a version of CI-test-passing code is provided to the code repository. Both generally operate by using containers to do all the actual work, helping ensure that no automated operation can accidentally polute the code repository.
Kubernetes and most DevOps tools use data files that describe the work to be done. These are often written in YAML, one of the more human-readable data file formats, and generally have a mix of use-case-specific operations (including triggers like when code arrives
and actions like copy the repo into a new directory
) and container operations (including what image to use, what resources to give the container, and how long to wait before force-removing it).
A full exploration of any of these tools is beyond the scope of this course.