Docker

Introduction

Docker is a company producing an open-source "software container platform", but the word is also used is also used to refer to the platform itself. Docker allows a user to collect all the code needed to run an application, in our case a grid simulation job, into an image. This image will contain everything from the OS-level utilities and libraries, to the lab UPS products, to the collaboration code, to our grid scripts. This image starts with a OS image, usually provided by the OS experts, and uploaded to the Docker repository. The user then adds layers of software to this base image. The computing division has added the art software layer to an OS image and uploaded this new, larger image to the repository. Mu2e can download the art image and add our own layers. This final mu2e image can be passed around and run on almost any platform with x86_64 architecture.

The Docker image can only be run on a machine with Docker software installed and the Docker daemon running. When the user issues a docker command, the daemon starts the image. The user's command can run in the image's environment, exit, and return control back to the user. The user may also run the image in an interactive mode, gain the image's prompt and issue commands like any linux command line. In effect, the user can fire up their own pseudo-virtual machine. The user's process will have a virtual working directory which can be destroyed, or saved, on exit from the container. File systems mounted on the host machine (your home directory, for example) can be mounted in the pseudo-VM and you can read and write permanently to the file system. This pseudo-VM is called a container because it is isolated in software from other copies of image that may be running.

The main point of the image/container model is that the user brings their entire software stack to the CPU, therefore the execution of the application is very reliable and reproducible. This model differs from the model of running several full virtual machines on the host in an important way. If we start a virtual machine, it requires a separate memory space and will load the shared libraries once in each virtual machine, quickly using up the memory. The Docker daemon knows if the users are running multiple containers for one image and shares as much memory as possible, such as the shared libraries, while keeping each container's data space strictly isolated. If you are running on a machine with 64 GB of memory and 64 cores, you may be able to run 64 containers of one Docker image, but you can't run 64 full virtual machines.

Docker Software

The Docker software can be installed on Mac, Windows and Linux. SL6 has an older kernel which cannot implement all the features of Docker and limits the size of a Docker image to 10GB. Windows (>=10), MacOS (>=10.11), and SL7 are all suitable to run Docker. The code needs to be installed by admin/root because it will need a higher level of access to the kernel than a typical user application, and will run a system daemon.

Creating a Docker image

A Docker image is a file, like a sophisticated tarball, which contains all the code needed to run our executables. This image will contain everything from the OS-level utilities and libraries, to the lab UPS products, to the collaboration code, to our grid scripts. Anything you would use to run the job interactively needs to be provided in the image. The image might be up to 15GB in size or it might be as small as 1 GB if it is carefully pruned.

One could start an image from scratch, but it we will always start with an operating system image, such as an SL6 image, provided by experts, and add to that base image. In fact the computing division has provided an OS image with and added layer of the mu art software manifest. We then have to add a mu2e release and some odds and ends such as the mu2egrid product.

To download the base image, you will need and account on hub.docker.com. Searching for FNAL will show the existing images. All hub images are public by default, so can be uploaded and shared from any account.

Theoretically, the cvmfs client could be added to the image and run in the container to mount and read a standard cvmfs repository. In practice this has not been debugged, so we need to include all the software directly in the image.

Running a Docker image

Submitting a Docker Image

When the Docker imagine is run at NERSC, it is actually converted to a shifter image. This conversion adds security and adds access to the aggregated disk system on the cluster (their dcache). We will write out output to these disks so it is saved after the container exits.