Docker and Data
I suspect it is easy for someone new to Docker to get a little lost when trying to understand how data is persisted. I know it took me a while to wrap my head around it. I’ll do my best to explain my understanding.
Docker images are saved file system state. They should contain the binaries and support files your container will need when it runs. The image is used to start a container from a known state. Once that image is running, it is considered a container, and it’s state can change. When the container is stopped and started again, the state will resume, but if the container is removed, and a new container is started from the same original image, it will be a fresh instance, with none of the state from the first container.
Docker allows you to manage file system data through volumes. A volume indicates a folder that is to live outside of the normal union file system operations. Volumes need to be explicitly removed if a container is removed, or they are left orphaned by their container. Volumes can be linked to other containers so that two containers share the same folder on the file system, and volumes can be mapped to a specific folder on the host operating system, so that you can easily interact with the same folder the container is using.
For my purposes, I don’t need any fancy mapping to the host OS, but I do want to make sure that my PostgreSQL data persists through restarts/removals/upgrades of the PostgreSQL server process. So what I did was create a stopped container to be used purely for data. This allows me to link the running PostgreSQL to the exited container, and be able to remove the running PostgreSQL container without losing the pgdata folder, which is in the linked data-only stopped container.
Here is the run down of bash commands to set this up:
# pull postgres container docker pull postgres # create data container using the postgres image, give it a name so you can access it docker run --name pgdata postgres echo "data only" # it is now important to never run `docker rm pgdata` (not that it looks like a command you should run)
Running it with systemd
Now we can setup the PostgreSQL service with systemd. Here is the systemd service file
[Unit] Description=Run pg After=docker.service Requires=docker.service [Service] Restart=always RestartSec=30s ExecStartPre=-/usr/bin/docker kill pg ExecStartPre=-/usr/bin/docker rm pg ExecStart=/usr/bin/docker run --rm --name pg --volumes-from pgdata postgres ExecStop=/usr/bin/docker kill pg ExecStartPost=-/usr/bin/docker rm pg [Install] WantedBy=multi-user.target
I’m not going to dive deep into systemd yet, but there are a couple interesting notes on this service file. The
ExecStartPre with the
=- means that it is ok if that command fails. This is important because for a new start of the service, it won’t be able to kill or remove the existing container.
For the actual start line, the
--volumes-from pgdata is important so that we use the file system we set up with
docker run --name pgdata postgres echo "data only". The
--name pg is also important so that we can refer to this running container when linking other containers.
Finally we connect our service file to systemd and then we can interact with our running PostgreSQL instance.
# Link and start postgres service sudo systemctl enable /home/core/website/coreos/pg.service sudo systemctl start pg.service # `docker ps` should show the pg container running # interactive database console to create a database docker run -it --rm --link pg:pg postgres psql -h pg -U postgres
Interacting with PostgreSQL
You are now on a psql shell, inside a temporary docker container that is network linked to your running PostgreSQL container! You can then create your database, do any other maintenance you’d like, or just look around.
Back on your host (after quitting the psql shell) keep in mind you can restore a database dump, but the command isn’t obvious, so here it is:
# Restore backup cat backup.sql.dump | docker run -i --rm --link pg:pg postgres psql -h pg -U postgres database_name
That command puts the dump file on stdin for the psql process that docker runs in this new container, in interactive mode (but with no tty
-t) so that it will read from stdin.
And those are the basics for the PostgreSQL portion of my website stack. There is a lot more to come in this series examining the tech behind my website relaunch. Check back soon!