Postgres Backups to S3 with Docker and systemd

Now that I’ve got my website running nicely on CoreOS with Docker it is time to consider disaster recovery.

I find it very important to ship my PostgreSQL database backups off my server for safe keeping. Here is the process by which those backups are extracted and pushed into Amazon S3.

Extracting the Backup with Docker

I use a my normal PostgreSQL Docker image to run the container that extracts the backup from PostgreSQL. Here is the command:

/usr/bin/docker run --rm --link pg:pg -v /tmp:/tmp postgres pg_dump -O --inserts -b -h pg -U postgres -f /tmp/backup.sql.dump ghost

I link to the running PostgreSQL container, and mount the CoreOS host’s tmp directory so that I can access the created sql dump from CoreOS. Other than that, it is pretty much running a standard pg_dump command.

Uploading with Ruby in Docker

Now that the backup file has been places on the CoreOS file system, it is time to push it into S3, which I also do within Docker.

==Note==: Why even upload to S3 from a Docker Container? CoreOS is a read only system, so I’m only installing things into Docker images and managing them from my home directory, I can’t modify any of the software is part of the host OS

Here is the Dockerfile for my upload setup:

FROM ruby
RUN mkdir /usr/src/app
WORKDIR /usr/src/app
COPY Gemfile /usr/src/app/Gemfile
COPY Gemfile.lock /usr/src/app/Gemfile.lock
RUN bundle config build.nokogiri --use-system-libraries
RUN bundle install
COPY backup.rb /usr/src/app/backup.rb
CMD ["ruby", "backup.rb"]

Not much to do here. I did have to build Nokogiri with the system libraries flag, so that config option is set for Bundler before the install. Speaking of Bundler, here is the Gemfile:

source "https://rubygems.org"
gem 'aws-sdk'

Pretty simple! Here is the backup.rb file:

#!/usr/bin/env ruby
require 'time'
require 'aws-sdk'
require 'fileutils'

bucket_name = ''
project_name = ''

ENV['AWS_ACCESS_KEY_ID'] = ''
ENV['AWS_SECRET_ACCESS_KEY'] = ''

time = Time.now.strftime("%Y-%m-%d-%H-%M-%S")
filename = "ghost-pg-backup.#{time}.sql.dump"
filepath = "/tmp/#{filename}"

# Move the backup file from docker run
FileUtils.mv('/tmp/backup.sql.dump', filepath)

# verify file exists and file size is > 0 bytes
unless File.exists?(filepath) && File.new(filepath).size > 0
  raise "Database was not backed up"
end

s3 = AWS.s3
bucket = s3.buckets[bucket_name]
object = bucket.objects["#{project_name}/#{filename}"]
object.write(Pathname.new(filepath), {
  :acl => :private,
})

if object.exists?
  FileUtils.rm(filepath)
else
  raise "S3 Object wasn't created"
end

DAYS_30 = 30 * 24 * 60 * 60
objects = bucket.objects.select do |object|
  time = object.last_modified
  time < Time.now - DAYS_30
end
objects.each(&:delete)

This script finds the file that was extracted using the postgres container, renames it based on the current date and time, and then places it in a S3 bucket. Finally it runs through all the objects in the bucket (backups) and removes any that are older than 30 days. That way our S3 usage doesn’t grow forever, but we can always get back any state we were in within 30 days.

Running with systemd

The script to invoke a backup is pretty simple:

/usr/bin/docker run --rm --link pg:pg -v /tmp:/tmp postgres pg_dump -O --inserts -b -h pg -U postgres -f /tmp/backup.sql.dump ghost
/usr/bin/docker run --rm -v /tmp:/tmp backup

So how do we best run this periodically?

Systemd has a concept of timers. So I created a timer unit to kick off a backup service.

Here is the backup service unit file:

[Unit]
Description=backup oneshot
Requires=docker.service

[Service]
Type=oneshot
ExecStart=/usr/bin/sh /home/core/website-ghost/backup/backup.sh

This is a oneshot service, meaning that it runs once and exists, and that is the expected behavior. The backup.sh script is the script I mentioned at the top of this section.

And the backup timer unit file:

[Unit]
Description=backup oneshot
Requires=docker.service

[Timer]
OnCalendar=*-*-* 02:00:00

[Install]
WantedBy=timers.target

This timer unit runs daily at 02:00 as shown in the OnCalendar directive. It invokes the matching service by unit name (backup.timer invokes backup.service)

I use Cyberduck on my Mac to view my S3 buckets: center

You may notice there are a few extra backups, I can take a backup whenever I’d like just by running the /home/core/website-ghost/backup/backup.sh script. Once those backups are over 30 days old, they will get swept up by the backup Ruby script just like the timer initiated ones.

That is how I keep off-server backups of my Ghost blog database. There is still a little bit more to come in this series examining the tech behind my website relaunch. Check back soon!