A Ruby Script to Deploy Rails Docker App to Hetzner

When new code is committed, I need to build a new Docker image and repeat the process of:

docker save and ssh docker load it in the server
Stop old Rails app Docker container
Start new Rails app Docker container
Remove old Rails app Docker image
Tag new Rails app Docker image

The Problem

This whole process is kind of boring, I want to automate it. Like we used to have "kamal deploy", and "git push dokku main".

Best hope is I can just run "ruby deploy.rb" and go get a cup of tea, when I come back the new code base is live under my domain name!

A Ruby Script for Automated Deploy

Once I've set up the server and deployed the first image manually, I wrote a Ruby script to automate the consecutive deploy process.

First I need to set some project-specific variables, such as Environment variables, the app name, the docker network or container names.

require 'colorize'

app_name = "my-app" # specify your app name
rails_master_key = File.read('config/master.key')
db_password = "dumbpass" # same as postgresql container's setting
db_host_or_ip = "app-pg" # the container name of pg db, to let Rails database.yml know where to connect
docker_subnet = "app-net" # The rails, db, frp, redis Docker internal network, can be "bridge" or "host" or "anything", need to create in docker with docker network create
server_user = "root" # the user of the remote server (pi, or an Hetzner vps, etc)
server_ip = "docker-app-ip" # the IP of the server
use_bzip = false # whether to compress docker image when transmitting it to server, if server can't decompress fast or net traffic is free, don't zip. If server traffic is slow, zip it.

And then I wrote two proc to run commands in my local development machine or remote server (VPS machine). I also record time elapsed when running each task, so I know what can cause error and how long is used everytime.

deploy_start_at = Process.clock_gettime(Process::CLOCK_MONOTONIC) # To count how many seconds is used for the whole deploy process

# Run a shell command on remote machine
run_remote = Proc.new do |cmd|
  start_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
  puts "Start CMD: ssh #{server_user}@#{server_ip} #{cmd} 2>&1".colorize(:blue)  # 2>&1 will show the stderr in return value
  out = `ssh #{server_user}@#{server_ip} #{cmd} 2>&1`
  finish_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
  puts "Finished CMD: #{(finish_at - start_at).round(1)} seconds.\n\n".colorize(:green)
  out
end

# Run a shell command on local machine
run_local = Proc.new do |cmd| 
  start_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
  puts "Start CMD: #{cmd} 2>&1".colorize(:blue)
  out = `#{cmd} 2>&1`
  finish_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
  puts "Finished CMD: #{(finish_at - start_at).round(1)} seconds.\n\n".colorize(:green)
  out
end

Then I wrote a method for Health Check. This is learned from Kamal, but I didn't look into Kamal's implementation, I just use a cURL docker image running in the same Docker subnetwork to "ping" the "/up" route.

# Healthcheck a Rails app is running successfully by inspect the container status and then GET /up.
test_app_running = Proc.new do |container_name|
  10.times do |i|
    puts "#{server_ip}: Healthcheck new container status - Round: #{i}"
    out = run_remote.call("docker inspect --format \"\" #{container_name}")
    if out.include? "true"
      out = run_remote.call("docker run --rm --network #{docker_subnet} curlimages/curl --silent -LI -o /dev/null #{container_name}:3000/up -w '%{http_code}\n'")
      if out.include? "200"
        puts "#{server_ip}: Container is ready."
        puts "#{server_ip}: Success!"
        break
      else
        puts "#{server_ip}: Container is up but Rails server is not ready, waiting 3 seconds..."
        sleep(3)
      end
    elsif i < 9
      puts "#{server_ip}: Container is not ready, waiting 1 second..."
      sleep(1)
    else
      raise "#{server_ip}: Container is not ready after waits. Aborting. Manual fix is required."
    end
  end
end

Those are the methods to be invoked, then I need to run some commands on my mac and on the Hetzner VPS.

The Deploy Process

Phase 1 is building the Docker image on my development machine, it uses the app specific variables I defined before. And if successful it would use Docker load and ssh to load the image on the Hetzner server with a new Docker image tag "new_build".

# Phase 1: Build Docker Image for Rails App
puts "local: Building Docker image for #{app_name}."
build_image_cmd = "docker build -t #{app_name}:new_build ."
out = run_local.call(build_image_cmd)
if out.include? "DONE"
  puts "local: Docker Build Successful."
elsif out.include? "ERROR"
  puts out
  raise "local: Docker Build Error."
else
  raise "local: Docker build process status unknown, check manually."
end

puts "local+remote: Compressing Image and Loading it in the Remote Server"
out = run_local.call("docker save #{app_name}:new_build #{use_bzip ? "| bzip2 |" : "|"} ssh #{server_user}@#{server_ip} docker load")

Phase 2 is testing if the new image can run on the server. At this moment, the old and new rails app containers will run together.

# Phase 2: Test the Docker image of Rails app on remote server. If successful, stop it and run it later.
# Because port mapping of docker and container name can't have conflict, but FRP uses a static web app container name in frps.ini, can't change it dynamically.
# So, have to stop and rerun the new app container.
puts "#{server_ip}: Starting Rails app with new built image"
out = run_remote.call("docker run --name #{app_name}-app-new-build -e RAILS_MASTER_KEY=#{rails_master_key} -e POSTGRES_PASSWORD=#{db_password} -e DB_HOST=#{db_host_or_ip} --network #{docker_subnet} -d #{app_name}:new_build")
puts "#{server_ip}: #{out}"

puts "Test if app can start"
test_app_running.call("#{app_name}-app-new-build")

puts "#{server_ip}: Stopping new container, will restart it later after stopping the old one."
out = run_remote.call("docker container stop #{app_name}-app-new-build")

puts "#{server_ip}: Removing new build container"
out = run_remote.call("docker container rm #{app_name}-app-new-build")

Since I can't find an easy way to just change the reverse proxy's port to the new container, I have to stop both old and new containers, and start a new container again. This will cause a few seconds' downtime. Haven't found a good solution to that yet. Kamal does this with Traefik, but I don't use Traefik so I didn't look into their implementation. And BTW in Kamal 2 I heard they won't be using Traefik either, so I didn't take time to look into it. This script will work for me because I don't need to enforce 0-downtime deploy.

Phase 3 will start new container and do some cleanup, such as removing the old image and container, re-tag the new image from "new_build" to "latest".

# Phase 3: Stop old container, start new container for the Rails app
puts "#{server_ip}: Stopping old Rails container"
out = run_remote.call("docker container stop #{app_name}-app")

puts "#{server_ip}: Removing old Rails container"
out = run_remote.call("docker container rm #{app_name}-app")

puts "#{server_ip}: Removing old Rails Image"
out = run_remote.call("docker image rm #{app_name}")

puts "#{server_ip}: Renaming new rails image tag from 'new_build' to 'latest'"
out = run_remote.call("docker image tag #{app_name}:new_build #{app_name}:latest")
out = run_remote.call("docker image rm #{app_name}:new_build")

puts "#{server_ip}: Starting Rails app..."
out = run_remote.call("docker run --name #{app_name}-app -e RAILS_MASTER_KEY=#{rails_master_key} -e POSTGRES_PASSWORD=#{db_password} -e DB_HOST=#{db_host_or_ip} --network #{docker_subnet} -d #{app_name}")

The last phase is testing the new rails app Docker container is working correctly. Once it's done I can manually run some db migration through docker exec -it /bin/bash.

And if I'm confident, I can also remove old image and re-tag the image on my development machine from "new_build" to "latest". But that can cause the danger of new container not working properly but I can't rollback to old version quickly. So whether or not to remove old container depends on how critical uptime means for the project.

# Phase 4: Test new container is ready.
puts "Test if app is running"
test_app_running.call("#{app_name}-app")

puts "Deployment finished in #{(Process.clock_gettime(Process::CLOCK_MONOTONIC) - deploy_start_at).round(1)} seconds."

# Optional: Clean up local docker images
puts "local: Retag new build Docker image to latest. Remove old image file."
old_image_id = run_local.call("docker images -q #{app_name}:latest").strip
out = run_local.call("docker image tag #{app_name}:new_build #{app_name}:latest")
out = run_local.call("docker image rm #{old_image_id}")
out = run_local.call("docker image rm #{app_name}:new_build")

And that's it! With this deploy script, the Docker build process takes around 30-60 seconds if no significant changes are done to the codebase. And the whole process can be finished within 2 minutes, including transfering a 180MB (after bzip compress and uncompress) to the server at 5MB/s.