Move Active Storage From Cloud to Local, and then to Another Cloud
(First thing, this will cause downtime, otherwise it can't guarantee consistency between different S3 services.)
The Problem
Problem is, I run an in-house web app for our company in China. Our sales people don't have very good Internet connection to the Internet outside of China, so I can't use AWS or Cloudflare, I can only use S3-like services provided by local companies, let's call it Company T.
Our app uses S3 service from the company T (let's call it S3-T), but the new server in Hetzner cannot fetch the objects from S3-T because there are many random loss packets, when obtaining files from CN in DE.
But luckily I found another company A who offers S3 like service from Japan (let's call it S3-A), it's not slow when I visit it from China, and it's also stable when visited from Hetzner.
So I decided to move the files from S3-T to S3-A.
But before proceeding, I want to test if downloading files from S3-T and putting them to my "/storage" folder works on my local Rails environment. It can be beneficial for both testing purposes and self-hosting purposes.
Like DHH said: "Moving off the cloud!" Woo hoo!
Move From Cloud S3 to Local Storage
This is the first time I tinker with active storage's files. When developing I notice Rails stores files in the "/storage" folder. Inside the "/storage" folder are many two letter named subfolders, like: "hz", "va", "tc", etc. I know that's some kind of hash converted into folder and file names so Rails can easily find what it needs.
But when I look at the files in the Company T's web dashboard, there are no sub folders, only pure files with hash-like filenames. Like "zwjhu5t9d0a62w2mpdlzn3tnehki", "zfhssey9spjsg0z1g9ild3qfnz1k".
That's weird, is it because company T uses a different structure from real AWS S3? Not likely, because I only have the "aws-s3-sdk" gem, but not "t-s3-sdk" gem, so S3-T must speak the same language as AWS S3.
I found this guy on Github has a gist: https://gist.github.com/LucasKuhn/f436c4e94991bc44c43b2669ae0057fe. It just renames the files downloaded from AWS S3 and also generates intermediate sub folders, so folder structure will match what Rails' :local should look like.
#script.rb
storage_folder = Rails.root.join('storage')
files = storage_folder.children.select { |file| file.file? && !file.empty? }
files.each do |path_name|
dir, basename = path_name.split
file_name = basename.to_s
sub_folders = dir.join(file_name[0..1], file_name[2..3])
sub_folders.mkpath # Create the subfolder used by active_record
path_name.rename(dir + sub_folders + basename) # Renames file to be moved into subfolder
end
Looks promising. Having some confidence, I started to download the files from S3-T to my local machine to have a test of this script.
To download all the files from S3-T to my local machine, Company T provides a tool called "coscli". AWS has a similar program called "awscli" I can brew install. But they don't work interchangeably. So I downloaded "coscli", set some keys like the api secret_key and secret_id. And then it's just a single simple command:
$ ./coscli cp -r cos://bucket-images-66666 ./backup/
# Succeed: Total num: 1223, size: 404,661,084 Byte (385.91 MB). OK num: 1223(download 1223 objects).
Now I've got all the images of our in-house Rails app on my local machine.
Then I backup my development database, development storage folder. Change my storage.yml, and started to test this downloaded folder with the script above.
$ rails console
# run the script above...
$ bin/dev
Refresh the page in browser...
Nice, the images are displaying!
That's very easy! To move active storage files from cloud to local storage, just:
- Modify storage.yml.
- Download the files from S3-T.
- Rename them with the script to generate intermediate sub folders.
Move From Local Storage to Another Cloud S3 Service
Now that I've tested, whenever I wish, I can just host the active storage on my local server. The next step is to move from local to another cloud S3 storage service. Because for a 12-factor-app, I wouldn't want to copy the files every time I migrate an app. So using an S3 service is very beneficial.
I've got around 1200 files to upload, but they're not very large, just ~380MB in total. I uploaded them from S3-A provider's web dashboard in the browser.
This step is even easier than the last, just upload the files to a bucket, modify storage.yml again and that's it.
Verify it on my development machine, I see the URL of my image assets have changed from "https://s3-t...jpg" to "https://s3-a...jpg" in the browser developer console, and the images are displaying correctly. I tested uploading and deleting files, after correctly setting CORS rules, it also works perfectly. I can also verify in the S3-A's dashboard that there are new images added after I upload in my Rails app.
Perfect.
Rename Storage.yml Service Name
Since I migrated active storage S3 service from S3-T to S3-A, I want to change my storage.yml's service provider name.
S3-T:
service: S3
access_key_id: <%= Rails.application.credentials.dig(:s3t, :access_key_id) %>
secret_access_key: <%= Rails.application.credentials.dig(:s3t, :secret_access_key) %>
# To
S3-A:
service: S3
access_key_id: <%= Rails.application.credentials.dig(:s3a, :access_key_id) %>
secret_access_key: <%= Rails.application.credentials.dig(:s3a, :secret_access_key) %>
Now I want that S3-T to become S3-A, it avoids the future me of confusing which S3 service I'm using for this app. But Rails active_storage_blobs table has a column "service_name", their values are now "S3-T". I can't just rename here (storage.yml), I've got to update the column too.
# Can use a migration as well.
# But console is more convenient!
ActiveStorage::Blob.where(service_name: 'S3-T').update_all(service_name: 'S3-A')
If you happen to want to use :local Disk service to serve them from "/storage" folder, like I did in the Move From Cloud S3 to Local Storage section, you can:
ActiveStorage::Blob.where(service_name: 'S3-T').update_all(service_name: 'local')
Done!
Now our in-house sales people can upload files like they used to, and my server in Hetzner running on ARM CPUs can also fetch the images for resizing etc. I think I got the best of both worlds, after 3 days of testing and thinking.
(BTW, Docker images built from M1 Mac can run directly on Raspberry Pi and Hetzner ARM machines!)