By Michael Gebis, Sun 02 October 2022, in category Tips
I got a lot of feedback after my first article on Upside-Down Backups made it to Hacker News. There were a lot of good suggestions and it's worth a follow-up article to incorporate some of them.
The conventional wisdom of backups is to follow Peter Krogh's "3-2-1 Rule": three copies on two media types, one stored offsite. This is still good advice. The cloud can serve as the offsite backup. But if you go a little deeper, it pays to think about the various scenarios that could affect data recovery.
Saving your data on a cloud provider protects you against some of these scenarios but exposes you to others. Cloud data might be corrupted either on purpose (malware) or on accident (yours or theirs), and you want to make sure your backups can survive these disasters.
Many HN users pointed this out: Relying upon rclone sync for backups is dangerous, since it will propagate data corruption/loss problems rather than catching them. Valid point! Fortunately, there's a way to protect yourself against this scenario: Use a snapshotting filesystem such as Btrfs.
In my case, I'm storing backups on a USB drive using the Btrfs filesystem.  After every rclone sync, I just take a Btrfs snapshot.  If a file is later corrupted or lost in the cloud, and a future rclone sync propagates that error, all is not lost: earlier snapshots should still have the old data.
Here's an introduction to Btrfs snapshots. And here's my personal cheat-sheet of the commands I need:
# Creating snapshot subvolumes
sudo btrfs subvolume create /mnt/t/gdrive
sudo btrfs subvolume create /mnt/t/gphotos
# Turning on quotas which enables us to see snapshot sizes
# A large snapshot indicates a lot of data changed, and if you
# were not expecting this, you should investigate
# https://dustymabe.com/2013/09/22/btrfs-how-big-are-my-snapshots/
sudo btrfs quota enable /mnt/t/gdrive
sudo btrfs quota enable /mnt/t/gphotos
# Taking a read-only snapshot with embedded datestamp
today=$(date +"%Y-%m-%d")
sudo btrfs subvolume snapshot -r /mnt/t/gdrive /mnt/t/.snapshots/gdrive-$today
sudo btrfs subvolume snapshot -r /mnt/t/gphotos /mnt/t/.snapshots/gphotos-$today
# Viewing quota info
# https://unix.stackexchange.com/questions/699035/how-to-display-btrfs-snapshot-size
sudo btrfs subvol show /mnt/t/.snapshots/gdrive-2022-30-09
sudo btrfs subvol show /mnt/t/.snapshots/gphotos-2022-30-09
Automate the snapshots as above, and your data should be more resilient to corruption issues. I will state that I have never had to actually dig into any of my snapshots, so fingers crossed it works as it should.
Another user on HN pointed this out. I guess I never read the rclone page on the Google Photos API limitations, but as of October 2022 there's no way to use the Google Photos API to get your original photos out: you can only get a re-encoded version that may be lower resolution, and may have some EXIF data stripped out. It's the same deal with video. And supposedly the re-encoded video is highly compressed in many cases.
Sadly, there's no easy way around this; Google simply doesn't provide any automated API to get the data you want. And thus Rclone can't get access to the original photos. This may be a deal breaker for some.
As far as I can tell, the only way you can manage to get the originals is with Google Takeout. This is not really automatable.  The best you can do is use the web to create a "scheduled export" to a cloud provider which will happen every 2 months for a year. You can then use rclone to download those exports locally. After a year you will have to request another set of "scheduled exports". To be honest, I have not gone through the trouble of doing this.