Creating a Simple, Cheap, and Automated Backup Solution with Tarsnap

Background:

So I host a variety of small websites on a VPS at Ramnode (affiliate link). I’ve been extremely happy with their service, and their performance per dollar ratio. Previously I had been using DigitalOcean, but their VPS performance lately was a bit lacking compared to other providers (sorry DigitalOcean, I still love ya). As part of my evaluation of a handful of providers I performed extensive benchmarking to determine which VPS provider would be best for my (amateur) needs. It was also an excuse to use Excel again — oh Excel how I miss thee — but I digress.

I’ve been a very happy camper at Ramnode until I realized the weaknesses of having picked OpenVZ Linux containers vs. KVM virtualization which I’ve used in the past. Long story short, with OpenVZ containers the user (me) does not have access to much of the low-level system (including the kernel). This leads to problems with things like iptables logging, syslog, or when trying to access information about a given partition within your container. This lack of partition information unfortunately means that when you try to backup your data with a traditional backup solution like R1Soft you — as a lowly user — do not have the right permissions to read and then backup your data within your own container. Not a problem I said — Ramnode provides customers with regular backups. That was one of the reasons I picked them. 

Well, that was the case until recently: https://clientarea.ramnode.com/announcements.php?id=368 They casually announced that they had disabled the weekly automated backup system. So that sucks, a lot. 

My VPS provider decided to stop backing up my data (even though they sold me plan saying they would) and due to OpenVZ limitations many of the common automated backup tools simply won’t work.

So I needed to come up with a solution.

Tarsnap

Say hello to Tarsnap:

I’d recommend reading over all the various content over at the Tarsnap website for technical details. The short version, they act as a simple front end for Amazon’s S3 storage service. The Tarsnap CLI handles complex things such as client-side encryption and de-duplication before it sends off your data to be stored on Amazon’s massive (and stable) platform for safe keeping.  You don’t need to set up your own Amazon account, you just sign up for Tarsnap.

Review the pricing at Tarsnap closely. It sounds a bit confusing, but you’ll be shocked how little you will pay for your backups. I’m backing up roughly 1.3GB of data with not much in the way of data changing day over day. My daily charges for Tarsnap looks like this:

2014-08-02 Client->Server bandwidth 372140 bytes 0.000093035000000000
2014-08-02 Daily storage 1488737019 bytes 0.012005943509517804
2014-08-02 Server->Client bandwidth 196191 bytes 0.000049047750000000

That’s right, my daily backups come to roughly $0.01 per day. Not bad. That implies with the same kind of storage and upload needs 13GB of backups would only cost you $0.10 per day. That’s an approximation of course, your data transfer and storage requirements will vary.

After you’ve setup Tarsnap — which includes compiling the code — it’s simple to create a single backup of a given folder (or file). You’d just type:

tarsnap -c -f mybackup /home/user1

Simple, but what about creating multiple backups of this folder? If you want to run a daily backup of the same folder (or file) this will get tedious. To make another backup of your /home/user1 folder the next day you’d need to do something like:

tarsnap -c -f mybackup2 /home/user1

The day after that  mybackup3…etc. That’s going to get annoying fast.

 We’re going to want to automate that.  The other problem being, unless you want to store an infinite number of backups (sure, in a perfect world) you’re going to want to clean up old backup files after XX number of days. Let’s solve that problem and then automate that also.

This post: http://blog.thomasupton.com/2010/12/automated-backups-with-tarsnap/ was definitely an influence for the scripts used here. The code in the mentioned post however did not (or does not) work on Ubuntu so I had to tweak things for my particular needs.

Disclaimer 1: I don’t consider myself even remotely any kind of command line ninja. I get by, but I am no expert. I’m sure my code has problems and I’m sure it could be optimized in any number of ways. It does what I need, and I managed to hack this together using my (limited) skill set.

Disclaimer 2: Your mileage may vary. I’m doing all of this on Ubuntu 14.04. Most Linux distros should work just fine with these scripts. Not sure if they will work on a Mac — give it a try and let me know how it goes in the comments.

Step 1: tarsnap-backup.sh

Let’s look at what is going on here.

The 1st line — It’s going to output the current date and time followed by some text indicating which file we’re backing up . We’ll pass the $2 variable when we call the script, more on that shortly. This will be the path to the file or folder we want to backup. We write this line to a log file which is a good idea for anything you plan to automate, and really — why would you not write to a log?  Adding log rotation is simple to setup, but beyond the scope of this article.

The 2nd line — This is the important line.  We’re running the actual backup here. The $1 variable will also be passed in when we call the script (more on that in a moment) and will be the name of the backup. Notice that we’re adding the current date to the name of the backup. This will give us an output like:

mybackup2014-08-02

All this gets written to the same log file — again, it’s just good practice. The &>> captures all output from the command being run and writes that information to the log file for informational purposes, otherwise your log file won’t include the output from the command.

The 3rd line — Similar to the 1st line. We’re writing out when the backup completed and the path to data being backed up. You could easily change this to $1 if you’d rather it say “Completed backup of mybackup” instead.

Make sure you chmod +x your script whenever you’re creating a script by hand, otherwise it won’t be executable.

To use the script you’d run (Note: adjust for wherever your script is located):

/home/tarsnap-backup.sh mybackup /home/user1

Note that we’re passing the two variables we just discussed. $1 corresponds with “mybackup” (the name of your backup) and then $2 corresponds with “home/user1” (the path to what you’d like to backup).

 The script will then take “mybackup” and add the current date, so you get a backup named “mybackup2014-08-02”.

You can chain together multiple backups by doing something like:

/home/tarsnap-backup.sh mybackup /home/user1 && /home/tarsnap-backup.sh otherbackup /var/log/something/important

Rinse, repeat for however many folders (or files) you’d like to backup.

You can check what you have archived using:

tarsnap --list-archives
mybackup-2014-07-31
otherbackup-2014-07-31
mybackup-2014-08-01
otherbackup-2014-08-01
mybackup-2014-08-02
otherbackup-2014-08-02

You’ll see we now we have uniquely named individual backups for each day. Perfect. Now let’s deal with cleaning up old backups we’d like to remove.

Step 2: tarsnap-delete.sh

This code is pretty similar to the backup script.

The 1st line — outputting date and time when the script is run, takes a variable — $1 — and then calculates a date 7 days in the past. All of this is written to the log file.

The 2nd line — the actual clean up/deletion step. We’re taking the $1 variable we’ll pass in, and deleting the archive with that name from 7 days ago, and writing the entire output to the log file (note the &>> again). This means when we call the script with “mybackup” it’ll run the delete for the backup file named “mybackup-2014-XX-XX”  (whatever the date was 7 days ago). This 7 day value obviously can be altered if you’d prefer to keep 30 days of daily backups.

The 3rd line — We’re just outputting that the backup completed at date and time, and writing this to the log.

To use the script you’d use (Note: adjust for wherever your script is located, remember to chmod +x the script):

/home/tarsnap-delete.sh mybackup

Cool. So running that will delete the backup from 7 days ago.

Step 3: That’s cool, but let’s automate this

You’ll want to look up how to create/edit a cron job based on your OS of choice. I’m using Ubuntu 14.04 on my VPS, so I just need to type:

crontab -e

and you’ll now be able to add/edit a cron job.

Helpful tiphttp://cronchecker.net/ is great if you’re struggling with setting up your cron jobs. It’ll give you the plain English version of all those different time and frequency values in crontab.

For my needs (adjust for your own) I want to run a daily backup and then a weekly clean up job to get rid of old backups.

To do this:

## Daily, 6AM PST backup
0 6 * * * /home/tarsnap-backup.sh mybackup /home/user1 && /home/tarsnap-backup.sh otherbackup /var/log/something/important

## Weekly, 6AM PST clean up of backups older than 7 days
0 6 * * 7 /home/tarsnap-delete.sh mybackup && /home/tarsnap-delete.sh otherbackup

That’ll work just fine. Every day at 6AM PST the backup script will run. Every 7 days at 6AM PST the delete script will run and clean up the old backups.

I went a step further though. I use a service called Dead Man’s Snitch (affiliate link). With this service I make a call to their service when I run a cron job. If that cron job fails to run for any reason and Dead Man’s Snitch won’t get the expected ping from me and they’ll send me an email (or push notification) alerting me to the problem. This is a great addition for something as important as a backup job.

My cron jobs look like this after adding Dead Man’s Snitch:

## Daily, 6AM PST backup
0 6 * * * m=`time ( /home/tarsnap-backup.sh mybackup /home/user1 && /home/tarsnap-backup.sh otherbackup /var/log/something/important &> /dev/null) 2>&1` && curl -d "m=$m" https://nosnch.in/XXXXXXXX

## Weekly, 6AM PST clean up of backups older than 7 days
0 6 * * 7 m=`time ( /home/tarsnap-delete.sh mybackup && /home/tarsnap-delete.sh otherbackup &> /dev/null) 2>&1` && curl -d "m=$m" https://nosnch.in/XXXXXXXX

Definitely a bit more going on. Let’s take look.

With m=’time (…..’  we’re measuring the time each of our jobs take to run and then assigning that value to the m variable.  After the job completes we then do a curl to Dead Man’s Snitch passing the m variable.  On the Dead Man’s Snitch website you’ll get output like this:

2014-08-02 16:31:08 UTC real 1m15.358s user 0m0.878s sys 0m0.083s

This indicates the job completed in roughly 1 minute and 15 seconds. If the job fails to run for whatever reason Dead Man Snitch will alert you.

Wrap up:

So that’s it. At this point I am considering bumping out the clean up job to keep 30 days of backups potentially instead of 7 days. You could also get fancy and do things like keep 7 daily backups, keep a months worth of Sunday backups to have “weekly” restore points, and then keep 12 backups from the 1st of each month to have a “monthly” restore point. Go nuts with it. Please share if you expand on my extremely basic scripts.

As mentioned in “disclaimer 1” — I’m sure there’s a better way to do much of this. Hell, I’m sure I’m doing at least 1 dumb thing in this process. I’d be more than happy to hear any feedback, suggestions, and even some trolling on this post.