My last post was an overview of our home backup system. Part of that system includes storing a subset of files online with Amazon S3. Here are some of the details

The Cost

So you'll be hard pressed to find an online backup solution that doesn't cost something. But the difference with Amazon S3 is that these costs grow unbounded. This can be a scary thought; the more data you have, the more it costs. However, to make this tradeoff for a few reasons:

  • Trust - Amazon is a trusted company that's been around for ages (in Internet time) and has always put users first.
  • Flexibility - S3 has a simple programming interface with a wide variety of frontends for interacting with the data. There aren't any limits on what kind of data I can store or where that data comes from (Some online backup services limit you to a single computer, don't allow network attached storage, or have size limits.)
  • Cost - While the cost does grow unbounded, it grows slowly. 100Gigs of data would be $15/month, and I won't have close to that much data for some time. Storing 30Gigs puts you at about $5/month, which is similar to many of the other online backup services. And hopefully over the years, as the cost of storage decreases, so will the costs of Amazon S3.

The Tools: Jungle Disk

There are a bunch of tools out there for backing up your data to S3; anything from simple shell scripts to high-octane GUI apps. I started by looking at Jungle Disk. I remember evaluating Jungle Disk years ago; it was one of the first S3 clients, with a reliable development schedule. If I recall correctly, I think it was even open source.

Jungle Disk has a large suite of useful features. But it also has a monthly fee ($2/month, first 5 gigs free, $3/month to backup a network drive, which I need). If you're looking for an easy-to-use and powerful solution, Jungle Disk sounds great. But I just can't bring myself to pay another monthly fee on top of the Amazon S3 costs.

The Tools: Other options

I was surprised that most of the popular S3 backup clients were pay apps. My ideal client would be a feature rich open source application or script. Among these, s3sync and s3fs kept coming up.

s3sync is a Ruby script that syncs folders between S3. The app itself hasn't been updated since 2008, but the forums are active. Unfortunately, when I tried s3sync it threw an exception. I didn't bother debugging why. Lazy me.

s3fs uses Fuse to mount an S3 bucket as a folder. However I ran into some errors using MacFuse (and I read that others have too), the app hasn't had any substantial code updates in a while, and it looks like the developer is focused on a pay product.

Now I'm not faulting any of these applications for having a pay component. I'm glad to see these apps are thriving and developers are being rewarded for their hard work. However, I must say I'm surprised there aren't more open source options for S3 backup. S3 has been around for years, the interface is well documented and stable, I guess I expected more.

The Tools: jets3t

I finally settled on jets3t, which is an open source Java command-line app. It was written by James Murty, who literally wrote the book on Amazon S3. Jets3t is well supported, simple to use, and most of all, It Just Works (for me). Jets3t offers a whole suite of tools; I ended up using the command line Synchronize app. After a few test runs between a small set of data and an S3 bucket, I was ready to turn it loose on my personal files.

Here's the Jets3t command I used to run the backup.

bin/synchronize.sh -g -b UP bucketName/backups/readynas /Volumes/personal

It took about a week to back up all 30gigs of my personal folder. Synchronize is remarkably robust; while it encountered S3 errors every now and then, the Synchronize app retried the requests accordingly and never crashed. I was lucky that Amazon was offering free data transfer for the month of June (they have since extended that offer until November).

This command runs every 2 weeks. The initial upload took a while, but each incremental backup after that shouldn't take too long.

Looking forward

This system works well for now, but I keep wondering if it will scale. I use 100Gigs as an imaginary limit. Storing 100Gigs will cost $15/month. I'd be willing to pay that much of peace of mind. I estimate I'm still years away from 100Gigs, but who knows. As we take more photos, and as personal video takes a more prominent role in our media, we may reach it faster than I think.

I also have to consider the cost of restoring 100Gigs of data. At our present bandwidth, that could take weeks. Amazon does offer an Import/Export service (you send them a drive, they put your data on it and send it back); that sounds like a much better option for large amounts of data. Hopefully by the time I reach that much data, bandwidth speeds will have improved, or there will be better options at a lower cost.