Backup advice sought

The serious sysadmins of my acquaintance may wish to avert their eyes from the following, or risk being horrified by my heretofore laissez-faire attitude to backups.

For the past mumble years, my backup needs have been minimal. I have had a small amount of personal data that I cared about on my Macbook, my email (in GMail), a web hosting account with my websites and some other crap on it, and source code to certain coding projects (mostly open source). Thus, my backup solution has been:

  • offlineimap to regularly pull down copies of my GMail to my laptop
  • rsync backup of my web host to my laptop
  • source code kept in version control systems somewhere on the interwebs (currently in the process of moving most of my stuff to GitHub, using a private account for the non-open-source projects)
  • Time Machine backup of my laptop to external hard drive

This has suited me just fine. Of course if my house burns down, taking my laptop and external hard drive with it, I’ll lose some stuff — the music I have in iTunes, some fanvids I’m working on, years and years worth of collected porn — but whatever, it’s all replaceable, none of it’s mission-critical. The most important things to me are my email, websites, and source code, and I feel pretty comfortable about where they’re at.

Now I’m moving into an area where my work is going to take up more space than source code does. I recorded some demos for a band a while ago, just a couple of hours of stuff, and the folder where I’m keeping the project is 4GB. A larger recording project could easily take orders of magnitude more space than that. And unless I want some nasty surprises, I’m going to want to back it up properly.

I’m also going to be in Australia, land of overpriced, download-capped Internet (sample ADSL+ plans from a decent Australian ISP; I’m likely to share something in the middle of that range with a housemate or two). So, given the amount of data I’m likely to have, I don’t think cloud-based backup options will work well for me. I mean, imagine a situation where I have to do a full restore of half a terabyte of data or something — it would be completely infeasible.

So, in my situation, what would you do? Remember that I’m going to be on a student-ish budget for the next couple of years.

I’m thinking I’ll just get a couple of larger drives for Time Machine, keep one at home and one somewhere else, and switch them every so often. Additional to this, maybe a smallish cloud-based backup solution for “current projects” — hopefully just in the tens of GB at any given time, costing under $20/month. What do you think?

art wall near 23rd and Valencia

Random photo du jour: art wall near 23rd and Valencia. I photographed this a couple of weeks ago, and two days later, the whole thing had been painted over by some asshole taggers.

21 thoughts on “Backup advice sought

  1. Manually changing time machine drives with is painful, because time machine really isn’t set up to deal with the idea of multiple backup drives. :-/ Not sure if it’s any better in Lion, though.

    But that said, multiple time machine drives are most likley the go. I have a script which quietly switches time machine’s config files based on what drive names exist and are attached, which I run between work and home. It’s a bit evil, but works – email me if you want it. (It’s untested with Lion but I imagine it won’t take much to fix it.)

    I think Time Machine can be quietly fooled if you just “restore” a working time machine drive to a blank using disk utility, but I haven’t actually tested that properly yet. It should certainly quietly work out where the disk is at and do the right thing to bring it up to date. It might complain that the disk id has changed, but I’m not sure about that.

    The only other issue to consider is where you’re storing the offsite disk(s) and what rotation policy you have. Two disks isn’t enough to have complete site redundancy during rotation, unless you’re not carrying your laptop with you. You may need three – one plugged in at home, one unplugged at home, one at storage.

  2. I forgot to mention – crashplan.com does let you back up “to a friends’ machine” as well as to local attached disks for free… and has a pretty cheap unlimited storage online plan – http://www.crashplan.com/consumer/crashplan-plus.html

    Crashplan has a feature where you can take a disk to “your friend” and they can import it to save bandwidth too.

    The problem really is that backing even just up your “current project” is going to be just as bad on your bandwidth utilisation. Storage is a non-issue as far as cost goes – it’s bandwidth that sucks. :-/

  3. Thanks for the heads-up re: multiple time machine drives. I hadn’t tried it but had assumed, from OSX asking me “do you want to make this a time machine drive?” every time I plug in something largish, that it could deal with it. I’ll do a little more research.

    Re: redundancy during rotation, are you suggesting that the problem is if, for instance, I have drive A at home and B at work/school/friend’s house, and then I take drive A to work/school/friend’s house to swap them, and a bomb goes off while I’m in the process of swapping, before I can get drive B home again? If so, that’s something I’m prepared to risk.

  4. But as I understand it, that backup “to a friend’s machine” happens over the network, right? So even assuming you have a trusted friend and give them a large hard drive, you still need good bandwidth at both ends. Though at least if the friend’s in the same city, if there’s a catastrophe you can just go round and pick up the hard drive, physically, rather than restoring over the network.

  5. Multiple TM: Yeah, I think once you say yes, it’s sticky – if you put in another disk, it won’t necessarily find that. Hence the script which does some under-the-hood messing about with moving config files.. :-/

    Redundancy: Yeah, or lightning strike/fire/emergency at the third location requiring you to exit in a hurry leaving behind everything which then burns down. It’s a very low likelihood risk, but it is a definite single point of failure risk.

  6. Yeah, the “to a friend’s machine” is still over the network – but you can seed it or do large differentials using a disk rather than having to do the whole thing over the network, or perhaps even just back up over-the-LAN when you’re at the friend’s place. Although I would recommend making sure there’s wired ethernet for that if you’re not staying overnight – even 802.11n gets a bit slow if you’re doing large media files.

    It also does open up an option if you have a friend next door, or a granny flat on the same block, and as you say, catastrophic restore is at least “go around to friend’s place and get disk.”

    That said – crashplan plus will also send you a disk from their online archive – but it’s US address only, and they expect it back within a fairly short period (10 days I think?). If I ever have to do a catastrophic restore, I’ll probably suck it up and do that (via someone in .us) and pay the fine for non-return of disk.

    Internode’s Power Pack option does also remove the counting on uploads, which has meant that I can push lots of data up to crashplan without worrying about that at least. In combination with the Fritz!box 7390 and its extremely good QoS, shoving a huge pile of crashplan up the line doesn’t even affect normal web browsing. Takes a good long while to send stuff, of course, but it does go.

  7. I use multiple cloud backup servers, so I don’t know if this is sane, but I’ve often considered doing incremental backups to a drive in a fireproof safe (meaning, I would use that as my switch).

    I’ve only done a few web searches for fire-proof safes, so I can’t say for sure if that is an affordable option. Does anyone else have insight into that?

    Also, and not very easy to set up in MacOS X (at least for me), Tahoe-LAFS is pretty cool. It is super-redundant, and I believe that you can do partial recoveries (because you are choosing directories and files to store, rather than an OS system). Dealing with 4 gig files is tough in the bandwidth climate you are entering, but that isn’t a bad thing to have going in addition to anything else you are looking at. ^_^

  8. This may sound like a weird solution – but do you happen to have a real office of some kind? Or some other place outside of your home where you could store an external harddisk?

    I’m keeping a backup of my home machine on an encrypted external HD at my workplace (20 miles ought to be far enough away from my home to cover most disasters up to about 150kt TNT as there’s a large hill between the two sites ;-)

  9. No, I won’t have an office. But as I said in the post, I’m thinking of getting two drives and storing one offsite (probably at a friend’s house or, if I have some kind of storage locker at school, maybe there).

  10. Hi Skud,
    I’d just say: Don’t write off Cloud backup here; you just need to be a bit sneaky. If you shoose your ISP carefully (it’s a checkbox on bc.whirlpool.net.au) you can select one who gives unmetered traffic to and from the (fast) WAIX peering network. My hosting crowd (ld.net.au) allow unlimited unmetered traffic between WAIX and their hosting services, which include (1) a AU$1/GB/yr bulk-storage service.

    (1) LD sell me a 100GB storage service as described, but 90% of the products (including hosted storage) have recently fallen off their website. I am making inquiries and will keep you updated if you’re interested.

  11. I would need to do a little more research on whether the controllers would all play nicely with each other, but this is what I would pursue, personally:

    1) there are a couple of manufacturers who make external raid hdd cases that hold two drives that duplicate, you want to find one that’ll play nice with time machine. There are a lot of them out there now, some don’t actually mirror (allowing you to virtualize multiple drives as if they were a single drive), but some that do, such as this: http://www.buy.com/cat/raid-external-hard-drive/65152.html

    2) buy some extra drives that’ll work together with the raid

    3) when you want to move data to the offsite backup, swap the drives. in theory you should be able to set it up so that you can pull one drive out, drop it at a friends, put a clean one in and be on your way.

    Another option would be, if you are running a tower, to install a raid controller on your board. You could, in theory, set it with 3 disks and run the cables for the third outside the case so that it’s an easy swap.

    Long story short, I would use a raid controller to mirror disks rather than trying to have multiple time machine instances. Hardware controllers are the way to go for simple and efficient mirroring. That’s my opinion anyway.

  12. +1 for RAID-capable external enclosures.

    Personally I use a Guardian Maximus that I equipped with two WD20EARS drives (2 TByte):

    http://eshop.macsales.com/shop/firewire/usb/raid_1/Gmax

    The Gmax looks nice, in particular when attached to my black PowerBook Pismo (my main work machine). There’s a garage sale on macsales.com every month where you can get refurbished units for a reduced price. Check the OWC blog for announcements.

    I create manual full backups to the Gmax like this:

    hdiutil create -verbose -srcfolder /Volumes/psystem -volname psystem-backup -format UDRO -nocrossdev -scrub /Volumes/gmax/backup/pismo-DATE.dmg 2> /Volumes/gmax/backup/pismo-DATE.log

  13. If you’re looking at redundancy for large (i.e. media) files have you looked into git-annex?

    You could use your multi-disk approach, but make it less dependent on the vagaries of time machine – you can then also host the git-repo on github AND know *exactly* where your media files are saved…

  14. Admittedly this post is a bit late to the party (ok, a month late to the party).

    If you like the simplicity of TM, and you intend to use it exclusively on your Mac, check out Carbon Copy Cloner (aka ‘CCC’, http://www.bombich.com/). It does nice things like automatically running the backup when you plug in the external drive. It can make bootable clones of your entire hard drive (or just part of it), happily handles multiple drives, and it’s free (as in beer).

    It does NOT let you go ‘back in time’ quite like TM does, but if your only goal is data duplication, this is a great path-of-least-resistance option.

    Were I to be you, I’d make sure I had two external drives, and use CCC on at least one of them (maybe TM on the other). Keep one at a friend’s house.

    Of course, this doesn’t do anything for you if your work fills your internal hard drive. Having said that, duplicating data that’s already external isn’t any harder. Just try to keep three copies of all your data in at least two locations (one ‘offsite’), and you’ll be fine. It sounds harder than it is.

  15. Hi Thorfi,

    I’d very much like your script which detects which disc is attached & updates timemachine’s config data to backup to the right place. Any chance you can email it to me ? hodgie59 at gmail

    Thanks
    Steve

  16. thx Skud, but it doesn’t seem to work for me. Have backed up to a new drive. Went back to the old, pointed TimeMachine at the old disc (didn’t allow me to choose the explicit archive – just the disc it sits on), and it tries to create a brand new archive on that disc (same name but with a “1″ subscript) It doesn’t seem to realise I want the existing dump on that disc updated. Have googled the problem, can’t see any evidence that it’s fixed. :(

  17. I don’t create a new archive or anything, I just plug disk #2 in and magic happens. (I think at one point I told it that yes, I wanted to make TM backups on that disk.) There’s no additional fiddling around each time I plug things in, though.

Comments are closed.