lazyweb: help me back up to a shell account


So Dreamhost, odd place they are, have apologized to their users for a long stint of server and network problems (which seems to be resolved, knock on wood) by doubling our monthly bandwidth and 10x-ifying our plans’ disk quotas. That means I’ve got 413GB of storage there now. I don’t have 413GB worth of website and mail. (That’s also $10/month for 200GB of data, which is $0.05/GB, or 1/3 of Amazon S3’s price!)

So since I recently lost 200GB of data at home when both disks in a RAID set failed at the same time*, I figure I might as well use that for backing up. I do have a 35/70 AIT drive at home, but the new server has 320GB total, and that sounds like a lot of work changing tapes compared to shoving it over the network.

But to back up there I have a few requirements, or at least I think they are requirements, and I want to poll the lazyweb to see if I’m way off base or if there’s something that will already do all this for me. So, my requirements:

  1. Incremental. I can’t upload all 320GB nightly or even weekly. It doesn’t have to be completely bandwidth-cheap like rsync, but it has to at least be incremental at the file level.
  2. Encrypted. I’m backing up things like tax records, so I don’t want anyone at Dreamhost to be able to see the data. I don’t even want them to know what it is, so both the file and directory names and the file contents need to be hidden from prying eyes. I’m not worried about the existence of directory structures; it’s ok to reproduce the tree exactly as long as the directory and filenames are unreadable.
  3. Individual files. Dreamhost themselves offer Netapp-style .snapshot directories, so I need to upload individual files and not, say, one giant encrypted cpio archive, in order to be able to take advantage of those .snapshot directories for restoring files from versions prior to the last backup.
  4. Metadata-friendly. I want to keep file ownerships and permissions so I can restore these files later without having to worry about fixing ownerships and permissions on hundreds of GB of data, but I only have a single userid on the far end with which to store them.

Is there anything out there that comes close to what I need, that’d be extensible in the way I describe, or do I have to write this all myself? Are there any other requirements I’m forgetting? If it helps, at the Dreamhost end I have a Linux shell account via SSH, plus FTP, DAV, and maybe SMB access.

(Brad, brackup does all of this, right? :-)

* Yeah, I should know better. But I had RAID-1 and an unused tape drive, what could go WRONG? What went wrong: A scheduled power outage at home that must have come up badly, that took out what was apparently already a marginal power supply in my file server, which in turn managed to let the magic smoke out of everything it could, including two identical hard drives. Incidentally, as of this server I don’t run identical hard drives in my software RAID sets now! The guy at the computer store was confused but once I explained he seemed as though he was going to start recommending the same to everyone.


9 responses to “lazyweb: help me back up to a shell account”

  1. Yes, brackup does all of this. Except ownership metadata… it does keep permissions metadata. Adding ownership would be possible, if you run it as root.

  2. At home, I originally got the Incremental and Individual Files parts with rsync over ssh. (-r –rsh=ssh –link –perms –owner –times –delete and optionally –size-only) As someone else mentioned, it needs to be run as root for the ownership stuff to work correctly (plus, the same usernames and/or username-to-id mappings need to be on both machines or the files are owned by numeric user ids that don’t exist in /etc/passwd.)

    I then changed the process to be more Metadata-Friendly by creating a sparse disk image (it’s an OS X thing that’s roughly equivalent to a loopback filesystem with a nice GUI tool and mounter), accessing that over Samba, then using the same rsync command, but on the local filesystem tree instead of over ssh. This fixes issues with keeping usernames/ids in sync and running as root as well as the metadata thing.

    Not that I’ve done this, but I would imagine that if you could create a similar loopback disk image on the Dreamhost server, access that over a remote protocol (FTP won’t do it, DAV might, Samba certainly would, as would NFS), you would have the same thing. Adding encryption is relatively easy (well, easy-ish for being a Linux fs driver, from what I remember back when I last played around with it), using an encrypted loopback device. The image on the remote share is mounted through the loopback, then a local rsync is run to synchronize the files.

    On the other hand, I had never heard of Brackup until just now, and that’s looking pretty nifty.

  3. The problem with a big loopback disk image is that then you have a single file on the Dreamhost end. I considered doing that, but then I realized that if I wanted to take advantage of their Netapp .snapshots then I’d need individual files (or else I’d have to deal with snapshots of a single 100s-of-GBs file.

    Incidentally you can avoid your “same usernames” problem with rsync’s –numeric-ids option.

    But even if brackup doesn’t work I don’t think this’ll be too hard to get working. Working elegantly might be another story, but still.

  4. Thanks — but those won’t store ownership information. With only a single account at the far end, ownerships need to be stored as metadata, rather than just in the inode at the far end.

    Brackup is looking promising, though, once I write an sftp target for it and get it storing ownership in metadata.

  5. Almost on topic – I think I’m going to sign up for Dreamhost and ditch my overpriced VPS thing that I don’t need.

    Do you do their affiliate thing? Give me a code or something (?) and I’ll hook you up…

  6. Awesome! Thanks. You can sign up here, or list rich+dreamhost at lafferty dot ca as the person that referred you (in normal email address form, of course).