Building Your Own Dropbox Equivalent

My previous dog food post did generate a couple of requests for more detailed instructions on how to build your own private cloud storage repository, so here goes.

First, you need to have a server somewhere that will always be on, or at least on as much of the time as you want to access your cloud storage. I happen to have a server in the Rackspace Cloud that I leave running for various purposes at $10/month. If you already have a cloud server or VPS that you’re running somewhere, or a home/work machine, that works fine too.

Next, you need to be running GlusterFS on your server and clients. There are already pre-packaged versions available for some Linux distributions, though some are more current than others. For example, Fedora is at 3.1.3 which is pretty close to current, while Ubuntu is at 3.0.5 which is simply too old to be useful. Downloads are also available from Gluster themselves and elsewhere. However, you’re highly likely to run into a problem with many of the pre-packaged versions. Prior to 3.1.4, the servers would not accept connections from non-privileged ports and if you’re using any sort of NAT/tunnel/VPN you’ll probably be assigned just such a port. At least in 3.1.4, they added options to override this misguided code (which doesn’t even check the port’s privilege status the right way BTW) but my choice was to remove it entirely and build my own RPMs. I’ll provide patches and fixed RPMs soon; bug me if I forget. Anyway, install whatever 3.1+ version you want and move on to the next step.

At first, you’ll want to create your volumes and start your server-side daemons the “normal” way, something like this.

# mkfs -t ext4 /dev/sdwhatever
# mkdir -p /bricks/mycloud
# mount -o noatime,user_xattr /dev/sdwhatever /bricks/mybrick1
# gluster volume create mycloud my.external.ip.address:/bricks/mybrick1
# gluster volume start mycloud

At this point, you would actually be able to mount the “mycloud” volume from any GlusterFS client, unless you have firewall issues (which you often will in a public cloud including Rackspace). Here’s one simple way to get around that, though I actually don’t recommend this method for reasons I’ll get to in a moment.

# netstat -lpn | grep glusterfsd
# iptables -I INPUT -p tcp --dport 24009 -j ACCEPT

24009 is just the port I happened to get. Also, you’d probably want to make the IP tables rule more specific to the hosts you use, save it as part of your private config, yadda yadda. However, the reason you probably don’t want to do this is that it provides no security at all. Instead, you’ll want to run this through some sort of tunnel, and in that case, you’ll effectively be making a local connection to the server. Here’s how you’d do it with ssh.

# ssh -L24009:localhost:24009

The other part of this is avoiding some of Gluster’s port-mapping silliness. Grab the client-side “volfile” describing the mount from /etc/glusterd/vols/mycloud/mycloud-fuse.vol and copy it to whatever machines you’ll be mounting on. Edit the “protocol/client” part of the copy to look something like this:

volume mycloud-client-0
    type protocol/client
    option remote-host localhost
    option remote-port 24009
    option remote-subvolume /bricks/mybrick1
    option transport-type tcp

Now your mount command can change something like this.

# mount -t glusterfs /mnt/mycloud # glusterfs way = wrong
# mount -f ~/mycloud-fuse.vol /mnt/mycloud # cloudfs way = right

At this point, you have a remotely accessible directory with encrypted communication, but data on the remote server that’s still accessible to anyone who can get into that server. In other words, you’ll have something practically equivalent to most of the cloud-storage offerings out there – including the ones you pay for. What you need, both to preserve your privacy and to comply with many companies’ information security policies (such as Red Hat’s), is pure client-side encryption. That’s where CloudFS comes in . . . or at least part of it. At this point, you need to be a bit of a coder or fairly advanced sysadmin, because I’m going to make you pull the code and build your own RPM. I’ll provide some RPMs shortly; also, note that you specifically want the “aes” branch for now. To make it work for you, you’ll need to edit that client-side volfile again, adding a new “encryption/crypt” stanza at the end like this:

volume mycloud-crypt
    type encryption/crypt
    option key %0123456789abcdef0123456789abcdef
    subvolumes mycloud-client-0

The key can be whatever you want, at 128/192/256 bits, and can also be stored in a file (use a first character of “/” instead of “%” in the volfile). One easy way to generate such a key is like this:

dd if=/dev/urandom bs=16 count=1 | od -tx1 -An | tr -d '\n '

In any case, the resulting volfile contains nothing but the two stanzas I’ve shown, in that order. All of the “performance” translators which can interfere with proper operation of the encryption code need to be stripped out.

At this point you should be able to mount using your custom volfile to have full encryption both on the network and on disk, with all of the keys safely on your own client system(s). Performance might not be all that great for some kinds of operations, but I’m using it myself and it seems adequate for most purposes. Some day I’ll finish the UID-mapping translator so that you can use this from different UIDs on different machines without getting permission errors, and I’ll finish the built-in SSL transport so you can connect directly instead of needing an ssh tunnel. Then it’ll be really cool, but you know what might be even cooler? With GlusterFS 3.2 you could even replicate this data across two different cloud providers, giving you all of that “multi cloud” goodness that has been all the rage since the EBS outage. With CloudFS 1.0 you’ll be able to do it an even better way, with better consistency and decent performance and so on. At that point I’ll seriously start to question the sanity of anyone who’s using some other solution that doesn’t offer the same performance and the same levels of data protection from both accident and intrusion, or which isn’t open-source like everything I’ve talked about.

UPDATE: Some people on the Hacker News thread (thanks Jesse) have rightly pointed out that what I’ve described here isn’t really equivalent to Dropbox. Rather, it’s a directly mountable filesystem which I feel is even better. If you really want something equivalent to Dropbox, you’d have to do two things. To get the sync/work-offline functionality, you’d have to set up this plus a second directory and use something like rsync/Unison/lsyncd to sync between them. To get a GUI, there must be a few dozen web apps you can plop onto the server pointing at a local mount of the same volume. Note, though, that you give up some security when you do this because anybody who can get onto the server can bypass encryption by looking in the mount used by the web app. Since the whole point is to have better security than Dropbox, not to repeat their mistakes, I can’t recommend that. The nearest thing I can think of would be to build the encryption pieces directly into the web app (GlusterFS does actually make this possible) but even then the data would still exist transiently in the web app’s memory on the server so it’s not much an improvement. If anybody knows of a good way to have a web UI that does the encryption/decryption entirely on the client side – big blob of JavaScript? – please let us know in the comments. It might be an interesting project, though probably not one I’m interested/qualified to attempt personally.