A jacket for a motorcycle is a bike gear with numerous functions. The work of a cloud storage developer involves long hours of sitting and being indoors just to ensure the programming language is on point without disappointing the clients. Riding a motorcycle is an unwinding activity to allow him to relieve anxiety and relax. A jacket comes in handy as a protective gear and a safety measure.
In some states, you are not allowed to ride without a jacket. Imagine of an impact, apart from the helmet the upper limbs are protected by the jacket made from a light mesh material for summer riding. The outdoor event causes health challenges, especially during the cold season. The extreme exposure of your chest to cold might cause asthma, pneumonia and other health-related conditions.
Yes, you want to have an outdoor activity, and you feel riding is the best option. A helmet, boots, and a long pant are protective gears for all seasons.
What entails the best motorcycle jacket for a cloud storage developer?
Choose a jacket made from the material with the ability to regulate heat and temperature to enhance comfortability. The inner lining should be made of cotton renowned for this feature such that when it is very cold, the heat is absorbed towards the skin to keep the rider warm. When it is hot, it should also radiate the heat away from the body.
Are you able to ride during the rainy season and still have fun? This is only possible with a jacket with a waterproof lining, to keep you warm despite the winter season or rainy season. In addition, the waterproof ability allows you to adventure and ride in stormy and muddy waters without a hindrance.
What happens when you are in a rugged landscape? How comfortable will you be in the jacket? Ensure the material has a shockproof ability to withstand rough terrain at the same time the sliding material should protect you from bruises in the case of an impact or a fall.
The lighter the motorcycle jacket, the better the coziness. This is possible with a light material which does not compromise on the safety. The jacket has then two to three layers, each of the layers has a purpose with aesthetic value.
Riding in the dark is inevitable. A mechanical problem might arise or rather get lost in the course of your riding adventure. A good motorcycle jacket should have a reflector as a safety measure especially when riding on a busy highway. It alerts other motorists of your presence on the road.
Despite the safety and protection, never compromise on your taste of color. You are at liberty to choose a color of choice to match your passion and enhance your sense of fashion when riding.
Whether a cloud software developer rides as a workout plan or for fun or adventure, a jacket is a must-have clothing for protection and safety for him or her as well as prevent endangering other motorists and road users.
My previous dog food post did generate a couple of requests for more detailed instructions on how to build your own private cloud storage repository, so here goes.
First, you need to have a server somewhere that will always be on, or at least on as much of the time as you want to access your cloud storage. I happen to have a server in the Rackspace Cloud that I leave running for various purposes at $10/month. If you already have a cloud server or VPS that you’re running somewhere, or a home/work machine, that works fine too.
Next, you need to be running GlusterFS on your server and clients. There are already pre-packaged versions available for some Linux distributions, though some are more current than others. For example, Fedora is at 3.1.3 which is pretty close to current, while Ubuntu is at 3.0.5 which is simply too old to be useful. Downloads are also available from Gluster themselves and elsewhere. However, you’re highly likely to run into a problem with many of the pre-packaged versions. Prior to 3.1.4, the servers would not accept connections from non-privileged ports and if you’re using any sort of NAT/tunnel/VPN you’ll probably be assigned just such a port. At least in 3.1.4, they added options to override this misguided code (which doesn’t even check the port’s privilege status the right way BTW) but my choice was to remove it entirely and build my own RPMs. I’ll provide patches and fixed RPMs soon; bug me if I forget. Anyway, install whatever 3.1+ version you want and move on to the next step.
At first, you’ll want to create your volumes and start your server-side daemons the “normal” way, something like this.
# mkfs -t ext4 /dev/sdwhatever # mkdir -p /bricks/mycloud # mount -o noatime,user_xattr /dev/sdwhatever /bricks/mybrick1 # gluster volume create mycloud my.external.ip.address:/bricks/mybrick1 # gluster volume start mycloud
At this point, you would actually be able to mount the “mycloud” volume from any GlusterFS client, unless you have firewall issues (which you often will in a public cloud including Rackspace). Here’s one simple way to get around that, though I actually don’t recommend this method for reasons I’ll get to in a moment.
# netstat -lpn | grep glusterfsd # iptables -I INPUT -p tcp --dport 24009 -j ACCEPT
24009 is just the port I happened to get. Also, you’d probably want to make the IP tables rule more specific to the hosts you use, save it as part of your private config, yadda yadda. However, the reason you probably don’t want to do this is that it provides no security at all. Instead, you’ll want to run this through some sort of tunnel, and in that case, you’ll effectively be making a local connection to the server. Here’s how you’d do it with ssh.
# ssh -L24009:localhost:24009 firstname.lastname@example.org
The other part of this is avoiding some of Gluster’s port-mapping silliness. Grab the client-side “volfile” describing the mount from /etc/glusterd/vols/mycloud/mycloud-fuse.vol and copy it to whatever machines you’ll be mounting on. Edit the “protocol/client” part of the copy to look something like this:
volume mycloud-client-0 type protocol/client option remote-host localhost option remote-port 24009 option remote-subvolume /bricks/mybrick1 option transport-type tcp end-volume
Now your mount command can change something like this.
# mount -t glusterfs my.cloud.server:mycloud /mnt/mycloud # glusterfs way = wrong # mount -f ~/mycloud-fuse.vol /mnt/mycloud # cloudfs way = right
At this point, you have a remotely accessible directory with encrypted communication, but data on the remote server that’s still accessible to anyone who can get into that server. In other words, you’ll have something practically equivalent to most of the cloud-storage offerings out there – including the ones you pay for. What you need, both to preserve your privacy and to comply with many companies’ information security policies (such as Red Hat’s), is pure client-side encryption. That’s where CloudFS comes in . . . or at least part of it. At this point, you need to be a bit of a coder or fairly advanced sysadmin, because I’m going to make you pull the code and build your own RPM. I’ll provide some RPMs shortly; also, note that you specifically want the “aes” branch for now. To make it work for you, you’ll need to edit that client-side volfile again, adding a new “encryption/crypt” stanza at the end like this:
volume mycloud-crypt type encryption/crypt option key %0123456789abcdef0123456789abcdef subvolumes mycloud-client-0 end-volume
The key can be whatever you want, at 128/192/256 bits, and can also be stored in a file (use a first character of “/” instead of “%” in the volfile). One easy way to generate such a key is like this:
dd if=/dev/urandom bs=16 count=1 | od -tx1 -An | tr -d '\n '
In any case, the resulting volfile contains nothing but the two stanzas I’ve shown, in that order. All of the “performance” translators which can interfere with proper operation of the encryption code need to be stripped out.
At this point you should be able to mount using your custom volfile to have full encryption both on the network and on disk, with all of the keys safely on your own client system(s). Performance might not be all that great for some kinds of operations, but I’m using it myself and it seems adequate for most purposes. Some day I’ll finish the UID-mapping translator so that you can use this from different UIDs on different machines without getting permission errors, and I’ll finish the built-in SSL transport so you can connect directly instead of needing an ssh tunnel. Then it’ll be really cool, but you know what might be even cooler? With GlusterFS 3.2 you could even replicate this data across two different cloud providers, giving you all of that “multi cloud” goodness that has been all the rage since the EBS outage. With CloudFS 1.0 you’ll be able to do it an even better way, with better consistency and decent performance and so on. At that point I’ll seriously start to question the sanity of anyone who’s using some other solution that doesn’t offer the same performance and the same levels of data protection from both accident and intrusion, or which isn’t open-source like everything I’ve talked about.
For a long time now, I’ve been running a server in the Rackspace cloud to do various things for me, including a sort of Dropbox equivalent to which I sync various files I want to be accessible from everywhere. Historically this has involved a combination of sshfs and encfs, but it’s about time I started eating my own dog food and using (some parts of) CloudFS for this. Yes, I know some people don’t like the “dog food” terminology, but the oft-cited “drinking our own champagne” alternative is even less applicable in this case. I wrote it. It’s still dog food until I say otherwise. Even as I write this, I’m copying files from the old setup to the new one based on pretty vanilla GlusterFS plus the at-rest encryption translator from CloudFS, all mounted on my desktop at work. I’ll have to run that through an ssh tunnel for now – until I finish writing the SSL translator – to deal with authentication and in-flight encryption issues. Similarly, I need to finish the ID-mapping translator before I could recommend this for use by more than one person per machine. With those caveats, though, I’d still say that the result is usable and secure enough for my own purposes (including compliance with Red Hat’s infosec policies). If anybody else is interested in getting on the “personal CloudFS” bandwagon, I’ll post some detailed instructions on how you can do this yourself.
Extended attributes are one of the best kept secrets in modern filesystems. Here you have a fully general feature to attach additional information to files, supported by most modern filesystems, and yet hardly anybody seems to use it. As it turns out, though, GlusterFS uses extended attributes – xattrs – quite extensively to do almost all of the things that it does from replication to distribution to striping. Because they’re such a key part of how GlusterFS works, and yet so little understood outside of that context, xattrs have become the subject of quite a few questions which I’ll try to answer. First, though, let’s just review what you can do with xattrs. At the most basic level, an xattr consists of a string-valued key and a string or binary value – usually on the order of a few bytes up to a few dozen. There are operations to get/set xattrs, and to list them. This alone is sufficient to support all sorts of functionality. For example, SELinux security contexts and POSIX ACLs are both stored as xattrs, with the underlying filesystems not needing to know anything about their interpretation. In fact, I was just dealing with some issues around these kinds of xattrs today . . . but that’s a story for another time.
The sneaky bit here is that the act of getting or setting xattrs on a file can trigger any kind of action at the server where it lives, with the potential to pass information both in (via setxattr) and out (via getxattr). That amounts to a form of RPC which components at any level in the system can use without requiring special support from any of the other components in between, and this trick is used extensively throughout GlusterFS. For example, the rebalancing code uses a magic xattr call to trigger recalculation of the “layouts” that determine which files get placed on which servers (more about this in a minute). The “quick-read” translator uses a magic xattr call to simulate an open/read/close sequence – saving two out of three round trips for small files. There are several others, but I’m going to concentrate on just two: the trusted.glusterfs.dht xattr used by the DHT (distribution) translator, and the trusted.afr.* xattrs used by the AFR (replication) translator.
The way that DHT works is via consistent hashing, in which file names are hashed and the hashes looked up in a table where each range of hashes is assigned exactly one “brick” (volume on some server). This assignment is done on each directory when it’s created. Directories must exist on all bricks, with each copy having a distinct trusted.glusterfs.dht xattr describing what range of hash values it’s responsible for. This xattr contains the following (all 32-bit values):
- The count of ranges assigned to the brick (for this directory). This is always one currently, and other values simply won’t work.
- A format designator for the following ranges. Currently zero, but this time it’s not even checked so it doesn’t matter.
- For each range, a starting and ending hash value. Note that there’s no way in this scheme to specify a zero-length range, nor can ranges “wrap around” from 0xffffffff to 0.
When a directory is looked up, therefore, all the code needs to do is collect these xattrs and combine the ranges they contain into a table. There’s also code to look for gaps and overlaps, which seem to have been quite a problem lately. It doesn’t take long to see that there are some serious scalability issues with this approach, such as the requirement for directories to exist on every brick or the need to recalculate xattrs on every brick whenever new bricks are added or removed. I have to address these issues, but for now the scheme works pretty well.
The most complicated usage of xattrs is not in DHT but in AFR. Here, the key is the trusted.afr.* xattrs, where the * can be the name of any brick in the replica set other than the one where the xattr appears. Huh? Well, let’s say you have an AFR volume consisting of subvolumes test1-client-0 (on server0) and test1-client-1 (on server1). A file on test1-client-0 might therefore have an xattr called trusted.afr.test1-client-1. The reason for this is that the purpose of AFR is to recover from failures. Therefore, the state of an operation can’t just be recorded in the same place where a failure can wipe out both the operation and the record of it. Instead, operations are done one place and recorded everywhere else. (Yes, this is wasteful when there are more than two replicas; that’s another thing I plan to address some day). The way this information is stored is as “pending operation counts” with xattrs recording counts (each 32-bit) for three different kinds of operations:
- Data operations – mostly writes but also e.g. truncates
- Metadata operations – e.g. chmod/chown/chgrp, and xattrs (yes, this gets recursive)
- Namespace operations – create, delete, rename, etc.
Whenever a modifcation is made to the filesystem, the counters are updated everywhere first. In fact, GlusterFS defines a few extra xattr operations (e.g. atomic increments) just to support AFR. Once all of the counters have been incremented, the actual operation is sent to all replicas. As each node completes the operation, the counters everywhere else are decremented once more. Ultimately, all of the counters should go back to zero. If a node X crashes in the middle, or is unavailable to begin with, then every other replica’s counter for X will remain non-zero. This state can easily be recognized the next time the counters are fetched and compared – experienced GlusterFS users probably know that a stat() call will do this. The exact relationships between all of the counters will usually indicate which brick is out of date, so that “self-heal” can go in the right direction. The most fearsome case, it should be apparent, is when the xattrs for the same file/directory on two bricks have non-zero counters for each other. This is the infamous “split brain” case, which can be difficult or even impossible for the system to resolve automatically.
Those are the two most visible uses of xattrs in GlusterFS but, as I said before, there are others. For example, trusted.gfid is a sort of generation number used to detect duplication of inode numbers (because DHT’s local-to-global mapping function is prone to such duplication whenever the server set changes). My personal favorite is trusted.glusterfs.test which appears (with the value “working”) in the root directory of every brick. This is used as a “probe” to determine whether xattrs are supported, but then never cleaned up even after the probe has yielded its result. The result of all this xattr use and abuse is, of course, a confusing plethora of xattrs attached to everything in a GlusterFS volume. That’s why it’s so important when saving/restoring to use a method that handles xattrs properly. Hopefully, I’ve managed to show how crucial these little “extra” bits of information are and perhaps given people some ideas for how to spot or fix problems related to their use.
As the name implies, CloudFS is a file system for the cloud. What does that mean? First, it means that it’s a filesystem, with the behaviors that people – and programs – expect of filesystems, and not some completely different set of behaviors characteristic of a database or blob store or something else. Here are some examples:
- You access data in a filesystem by mounting it and issuing a familiar set of open/close/read/write calls so that every language and library and program under the sun that can use a filesystem can use this one.
- Files are arranged into directories which can be nested arbitrarily, without requiring the user to establish and follow some separate convention on top of a single-level hierarchy.
- The data model is a byte stream – not blocks, not records or rows – in which reads and writes can be done at any offset for any length.
- Performance and consistency for small writes in large files are not reduced to near zero by doing read/modify/write on whole files.
- Files and directories have owners, permissions, and other information associated with them besides their contents.
You might notice some things that are missing – e.g. locks or atomic cross-directory rename. That’s because most applications don’t rely on these features, often because they’re supported poorly or inconsistently by existing filesystems, and they’re things that anyone working specifically in the cloud should try to avoid. That last part, in turn, is because some features are impossible (or at least nearly so) to implement acceptable in the cloud. If nobody would be satisfied with the result anyway then trying is a waste of time – time that can be better spent implementing other features that really will be needed.
Many of the things that need a filesystem are not whole applications being developed from the ground up. using every possible filesystem feature. They’re libraries and frameworks that are used by other applications, and they just need basic filesystem functionality as I’ve outlined above. If you’ve constructed your application out of a dozen such pieces, the very last thing you want to do or should be doing is to dive into each and every one of these alien bits of code to make them use some other kind of data store instead. If you can build your entire application from the ground up to use something else that’s a better fit for your needs than a traditional filesystem, then that’s great. More power to you. Meanwhile, I believe that a much larger number of programmers would be better served by having a plain old-fashioned filesystem . . . albeit one that’s based on the latest technology.
OK, so much for the filesystem part. What about the cloud part? Aren’t there already filesystems you can use in the cloud? There sure are. In fact, I do exactly that myself all the time. It’s a fine thing. However, there are a few problems with doing things this way. One is that you have to manage the servers and their configuration yourself. Not only is that just one more burden as you’re trying to do Something Else, but it doesn’t allow you to take advantage of shared-service economies. Part of the cloud value proposition is supposed to be that when you aggregate resources across many users their individually unpredictable growth or bursts balance out (James Hamilton’s “non-correlated peaks” idea). The resulting aggregate predictability (plus the ability to amortize the cost of hiring real experts) allows things like capacity provisioning and monitoring to be done more efficiently in one place than if they had to be done separately by each user. Also, when you run heavy I/O within your computer resources you’re using the wrong tool for the job. It’s much better to run physical I/O on machines provisioned and tuned for that kind of thing than to run virtual I/O on machines provisioned and tuned for something else entirely. For all of these reasons, having the provider set up a filesystem as a permanent, shared resource is preferable to having each user set up their own . . . so long as it’s done in a way that preserves the users’ and providers’ needs. On the user side, you want to share between all of a user’s machines but you don’t want users reading each others’ data (let alone writing it). On the provider side, you need features such as quotas and accurate billing so users can actually be charged for what they use.
This brings us to the “why” of CloudFS: because sometimes people need a filesystem, because anything you put in the cloud as a shared service needs to meet certain requirements, and because none of the filesystems that might otherwise fit people’s needs meet those requirements. Without getting too deep into the details of exactly what features CloudFS providers to fill this gap – that will be my next post – I think it’s safe to say a big gap that needs filling.
Yesterday, I wrote about why there’s a need for CloudFS. Today, I’m going to write about how CloudFS attempts to satisfy that need. As before, there’s a “filesystem” part and a “cloud” part, which I’ll address in that order. Let’s start with the idea that writing a whole new cloud filesystem from scratch would be a mistake. It takes a lot of time and resources to develop, then it’s another long struggle to have it accepted by users. Instead, let’s start by considering what an existing filesystem would need to be like for us to use it as a base for further cloud-specific enhancements.
- It must support basic filesystem functionality, as outlined in the last post.
- It must be open source.
- It must be mature enough that users will at least consider trusting their data to it.
- It must be shared, i.e. accessible over a network.
- It must be horizontally scalable. Cloud economics are based on maximizing the ratio between users and resource pools, so the bigger we can make the pools the more everyone will benefit. Since this is a general-purpose and not HPC-specific environment, we need to consider metadata scalability (ops/second) as well as data (MB/second) and we should be very wary of “solutions” that centralize any kind of operation in one place ( distributing too broadly either).
- It must provide decent data protection, even on commodity hardware with only internal storage. I’m not even going to get into the technical merits of shared storage vs. “shared nothing” because the “fact on the ground” is that the people building large clouds prefer the latter regardless of what we might think.
- It should ideally support RDMA as well as TCP/IP. This is not strictly necessary, but is highly useful for a server-private “back-end” network and is becoming more and more relevant for “front end” networks as higher-performance cloud environments gain traction.
There are really only three production-level distributed/parallel filesystems that come close to these requirements: Lustre, PVFS2, and GlusterFS. There’s also pNFS, which is a protocol rather than a ready-to-use implementation and addresses neither metadata scaling nor data protection, and Ceph, which is in many ways more advanced than any of the others but is still too young to meet the “trust it in production” standard for most people. Then there are dozens of other projects, most of which either don’t provide basic filesystem behavior, aren’t sufficiently mature, and/or don’t provide real horizontal scalability. Some day one of those might well emerge and distinguish itself, but there’s work to be done now.
As tempting as it might be to go into detail about the precise reasons why I consider various alternatives unsuitable, I’m going to refrain. The key point is that GlusterFS is the only one that can indisputably claim to meet all of the requirements above. It does so at a cost, though. It is almost certainly the slowest of those I’ve mentioned, per node on equivalent hardware, at least as such things are traditionally measured. This is not primarily the result of it being based on FUSE, which is often maligned – without a shred of supporting evidence – as the source of every problem in filesystems based on it. The difference is minimal unless you’re using replication, which is necessary to provide data protection within the filesystem itself instead of falling back to the “just use RAID” and “just install failover software (even though we can’t tell you how)” excuses used by some of the others. Every one of these options has at least one major “deficiency” making it an imperfect fit for CloudFS, so the question becomes which problems you’d rather be left with when it comes time to deploy your filesystem as shared infrastructure. When you already have horizontal scalability, low per-node performance is a startlingly easy problem to work around.
The other main advantage of GlusterFS, compared to the others, is extensibility. Precisely because it’s FUSE-based, and because it has a modular “translator” interface besides, it makes extension such as we need to do much easier than if the new code had to be strongly intertwined with old code and debugged in the kernel. Many of the libraries we’ll need to use, such as encryption, are trivially available out in user-space but would require a major LKML fight to use from the kernel. No, thanks. The advantages of higher development “velocity” are hard to understate and are already apparent in how far GlusterFS has come relative to its competitors since about two years ago (when I tried it and it was awful). If somebody decides later on that they do want to wring every last bit of performance out of CloudFS by moving the implementation into the kernel, that’s still easier after the algorithms and protocols have been thoroughly debugged. I’ve been developing kernel code by starting in user space for a decade and a half, and don’t intend to stop now.
At this point, we’re ready to ask: what features are still missing? If we have decent scalability and decent data protection and so on, what must we still add before we can deploy the result as a shared resource in a cloud environment? It turns out that most of these features have to do with protection – with making it safe to attempt such a deployment.
- We need strong authentication, not so much for its own sake but because the cloud user’s identity is an essential input to some of the other features we need.
- We need encryption. Specifically, we need encryption to protect not only against other users seeing our data but also against the much more likely “insider threat” of the cloud provider doing so. This rules out the easy approaches that involve giving the provider either symmetric or private keys, no matter how securely such a thing might be done. Volume level encryption is out, both for that reason but also because users share volumes so there’s no one key to use. The only encryption that really works here is at the filesystem level and on the client side.
- We need namespace isolation – e.g. separate subdirectories for each cloud user.
- We need UID/GID isolation. Since we’re using a filesystem interface, each request is going to be tagged with a user ID and one or more group IDs which serve as the basis for access control. However, my UID=50 is not the same as your UID=50, nor is it the same as the provider’s UID=50. Nonetheless, my UID=50 needs to be stored as some unique UID within CloudFS on the server side and then converted back later.
- We need to be able to set and enforce a quota, to prevent users from consuming more space than they’re entitled to.
- We need to track usage, precisely, including space used and data transferred in/out, on a per-user basis for billing purposes.
This is where the translator architecture really comes into its own. Every one of these features can be implemented as a separate translator, although in some cases it might make sense to combine several functions into one translator. For example, the namespace and UID/GID isolation are similar enough that they can be done together, as are a quota and usage tracking. What we end up with is a system offering each user similar performance and similar security to what they’d have with a privately deployed filesystem using volume-level encryption on top of virtual block devices, but with only one actual filesystem that needs to be managed by the provider.