A jacket for a motorcycle is a bike gear with numerous functions. The work of a cloud storage developer involves long hours of sitting and being indoors just to ensure the programming language is on point without disappointing the clients. Riding a motorcycle is an unwinding activity to allow him to relieve anxiety and relax. A jacket comes in handy as a protective gear and a safety measure.
In some states, you are not allowed to ride without a jacket. Imagine of an impact, apart from the helmet the upper limbs are protected by the jacket made from a light mesh material for summer riding. The outdoor event causes health challenges, especially during the cold season. The extreme exposure of your chest to cold might cause asthma, pneumonia and other health related conditions.
Yes, you want to have an outdoor activity, and you feel riding is the best option. A helmet, boots, and a long pant are protective gears for all seasons.
What entails the best motorcycle jacket for a cloud storage developer?
Choose a jacket made from the material with the ability to regulate heat and temperature to enhance comfortability. The inner lining should be made of cotton renowned for this feature such that when it is very cold, the heat is absorbed towards the skin to keep the rider warm. When it is hot, it should also radiate the heat away from the body.
Are you able to ride during the rainy season and still have fun? This is only possible with a jacket with a waterproof lining, to keep you warm despite the winter season or rainy season. In addition, the waterproof ability allows you to adventure and ride in stormy and muddy waters without a hindrance.
What happens when you are in a rugged landscape? How comfortable will you be in the jacket? Ensure the material has a shockproof ability to withstand rough terrain at the same time the sliding material should protect you from bruises in the case of an impact or a fall.
The lighter the motorcycle jacket, the better the coziness. This is possible with a light material which does not compromise on the safety. The jacket has then two to three layers, each of the layers has a purpose with aesthetic value.
Riding in the dark is inevitable. A mechanical problem might arise or rather get lost in the course of your riding adventure. A good motorcycle jacket should have a reflector as a safety measure especially when riding on a busy highway. It alerts other motorists of your presence on the road.
Despite the safety and protection, never compromise on your taste of color. You are at liberty to choose a color of choice to match your passion and enhance your sense of fashion when riding.
Whether a cloud software developer rides as a workout plan or for fun or adventure, a jacket is a must-have clothing for protection and safety for him or her as well as prevent endangering other motorists and road users.
You judge a company by the interior and exterior decor. Right from the furniture to the fittings, all these have an impact on the attitude of the clients towards the company. A cloud software company should invest in the best heavy-duty router tables from a renowned woodworking website to make a choice.
In as much as it deals with the tech field where most of the virtual clients are, their office design should have a positive impact just like the applications they develop.
Cloud storage is the current digital solution in the storage of data for both personal and commercial use. With the correct applications, be sure of an influx of clients fighting to have space on your storage platform.
Ask yourself the following questions
· As they wait for service from your office, how comfortable are they in the woodwork furniture?
· What is the outlook of your office desk?
· Are your fittings in line with the color scheme?
· Does your woodwork design add aesthetic value?
These are the guiding principles of the best office design for your cloud software company. Here are some of the ideas to achieve this goal.
As a company you have a company color, ensure you spray your woodwork design with the color theme in mind to enhance the welcoming effect as well as the vision and the goal of the company. Different colors portray different effect and all these should be portrayed in your company merchandise.
You have to utilize the office space without it being cluttered or too empty due to a poor arrangement. In the setting, ensure there is free movement and joints of the woodwork furniture are correctly tacked on the wall to enhance durability and avoid minor accidents within the office.
Creativity and uniqueness of the designs
As a cloud software company, you take pride in creativity and originality of the application, depict this in the office furniture and woodwork designs in the office. Use online tools and renowned woodworker to make designs that every visitor in your office never fails to notice. It will give you an upper hand in closing a business deal. “If they invest in the best office designs, what of the products?” a client laments. It gives a client a sense of professionalism and integrity in service delivery of your products.
Imagine of a cloud software company with a detergent as a cover picture. You get the confusion in the relevance of the theme. Why not place a computer in one of your woodwork designs to enhance the applicability of your products?
You can also portray your creativity in woodwork designs in the partitions of your office. If you lease a business premise without partition, it is time to portray your woodwork skills by enhancing its beauty through unique partitions.
Woodwork is a vintage skill that finds relevance in the contemporary society. Although there are new methods, the incorporation of old designs still gives a cloud software company a facelift Despite the virtual nature of their work. Engage a professional woodworker or DIY projects.
Bathroom remodeling is one of the top ranked home improvement projects today. This is because it is one of the essential rooms that allow you to have an easier time as you do your private business. Modernizing your bathroom will not only increase your home’s value but also maximize its functionality, comfort, and style. Remodeling a bathroom is not a simple process especially if you do not do proper preparations. There are many things bathroom essentials to consider installing, the design plans, and the toilet brand to choose. Finding the right mix of the renovation planning and appliances to use will help you to build a bathroom that you have always wanted.
To start with, you need to consider the toilet brand to use. Today there are very many brands on the market. With so many products on the market, it is easy to get confused on the brand to select. One of the trusted brands is the Toto brand. This manufacturer is well known for producing functional and high-quality products that give your bathroom a unique and stylish look. Since the brand has a variety of models, it is easy to get the model that fits your demands.
One of the best toilet models available on the market is the corner toilet. This toilet has a great design that allows you to add a toilet right in the corner of your bathroom rather than along one of the walls. This toilet comes with a triangular tank that fits in the corner perfectly. Besides giving the bathroom with a stylish look, this toilet will also help you to save a lot of space in your bathroom. This is one of the best models to use especially if you have a smaller bathroom.
Another great model available is the tankless toilet. Most toilets come with a cistern on the back, making it occupy a large footprint in your bathroom. Unlike another type of toilets, the tankless models do not depend on gravity fed water tanks; instead, they rely on a pump to get rid of the waste. This means that there is no tank saving a lot of space in the bathroom.
Another reason for considering toilets from a reliable manufacturer is that all the models high quality and compact sizes making them excellent addition for any bathroom. On the other hand, the models are packed with features that make them stylish and functional.
Comparing the various brands on the market is not simple. If you are planning to go through various models and give a comprehensive review, the best way to do this is to compile the review as a video. You can post this video to a review site to help other clients when making an informed decision. Saving the video as a cloud file system of toilet brands/ companies comparison will ensure that your work is well stored and anyone can access it whenever they are trying to find the best toilet for their bathrooms
Since there are many brands on the market, it is easy to find a toilet and bathroom accessories that meet your requirement. Take your time to do research online for you to find a model that matches your bathroom décor and space requirement.
Most of us love taking part in outdoor games and recreation. However, cold weather in certain periods of the year such as the winter makes it hard enjoy the outdoor games. This is the time to enjoy the indoor games. On the other hand, some individuals prefer the indoor games. If you enjoy indoor games, then the ping-pong or table tennis game is a great choice. The ping-pong is similar to the tennis game only that you play it on a table instead of a court. The game has a net at the center, which marks the boundary between you and your opponent.
The ping-pong game is a great game to spend quality time with your family. If you have children in your home, it is important to understand that they desire to spend time with their parents. Family time is important when it comes to the development of your children both intellectually and psychologically. Besides assisting in the development of your children, it also helps in optimizing your family bonds. This article will discuss some of the benefits playing ping-pong game with your family.
One of the benefits of playing this game is that it offers your children with learning opportunities. This is a great family game fun at home provides you with an entertaining way to learn new skills. There are many essential skills and benchmark concepts that you pass to your children when playing this game. Fist this game will provide your kids with the ability to optimize their ability to recognize certain words and letters which in turns improve your kid is reading skills. When playing this game, your child will also benefit from color recognition and improvement of visuals.
Besides enhancing your children intellectual growth, this game will also improve your child social skills. Since the game requires you to communicate verbally, your child will learn how to express themselves when out with their children. By interacting with other family members, the game will provide the child with an opportunity to open up and communicate openly with other members of the family. This will foster relationship within the family unit and build closer bonds between you and your family.
Besides bringing your family together, your child will learn how to play a game that he/she enjoys. Through the exercise, your child will gain all the basic knowledge of the game, and this can give them the skill they need to compete with other kids. This will also give your child a favorite pastime activity, and this can keep them occupied. Additionally, ping-pong game will help the children to exercise. With the rise of obesity in children, playing ping-pong is very helpful when it comes to exercising. Exercise will keep their body fit and avoid the diseases that are associated with being overweight.
Today’s technology has enhanced the convenience of doing things. You can record the game and store it so that you can remember the fun time you spent with your family. Today there are many storage devices that you can keep the data. You can decide to keep the data on your computer’s hard disk or memory cards. However, your computer can crash, and this may make you lose all your data. The Cloud storage can also save Ping-pong match video. Saving the video in cloud storage will allow you to keep it safe and you can retrieve it anytime you want. This technology allows you to save your videos for a lifetime without any risk of losing them.
My previous dog food post did generate a couple of requests for more detailed instructions on how to build your own private cloud storage repository, so here goes.
First, you need to have a server somewhere that will always be on, or at least on as much of the time as you want to access your cloud storage. I happen to have a server in the Rackspace Cloud that I leave running for various purposes at $10/month. If you already have a cloud server or VPS that you’re running somewhere, or a home/work machine, that works fine too.
Next, you need to be running GlusterFS on your server and clients. There are already pre-packaged versions available for some Linux distributions, though some are more current than others. For example, Fedora is at 3.1.3 which is pretty close to current, while Ubuntu is at 3.0.5 which is simply too old to be useful. Downloads are also available from Gluster themselves and elsewhere. However, you’re highly likely to run into a problem with many of the pre-packaged versions. Prior to 3.1.4, the servers would not accept connections from non-privileged ports and if you’re using any sort of NAT/tunnel/VPN you’ll probably be assigned just such a port. At least in 3.1.4, they added options to override this misguided code (which doesn’t even check the port’s privilege status the right way BTW) but my choice was to remove it entirely and build my own RPMs. I’ll provide patches and fixed RPMs soon; bug me if I forget. Anyway, install whatever 3.1+ version you want and move on to the next step.
At first, you’ll want to create your volumes and start your server-side daemons the “normal” way, something like this.
# mkfs -t ext4 /dev/sdwhatever # mkdir -p /bricks/mycloud # mount -o noatime,user_xattr /dev/sdwhatever /bricks/mybrick1 # gluster volume create mycloud my.external.ip.address:/bricks/mybrick1 # gluster volume start mycloud
At this point, you would actually be able to mount the “mycloud” volume from any GlusterFS client, unless you have firewall issues (which you often will in a public cloud including Rackspace). Here’s one simple way to get around that, though I actually don’t recommend this method for reasons I’ll get to in a moment.
# netstat -lpn | grep glusterfsd # iptables -I INPUT -p tcp --dport 24009 -j ACCEPT
24009 is just the port I happened to get. Also, you’d probably want to make the IP tables rule more specific to the hosts you use, save it as part of your private config, yadda yadda. However, the reason you probably don’t want to do this is that it provides no security at all. Instead, you’ll want to run this through some sort of tunnel, and in that case, you’ll effectively be making a local connection to the server. Here’s how you’d do it with ssh.
# ssh -L24009:localhost:24009 email@example.com
The other part of this is avoiding some of Gluster’s port-mapping silliness. Grab the client-side “volfile” describing the mount from /etc/glusterd/vols/mycloud/mycloud-fuse.vol and copy it to whatever machines you’ll be mounting on. Edit the “protocol/client” part of the copy to look something like this:
volume mycloud-client-0 type protocol/client option remote-host localhost option remote-port 24009 option remote-subvolume /bricks/mybrick1 option transport-type tcp end-volume
Now your mount command can change something like this.
# mount -t glusterfs my.cloud.server:mycloud /mnt/mycloud # glusterfs way = wrong # mount -f ~/mycloud-fuse.vol /mnt/mycloud # cloudfs way = right
At this point, you have a remotely accessible directory with encrypted communication, but data on the remote server that’s still accessible to anyone who can get into that server. In other words, you’ll have something practically equivalent to most of the cloud-storage offerings out there – including the ones you pay for. What you need, both to preserve your privacy and to comply with many companies’ information security policies (such as Red Hat’s), is pure client-side encryption. That’s where CloudFS comes in . . . or at least part of it. At this point, you need to be a bit of a coder or fairly advanced sysadmin, because I’m going to make you pull the code and build your own RPM. I’ll provide some RPMs shortly; also, note that you specifically want the “aes” branch for now. To make it work for you, you’ll need to edit that client-side volfile again, adding a new “encryption/crypt” stanza at the end like this:
volume mycloud-crypt type encryption/crypt option key %0123456789abcdef0123456789abcdef subvolumes mycloud-client-0 end-volume
The key can be whatever you want, at 128/192/256 bits, and can also be stored in a file (use a first character of “/” instead of “%” in the volfile). One easy way to generate such a key is like this:
dd if=/dev/urandom bs=16 count=1 | od -tx1 -An | tr -d '\n '
In any case, the resulting volfile contains nothing but the two stanzas I’ve shown, in that order. All of the “performance” translators which can interfere with proper operation of the encryption code need to be stripped out.
At this point you should be able to mount using your custom volfile to have full encryption both on the network and on disk, with all of the keys safely on your own client system(s). Performance might not be all that great for some kinds of operations, but I’m using it myself and it seems adequate for most purposes. Some day I’ll finish the UID-mapping translator so that you can use this from different UIDs on different machines without getting permission errors, and I’ll finish the built-in SSL transport so you can connect directly instead of needing an ssh tunnel. Then it’ll be really cool, but you know what might be even cooler? With GlusterFS 3.2 you could even replicate this data across two different cloud providers, giving you all of that “multi cloud” goodness that has been all the rage since the EBS outage. With CloudFS 1.0 you’ll be able to do it an even better way, with better consistency and decent performance and so on. At that point I’ll seriously start to question the sanity of anyone who’s using some other solution that doesn’t offer the same performance and the same levels of data protection from both accident and intrusion, or which isn’t open-source like everything I’ve talked about.
For a long time now, I’ve been running a server in the Rackspace cloud to do various things for me, including a sort of Dropbox equivalent to which I sync various files I want to be accessible from everywhere. Historically this has involved a combination of sshfs and encfs, but it’s about time I started eating my own dog food and using (some parts of) CloudFS for this. Yes, I know some people don’t like the “dog food” terminology, but the oft-cited “drinking our own champagne” alternative is even less applicable in this case. I wrote it. It’s still dog food until I say otherwise. Even as I write this, I’m copying files from the old setup to the new one based on pretty vanilla GlusterFS plus the at-rest encryption translator from CloudFS, all mounted on my desktop at work. I’ll have to run that through an ssh tunnel for now – until I finish writing the SSL translator – to deal with authentication and in-flight encryption issues. Similarly, I need to finish the ID-mapping translator before I could recommend this for use by more than one person per machine. With those caveats, though, I’d still say that the result is usable and secure enough for my own purposes (including compliance with Red Hat’s infosec policies). If anybody else is interested in getting on the “personal CloudFS” bandwagon, I’ll post some detailed instructions on how you can do this yourself.
Extended attributes are one of the best kept secrets in modern filesystems. Here you have a fully general feature to attach additional information to files, supported by most modern filesystems, and yet hardly anybody seems to use it. As it turns out, though, GlusterFS uses extended attributes – xattrs – quite extensively to do almost all of the things that it does from replication to distribution to striping. Because they’re such a key part of how GlusterFS works, and yet so little understood outside of that context, xattrs have become the subject of quite a few questions which I’ll try to answer. First, though, let’s just review what you can do with xattrs. At the most basic level, an xattr consists of a string-valued key and a string or binary value – usually on the order of a few bytes up to a few dozen. There are operations to get/set xattrs, and to list them. This alone is sufficient to support all sorts of functionality. For example, SELinux security contexts and POSIX ACLs are both stored as xattrs, with the underlying filesystems not needing to know anything about their interpretation. In fact, I was just dealing with some issues around these kinds of xattrs today . . . but that’s a story for another time.
The sneaky bit here is that the act of getting or setting xattrs on a file can trigger any kind of action at the server where it lives, with the potential to pass information both in (via setxattr) and out (via getxattr). That amounts to a form of RPC which components at any level in the system can use without requiring special support from any of the other components in between, and this trick is used extensively throughout GlusterFS. For example, the rebalancing code uses a magic xattr call to trigger recalculation of the “layouts” that determine which files get placed on which servers (more about this in a minute). The “quick-read” translator uses a magic xattr call to simulate an open/read/close sequence – saving two out of three round trips for small files. There are several others, but I’m going to concentrate on just two: the trusted.glusterfs.dht xattr used by the DHT (distribution) translator, and the trusted.afr.* xattrs used by the AFR (replication) translator.
The way that DHT works is via consistent hashing, in which file names are hashed and the hashes looked up in a table where each range of hashes is assigned exactly one “brick” (volume on some server). This assignment is done on each directory when it’s created. Directories must exist on all bricks, with each copy having a distinct trusted.glusterfs.dht xattr describing what range of hash values it’s responsible for. This xattr contains the following (all 32-bit values):
- The count of ranges assigned to the brick (for this directory). This is always one currently, and other values simply won’t work.
- A format designator for the following ranges. Currently zero, but this time it’s not even checked so it doesn’t matter.
- For each range, a starting and ending hash value. Note that there’s no way in this scheme to specify a zero-length range, nor can ranges “wrap around” from 0xffffffff to 0.
When a directory is looked up, therefore, all the code needs to do is collect these xattrs and combine the ranges they contain into a table. There’s also code to look for gaps and overlaps, which seem to have been quite a problem lately. It doesn’t take long to see that there are some serious scalability issues with this approach, such as the requirement for directories to exist on every brick or the need to recalculate xattrs on every brick whenever new bricks are added or removed. I have to address these issues, but for now the scheme works pretty well.
The most complicated usage of xattrs is not in DHT but in AFR. Here, the key is the trusted.afr.* xattrs, where the * can be the name of any brick in the replica set other than the one where the xattr appears. Huh? Well, let’s say you have an AFR volume consisting of subvolumes test1-client-0 (on server0) and test1-client-1 (on server1). A file on test1-client-0 might therefore have an xattr called trusted.afr.test1-client-1. The reason for this is that the purpose of AFR is to recover from failures. Therefore, the state of an operation can’t just be recorded in the same place where a failure can wipe out both the operation and the record of it. Instead, operations are done one place and recorded everywhere else. (Yes, this is wasteful when there are more than two replicas; that’s another thing I plan to address some day). The way this information is stored is as “pending operation counts” with xattrs recording counts (each 32-bit) for three different kinds of operations:
- Data operations – mostly writes but also e.g. truncates
- Metadata operations – e.g. chmod/chown/chgrp, and xattrs (yes, this gets recursive)
- Namespace operations – create, delete, rename, etc.
Whenever a modifcation is made to the filesystem, the counters are updated everywhere first. In fact, GlusterFS defines a few extra xattr operations (e.g. atomic increments) just to support AFR. Once all of the counters have been incremented, the actual operation is sent to all replicas. As each node completes the operation, the counters everywhere else are decremented once more. Ultimately, all of the counters should go back to zero. If a node X crashes in the middle, or is unavailable to begin with, then every other replica’s counter for X will remain non-zero. This state can easily be recognized the next time the counters are fetched and compared – experienced GlusterFS users probably know that a stat() call will do this. The exact relationships between all of the counters will usually indicate which brick is out of date, so that “self-heal” can go in the right direction. The most fearsome case, it should be apparent, is when the xattrs for the same file/directory on two bricks have non-zero counters for each other. This is the infamous “split brain” case, which can be difficult or even impossible for the system to resolve automatically.
Those are the two most visible uses of xattrs in GlusterFS but, as I said before, there are others. For example, trusted.gfid is a sort of generation number used to detect duplication of inode numbers (because DHT’s local-to-global mapping function is prone to such duplication whenever the server set changes). My personal favorite is trusted.glusterfs.test which appears (with the value “working”) in the root directory of every brick. This is used as a “probe” to determine whether xattrs are supported, but then never cleaned up even after the probe has yielded its result. The result of all this xattr use and abuse is, of course, a confusing plethora of xattrs attached to everything in a GlusterFS volume. That’s why it’s so important when saving/restoring to use a method that handles xattrs properly. Hopefully, I’ve managed to show how crucial these little “extra” bits of information are and perhaps given people some ideas for how to spot or fix problems related to their use.
As the name implies, CloudFS is a file system for the cloud. What does that mean? First, it means that it’s a filesystem, with the behaviors that people – and programs – expect of filesystems, and not some completely different set of behaviors characteristic of a database or blob store or something else. Here are some examples:
- You access data in a filesystem by mounting it and issuing a familiar set of open/close/read/write calls so that every language and library and program under the sun that can use a filesystem can use this one.
- Files are arranged into directories which can be nested arbitrarily, without requiring the user to establish and follow some separate convention on top of a single-level hierarchy.
- The data model is a byte stream – not blocks, not records or rows – in which reads and writes can be done at any offset for any length.
- Performance and consistency for small writes in large files are not reduced to near zero by doing read/modify/write on whole files.
- Files and directories have owners, permissions, and other information associated with them besides their contents.
You might notice some things that are missing – e.g. locks or atomic cross-directory rename. That’s because most applications don’t rely on these features, often because they’re supported poorly or inconsistently by existing filesystems, and they’re things that anyone working specifically in the cloud should try to avoid. That last part, in turn, is because some features are impossible (or at least nearly so) to implement acceptable in the cloud. If nobody would be satisfied with the result anyway then trying is a waste of time – time that can be better spent implementing other features that really will be needed.
Many of the things that need a filesystem are not whole applications being developed from the ground up. using every possible filesystem feature. They’re libraries and frameworks that are used by other applications, and they just need basic filesystem functionality as I’ve outlined above. If you’ve constructed your application out of a dozen such pieces, the very last thing you want to do or should be doing is to dive into each and every one of these alien bits of code to make them use some other kind of data store instead. If you can build your entire application from the ground up to use something else that’s a better fit for your needs than a traditional filesystem, then that’s great. More power to you. Meanwhile, I believe that a much larger number of programmers would be better served by having a plain old-fashioned filesystem . . . albeit one that’s based on the latest technology.
OK, so much for the filesystem part. What about the cloud part? Aren’t there already filesystems you can use in the cloud? There sure are. In fact, I do exactly that myself all the time. It’s a fine thing. However, there are a few problems with doing things this way. One is that you have to manage the servers and their configuration yourself. Not only is that just one more burden as you’re trying to do Something Else, but it doesn’t allow you to take advantage of shared-service economies. Part of the cloud value proposition is supposed to be that when you aggregate resources across many users their individually unpredictable growth or bursts balance out (James Hamilton’s “non-correlated peaks” idea). The resulting aggregate predictability (plus the ability to amortize the cost of hiring real experts) allows things like capacity provisioning and monitoring to be done more efficiently in one place than if they had to be done separately by each user. Also, when you run heavy I/O within your computer resources you’re using the wrong tool for the job. It’s much better to run physical I/O on machines provisioned and tuned for that kind of thing than to run virtual I/O on machines provisioned and tuned for something else entirely. For all of these reasons, having the provider set up a filesystem as a permanent, shared resource is preferable to having each user set up their own . . . so long as it’s done in a way that preserves the users’ and providers’ needs. On the user side, you want to share between all of a user’s machines but you don’t want users reading each others’ data (let alone writing it). On the provider side, you need features such as quotas and accurate billing so users can actually be charged for what they use.
This brings us to the “why” of CloudFS: because sometimes people need a filesystem, because anything you put in the cloud as a shared service needs to meet certain requirements, and because none of the filesystems that might otherwise fit people’s needs meet those requirements. Without getting too deep into the details of exactly what features CloudFS providers to fill this gap – that will be my next post – I think it’s safe to say a big gap that needs filling.
Yesterday, I wrote about why there’s a need for CloudFS. Today, I’m going to write about how CloudFS attempts to satisfy that need. As before, there’s a “filesystem” part and a “cloud” part, which I’ll address in that order. Let’s start with the idea that writing a whole new cloud filesystem from scratch would be a mistake. It takes a lot of time and resources to develop, then it’s another long struggle to have it accepted by users. Instead, let’s start by considering what an existing filesystem would need to be like for us to use it as a base for further cloud-specific enhancements.
- It must support basic filesystem functionality, as outlined in the last post.
- It must be open source.
- It must be mature enough that users will at least consider trusting their data to it.
- It must be shared, i.e. accessible over a network.
- It must be horizontally scalable. Cloud economics are based on maximizing the ratio between users and resource pools, so the bigger we can make the pools the more everyone will benefit. Since this is a general-purpose and not HPC-specific environment, we need to consider metadata scalability (ops/second) as well as data (MB/second) and we should be very wary of “solutions” that centralize any kind of operation in one place ( distributing too broadly either).
- It must provide decent data protection, even on commodity hardware with only internal storage. I’m not even going to get into the technical merits of shared storage vs. “shared nothing” because the “fact on the ground” is that the people building large clouds prefer the latter regardless of what we might think.
- It should ideally support RDMA as well as TCP/IP. This is not strictly necessary, but is highly useful for a server-private “back-end” network and is becoming more and more relevant for “front end” networks as higher-performance cloud environments gain traction.
There are really only three production-level distributed/parallel filesystems that come close to these requirements: Lustre, PVFS2, and GlusterFS. There’s also pNFS, which is a protocol rather than a ready-to-use implementation and addresses neither metadata scaling nor data protection, and Ceph, which is in many ways more advanced than any of the others but is still too young to meet the “trust it in production” standard for most people. Then there are dozens of other projects, most of which either don’t provide basic filesystem behavior, aren’t sufficiently mature, and/or don’t provide real horizontal scalability. Some day one of those might well emerge and distinguish itself, but there’s work to be done now.
As tempting as it might be to go into detail about the precise reasons why I consider various alternatives unsuitable, I’m going to refrain. The key point is that GlusterFS is the only one that can indisputably claim to meet all of the requirements above. It does so at a cost, though. It is almost certainly the slowest of those I’ve mentioned, per node on equivalent hardware, at least as such things are traditionally measured. This is not primarily the result of it being based on FUSE, which is often maligned – without a shred of supporting evidence – as the source of every problem in filesystems based on it. The difference is minimal unless you’re using replication, which is necessary to provide data protection within the filesystem itself instead of falling back to the “just use RAID” and “just install failover software (even though we can’t tell you how)” excuses used by some of the others. Every one of these options has at least one major “deficiency” making it an imperfect fit for CloudFS, so the question becomes which problems you’d rather be left with when it comes time to deploy your filesystem as shared infrastructure. When you already have horizontal scalability, low per-node performance is a startlingly easy problem to work around.
The other main advantage of GlusterFS, compared to the others, is extensibility. Precisely because it’s FUSE-based, and because it has a modular “translator” interface besides, it makes extension such as we need to do much easier than if the new code had to be strongly intertwined with old code and debugged in the kernel. Many of the libraries we’ll need to use, such as encryption, are trivially available out in user-space but would require a major LKML fight to use from the kernel. No, thanks. The advantages of higher development “velocity” are hard to understate and are already apparent in how far GlusterFS has come relative to its competitors since about two years ago (when I tried it and it was awful). If somebody decides later on that they do want to wring every last bit of performance out of CloudFS by moving the implementation into the kernel, that’s still easier after the algorithms and protocols have been thoroughly debugged. I’ve been developing kernel code by starting in user space for a decade and a half, and don’t intend to stop now.
At this point, we’re ready to ask: what features are still missing? If we have decent scalability and decent data protection and so on, what must we still add before we can deploy the result as a shared resource in a cloud environment? It turns out that most of these features have to do with protection – with making it safe to attempt such a deployment.
- We need strong authentication, not so much for its own sake but because the cloud user’s identity is an essential input to some of the other features we need.
- We need encryption. Specifically, we need encryption to protect not only against other users seeing our data but also against the much more likely “insider threat” of the cloud provider doing so. This rules out the easy approaches that involve giving the provider either symmetric or private keys, no matter how securely such a thing might be done. Volume level encryption is out, both for that reason but also because users share volumes so there’s no one key to use. The only encryption that really works here is at the filesystem level and on the client side.
- We need namespace isolation – e.g. separate subdirectories for each cloud user.
- We need UID/GID isolation. Since we’re using a filesystem interface, each request is going to be tagged with a user ID and one or more group IDs which serve as the basis for access control. However, my UID=50 is not the same as your UID=50, nor is it the same as the provider’s UID=50. Nonetheless, my UID=50 needs to be stored as some unique UID within CloudFS on the server side and then converted back later.
- We need to be able to set and enforce a quota, to prevent users from consuming more space than they’re entitled to.
- We need to track usage, precisely, including space used and data transferred in/out, on a per-user basis for billing purposes.
This is where the translator architecture really comes into its own. Every one of these features can be implemented as a separate translator, although in some cases it might make sense to combine several functions into one translator. For example, the namespace and UID/GID isolation are similar enough that they can be done together, as are a quota and usage tracking. What we end up with is a system offering each user similar performance and similar security to what they’d have with a privately deployed filesystem using volume-level encryption on top of virtual block devices, but with only one actual filesystem that needs to be managed by the provider.