User Space Filesystems

Apparently Linus has made another of his grand pronouncements, on a subject relevant to this project (thanks to Pete Zaitcev for bringing it to my attention).

People who think that userspace filesystems are realistic for anything but toys are just misguided.

I beg to differ, on the basis that many people are deploying user-space filesystems in production to good effect, and that by definition means theyre not toys. Besides the obvious example of GlusterFS, PVFS2 is almost entirely in user space and it has been used to solve some very serious problems on some seriously large systems for years. Everything Linus has worked on is a toy compared to this. There are several other examples, but that one should be sufficient.

So where does Linuss dismissive attitude come from? Only he can say, of course, but Ive seen the same attitude from many kernel hackers and in many cases I do know where it comes from. A lot of people who have focused their attention on the minutiae of whats going on inside processors and memory and interrupt controllers tend to lose track of things that might happen past the edge of the motherboard. This is a constant annoyance to people who work on external networking or storage, and the problem is particularly acute with distributed systems that involve both. Sure there are inefficiencies in moving I/O out to user space, but those can be positively dwarfed by inefficiencies that occur between systems. A kernel implementation of a bad distributed algorithm is most emphatically not going to beat a user-space implementation of a better one. When youre already dealing with the constraints of a high-performance distributed system, having to deal with the additional constraints of working in the kernel might actually slow you down. Its not that it cant be done; its just not the best way to address that class of problems.

The inefficiency of moving I/O out to user space is also somewhat self-inflicted. A lot of that inefficiency has to do with data copies, but lets consider the possibility that there might be fewer such copies if there were better ways for user-space code to specify actions on buffers that it cant actually access directly. We actually implemented some of these at Revivio, and they worked. Why arent such things part of the mainline kernel? Because the gatekeepers dont want them to be. Linuss hatred of microkernels and anything like them is old and well known. Many other kernel developers have similar attitudes. If they think a feature only has one significant use case, and its a use case they oppose for other reasons, are they going to be supportive of work to provide that feature? Of course not. Theyre going to reject it as needless bloat and complexity, which shouldnt be allowed to affect the streamlined code paths that exist to do things the way they think things should be done. Theres not actually anything wrong with that, but it does mean that when they claim that user-space filesystems will incur unnecessary overhead theyre not expressing an essential truth about user-space filesystems. Theyre expressing a truth about their support of user-space filesystems in Linux, which is quite different.

A lot of user-space filesystems -perhaps even a majority really are toys. Then again, is anybody using kernel-based exofs or omfs more seriously than Argonne is using PVFS? If you make something easier to do, more people will do it. Not all of those people will be as skilled as those who would have done it The Hard Way. FUSE has definitely made it easier to write filesystems, and a lot of tyros have made toys with it, but its also possible for serious people to make serious filesystems with it. Remember, a lot of people once thought Linux and the machines it ran on were toys. Many still are, even literally. I always thought that broadening the community and encouraging experimentation were supposed to be good things, without which Linux itself wouldnt have succeeded. Apparently Im misguided.