POSIX Limitations in FUSE

Because GlusterFS (and thus CloudFS) is based on FUSE, people often bring up the issue of FUSE limitations. Most often their concerns are about performance, and I think those concerns are a bit misplaced. Modern versions of FUSE do quite well, and Sage Weil even points out that some FUSE filesystems such as PLFS significantly outperform their native cousins for many important workloads. FUSE performance is fine, especially for a horizontally scalable distributed filesystem. As Sage also points out, though, there are some functional issues with FUSE. I think he overestimates the importance of integrating tightly with the kernel for memory management and cache coherency, but the problems are there and so I think it’s worthwhile to understand where the “rough edges” are. This post is my attempt to put together a list of ways in which FUSE filesystems might violate either POSIX standards or people’s expectations based on those standards. Here are the first few things that come to mind. Many thanks to Anand Avati @ Gluster for re-explaining some of these to me, providing updated information about others, and helping to fill out the list.

  • Shared writable mmap
    This is the best known FUSE limitation, because it really does bring the coherency and memory-management issues to the fore. Nonetheless, versions of FUSE since Linux kernel 2.6.27 do support it. I happen to believe shared writable mmap is something you shouldn’t be doing on a distributed filesystem because it will never perform well and introduces some extremely nasty fault-handling problems, but for some people it’s still a real issue and not solved in the versions of FUSE that they have.
  • Atomic rename
    This is more of a distributed-filesystem issue generally, but there is a FUSE component to it as well. In a nutshell, the problem is that POSIX requires that if a file exists at a certain path before a rename, then users must be able to open that file at any point even if it’s the subject of a rename. Unfortunately, since FUSE uses a handle-based interface, the open is actually in two parts – first a lookup to get the handle, then the open itself. The file could be renamed away in between, causing a POSIX-violating failure on the second part. This is really hard to address without speculatively locking the entire directory, which is just nasty in a whole bunch of ways.
  • Forgetting inodes
    Kernels have the right to evict inodes from their caches under pressure, but this can introduce a problem if the inode evicted on a server is still in use on a client. The result is a spurious ENOENT error on the client. Again, FUSE has actually addressed this – long before the mmap fixes, in fact – with some callbacks to notify user space, but these callbacks are not widely used and GlusterFS specifically doesn’t have those hooks yet. NFS doesn’t always handle these cases well either, by the way.
  • O_DIRECT
    OK, this one’s not POSIX, but still. FUSE actively filters out O_DIRECT flags, for reasons that escape me at the moment. Gluster has a FUSE patch that will turn O_DIRECT into something else that FUSE does support and that’s nearly equivalent, and just yesterday Anand Avati submitted a second patch that is even more fully integrated with the rest of how FUSE works, so maybe soon people won’t have to choose between stock FUSE and FUSE that supports O_DIRECT.
  • Now, I know this list is incomplete. Are there any other areas people can think of where FUSE filesystems can’t do things that in-kernel filesystems can? Please let me know in the comments so we can have a comprehensive list and point people to it when they ask.

 

2 Responses

You can follow any responses to this entry through the RSS 2.0 feed.

Both comments and pings are currently closed.

  1. Jeff Darcy says:

    The folks in #gluster immediately pounced on another FUSE issue: auxiliary GIDs. There’s just no place for them in the FUSE API, so Gluster had to work around the limitation by reading /proc/$pid/status and parsing the result into a list. I guess it works, but it’s certainly less efficient than how a kernel filesystem can deal with the same issue.

  2. Stef Bon says:

    FUSE not aware of inotify.

    Setting a watch on a FUSE filesystem works, and changes are reported as long they are performed on the mounted fs.

    But when something changes in the backend, which can happen when it’s shared, it’s not reported. This is due to the fact that FUSE is not aware an inotify watch is set on an inode. When FUSE “knows” that, it can watch the corresponding item in the backend and react on that, and make the VFS aware something has changed.