Imagine SocialFS…
I’ve been kicking around the idea of an web-app file system for quite some time, but I haven’t been able to really find the time to work up a head of steam on any implementation details. Now that I’m revisiting the idea for a graduate class, I keep finding little pieces of this vision I’ve had slowly creeping out, like this recent WWD article — Data Portability and the File System.
A great example of what I mean by a web-app file system is flickrfs, which is really only a couple of python scripts that glue the Flickr API to a FUSE implementation. It allows a user to mount their Flickr account as a local file system, and interact with it like it’s a just another thumbdrive. While this is a clever hack, its tight coupling to Flickr means it’s certainly not the answer to data portability or the Open Web, but it’s a useful demonstration of what could be…
Imagine for a second that on your desktop were another thumbdrive-looking volume, but this one’s got some special sauce. Mounting it just requires pointing to a resource (e.g. a personal domain: http://home.deanlandolt.com/), but at this location sits a pretty simple piece of software that, depending on how you attempt to connect, will expose its contents accordingly. You could drop files and folders in there, and they would be available to anyone that could mount this share (OpenID for authentication, naturally). Of course, this is still nothing special, and has been done to death. But let’s take this a step further and say that if you drop specific things in there, like a bunch of vcard files, what comes out is more than files in this filesystem — they are parsed into objects (in this case, Contact objects) — and as such, can be exposed in any number of ways. As vCards, obviously, but also as hCards, a microformat based on vcard. But the Contact class can go a lot further than that, and expose each Contact object or group of Contacts as XFN, another microformat, or FOAF, if you’re more the more academic type.
This metaphor can be extended in various directions with interesting results. For instance, other base data types can be ripped strait from the microformats world: calendaring data (hCalendar, ical, etc.), CMS data (hAtom), location information (addresses, geodata, all the way to the various metadata schemes for image, audio, and video formats. By leveraging the great work of the microformats and data portability communities, we can distill down the easy bits into a standard class library of sorts and accommodate the complex through extensions.
Of course, this can’t be your average run-of-the-mill file system, but just because we gain special properties for certain file types (a more descriptive data model, file type polymorphism, free text indexing even) doesn’t mean we can’t accommodate generic file system resources (plain old inodes — files and folders) as well. Of course, handler extensions can always be added to further enhance a given object type. But even without special handlers, regular folder and file types can gain interesting properties through associable mixins. When associated with a Tag object, any resources, including generic files, can be tagged. What do we end up with? A file system that supports search folders and tag clouds and all that wonderful stuff promised and never fulfilled by the likes of WinFS.
Tagging is old hat in web apps these days, but file systems are still way behind the times. But the Tag class could even be extended to support more exacting arbitrary groups. Of course, it’s not the only mixin that could usefully extend otherwise-boring files and folders. Another great mixin would be a Revision class which, when associated with any given file, would automatically track revisions, so if you were to Revision a specific Microsoft Word document, you could finally give Word version control that doesn’t suck. But when combined with a folder, Revision becomes something much more potent: a virtual version control repository, tracking and versioning all contents in the folder’s hierarchy. It’s not too hard to imagine emulating the common protocols for interfacing with version control systems, allowing a developer to utilize their normal tool chain.
But Revision doesn’t just add magic to boring files and folders — it can be associated with more robust resources as well. Imagine a Page object — a simple base class to represent a given web page. Mix in a little revisionism and what do you get? A Wiki object. Well, a wiki-like page without comments enabled. Which means we just need a Comment mixin as well. Now that we have Comments, we can mix it with a Page object and we have a Post object. All of the sudden we have the data model for a full-fledged content management system. Through the magic of microformats like hAtom we can also fall ass-backwards into many of the other CMS features like syndication as a bonus. All that’s left would be some pretty templates and localization (sounds like another mixin). Interestingly, a lot of this — especially revisioning and localization, has already been hashed out by Pagoda (or whatever they’re calling it these days) — a TurboGears CMS based on SQLAlchemy.
I’d like to try to do the same thing, just with a much wider problem space. It may sound a bit ambitious, but it certainly doesn’t seem all that difficult, especially given all the overlap that exists between web application domains. And wouldn’t it be worth it?
Imagine that your home folder (or if you’re a Windows user, the contents of your Documents and Settings folder — where My Documents hangs out) were just a pointer to one of these file stores? Imagine if some of your web applications started using it (APIs make this less important as flickrfs shows). Better yet, imagine if some of your desktop applications started using it (special file handlers can also render this moot as well).
Even if such a proposition were possible, we wouldn’t need a Data Bill of Rights.
I’ve created a SocialFS project on Google Code if you’re interested — I don’t have any code yet but I’ll make some time soon. (Yes I understand the irony of using Google Code, the very kind of service I’m explaining how to usurp. Getting the project far enough along to be self-hosting should be an early priority.)
May 2nd, 2008 at 1:59 pm
[…] you read through this whole screed, you still may be left wondering: why the name […]
May 21st, 2008 at 5:44 pm
[…] http://deanlandolt.com/archives/205 […]
June 24th, 2008 at 4:26 pm
One of the problems that I see with traditional file systems, and one that limits their scalability and limits their use as a foundation for what you are proposing, is that they are essentially containers. Files in these systems can exist in only one place. This is partially remedied by things like aliases or links, but the fundamental problem remains. What is needed is a file system that treats folders (to use a common metaphor) as classifications, and allows a file to be placed in any one of them that makes sense to the user of the system. Tags are something like this, but backwards.
The other (related) fundamental limitation is the hierarchical nature of the folders. Information is not really organized that way; it is just a convenient computer hack to do so. It represents a familiar metaphor, but one that ignores the available technology, imposing artificial limits on how one can organize the stuff kept on a computer or on the web. The web, with it’s hyperlinks would need to be “flattened” to fit within a traditional file system.
I’m not saying that file systems are are the wrong foundation, just that until legacy file systems can deal easily and naturally with web-shaped data representations and the appearance of a file in multuple arbitrary categories, a flexible mountable SocialFS will be a forced fit.
BTW, consider the (categorization) folder system as a way to selectively provide visibility and access.
There is a lot more to say about versioning — I like your approach there — and actual implementation — like keeping a single copy that may be visible in multiple folders — and all the privacy and access control, data portability (and how that builds out). I’ll try to get back to this with some expansions.
This is a semi-coherent rant about a strongly held belief of mine that the file system people need to move on to better ways of organizing data than was available in the simplers days of Unix, CPM, etc. I could be wrong.