Laxonomy as a file system…

It’s been a little bit since I’ve posted anything of substance. Work’s been a bitch and I’ve been trying to polish up an article for publication with my free time. At some point I’ll post the that pain in my ass, but in the meantime, I’ve come up with a devilishly simple scheme for an application of my paper’s laxonomy idea. Here’s the very long version…

Most files don’t have much metadata: timestamp, permissions, MIME type, etc. Music, on the other hand, inherently has a ton. As such, mp3s got something called id3 tags built right into the standard to allow you to enter in as much track information as you want — artist, album, track number, genre, year…yada…yada…yada. It was a real pain in the ass to fill all of that out. Thanks to p2p downloading, others often filled it out for you (file sharing as folksonomy?), or it was pulled from a database as you ripped your discs onto your computer.

It was probably around 2002 when WinAMP introduced their Media Library feature. What this little piece of software did was take your mp3 collection and index each file’s associated metadata. With all this id3 tag information, you could slice and dice your music collection in any way imaginable. As my data set grew exponentially, it was almost euphoric to be able to query my tracks like that. I could find anything…fast. Even better, I could set up complex rules for virtual folders that would dynamically maintain my playlists for me. It was like GMail for music — pre-GMail; and better.

In their zeal to emphasize search, Google stopped short of the holy grail of organization — “search folders” or “virtual directories” or whatever you want to call them. Sure, you could use labels (tags! when will they get over themselves and just call the damn things tags like everybody else!), but in their implementation labels are flat (no hierarchy) and isolated (no pairing). Before I ever used GMail, this was a feature I naively assumed they’d included; how could they miss this when they so clearly nailed everything else? They just started adding in folders (in their Documents & Spreadsheet product), but we’re still talking about real hierarchy; rigid hierarchy; a step back. Still no flexible, tag-based artificial hierarchy that strikes me as so…Googley.

Music, email, ehhh. These are just specialty, niche files. What’s been taunting me for years now is a way to manage all my files this way. It doesn’t strike me as particularly hard — except to implement it you would have to write a whole new file system; to implement it in any meaningful way, you’d have to write whole new shell hooks to allow users to create and manage this arbitrary file metadata…

Well, around 2004 Microsoft started talking up what it termed WinFS — it was going to ship with Longhorn, and it was going to change the world. It was a “relational” file system — files were stored in a database, and it would allow you to manage them as such. I read this as a play to bring me what I’ve craved for so long, what I described above. The more I learned, the more I learned I was wrong — it wasn’t revolutionary at all, nor would it be released with (the newly renamed and stripped down) Vista. I was bitter. Actually, bitter enough to start looking a lot closer at trying out Linux…

I’m glad I did. And not just because I like the OS better — it opened up a whole new world of fantastic applications. There were even dozens of competing file systems to choose from. So of course, I start reading up on these file systems. Maybe one of them has these simple features I crave. Of course, I’m still not sure how I would be able to apply metadata to a given file — there would need to be some sort of GUI, I imagine — but hell, Gnome has something called “emblems” you can apply to files, so maybe this is what they’re for. It didn’t take long to come to the painful conclusion that no, emblems are not tags; they’re little more than eye-candy for the file browser. And worse, there’s still no file system that will let me make a folksonomy out of my home directory. So Linux just won’t cure what ails me (though symbolic links are a breath of fresh air, and could allow me to do some of this manually). At least I can still interoperate nicely with Windows file shares using Samba — a project that’s been around since 1992 but I’d just come to learn about…

I’ve been using Samba now for a year, almost every day to get to our Windows file shares at work, and to get to my Linux installations from Windows. It’s so damn simple I don’t even think about it: it’s just native and intuitive. Just recently it occurred to me that a simple modification to the Samba daemon could give me everything I was hoping Microsoft, Google, anybody would build right into the OS.

Why it took so long for me to realize I have no idea: how hard could it be to hack Samba to — instead of serving up a real directory tree — just emulate a folder tree for a given share, based on a metadata index of the files. Tags could be applied to the files by adding them to the notes section of the file. Of course, they could also be applied by copying and pasting the file to a different “tag” folder in the virtual hierarchy. The file wouldn’t actually get copied — once the modified Samba share received the command to write a file, it would first check the md5 sum — if it’s a new file, it gets saved and tagged. If the md5 sum exists, as in the copy/paste scenario, it would just get a new tag.

Hell, each search folder could be a library to itself — even taking advantage of new visual stimulae like Apple’s Cover Flow to help users find what they want with a finer-honed instrument once they find where they can find what they want.

Any number of additional helper applications could be envisioned — for instance, shell hooks — in Windows like TortoiseSVN — to tag any file in one of these modified Samba shares. Perhaps a simple web application on the Samba server to help manage all of these files more easily, as well as associated permissions (of course, permissions could just be tags too) and any number of other tasks.

Speaking of other tasks, while a “folksified” file system would solve a lot of problems in both an individual and group setting, there are a whole host of benefits you realize right away. For instance, versioning. Right now versioning is a pain. If you had a virtual file system, the daemon could just save every copy and keep track of the current version, like a wiki. Or, through a management interface, you could set it up keep logarithmic backups (all copies from this week, one from last week, one from last month, year, etc.). Of course, throw a nice diff application in the mix and you’ve got something pretty useful. Hell, why not a diff application for every file type that can be diffed, even images — no problem.

Why not a spotlight or beagle-like app to get the full-text index of every app — now you’ve got further points of comparison in your index, just another way to let the users of this file share slice and dice the files. Trying to go paperless and have quite a few scanned images on your share? Why not throw OCRapus at them and now you’ve got a full-text searchable, selectable index plus the original image, neatly contained in a single “document” — with versioning.

All the benefits of an expensive content repository like a Documentum would be simple. And yet, the beauty of such a system is that current applications don’t even have to be modified; everything will run just as expected on any Windows file share. Even to add useful metadata to the files, you wouldn’t need any junk software or to even kick off a web browser. Once you have this virtual file share, there are all kinds of great software ideas that could make it more useful, but the best part is, it’d still be so incredibly useful without any of this. Just a user and a file browser and a mouse. It’s so…painfully…simple!

One Response to “Laxonomy as a file system…”

  1. rutger Wessels Says:

    Tag based filesystem is a gread idea. I am considering creating one using either Fuse (http://fuse.sourceforge.net) of webdav.

    Samba wouldn’t be a good place to implement it. It is a transportlayer, not a storage. Using Fuse, you can create your own file systems so Samba can run on top of that.

    Webdav is a web based solution. But webdev can be mounted in Windows so it is probably an option to.

Leave a Reply

Rambling semi-coherently since 2006…

Close
E-mail It