My rant on the state of the computer interface


WARNING: This is probably going to be a really long, angry, and potentially upsetting rant, involving an issue I have been interested in and done various bits of development on in the past.

A long time ago in a place about 8 hours south of me, some engineers from Apple went to visit an office in Palo Alto and saw three things: Smalltalk, Ethernet, and the Alto. This is a historically significant event in that it drove Apple to notice the first two and kind of play with them in their future products, and to consumerize the third. I think the first mistake was decoupling the first thing, Smalltalk, from the concept of the third, the GUI, and I think this decision has left interface development in a bit of a rut.

I'm going to lead with a bit of theory and then drive into a lot of examples. My theory is going to be based on some assumptions on my part which I imagine others will not agree with:

  1. As a basic concept the UNIX philosophy works very well
  2. One interface for a particular system (IM, Mail, etc) does not suit all tasks using that system.
  3. Almost all uses of the computer interface involve visualization and manipulation of information
  4. Organizing the information I intend to visualize or manipulate by metadata is a better way of organizing data then a single name.

While not exclusively necessary to buy into my argument the following issues probably play a role:

  1. Though people may not want to believe it, computers still have limited resources
  2. User-friendly is too often an excuse to take capabilities away from those that can make use of them
  3. Obligatory configuration is the root of many evils.

With that being said, I'll touch on various things I've seen, worked on , or been involved with. My first target is going to be a project I became involved with in 2005 whose architecture exemplifies what I think is a better way of handling interface development and organization of data. The project specifically is XMMS2. XMMS2 is a media library managed music player that provides a daemon, xmms2d, which various clients can connect to and manipulate. The first right decision made was to abstract functionality into clients. If for example, one wanted to collect album art for ones music collection this functionality would be handled by a client which would write to the medialib instead of by a plugin to the daemon. Currently this functionality is being developed further with the concept of Service Clients. Service clients open up a whole new world of functionality by allowing clients to issue method calls to any client which registers a service and set of methods. This kind of functionality is why we jokingly refer to XMMS2 as the music-playing microkernel. What this means though is that we get right at my second thesis and at my first and second personal views. The way in which I work with XMMS2 is really set around what exactly I'm trying to do. When I want to browse my music collection I would prefer something visual where I could walk through the collection and search it in various ways, but when I just want to pause a song I can either have a simple playback control client or I can use the command line client and simply issue xmms2 pause at any terminal. I can even do it from a python interpreter if I for some reason need to:

import xmmsclient
xc = xmmsclient.XMMSSync()
xc.connect()
xc.playback_pause()

Which leads right into the second big advantage of the open interface design of XMMS2. I can integrate XMMS2 into any other system by nature of its design instead of having to hack some kind of remote interface like so many older applications have or hoping that the developers of an application provide a usable API with dcop, or dbus, or whatever the trendy remote interface standard is today. The API is used by anyone developing a client and so it has a much higher standard of usability then a secondary API tacked on to provide ability to put the current song you're listening to in an email or a blog post.

The client-server design and open common API are not the only advantages that XMMS2 provides. The media library in XMMS2 has two amazing innovations:

  1. Use of a sourced entity-attribute-value model to which any client is allowed to write
  2. The Collections organization and querying system

XMMS2 indexes metadata off of media into a sqlite database whose Media table schema is something like: id, key, value, source. This is basically a Dictionary hash table, or associative array depending on your language of choice. There are a few disadvantages off the top of my head:

  1. Querying this is slow because in SQL it requires a lot of self JOINs on the Media table
  2. Typing of the value is difficult and ominous at best

The advantages are much more. Firstly, any sort of data can be indexed and made available to the user, such that a user has the ability to arbitrary tag their music in their own way. A user might decide to make a list of people_who_like a certain song, and then use this information later to build a playlist for a party based on whose attending. A playlist that they think most of their friends in attendance will be happy with the selection. More importantly it reduces development headaches in that we don't need to provide an upgrade path for the database every time we decide that we want to index a new tag property from the media files, and also so that we don't end up with a massive list of columns.

Onto our second friend, the Collection. Collections provide a way to organize music by certain constraints and then later query for a list of matching songs. More details about the specifics can be read in the article linked above which gives lots of pretty examples of the capabilities that this functionality provides. What's really more important is the basic idea of allowing arbitrarily constrained collections of metadata which can be linked together and saved, and then queried later. The results are thus always up to date when a query occurs.

Having gone into how I think XMMS2 does the Right Thing (tm), I'd now like to discuss some other systems some of which also do the Right Thing (tm), possibly in a different way, and some of which try to do it, but fail by being too unwillingly to break down the barriers that have been built up so strongly in the past.

The basic concept of cross-application integration has been very popular in the recent past. Apple highlighted the fact that Mail.app could tell you if the person to whom you were writing an email was on AIM at the time you were writing that email. The GNOME crowd quickly picked up on this as some kind of killer feature and offered a large bounty to see it implemented between Evolution and gaim. Initially such functionality was done by providing a simple gaim-remote command line tool and library which could be used to control certain specific functions of gaim in a way that facilitated this exact functionality. More directly the plan wasn't really to open up gaim's interface, but rather just to make this one feature possible. It's really an example of missing the big picture of cross-application integration and instead only picking up on already developed use-cases for it. Apple was most likely able to implement this functionality by using OpenStep's Distributed Objects functionality which allows the exporting of objects over some kind of channel so that other applications can manipulate them. (I assume this is how it was done since the functionality was already there and it seems silly to reinvent the wheel in this case.) NeXT had used this same functionality for collaboration tools where one user could watch another developer working transparently over the network. This functionality was used in the development of Doom and Quake.

Mind you, I'm not championing NeXT and certainly not Apple. I think that Apple while having the ability to really open up interfaces and create a integrated overall interface has focused too much on separate abstract applications, and consumer friendliness (eye candy, music, video). This was not always the case though. I consider the Apple Newton to be a champion of the kind of interface for which I'm calling.

What's so special about the Newton? The Newton actually is amazing in two ways:

  1. Integration of objects between the various tools on the Newton
  2. Not so novel ways of managing and visualizing information that seem to be avoided or forgotten

The Newton didn't use files for storing data but rather had an object database which allowed objects to reference each other and store fairly arbitrary data. This made it easy to extend the address book by adding new properties which could be made useful by third party add-ons to the builtin applications. This also allowed objects to reference each other, so an address book entry could be included in a note such that it linked directly to the address book instead of just including the information about the person. This way modifying the information in the note could potentially change it in the address book, and changes in the address book would be propagated back to the note. This is the kind of functionality that should really be focused on when the issue of cross-application integration comes up. More then just killer-features it should, nay, needs to be ubiquitous: something that's trivial for a developer to use, and hopefully accessible enough that a normal user has this functionality available to them with no or minimal programming skills.

The second issue is also rather cool, but nothing special. Similar functionality was presented in Doug Engelbart's Mother of all Demos. In particular the issue being referred to is the structural nature of lists in Newton's Note program. The idea of being able to collapse, reorganize, and re-visualize list items is a really worthwhile feature. Engelbart demonstrates some really cool functionality such as changing a hierarchical list into a flat list, and back again. He even demonstrates visualizing that list as a tree, and jokingly ends the demonstration of the list, which in this case was a shopping list, with a presentation of line drawn map of the path he would have to take home to pickup all the items he has on the list. What was a joke in 1968 (I repeat 1968) is very possible today. By applying an address property to certain list headers it would be fairly easy to provide a Google Map view of the path one would have to take to get home.

Diverging into the Mother of All Demos, it seems clear that many of the ideas presented in this presentation were lost in favor of fawning over the Mouse and video conferencing. Engelbart's focus on dealing with "wicked problems" would have pushed computer development into an entirely different arena where the focus was on tools that were more general and could be put together to deal with specialized problems, instead of focusing on the development of applications to address specialized problems. The tool architecture is something that was heavily utilized in UNIX development but has had little appearance in the world of the graphical user interface. There are a few (partial) exceptions to this: Plan 9 being a classic example, but in general the UNIX philosophy itself has even begun to break down in the development of CLI tools. A classic complaint about git is its toolkit design, which amounts to over 100 separate commands. The basic concept being that a certain set of them (the plumbing) can be strong together to create the actual functionality of git. For example the commit process is actually the combination of git-write-tree which writes the index to a tree object followed by git-commit-tree which writes a commit object to the repository which references the tree object (There is slightly more involved in the actual process done by the git-commit tool, but that is the basic principle.) This tool based approach is seen as a drawback with constant citations of how it's difficult to use and not user friendly. Those that make this claim are missing the point. Scripts (porcelain in git speak) exist to consolidate these processes for the user, but the toolkit approach allows a level of power and flexibility that are impossible if the functionality is internalized into a closed interface.) I'd go further to say that the closed interface makes it harder for the user to actually do what they want to do.

The entire user-friendliness argument amounts to what I consider a general disrespect for the user. It basically says that the user is not smart enough to learn, or that powerful functionality is usable by such a small subset of people that use an application that it amounts to not being worth having. This is why the open interface is a truly superior approach: generic cases can be easily be built on top of the open interface so as to make for something that's generally considered easy to use, but the interface becomes immediately open to anyone that wants to just play with it. I'm going to amazingly cite an example provided by Microsoft that follows along such a line.

Microsoft's Windows PowerShell exports much of the .NET classes to a user accessible shell with a simple scripting language. This is amazing in a lot of ways, specifically it allows a moderately competent user to experiment with the .NET classes and interfaces from a fairly non threatening environment (no need to compile, instant error notification, etc), and thus gives them more power over their computers functionality and capabilities. Some examples of this power presented directly to the user are:

PS> $rssUrl = "http://blogs.msdn.com/powershell/rss.aspx"
PS> $blog = [xml](new-object System.Net.WebClient).DownloadString($rssUrl)
PS> $blog.rss.channel.item | select title -first 8

Here access to the .NET objects are used directly to download a string and then cast it as an XML object which is then queried with a simple syntax for querying. This is functionality that an eager user could learn to do, or that could easily be scripted, providing a function that provides a title and URL for the top eight blog posts on a RSS feed provided by the user. This could even be pushed into a graphical application which presented the information as a list view. Immediately power is made accessible to a user that is willing to try and learn, with access to millions of examples all over the Internet which they can adapt to better suit their needs (I learned to program by a similar process). If interfaces are locked up and hard to work with and learn then we end up with people less likely to try and experiment with them and we get the kind of people that we always fast "Joe Blow" user as. If you assume people are stupid, you don't give them a chance to change that.

This has mainly been about my first issue, that toolkit based design is more adaptable and provides more power directly to the user and also allows for better integration of functionality. I think that by approaching software development from the first issue, the others can in some ways fall into place but I would like to address them more directly.

I will address my second issue by giving a use-case situation that I hope will point out a common issue. I'm looking at a website for example Portland, Oregon on Wikipedia, and I see this table of temperature and rainfall data. I personally would rather see a graph of this data, as I visualize this sort of thing better graphically. I'm presented with a few options. If I'm on windows I might cut and paste the data into excel and tell it to spit out a graph, on *nix I would load up gnuplot and input the data. It's clear that the ability to visualize this information in different ways is desirable and while I have viable options to deal with this problem, both of them are time consuming (one more so then the other), and require that I make decisions about the data to be graphed. HTML is a semantic format, the table is actually marked as a table and so the ability to convert a table of numbers into a graph (with some best guesses made about how to group the data and the ability to be more specific if necessary) could easily be implemented with open access to the table DOM object. In general these issues arise often, with various methods of presentation being better suited for different people and different situations and too often we focus on unified presentation instead of specialized presentation. As discussed before lists are another common structured format that would be suited to various forms of visualization depending on person and situation.

The computer is an information manipulator. It's designed to organize and access information. It should be thought of that way. Playing music on a computer is an example of this, much more so then a stereo is a information manipulator. Most media library music playing tools are focused on presenting the organization of your media instead of the actual playing of that media. Playlists exist to better organize the music, cover art exists just as much to ease recognition of music as it does to provide eye candy. These capabilities are often ignored or missed. The simple office mailbox has none of the sorting or organizational capabilities provided by even some of the most basic mail programs. With that in mind, those organizational capabilities should be a focus of computers. With the emphasis on relations between information and on providing information and grouping that is better suited to the person whose information is being organized. This is paramount, just because it works for me, doesn't mean it will work for you, so the focus should be on providing a system so dynamic that it can be adapted for whomever is opting to use it.

Filenames don't cut it, and embedded metadata is barely (and often not) enough. Filenames require you to essentialize all of the properties you associate with a piece of information into a fixed string. While possible to serialize various properties into the filename it makes for something difficult to query and to identify ultimately. Instead it makes more sense to catalog the metadata for various bits of information and allow it to be queried in various ways. This capability should necessarily be integrated wherever possible so that when looking at an email you can quickly jump to other emails by that sender or other information with a relation to the sender. Capabilities should exist much like XMMS2's collections to store constraints to be used later, or build them on the fly, and all of the various operators should fit together so that an extra level of filtering can be applied on top of the current one. The current searching systems I've seen (spotlight, beagle) don't really go far enough, and their functionality is rarely available outside of the specific applications. Apple has been making strides to shoehorn (and I use that term with a heavy hand) spotlight into various other parts of the OS, but nothing that's exactly amazing, and in general I think spotlight is too oriented around searching by file and a subset of available file metadata.

I probably upset someone. Sorry about that, maybe it's time to consider that what was revolutionary twenty years ago is tired by now, and that the foundation on which work is done itself is flawed. I had initially intended this article to be a call for ideas, help, and support on some stuff I have sitting around, but I decided that it would be better to present a manifesto of just what is wrong, and what possibilities there are to fix it. I want this manifesto to be thought of as a working draft of the possibilities for the future instead of a rock solid statement on the way things should be done. I will post about some of the projects I am working on that follow this general philosophy later.


published at Thu Jun 21 18:30:24 2007 (-0700) by alexbl
Tags: desktop, tags, metadata, organization, userfriendly
| |