WARNING: This is probably going to be a really long, angry, and potentially
upsetting rant, involving an issue I have been interested in and done various
bits of development on in the past.
A long time ago in a place about 8 hours south of me, some engineers from Apple
went to visit an office in Palo Alto and saw three things:
Smalltalk,
Ethernet, and the
Alto. This is a historically
significant event in that it drove Apple to notice the first two and kind
of play with them in their future products, and to consumerize the third.
I think the first mistake was decoupling the first thing, Smalltalk, from the
concept of the third, the GUI, and I think this decision has left interface
development in a bit of a rut.
I'm going to lead with a bit of theory and then drive into a lot of examples.
My theory is going to be based on some assumptions on my part which I imagine
others will not agree with:
-
As a basic concept the UNIX philosophy works very well
-
One interface for a particular system (IM, Mail, etc) does not suit all
tasks using that system.
-
Almost all uses of the computer interface involve visualization and
manipulation of information
-
Organizing the information I intend to visualize or manipulate by metadata
is a better way of organizing data then a single name.
While not exclusively necessary to buy into my argument the following issues
probably play a role:
-
Though people may not want to believe it, computers still have limited
resources
-
User-friendly is too often an excuse to take capabilities away from those
that can make use of them
-
Obligatory configuration is the root of many evils.
With that being said, I'll touch on various things I've seen, worked on
, or been involved with. My first target is going to be a project
I became involved with in 2005 whose architecture exemplifies what I think
is a better way of handling interface development and organization of data.
The project specifically is XMMS2. XMMS2 is a
media library managed music player that provides a daemon, xmms2d, which
various clients can connect to and manipulate. The first right decision made
was to abstract functionality into clients. If for example, one wanted to
collect album art for ones music collection this functionality would be handled
by a client which would write to the medialib instead of by a plugin to the
daemon. Currently this functionality is being developed further with the
concept of
Service Clients.
Service clients open up a whole new world of functionality by allowing clients
to issue method calls to any client which registers a service and set of
methods. This kind of functionality is why we jokingly refer to XMMS2 as the
music-playing microkernel. What this means though is that we get right at my
second thesis and at my first and second personal views. The way in which I
work with XMMS2 is really set around what exactly I'm trying to do. When I
want to browse my music collection I would prefer something visual where I
could walk through the collection and search it in various ways, but when I
just want to pause a song I can either have a simple playback control client
or I can use the command line client and simply issue xmms2 pause at any
terminal. I can even do it from a python interpreter if I for some reason
need to:
import xmmsclient
xc = xmmsclient.XMMSSync()
xc.connect()
xc.playback_pause()
Which leads right into the second big advantage of the open interface design
of XMMS2. I can integrate XMMS2 into any other system by nature of its design
instead of having to hack some kind of remote interface like so many older
applications have or hoping that the developers of an application provide a
usable API with dcop, or dbus, or whatever the trendy remote interface
standard is today. The API is used by anyone developing a client and so it has
a much higher standard of usability then a secondary API tacked on to provide
ability to put the current song you're listening to in an email or a blog post.
The client-server design and open common API are not the only advantages that
XMMS2 provides. The media library in XMMS2 has two amazing innovations:
-
Use of a sourced entity-attribute-value model to which any client is allowed
to write
-
The Collections
organization and querying system
XMMS2 indexes metadata off of media into a sqlite database whose Media table
schema is something like: id, key, value, source. This is basically a
Dictionary hash table, or associative array depending on your language of choice.
There are a few disadvantages off the top of my head:
-
Querying this is slow because in SQL it requires a lot of self JOINs on the
Media table
-
Typing of the value is difficult and ominous at best
The advantages are much more. Firstly, any sort of data can be indexed and made
available to the user, such that a user has the ability to arbitrary tag their
music in their own way. A user might decide to make a list of people_who_like
a certain song, and then use this information later to build a playlist for
a party based on whose attending. A playlist that they think most of their
friends in attendance will be happy with the selection. More importantly it
reduces development headaches in that we don't need to provide an upgrade path
for the database every time we decide that we want to index a new tag property
from the media files, and also so that we don't end up with a massive list
of columns.
Onto our second friend, the Collection. Collections provide a way to organize
music by certain constraints and then later query for a list of matching songs.
More details about the specifics can be read in the article linked above which
gives lots of pretty examples of the capabilities that this functionality
provides. What's really more important is the basic idea of allowing
arbitrarily constrained collections of metadata which can be linked together
and saved, and then queried later. The results are thus always up to date when
a query occurs.
Having gone into how I think XMMS2 does the Right Thing (tm), I'd now like to
discuss some other systems some of which also do the Right Thing (tm), possibly
in a different way, and some of which try to do it, but fail by being too
unwillingly to break down the barriers that have been built up so strongly
in the past.
The basic concept of cross-application integration has been very popular in the
recent past. Apple highlighted the fact that Mail.app could tell you if the
person to whom you were writing an email was on AIM at the time you were
writing that email. The GNOME crowd quickly picked up
on this as some kind of killer feature and offered a large bounty to see it
implemented between Evolution and gaim. Initially such functionality was done
by providing a simple gaim-remote command line tool and library which could
be used to control certain specific functions of gaim in a way that facilitated
this exact functionality. More directly the plan wasn't really to open up
gaim's interface, but rather just to make this one feature possible. It's
really an example of missing the big picture of cross-application integration
and instead only picking up on already developed use-cases for it. Apple was
most likely able to implement this functionality by using OpenStep's
Distributed Objects
functionality which allows the exporting of objects over some kind of channel
so that other applications can manipulate them. (I assume this is how it was
done since the functionality was already there and it seems silly to reinvent
the wheel in this case.) NeXT had used this same functionality for
collaboration tools where one user could watch another developer working
transparently over the network. This functionality was used in the development
of Doom and Quake.
Mind you, I'm not championing NeXT and certainly not Apple. I think that Apple
while having the ability to really open up interfaces and create a integrated
overall interface has focused too much on separate abstract applications,
and consumer friendliness (eye candy, music, video). This was not always the
case though. I consider the
Apple Newton to be a champion of
the kind of interface for which I'm calling.
What's so special about the Newton? The Newton actually is amazing in two ways:
-
Integration of objects between the various tools on the Newton
-
Not so novel ways of managing and visualizing information that seem
to be avoided or forgotten
The Newton didn't use files for storing data but rather had an object database
which allowed objects to reference each other and store fairly arbitrary data.
This made it easy to extend the address book by adding new properties which
could be made useful by third party add-ons to the builtin applications. This
also allowed objects to reference each other, so an address book entry could
be included in a note such that it linked directly to the address book instead
of just including the information about the person. This way modifying the
information in the note could potentially change it in the address book, and
changes in the address book would be propagated back to the note. This is the
kind of functionality that should really be focused on when the issue of
cross-application integration comes up. More then just killer-features it
should, nay, needs to be ubiquitous: something that's trivial for a developer
to use, and hopefully accessible enough that a normal user has this
functionality available to them with no or minimal programming skills.
The second issue is also rather cool, but nothing special. Similar
functionality was presented in Doug Engelbart's
Mother of all Demos.
In particular the issue being referred to is the structural nature of lists
in Newton's Note program. The idea of being able to collapse, reorganize, and
re-visualize list items is a really worthwhile feature. Engelbart demonstrates
some really cool functionality such as changing a hierarchical list into a
flat list, and back again. He even demonstrates visualizing that list as a
tree, and jokingly ends the demonstration of the list, which in this case
was a shopping list, with a presentation of line drawn map of the path he
would have to take home to pickup all the items he has on the list. What was
a joke in 1968 (I repeat 1968) is very possible today. By applying an address
property to certain list headers it would be fairly easy to provide a Google
Map view of the path one would have to take to get home.
Diverging into the Mother of All Demos, it seems clear that many of the ideas
presented in this presentation were lost in favor of fawning over the Mouse and
video conferencing. Engelbart's focus on dealing with "wicked problems" would
have pushed computer development into an entirely different arena where the
focus was on tools that were more general and could be put together to deal
with specialized problems, instead of focusing on the development of
applications to address specialized problems. The tool architecture is
something that was heavily utilized in UNIX development but has had little
appearance in the world of the graphical user interface. There are a few
(partial) exceptions to this: Plan 9 being a classic example, but in general
the UNIX philosophy itself has even begun to break down in the development of
CLI tools. A classic complaint about git is its toolkit
design, which amounts to over 100 separate commands. The basic concept being
that a certain set of them (the plumbing) can be strong together to create
the actual functionality of git. For example the commit process is actually
the combination of git-write-tree which writes the index to a tree object
followed by git-commit-tree which writes a commit object to the repository
which references the tree object (There is slightly more involved in the
actual process done by the git-commit tool, but that is the basic principle.)
This tool based approach is seen as a drawback with constant citations of how
it's difficult to use and not user friendly. Those that make this claim are
missing the point. Scripts (porcelain in git speak) exist to consolidate these
processes for the user, but the toolkit approach allows a level of power and
flexibility that are impossible if the functionality is internalized into a
closed interface.) I'd go further to say that the closed interface makes it
harder for the user to actually do what they want to do.
The entire user-friendliness argument amounts to what I consider a general
disrespect for the user. It basically says that the user is not smart enough
to learn, or that powerful functionality is usable by such a small subset of
people that use an application that it amounts to not being worth having. This
is why the open interface is a truly superior approach: generic cases can be
easily be built on top of the open interface so as to make for something that's
generally considered easy to use, but the interface becomes immediately
open to anyone that wants to just play with it. I'm going to amazingly cite an
example provided by Microsoft that follows along such a line.
Microsoft's Windows PowerShell exports much of the .NET classes to a user
accessible shell with a simple scripting language. This is amazing in a lot of
ways, specifically it allows a moderately competent user to experiment with the
.NET classes and interfaces from a fairly non threatening environment (no need
to compile, instant error notification, etc), and thus gives them more power
over their computers functionality and capabilities. Some examples of this
power presented directly to the user are:
PS> $rssUrl = "http://blogs.msdn.com/powershell/rss.aspx"
PS> $blog = [xml](new-object System.Net.WebClient).DownloadString($rssUrl)
PS> $blog.rss.channel.item | select title -first 8
Here access to the .NET objects are used directly to download a string and
then cast it as an XML object which is then queried with a simple syntax for
querying. This is functionality that an eager user could learn to do, or that
could easily be scripted, providing a function that provides a title and URL
for the top eight blog posts on a RSS feed provided by the user. This could
even be pushed into a graphical application which presented the information as
a list view. Immediately power is made accessible to a user that is willing to
try and learn, with access to millions of examples all over the Internet which
they can adapt to better suit their needs (I learned to program by a similar
process). If interfaces are locked up and hard to work with and learn then
we end up with people less likely to try and experiment with them and we get
the kind of people that we always fast "Joe Blow" user as. If you assume people
are stupid, you don't give them a chance to change that.
This has mainly been about my first issue, that toolkit based design is more
adaptable and provides more power directly to the user and also allows for
better integration of functionality. I think that by approaching software
development from the first issue, the others can in some ways fall into place
but I would like to address them more directly.
I will address my second issue by giving a use-case situation that I hope will
point out a common issue. I'm looking at a website for example
Portland, Oregon on Wikipedia,
and I see this table of temperature and rainfall data. I personally would
rather see a graph of this data, as I visualize this sort of thing better
graphically. I'm presented with a few options. If I'm on windows I might
cut and paste the data into excel and tell it to spit out a graph, on *nix
I would load up gnuplot and input the data. It's clear that the ability to
visualize this information in different ways is desirable and while I have
viable options to deal with this problem, both of them are time consuming (one
more so then the other), and require that I make decisions about the data to be
graphed. HTML is a semantic format, the table is actually marked as a table and
so the ability to convert a table of numbers into a graph (with some best
guesses made about how to group the data and the ability to be more specific
if necessary) could easily be implemented with open access to the table DOM
object. In general these issues arise often, with various methods of
presentation being better suited for different people and different situations
and too often we focus on unified presentation instead of specialized
presentation. As discussed before lists are another common structured format
that would be suited to various forms of visualization depending on person
and situation.
The computer is an information manipulator. It's designed to organize and
access information. It should be thought of that way. Playing music on a
computer is an example of this, much more so then a stereo is a information
manipulator. Most media library music playing tools are focused on presenting
the organization of your media instead of the actual playing of that media.
Playlists exist to better organize the music, cover art exists just as much to
ease recognition of music as it does to provide eye candy. These capabilities
are often ignored or missed. The simple office mailbox has none of the sorting
or organizational capabilities provided by even some of the most basic mail
programs. With that in mind, those organizational capabilities should be a
focus of computers. With the emphasis on relations between information and
on providing information and grouping that is better suited to the person
whose information is being organized. This is paramount, just because it works
for me, doesn't mean it will work for you, so the focus should be on providing
a system so dynamic that it can be adapted for whomever is opting to use it.
Filenames don't cut it, and embedded metadata is barely (and often not) enough.
Filenames require you to essentialize all of the properties you associate with
a piece of information into a fixed string. While possible to serialize various
properties into the filename it makes for something difficult to query and to
identify ultimately. Instead it makes more sense to catalog the metadata for
various bits of information and allow it to be queried in various ways. This
capability should necessarily be integrated wherever possible so that when
looking at an email you can quickly jump to other emails by that sender or
other information with a relation to the sender. Capabilities should exist much
like XMMS2's collections to store constraints to be used later, or build them
on the fly, and all of the various operators should fit together so that an
extra level of filtering can be applied on top of the current one. The current
searching systems I've seen (spotlight, beagle) don't really go far enough, and
their functionality is rarely available outside of the specific applications.
Apple has been making strides to shoehorn (and I use that term with a heavy
hand) spotlight into various other parts of the OS, but nothing that's exactly
amazing, and in general I think spotlight is too oriented around searching by
file and a subset of available file metadata.
I probably upset someone. Sorry about that, maybe it's time to consider that
what was revolutionary twenty years ago is tired by now, and that the
foundation on which work is done itself is flawed. I had initially intended
this article to be a call for ideas, help, and support on some stuff I have
sitting around, but I decided that it would be better to present a manifesto
of just what is wrong, and what possibilities there are to fix it. I want this
manifesto to be thought of as a working draft of the possibilities for the
future instead of a rock solid statement on the way things should be done. I
will post about some of the projects I am working on that follow this general
philosophy later.