The Importance of Ubiquitous Media

I would like to talk about design, obviously, but more particularly about `Ubiquitous Media' and `Augmented Reality', and what I mean by that I hope will become clear during this talk.

The running theme of this presentation is interactivity. We typically think of the term in respect to humans and technology, but I want to emphasise that my spin on this is slightly different for the most part. It's interacting with people as well, but mediated by technology. For example you are working with me, while I happen to be in Canada and you in Holland.

The other aspect of interactivity that we don't normally think of is in the sense of interactivity in the foreground (vs. background), which is what I'm doing now - well, it's a monologue rather than a conversation - but it's a foreground, conscious, intentional activity. At the same time however, there's interactivity with the environment, the periphery; there's background interactivity. And again, I want to push on that towards the end of the talk - it's actually where I think most of the action is, so to speak, both in respect to interactivity with people and with technology.

Now, we have a problem. And the problem is very much one of design - I would almost say we have a crisis of design. Simply stated it's this: based on our performance to-date, we as engineers, we as designers have absolutely no credibility to deliver on the promises which we are making to the public about this wonderful, glorious, wired future which everyone is hearing about but nobody is delivering. We can't deliver VCRs that work, we can't deliver phones that even the phone company knows how to forward calls on, we can't deliver copiers that Xerox people know how to make copies on. How in Hell are we going to deliver these other things we're promising?

This is all summarised in this highly-scientific graph that I have here. This axis represents the growth of functionality over time - which is this dimension here. I want to point out this little red line here: some people call it the Complexity Barrier, I call it the Threshold of Frustration. It basically is the line where people are not willing or able to go beyond in pursuit of some activity. And all I want to say is: things like VCRs, copiers and telephones are already right at that threshold, if not above - I'm being generous now - and things like multimedia, computer-supported collaborative work, virtual reality, telepresence... all this stuff, is so far above the threshold of frustration that I don't care if you can do me a laboratory demonstration: it simply does not exist in terms of a cognitive or social accessibility. We're wasting our time unless we change how we design. Based on the way we're doing things right now, it's a disaster, and we simply won't be able to deliver. I think one of the purposes of conferences like this is to explore how we might achieve some new paradigm of design.

My point of departure - not by accident as I work half time at Xerox PARC - is one of my colleagues Marc Weiser and this very important article he wrote in Scientific American in 1991 called The Computer for the 21st Century, in which he introduced the notion of `Ubiquitous Computing' (UbiComp). To give one other footnote: there's a special issue of the Communications of the ACM that came out in July 1993 which is also devoted to this question of `Ubiquitous Computing' and `Augmented Reality'.

I'm going to quickly summarise in a couple of slides the whole notion of Ubiquitous Computing and then I'll go on with my examples. It's really simple: `Ubiquitous Computing' means computers are everywhere, or access to computers is everywhere. But the computers and technology are invisible.

This in itself seems to be a paradox, but let me tell you what Weiser talks about in terms of computers: he talks about having in your office and milieu maybe hundreds of very small computers. Things like this watch: my watch happens to be about the power of an Apple II. It has an RS232 port, I download my day agenda into it everyday, I have a HyperCard stack; it's a peripheral computer, a standalone from my main computer.

Or this, which is called a ParcTab - it's a personal locator. It's got a wireless network: I walk through buildings, it tells me where I am, my phone follows me - it does all kinds of things. I can look up words in the dictionary. It's basically a very portable window into a wide wireless network where I can read email, and so on.

Scaled up to things like this: which are notepads, which is wireless and I can walk around, take notes, send faxes. Or scaled up to the size of the wall here, instead of being simply a one-way presentation thing, I can walk up and write on it. One way or the other: in the end it all comes down to the notion of embedding these things within the environment, that's the kind of thing Marc started talking about in his original article.

What I want to address is the issue of how to resolve this paradox between ubiquity and transparency? How can something be everywhere yet be invisible? I want do that with a very simple example, to give a sense of the design perspective here. It's a perspective based on the notion of skill, whereby we get the invisibility.

Here's a problem: we have a person sitting here on this island and he wants to get to that one and there's water in between. The question is: how do you go about it? With the first technology - which is manual labour - what one could do is to swim across. But in the process you have to learn a new skill. You have to swim, which is not something we come out of the womb with knowing - actually, it probably is, but we forget very shortly afterwards. Moreover it's fraught with other dangers. I might have eaten beforehand so I get a cramp, or I might get eaten by sharks as I swim across. So we go beyond manual labour - and engineers must also have a job - so let's build some technology.

This is the current technology: this is what the engineer of the 1990s - of the last few decades - would have done. He would have built a boat. This is very interesting, because first of all we have to build it. It's a high tech industry to build the boat, then we have to get petrol to fuel the boat, and of course the petrol leaks and pollutes the water. At the same time I have to learn how to drive the boat, and the engine is going to break down every six months, so I'm going to have to fix it. There's a whole infrastructure there. The main thing is there are all kinds of dangers, maintenance and overhead going on here - and most importantly there's a complete set of new skills that I have to acquire before I can make the crossing.

Here is the `Ubiquitous Computing'-approach: I build a bridge. You could say in one sense: "I can see the bridge". It's not invisible so what is this business of invisibility?' I'm trying to make the point here that it's invisible from the perspective of how I perform the task, which is to move from one space to another. To walk on the island requires exactly the same skill set as walking from island to island. To cross the bridge requires exactly the same set of skills we have already acquired from a lifetime of living in the everyday world. It's transparent. And it's this distinction between the motor boat and the bridge that we need to make in terms of bridging information systems in higher technologies.

The first thing I want to do to expand upon Weiser's work is to introduce the notion of `Ubiquitous Video' (UbiVid). One of the tenets of `Ubiquitous Computing' is that it is inappropriate to channel all of your computational activities through a single workstation. You have a maze of computational devices, all of which are specialised for specific tasks and embedded in the environment in the appropriate places. I say exactly the same thing about `Ubiquitous Video'.

We've seen media spaces in terms of Bellcore, Xerox PARC, our work in Toronto and so on, and what we want to say is that it's equally inappropriate to channel all your video activity through a single camera, monitor and loudspeaker. This, by the way, is the kiss of death to 'desktop video' as it is trying to become known.

One of the key points here - it's a running theme - is to pay attention to architectural and social space, and this is my second point and we'll talk more about this.

Where this leads if you take `Ubiquitous Video' and the media space on the one side and combine it with `Ubiquitous Computing' on the other, is something which is the larger picture: `Ubiquitous Media' (UbiMedia) - UbiComp plus UbiVid.

Let's talk about UbiVid for a while. We'll start with the stereotype we have right now in too many magazines, and the sooner we get rid of it the better. This stereotype is 'here's what we're going to do: desktop video'? We'll give you this and you're gonna sit at your desk with this monstrosity in front of you with a camera on top and you'll be able to `videoconference'. And this is what virtually anyone, who's doing desktop video conferencing, is currently working with, with some notable exceptions. This is the enemy! This is the monolithic approach.

There's a couple of design principles that are going to follow through from what I speak about for the rest of the talk. One of them is this concept: when I'm dealing with video, telepresence, I'm going to treat electronic visitations and entities the same way, as much as possible - and by `the same way' I mean with the same skill set - as physical visitors, and I'm going to pay attention to the long-evolved, hard-learned social mores of function and location. What I mean by that is, for example, that anyone walking into this room, by simply seeing where I'm standing and you're sitting, will know that I'm the speaker and you're the audience, and that the technology should take advantage of similar space-function relationships. We'll see more of that as we go along.

I probably won't dwell on this slide: it's simply contrasting two views of an office. Here's an office with a desk, a visitor's chair and a desk for a secondary person to work on. Here's the same office: these dark black things are video monitors. The traditional way of doing things is I have an office and this workstation on my desk with a monitor. In my office - and this is not hypothetical - in my actual office, I have on my desk a camera and monitor, so that when I'm dealing with someone at my desk in a side-by-side working relationship, that's where they appear and that's how we work.

On the other hand, if I'm sitting around with a group of people in my office, there's the visitor's chair. Right behind the visitor's chair is the video camera/monitor, so that if you're with me electronically in a meeting, you sit at a chair like anybody else, in the same spatial relationship. You'll see some other examples of this notion of preserving the sense and use of space in an architectural sense, and this is done by infusing the space with more technology, not less. And ironically, by putting more in, it is less visible.

Likewise in terms of video conferences. I don't know how many of you have taken part in video conferences, where you're giving a talk like I am right now, and half the audience is someplace else. In every single case where I've done that, the live audience was in front of me and the electronic audience behind me because the video conferencing gear sits at the front of the room.

What does that mean? It means the video conferencing audience and technology is invading the physical space where I am sitting, the space where I' am supposed to be. They're in the speaker's space, not among the audience, so I have to make a choice between showing either the remote people or you the back of my head. So one of the things you have to do, again as an example of using architectural space, is have back-to-front video conferencing. So I'm the speaker at the front and the audience sits at the back around the table, whether they're here electronically or physically. The most important thing about that is that the user-interface disappears. If somebody wants to ask a question, they put up their hand. If they look confused, they look confused, no matter what. I see it and I see it in the same way whether they're there electronically or physically.

Here's an example which you see on videotape, and it's about the notion of how to have an around-the-table conversation electronically. There's four of us, we're sitting around the table, we're having a brainstorming meeting and there's all kinds of very complex social interactions going on, like me looking at you, you looking at me, head-turning gaze awareness, breaking into parallel conversations, and so on. So what we want to do in this situation is simulate electronically the case where there's four people sitting around the table - and there's a design solution to this problem.

These are Tracy, Abby and my daughter Kate.

You'll notice that the space occupied by each remote participant is very small, and yet they have a distinct location, which means that the sound comes from their own personal space. Furthermore, each person can see where each other person is directing their attention, their gaze. Eye contact can be maintained because of the small angle between the camera and the monitor and there's enough space on the desk if I need to do other work. Therefore, through the use of industrial design, we're able to give an implementation of a concept that is actually testable and usable in a real world context.

The principal concept there is that those are not cameras and monitors, they are video surrogates of the remote participants and so it's not a camera, it's their surrogate eyes; it's not a loudspeaker, it's their surrogate mouth. And the combination of that with the monitors defines the social and functional space that they occupy. And you can even push this further, so you can lean towards them and make private asides. If you notice, there is no user interface. The way of dealing with this preserves the social skills that you have already established in a similar geometry of conversation.

Let me give you another example of a video surrogate and how social function applies to this. In my office above the door there's a camera and monitor and loudspeaker. When people come and approach me, they enter from the door whether they come electronically or physically. And if I invite them in, then they come and sit down in the visitor's chair or at my desk.

Now the point about this is that there are very important social graces of approach and departure and if you just have some monolith monitor on your desk and people just appear suddenly from somewhere, it's just as inappropriate as if (climbs off stage) I was standing here this close to you. This is absolutely violating the social graces we have learned. Yet that's how we treat video (climbs back).

Sometimes it's good to keep your distance. That distance is culturally embedded and we have to respect it. And we have to respect it by not allowing technology to intrude in our space and simply be rude. So let's see an example.

It's a short example, but it's real. It's no more complicated than what happens in the physical case, but the point is that you preserve social graces and I have an environment that I'm comfortable to live in. We use the principle here of using the same social protocols for electronic and physical interactions.

This goes even further in terms of doorways and so on. One of the first things people talk about when you start putting this technology around is privacy. One of the things you can do is use existing social conventions to control your own accessibility, in a sense that I can programme my accessibility by having my door, whether electronic or physical, open, ajar, closed or boarded shut, which gives strong social cues as to how approachable I am.

Now, the important thing about these door icons, that propagate throughout the network, is that, when people want to reach me by any electronic means, they preserve the metaphor. They have a slight problem: if I have them on my computer, I have to close both my physical door and I have to change the door icons at my workstation. What I'm about to say here is that I can actually instrument the physical door itself to be sensed. And this is import - this is the pun of the day! - it's the perception of the doors, as opposed to the doors of perception. All we've done is that we got a Macintosh mouse, screwed it on the wall on the shaft and cord, we put a little belt under the hinge, so as we open and close the door, the mouse senses that. It is fed as a second mouse input into the computer and that propagates to the world. So everybody knows whether I'm available or not. It's this notion that you can start to instrument and have your environment sense the background - we're getting back to interactivity: background interactivity, of context and environment.

Now here's the audience participation part: you now have 20 seconds to draw a computer. You're designers - get your paper out and start drawing now! I have to organise my talk. As soon as you finish put up your hand and I'll grab a couple.

Okay. You can check for evidence later, but people found the stereotype. This is what we got: how many of you did not draw something like that? You've heard me speak before! Here's the deal: I'll give you a second chance. Draw a computer from 20 years ago - in your mind's eye - we don't have enough time. Here's what you would draw. I'd draw that: some registers, an address bus, ALU. Here's the question: of the drawings, which one has remained true? Mine. More or less. This is a very important point: what nearly all of you drew were the input/output transducers, the screen and mouse, the keyboard. Some people drew the box that had the computer, but most drew the I/O transducers.

Very important observation, especially when we contrast the 1963 with the 1993 drawing: One, the I/O transducers were different. Furthermore, the fact that that is what you drew tells us about the power of the I/O devices to shape your concept of what the computer is. Second, the I/O devices are accidents of history. On the one hand to say they're accidents and on the other to say they're powerful is the most important thing that we can groc as designers, because it says to us we can design and adopt new I/O transducers - because of course they are accidents of history. In so doing we have the most powerful tool to shape the conception of what future computers are about. It's not designing computers, it's designing interactivity - the input and output - and thereby shaping the concept, the perception. One of the most important ways of doing this, which brings us back to the notion of Ubiquitous Computing, is to do it in such a way that you embed the technology into the environment.

What do I mean by that? Here's sort of multimedia - which I deplore - I hate the term and hate the concept. Back in 1293 if you were in a castle, you peered between the castlations of the castle. Now I'm a serious scientist, I've published and stuff like that, what do I do? I sit at my desk and peer between all of this technology, right? Where's the progress? Has there been any progress at all in the intervening 700 years? Do you really want to continue to live that way? The reason is: we're being sold a bill of goods by the computer manufacturers that say `we're going to solve your problems by delivering multimedia on your desktop in some 'super appliance' that does everything.' Now this `super appliance' means a whole bunch of appliances all cabled together, like you see in front of you right now, and the operational definition of multimedia is simply any three boxes cabled together and connected to any other three boxes on your desktop so you have no room to work!

Now, no appliances allowed! And the most important design thing to say is this: the box into which your solutions should fit is the box in which you live or the box in which you sit. No desktop computer, no desktop metaphor - I want my desktop back!

And we can do that. I'll show you a very quick clip:

In this case, he has a window in front of him to look out at the landscape. His desk: there's no computer on it. The desktop itself is embedded; it's his environment. He's got video conferencing capabilities the units you already saw. He works at it like a draughting table, he's a designer. If you're a designer, you sit at a draughting table and work the way you always have. The point here is the technology is embedded into the environment and all of his appliances disappear.

Okay, let me conclude. You've seen that how we deploy this here and how we hide it is critical in terms of the perceptions. And the thing that should drive our design is not what Intel will sell us, not what Microsoft will give us, it's what are the skills that people have? In terms of motor-sensory capabilities, skills in problem solving and cognitive capabilities, and skills in terms of social functions, working together as social/political animals. And the technology has to begin to respect this.

My final example: we're just down the road from the Concertgebouw Orchestra. Think about how much the bow - not the violin, the bow - of the concert maestro of the Concertgebouw Orchestra costs? The bow, the input/output transducer? I'll tell you: the bow of the concert master costs more than a Silicon Graphics Iris workstation. That's an important economy to put it in terms of. Just the bow. Who makes more money, the engineer or the musician? Who can afford to spend the most on an input/output device? Of course, the engineer.

If you go to Silicon Graphics or Apple and say 'I need something better than this piece of garbage', they say 'we can't afford it. It's too expensive.' This (mouse) costs five guilders to make. The musician is willing to spend the price of an entire workstation - because the instrument respects the skill and human capabilities to articulate powerful ideas through some mechanical or technological medium, namely a cello or a violin. Until we pay the same attention to detail and our designs are capable of capturing the same subtle nuances of skill, both motor-sensory, cognitive and social, as the bow does for the musician, our designs are still inappropriate and immature and we haven't done our job as designers for our customers. And I'll leave you with that.


updated 1993