multimodal web

Bob's Perfect Virtual World as the 3D Web

In this blog entry, I'd like to address Bob Sutor (of IBM)'s three blog posts about his requirements for a perfect 3D World, implemented as a direct extension to the World Wide Web, as described in my 3D Web design concept.

A pure offline Mode

I think this is part of a wider requirement for certain web applications to work offline. With the recently announced Google Gears and other projects from major industry players like Adobe's Apollo, Mozilla's Firefox 3 (and Parakey, currently vapourware), Django's Offline Toolkit, Microsoft's Silverlight and Joyent's Slingshot I think this is going to become an extremely hot topic. I think we're going to see the boundary between web server and web user agent blur considerably into "Web Servents". So in short, an offline mode can use the same technology as an online 3D web, with a local server or a local cache of data, logic and presentation.

A peer-to-peer model

By using web technology, we can take this for granted to the extent that anyone can run their own 3D web server and we can make hyperlinks between them. The peer to peer idea could be taken a lot further than this though, by users in the same virtual space swarming the data between each other. I don't know about that bit.

A model of many planets

Again, this is basically what the web is.

Much better zoning

This almost touches on the contraversial subject of the .xxx domain.

However, with Second Life the geography works in much the same way as First Life with blocks of land having permenant neighbours. This is a limitation of real physical space that while it might be nice to reflect in virtual worlds, is not necessary. We could have lots of areas of "virtual land" who's boundaries are defined only by their own content and then have portals (hyperlinks) which allow you to move into another space, there is no reason to have permenant neighbours because your neighbours are simply whatever you link to, which is under your control. In this way zoning just becomes a result of the links people make, which works reasonably well on the current web.

If you do want to build a planet with geography like the real world (like Second Life), you can still do so, but you could decide to ban certain activities in that particular planet. That way, if there's some content you don't like you simply don't link to it, and it is only as close as 6 degrees of separation dictate.

In-world Secure Chat

I would argue that secure chat in general isn't particularly widespread on the Internet yet, so this is an issue for the Internet in general. However, see later for more discussion on in-world chat. In short, XMPP encryption extensions.

AI

This would just be part of a web application. A 3D web application that is a 3D game may have AI controlling faux avatars and objects, a sales site may have an AI shop assistant or human-AI hybrid. Server side scripting languages and javaScript manipulating X3D files.

World-to-world communication

XMPP (Jabber) and either Jingle or SIP for voice (and video?) would be great for person to person chat. A couple of interesting points spring to mind:

Firstly, should the jabber client be part of the 3D Web user agent or should it just be another web interface like Google Talk in GMail? Especially with regards to advertising prescense or status of the user (available, away, busy, offline).

Secondly, how do we deal with the issue of hearing people around you in a virtual space and adjusting the sound as people move, in addition to person-to-person conversation between worlds. We certainly don't have standards for this yet so it wil be interesting to watch Linden Labs.

World to world teleportation

Hyperlinks.

Do I need a membership in the other world or is there a notion of guest?

We have the same issue on the web. I think distributed authentication like OpenID is a giant leap forward in this field.

How do I deal with cross-world identity?

By using a URI as a person's identity as in OpenID. You can still have your friendly screename in Jabber, but the URI uniquely identifies you.

Can I bring my money with me?

That's a good question, but I think the answer is that if you want some kind of virtual currency, it simply becomes a service like PayPal where you buy credits of some kind and they sort out "exchange rates". You could then use that currency in any world or any web site by using that service as a broker for payments. I'm obviously making this sound a lot more simple than it really is, nothing is straight forward where money is involved.

Can I bring my clothes with me?

Yes, your avatar and everything your avatar wears is hosted on an avatar server (just an 3D web server) and can simply be included into a scene. This only works if all the worlds use the X3D (or other) standard, which is one of the fundamental requirements of a 3D web in my opinion.

Can I bring more general objects between worlds?

The same as above, "objects" can be an X3D file hosted somewhere on the web which can be included into another X3D file dynamically. This requires a certain level of write access to all 3D web servers, which is probably going to cause all sorts of spam problems like we have on wikis. (Imagine a spamming company putting up billboards everywhere).

Search

Don't worry, Google will sort that out ;). Seriously though, it could work the same way as the web with spiders and giant indexes.

Device and world compatible link redirection

Now that is a very interesting topic which you could call the Device Independent or Multimodal Web. I think this can be solved with HTTP Accept headers and content negotiation. This is a major part of what my Webscope project is about, a Multimodal Web User Agent.

tola – Sat, 2007 – 06 – 09 13:12

Tux Droid VoiceXML Web Browser

Synopsis

A VoiceXML web browser using a "Tux Droid" as the human-computer interface.

Codename

Vux

Rationale

A proof-of-concept for the multimodal web and a bit of fun.

Features

  • Audio output via audio playback and TTS (text to speech)
  • Input via the XML form of DTMF grammars described in the W3C Speech Recognition Grammar Specification, triggered by buttons on the Tux Droid remote.
  • Input via speech recognition grammar data in the XML Form of the W3C Speech Recognition Grammar Specification.
  • Record audio received from the user

Extra input and output

Input

  • Push button on head - home
  • Menu button - home
  • Buttons 1-9, *, #, red, green, blue and yellow - DTMF tones
  • Left direction button - back
  • Right direction button - forward
  • Vol+ & Vol- - volume control

Output

  • Beak movements during TTS audio output
  • Error, status and notification messages using flashing eyes, moving wings and spinning

Use Cases

  • Voice webmail - reading email messages
  • Voice interface for home automation
  • Voice interface for music collection

Implementation

Possibly a front end to an existing VoiceXML interpreter such as Public VoiceXML which uses OpenVXI.

Developer Resources

"Megafreeze" development broken, Abstract User Interfaces

Melt the Megafreeze, let it trickle

Tuomo Valkonen writes that The megafreeze development model is broken in GNU/Linux distributions. He argues for a very long release cycle for an extremely stable base system (in line with Kernel releases) and then separate repositories for applications which are constantly upgraded.

I've often thought that in a world where security updates can be trickled over the Internet as they become available, it's odd that new features come in big chunks with each new release of a distribution. With Ubuntu, I upgrade every 6 months to see new features, why can't the features just appear as they become available like we're used to with Software as a Service?

Sam has tried to explain the reasons for the status quo to me on numerous occasions (him knowing a lot more about building Linux distributions than I), but like Valkonen I still remain unconvinced that the Megafreeze is the best approach.

Abstract User Interfaces: "Plasticity"

While I was on Tuomo Valkonen's homepage I noticed the Ion window manager that he developed. I found the UI ideas very interesting because they're very similar to a lot of things I'm trying to achieve with Webscope.

Ion has "tiling workspaces with tabbed frames" and the screen is always filled at any one time, like the multi-level resource tabs I want to create.

Ion also has a "query module" which "implements a line editor similar to mini buffers in many text editors. It is used to implement many different queries with tab-completion support: show manual page, run program, open SSH session, view file, goto named client window or workspace, etc." which is a similar concept to the Natural Language Command Line I am trying to develop.

In a paper entitled Vis/Vapourware Interface Synthesiser Valkonen describes a system for describing user interface semantics and then automatically generating actual interfaces based on user's preferences with the use of stylesheets. This seems very much like a transform view in a Model View Controller design pattern and he's essentially talking about doing for the desktop what I want to do for the multimodal web. Starting with a semantic description of a user interface (e.g. using DIAL) and then transforming that semantic description into various different presentations using XSL stylesheets.

In his bibliography, he links to papers which use the term "Plasticity" in user interfaces, which I might explore further. User interfaces these days have to go "above the level of a single device" -- O'Reilly.

tola – Wed, 2007 – 03 – 14 12:15

Why *not* to make the "Metaverse" a direct extension of the web

Further to my previous blog entry, Why I would make the "Metaverse" a direct extension of the web I have found a strong argument to the contrary in the documentation of the Virtual Object System.

In a section of their manual called The 3D Web the authors point out "three basic limitations of HTTP which have caused 10 years of pain, suffering and hacky workarounds for developers trying to build interactive applications over the web. These are that HTTP is a stateless protocol, that URLs represent opaque handles to resources, on which no reliable introspection is possible, and that HTTP is explicitly asymmetric so that a server typically cannot initiate sending new data to a client."

The reponse of the Virtual Object System community is to create an entirely new protocol stack which is a mirror of the technologies used on the web, but with a new technology for each layer:

  • VIP is like TCP
  • VOS is like HTTP
  • A3DL is like HTML
  • CSVOSA3DL is like an HTML rendering engine such as Gecko or KHTML
  • Ter'Angreal is like the web browser

The fact that HTTP is a synchronous, stateless protocol has come up in the past with regards to web applications - raising the possibility that AJAX is just a hack, waiting for a new protocol to replace it. Perhaps a replacement or extension of HTTP is due.

The current approach I am taking to a 3D Web client for Webscope is:

  • TCP is TCP
  • HTTP is HTTP
  • X3D is like XHTML
  • FreeWRL (and others) are like an HTML rendering engine such as Gecko
  • Webscope is the web browser.

Because of the limitations of HTTP I have considered building a protocol like XMMP into Webscope, and the argument the Virtual Object System community make will certainly prompt me to explore alternatives further.

What I think I would like to see is a solution that sits somewhere between the plain X3D over HTTP approach and the radical VOS approach of replacing the whole protocol stack. I don't want to throw away HTTP entirely because of its Content Negotiation abilities and the vision of the Multimodal Web.

I'd like to see some discussion on this by some people who know more about networking than I do.

tola – Sun, 2007 – 03 – 11 12:11

Why I would make the "Metaverse" a direct extension of the web

In answer to Bob Sutor's question "If we didn’t have web browsers as we do today and started today to do everything that you imagine [for a distributed 3D virtual world], what would you create to do all that?"

I would probably create something very much like Second Life and open source the server source code.

Anything anyone ever creates is based somehow on someone else's ideas (standing on the shoulders of giants and all that). If we didn't have the web but we had video games, I would start with an existing gaming engine. Then in the absence of a worldwide network of linked information resources, I would take the next best thing to existing technology, science fiction. I'd buy Snow Crash by Neal Stephenson and start writing network protocols and file formats!

I'd start by separating the storage of content, logic and presentation into different formats and come up with some kind of distributed TCP/IP streaming protocol with heavy compression.

I suspect that you're asking whether the web is really a suitable platform for all this, whether if we weren't stuck in the mind set of the existing world wide web we might come up with a better solution. Perhaps.

But if I was creating the web from scratch (but happened to benefit from the hindsight of all the great minds that came after me), I wouldn't use XML-like syntax for web pages, I would use something more efficient. I would try to make the DNS system more decentralised and URIs would be of the form http:uk.co.companyname.department/resource instead of http://department.companyname.co.uk/resource. I might make HTTP requests asynchronous, build comment spam protection and Denial of Service protection into the protocols of the web. However, I wouldn't necessarily attempt to make those changes now.

What's amazing about the web for me isn't that it's perfect technology that could not have been done better, it's that it's openness and adoption has made it almost ubiquitous in the world. Creating new protocols suited to new applications is definitely a good idea, but if the online 3D virtual world is to become as ubiquitous as the World Wide Web, we should learn from the lessons of how web technology was created and build on an already ubiquitous platform. Adoption of a well defined standard is more important than a perfect technology.

Another motivation behind making Stephenson's "Metaverse" a direct extension of the web is device independence. It's all very well creating a 3D virtual world which requires a large amount of processing to render, but what if I want to access the information on a small information appliance with little processing power? What if I live in a developing country and want to be able to access some information but only have a text based browser? What if I'm blind and can't see the virtual world and want to hear it instead? We need not carry over all the limitations of First Life into Second Life. I don't know about you, but I hate having to pay for physical objects and I love flying!

tola – Tue, 2007 – 02 – 27 12:24

3D Web

Synopsis

Browsing the world wide web as a three dimensional virtual world.

Rationale

Virtual online worlds like Second Life are like the AOL of the 3D web, they provide a walled garden in the virtual 3D world using proprietary software. Although the Second Life Client is now Open Source, the server software remains closed and only Linden Labs are able to run Second Life servers. Like AOL eventually had to open up user's access to the rest of the Internet, these innovative but proprietary solutions will eventually give way to a ubiquitous online space which is a direct extension of the web and uses web standards. Anyone will be able to host a 3D web server.

Features

3D Web Server

  • An existing HTTP server which serves 3D web pages to client requests with the relevent HTTP Accept header
  • Server side scripting
  • XML transformation if required
  • 3D web pages or "spaces" written in X3D and ECMAScript (with hyperlinks between spaces).
  • Web interface to chat server

Chat Server

  • Chat server, possibly using the XMMP protocol

Avatar Server

A special type of 3D web server which holds a person's 3D avatar. This could optionally include a distribute authentication mechanism like OpenID which identifies a user securely to others in a 3D space. When a user visits a 3D space, their avatar is served to that 3D world so that they appear to other users.

3D Web Browser

(Could be part of a Multimodal Web User Agent).

  • Rendering X3D
  • Executing ECMAScript
  • Sending HTTP Requests with the relevent Accept headers to ask for a 3D representation of a resource

Implementation

Web3D Consortium
Metaverse Roadmap

Related Blog Entries

 

Metaverse Roadmap, Convergence at CES

In a comment on my last blog post I mentioned that:

"I have a vision of something which basically *is* the web, but in 3D. In fact, I think the user should be able to choose how they wish to view a given web resource - in plain text, 2D shapes, 3D shapes, simulated speech etc. This can be done with content negotiation in HTTP. The same resource could be rendered by lots of different devices, from a light switch to a 3D headset."

I then found the Metaverse Roadmap, a "public ten-year forecast and visioning survey of 3D Web technologies". They have a wiki where you can input your thoughts. I was going to add my own vision statement about how the 3D web could just be one mode of interaction with a multimodal web (as mentioned above). I found this vision statement which is a similar idea:

"The world will be the metaverse. People often think of Stephenson’s metaverse as an “other” place, and the web as a window onto cyberspace, but as Paul Saffo and Mike Liebhold of Institute for the Future note, the best model for the metaverse of 2016 may be an information-drenched world, where the 3D web is just one particular instantiation. Mixed reality is likely to be the dominant user experience. You will use virtual worlds when they are an appropriate mode of interaction, but they are not your primary mode of communication – you have your chat, your email, your augmented reality, your 2D and 3D browser, etc. While people will continue to use online spaces and media centers for particularly high quality 3D content, the pervasiveness of information access and augmented reality will give world itself new layers of “metaverse-itivity.” The ubiquity of small, portable Sidekick-like and wearable devices will enable immediate access. Voice will be used for many basic queries, but text, even IM text, is private and unobtrusive, so it will not disappear."

Someone also mentions the need for a new type of browser which will allow us to access "all our 3D access through one piece of software" and mentioned that "Open standards will be particularly important for this". I've downloaded FreeWRL, the X3D renderer I want to use for Webscope

In other news...

It seems CES is all about convergence again this year with Apple's iPhone being announced alongside the Nokia N800, Apple TV and Windows Home Server. The iPhone was inevitable but it sure is pretty now it's here, very nice design touches like motion sensors and multi-touch screen that I didn't expect to see yet. Note the lack of 3G and the presence of WiFi. This is the kind of hardware we should be thinking about for future web software development.

tola – Tue, 2007 – 01 – 09 20:39

Second Life Client Open Sourced

In my fourth blog post of the day, Linden Labs has Open Sourced the client for Second Life in a blog post entitled Embracing the Inevitable.

Linden Labs always said that Open Sourcing the code was part of the long term plan, I remember an interview on LUGRadio a while back. It's a shame it's only the client and not the server-side code, but they say they are staying open minded about that. One step at a time.

My dream (as I described in March) would be a distributed system where anyone could set up their own server. It would use web standards and would just be like a collection of 3D web pages in X3D. It might be difficult to attain the same kind of user experience you get with Second Life, but it would be a great extension of the web.

Update: I've started a wiki page posing the question "What would be required to create a 3D web with a similar user experience to that of online virtual worlds like Second Life?". You can log in with username:iwontspam password:ipromise or start a new account. I'd value input.

tola – Mon, 2007 – 01 – 08 15:45

Device Independent Web Server

Synopsis

A web server which uses content negotiation to return different representations of a web resource depending on the HTTP Accept header passed from a user agent.

Multimodal Web User Agent

Codename

Scope

Synopsis

A multimodal user agent for the web. Scope aims to combine a multimodal web browser and media player into a single application which runs in full screen mode by default. It will have a very minimalistic appearance and be able to render formats such as SVG, VoiceXML, X3D and multimedia as well as the traditional XHTML/CSS/JavaScript. The format used to represent a web resource will be negotiated between the user agent and web server using content negotiation. The user agent is intended to run in full screen mode on an information appliance or replace a traditional desktop environment and window manager, but can be run as a traditional desktop application if desired.

Rationale

The web is often assumed to be a collection of web pages, but really it is a collection of resources identified by Universal Resource Identifiers (URIs), a web page is only one representation of an abstract resource. These resources are not limited to being represented by text and images you "browse" on your "desktop". You should be able to walk around the web, listen to it, watch it, have a conversation with it, interact with it and change it. You should be able to carry it in your pocket and hang it on your wall. "A computer terminal is not some clunky old television with a typewriter in front of it. It is an interface where the mind and body can connect with the universe and move bits of it about." -- Douglas Adams Scope aims to be a general purpose user agent which can render resources in many formats such as a vector image, voice synthesis, 3D environment and perhaps some kind of tactile interface in the future. The format used for representation can be negotiated with a web server using content negotiation.

Features

Rendering Engines

  • Content & Structure
    • XHTML - Structured text (Gecko)
    • SVG - Vector image
    • VoiceXML - Voice synthesis
    • X3D - 3D environment
    • Binary Enclosures
      • PNG & JPEG - Image
      • MP3, OGG & FLAC - Sound
      • MPEG - Video
  • Layout & Style
    • CSS - layout and style for the above
  • Logic
    • JavaScript - logic for the above

Widgets

  • Back button
  • Forward button
  • Home button
  • Graphical address bar
  • Tabbed browsing
  • Command box/suggestion menu/progress bar combined
  • Clock
  • Power button
  • Error and notification message stripes

Content Negotiation

Scope will implement under-used parts of the HTTP specification for content negotiation to negotiate a resource representation format based on user preferences, abilities, environment or usage scenario. This feature will require server side logic on the web server where a resource may be transformed from a base format to the user's preferred format using XSLT or simply be stored in multiple formats. An implementation of XSLT inside the client itself for client-side transformation may also prove useful.

Natural language command line

Before graphical user interfaces computers were interacted with using a "command line", a series of text based commands of a strict structure where individual commands had to be memorised. Scope will include an experimental "natural command line" which could be represented as a text box or used in combination with speech recognition to enable a user to give commands to software in natural language. Example commands would be "email jack", "turn off the lounge light" or "play some classical music". The commands are input into the user agent which may require additional processing such as speech recognition. The command could then be passed as a string of text over HTTP with a URI such as: http://example.uk.home/?q=command goes here It is then up to a server side application to interpret this command and execute an action as a result.

Use Cases

Although the initial development will be for common desktop operating systems like Windows, Mac and GNU/Linux the primary intended platform is a new breed of "information appliances". This will include portable devices, television-like devices, touchscreens and a lot of other obscure hardware. Use cases include:

  • Voice interface for handsfree operation or for visually impaired users
  • The use of a 3D headset and hand sensors to navigate a 3D environment
  • Full screen video playback
  • Multi-modal interaction with a combination of web pages and voice
  • A screen on your fridge or TV
  • A media player on a small portable device

Implementation

See Webtop, previously Webscope

Syndicate content