From real cities to virtual worlds using an open modular

From real cities to virtual worlds using an
open modular architecture
Sergiy Byelozyorov, Rainer Jochem,
Vincent Pegoraro & Philipp Slusallek
The Visual Computer
International Journal of Computer
Graphics
ISSN 0178-2789
Vis Comput
DOI 10.1007/s00371-012-0717-9
1 23
Your article is protected by copyright and
all rights are held exclusively by SpringerVerlag. This e-offprint is for personal use only
and shall not be self-archived in electronic
repositories. If you wish to self-archive your
work, please use the accepted author’s
version for posting to your own website or
your institution’s repository. You may further
deposit the accepted author’s version on a
funder’s repository at a funder’s request,
provided it is not made publicly available until
12 months after publication.
1 23
Author's personal copy
Vis Comput
DOI 10.1007/s00371-012-0717-9
O R I G I N A L A RT I C L E
From real cities to virtual worlds using an open modular
architecture
Sergiy Byelozyorov · Rainer Jochem ·
Vincent Pegoraro · Philipp Slusallek
© Springer-Verlag 2012
Abstract The technologies for the Web and virtual worlds
are currently converging, but although there are some efforts
made to integrate them with each other, they typically rely
on technologies foreign to most Web developers. In this paper, we present a new open architecture that combines several emerging and established technologies to provide convenient tools for developing virtual worlds directly in the
Web. These technologies are easy to learn and understand
by the Web community and allow for quick prototyping.
Overall the modular architecture allows virtual worlds to be
developed more quickly and more widely deployed. Additionally, we demonstrate that creating an adequate virtual
environment can be an easy task when applying the principles of crowd-sourcing. We present an application that uses
one of the largest available open data sources of geospatial
information to bring 3D cities from the real world into the
virtual environment.
Keywords Virtual worlds · The Web · Open architecture ·
Geographic information system
S. Byelozyorov ()
Saarland University, Saarbrücken, Germany
e-mail: [email protected]
R. Jochem
DFKI, Saarbrücken, Germany
e-mail: [email protected]
V. Pegoraro
Saarland University and MMCI, Saarbrücken, Germany
e-mail: [email protected]
P. Slusallek
Saarland University and DFKI, Saarbrücken, Germany
e-mail: [email protected]
1 Introduction
The developments in the Web technology made over the last
decade have led to a significant impact not only on businesses, but also on society as people from different backgrounds and cultures communicate with each other using
Web-based social networks, such as Facebook and Twitter,
thus increasingly merging the real world with a new borderless digital space. The Web has become so significant that
the Internet itself has become a synonym for it.
At the same time, we have seen a tremendous development in graphics in the last decade. Modern hardware is able
to display highly realistic environments in real time, making
games and other applications an immersive and entertaining experience. The general increase in the network bandwidth has allowed game companies to create a new type of
games, massive multiplayer online games, where large number of players can play together in a single shared virtual
world.
We are currently at the point where these two developments can be combined by creating next-generation virtual
worlds that are fully integrated into the Web. In the same
way social networks have become a daily part of our society, we expect such shared environments to create another
way of interaction and communication on the Internet. However, there is currently no widely accepted and easy way to
integrate virtual worlds into the Web.
In this paper, we propose an open modular architecture
based on a combination of emerging technologies, such as
XML3D [22], Xflow [9], and Sirikata [6], as well as established and widely available Web components. This combination has a number of advantages:
Firstly, it makes the integration of virtual worlds in the
browser much simpler for Web developers, as no special
knowledge in the areas of networks, graphics, or system ar-
Author's personal copy
S. Byelozyorov et al.
(Map data ©OpenStreetMap contributors, CC-BY-SA)
Fig. 1 An example of a virtual world running in the browser using
our architecture. On the left, our modified Chromium browser is run on
Windows and uses rasterization to render the scene, while on the right,
the same browser running on Ubuntu Linux uses ray-tracing for more
realistic rendering (e.g., correct shadows)
chitecture is required, other than Web technologies. In addition, developers can quickly prototype their applications using high-level concepts offered by the underlying technologies.
The approach is highly flexible due to the modular architecture, which allows replacing components easily. In the
Web community, this approach is known as mashup, where
a developer combines data and functionalities from two or
more sources to create new services and applications.
Wider and easier deployment is achieved as users do not
have to install additional specialized software, but instead
use the browser that is typically shipped preinstalled on their
computer. An illustration of the world running in different
operating systems and using different rendering algorithms
is shown in Fig. 1.
Finally, an important property of such design is that it is
an open system, i.e., it is based on open-source code, open
protocols, and an open specification. The development of
the Web has shown that open systems are more successful
than the ones based on closed or proprietary technologies,
thanks to a more dynamic response to user feedback via
typically shorter integration cycles, and their sustainability
beyond the life expectancy of a particular contributing entity.
As we have shown earlier [2], such an open modular architecture provides an effective framework for interactively
exploring virtual worlds and interacting with other users in
it. In order to make it a truly end-to-end platform, this paper
additionally shows how the framework can be extended to
support the creation of the virtual environments themselves.
Regarding the data available in today’s Web 2.0 with its social media and crowd-sourcing movements, one can observe
that there is a large amount of information with meaningful
spatial data about the real world already available: people
map the world on OpenStreetMap [15], publish their photographs along with Global Positioning System (GPS) coordinates, send twitter information from specific locations,
or publish their movements and “check-in” at real-world
venues on the Web. Using the example of spatial data from
OpenStreetMap, we show that the integration of real-world
data into virtual worlds is easily possible.
We describe our approach in the following way. Section 2
gives an overview of the related work and motivates our
choice of technologies. The selected technologies are detailed in Sect. 3, which also presents the prototype library
that we have developed. Potential applications of this library
are then illustrated in Sect. 4, one of which is creating real
cities in the virtual world as shown in Sect. 5. Limitations
and future work are discussed in Sect. 6.
2 Related work
In this section, we discuss the components needed to integrate virtual worlds into the Web and provide an overview
of existing solutions for each of them: 3D graphics technologies, 3D animation techniques, networking components for
the communication with the virtual world platform, and the
use of data from the OpenStreetMap project.
Author's personal copy
From real cities to virtual worlds using an open modular architecture
2.1 3D graphics
This subsection reviews some technologies for embedding
3D graphics into Web pages and analyses their pros and
cons.
One of the approaches used to integrate graphics into the
Web is known as X3DOM [1]. This technology is developed
in the IGD Fraunhofer institute and is an adaptation of the
ISO-standardized Extensible 3D Graphics (X3D) [32] file
format to HyperText Markup Language 5 (HTML-5) [27].
The advantage of this technology is that it is based on the existing standard and thus is backward compatible to existing
content. However X3D, which is the successor of the Virtual Reality Modeling Language (VRML) [24], was developed with different underlying technologies than the Web.
As such X3D has to bridge between backward compatibility
and seamless integration into HTML and the Document Object Model (DOM) [26]. These differences lead to a number
of inconsistencies in key areas such as events, scene graph
structure, scripting, material models, use of Cascading Style
Sheets (CSS) [27], as well as others.
An alternative technology for 3D graphics available in
modern browsers is WebGL [8], developed by the Khronos
Group providing direct JavaScript bindings to the OpenGL
ES 2.0 API [5]. It is already implemented by most browser
vendors. WebGL allows for delegating heavy processing to
the parallel graphics hardware available on most consumer
PCs, such that the central processing unit may perform other
tasks. On the other hand, WebGL does not define declarative
approach to represent 3D content similar to HTML. Instead,
a JavaScript-based scene graph library must be used, which
is inconsistent with other parts of a Web document and limits
performance.
Google has tried to address this problem by providing a
fast scene graph JavaScript API in their project O3D [4].
While this solved the performance issue and allowed for
larger scene sizes, it is still unintuitive for the web developers to keep the content separate from the DOM. The project
has been discontinued.
Finally XML3D, which is developed at Saarland University together with the German Centre for Artificial Intelligence (DFKI) and the Intel Visual Computing Institute, defines a format for encoding 3D graphics as an extension to
HTML. However, unlike X3D, it is free to adopt the best and
most widely available Web technologies familiar to Web developers. The XML3D specification defines a 3D document
as part of the DOM model in the browser. It reuses ideas
from Scalable Vector Graphics (SVG) [29], HTML-5, CSS,
and more. Elements that encode different aspects of the 3D
scene (such as geometry, material, lights, etc.) provide interfaces to JavaScript that allow an easy and real-time modification of the document. XML3D is efficient because it uses
vertex arrays to natively encode geometry, which can be directly mapped to the hardware. This technology is the most
adequate for our needs, and we present it in more detail in
Sect. 3.1.
2.2 Animation
Without animation, a virtual world is a monotonous and
static environment, while we would like to use rich and immersive environments. Additionally, we need to be able to
efficiently synchronize animations over the network using
as little bandwidth as possible. Therefore we need an easy,
intuitive, and efficient way to encode animations.
The most straightforward way to animate an object is to
use a script that regularly changes the transformations of objects and vertices. This is also known as DOM scripting and
typically uses JavaScript. While this method, being familiar to Web developers from their experience, is intuitive and
easy to learn, it is highly inefficient when it comes to complex animations. In order to create a smooth animation of
an avatar character, a script would need to move thousands
of vertices many times per second. This would typically require a substantial amount of code and expensive hardware
to have a smooth realistic animation.
We can solve this problem by moving such modifications
to parallel hardware by implementing them as vertex shaders
that are executed on the Graphics Processing Unit (GPU)—
and thus are much more efficient. However, at this point,
it becomes highly unintuitive for the web developers who
are typically not familiar with shading languages. Learning
them requires deep knowledge of graphics and GPU programming, which means that only experts will be able to
create such animations.
Instead of writing scripts, we can describe animations
declaratively as an extension to HTML. Such an approach
was suggested by the Synchronized Multimedia Integration
Language (SMIL). However, SMIL does not fit well into
the DOM concepts, and its elements are not very well supported by authoring tools. Instead, most Web applications
use the aforementioned DOM scripting capabilities for dynamic content.
An alternative is Xflow, which is an extension to XML3D
and defines a data flow declaratively in the document. The
main idea behind it is that we can represent animations as
graphs where only a few data elements are changed, such
as bone positions, but influence many more data elements,
such as the final mesh vertices of a character. With this
approach, a programmer defines a flow in the document
and only changes a few parameters in JavaScript that define the animation, while the underlying system propagates
data through the Xflow graph and creates a smooth animation on the screen. Moreover, in order to synchronize such
an animation, we will need to send the graph configuration
once and only send updates to the initial data elements later.
Animations will be automatically computed on each client
Author's personal copy
S. Byelozyorov et al.
locally. Due to these advantages, we have selected it as a basis for describing animations in the virtual worlds. A more
detailed description can be found in Sect. 3.2.
2.3 Networking
Aside from 3D graphics and animations, we need a real-time
networking technology to create a shared virtual world. An
established and convenient way of transferring additional
data to the web application after the page has already been
loaded is Ajax [31], which is a technology based on asynchronous requests to the server. This approach has shown its
reliability and usability in thousands of the rich Web 2.0 applications. It is, however, designed for casual data exchange,
rather than for continuous data streams that are present in applications like virtual worlds. The communication overhead
created for each request can easily exceed the transferred
data in size, thus making this technology inefficient in terms
of the network load.
An emerging technology that was introduced in the upcoming HTML-5 standard is known as WebSockets [30].
Similarly to WebGL, it is implemented in most major
browsers and supports continuous data streams with only
2 bytes of overhead for each message. In addition, it can
provide similar functionality in the business environments,
where all ports other than 80 are blocked, by multiplexing several WebSocket connections over a single Hypertext
Transfer Protocol (HTTP) [28] connection. This is important, since we would like virtual worlds to be available for
everyone despite the limitations created by local firewalls or
network policies. These benefits and tight integration with
existing Web technologies make it the most suitable networking technology for virtual worlds on the Web.
2.4 Virtual world platform
In addition to the client components, we need a virtual world
platform to provide the data and synchronize the state of the
world. This subsection presents a number of existing and
emerging technologies in the area.
Many popular online games, such as World Of Warcraft,
Eve Online, EverQuest, and many others, are typically based
on a proprietary server and client software. While being extremely immersive and realistic, they typically do not allow
users to create content, because such content has to be specially designed for the game’s rendering engine. This type of
platforms is not suited for the Web, since they do not allow
interoperability, require special client software per world,
and do not allow users to change the world.
One of the first virtual worlds that allowed to create user
content and has gained public attention is Second Life [10]
in 2003. This platform features an internal market, where
users can trade their content with each other, opening opportunities for businesses. Second Life is developed by Linden
Labs and is run on their private servers. Unfortunately, the
system is closed and not designed for interoperability with
other worlds, which is an essential part of the Web. Other
systems that have similar limitations include Blue Mars,
Twinity, and HiPiHi.
Open Simulator [14] is a project that provides open world
server software that is compatible with the Second Life
client. The implementation inherits Second Life’s server architecture, known as the grid, but also extends it to a hypergrid mode, where external worlds can be connected together.
In the grid and hypergrid architecture, the world is divided
into fixed-sized square regions, each being supported by a
single server only. Moreover, when the server is close to its
maximum capacity, the users are prohibited to enter the region, which sets an upper bound for the number of users that
can visit one region at a time. This creates an issue when an
event of interest, e.g., a virtual concert, is happening in a
given region and many users are trying to visit it. Servers
have to be designed with excessive capacity, which is used
at a fraction of its potential most of the time.
The Open Cobalt [13] platform offers a fully decentralized open-source world browser and toolkit, which has embedded tools for collaborative research and education. It is
based on the Squeak programming language, which allows
modifying the program at runtime, in particular, for extending the world functionality when discovering new object
types by sharing their implementation code. Unfortunately,
this solution has security limitations as such code can be arbitrary and thus potentially malicious. Additionally, the current architecture requires each peer to store the entire world
and run all downloaded scripts locally, which is not scalable
to large worlds.
Red Dwarf [21], a successor to Project Darkstar originally developed by Sun, is focusing their architecture on
scalability and reuse of resources. With their approach, the
simulation of the world regions does not map directly to the
servers, but is distributed dynamically depending on the load
of each region. This allows resources to be used more efficiently and decreases the cost of running a world. This provides a good solution for a world that is run by a single
company. However, in the Web, different companies typically run their web-sites independently and do not share
their hardware due to security issues.
Project Multiverse [11] offers an architecture that is designed for games. The world is separated into zones, and
zones into regions. When switching between zones, a user
has to wait until the other zone is loaded. However, moving
between the regions is seamless as loading of adjacent regions starts in advance when the user is close to the region
boundary. Additionally, there are instances that are similar
to zones, because they also require a loading screen, but apply when entering a certain closed structure, e.g., a house or
a cave. This can be used, for example, to create spaces that
Author's personal copy
From real cities to virtual worlds using an open modular architecture
are larger than their outer apparent size. Unfortunately all
these features are very game-specific and do not fit into the
generic definition of the virtual worlds that we would like to
see on the Web.
Distributed Scene Graph [7] was suggested by Intel as a
solution for scalable virtual environments. In this approach
the scene graph is decoupled from other components, such
as client management, physical simulation, script processing
or scene persistence, each of which is executed in a separate
server process. Scene graph itself is split into partitions that
are also executed in different server processes. This allows
one to dynamically distribute the load depending on the current situation in the world. For example, if at one place there
is a massive event which many users are attending, then it
makes sense to dynamically allocate one server to perform
user management tasks for that particular place and maybe
just one more server to handle users in all other parts of the
world. This allows one to efficiently reuse the hardware and
thus decrease the cost of running a virtual world. While this
architecture is very promising, we have chosen not to use it
as it is not mature enough yet, i.e., animations, interaction,
and some other traditional features of a rich 3D environment
are not implemented yet.
The Sirikata project, which is developed at Stanford University, offers an architecture that separates network nodes
into space servers and object hosts, where space servers
are responsible for managing a certain constrained space in
the shared world, while object hosts run the objects within
spaces. This approach offers a better security model, is
scalable to a large number of players, and offers a federated world, where parts of it can be controlled by different organizations. Additionally, Stanford developers provide
a ready-to-use implementation of the server and client libraries which help developing new Web clients. Moreover,
they support WebSockets that we have identified as a technology that we would like to use for networking. Based on
these assets, Sirikata appears as the most appropriate platform for our needs. The detailed platform architecture is presented in Sect. 3.3.
2.5 Geographical datasets
Creating a copy of the real city requires access to its associated geoinformation. Usually, this sort of spatial data is provided by either surveying offices, often imposing restrictions
that prohibit a free usage, or commercial data providers,
again requiring license fees. This difficulty led to the creation of OpenStreetMap (OSM) in 2004, which has since
then become the central resource for crowdsourced geospatial data of the entire world and therefore forms a promising
source for our needs.
OSM is a huge, rather unstructured, but very detailed,
and mainly 2D geodatabase, being maintained by roughly
500,000 users world-wide. Initially, the map data was only
collected by volunteers using handheld GPS devices in order to record roads that were then uploaded into the OSM
database. Since then, public domain datasets have been integrated, and commercial and governmental data providers
have also either opened their datasets in order to, e.g., use
aerial imagery to map features, or have directly released
their data under appropriate licenses to have them included
in the OSM databases.
Today, the total number of nodes exceeds 1.3 billion
including approximately 115 million streets and approximately 48 million buildings [18, 19]. The entire planet.osm
dataset is 250 GB (16 GB compressed) in size. It is licensed
under the Creative Commons Attribution-Share Alike 2.0
License but is currently in the process of relicensing under
the Open Database License. Recent research [12] also shows
that the quality of the German street network data in OSM is
expected to be comparable to proprietary data sets by mid2012.
The data collected by the OSM community consists not
only of the street networks, but also of points of interest
(POI), landuse areas, and building footprints. For buildings,
in some areas there is also height information available,
which can then be used to extrude simple box-shaped building models out of the footprints. If no height information is
available, reasonable defaults can be used in order to create a somewhat realistic impression. Together with information from additional tags associated with the individual features, one can easily place the appropriate 3D geometry on
a POI, e.g., for trees, bus stops, etc., or try to apply generic
textures for certain building types, e.g., restaurants, shops,
pharmacies, etc. This makes OpenStreetMap a very interesting data source for creating a virtual copy of the real
world. Nevertheless, as the system described in Sect. 5.1
is designed using common standards, it is not restricted to
OSM data and other free or non-free datasets could be used
as well.
There already exists some effort [17] on generating 3D
worlds out of OSM data, the most advanced of these projects
being OSM-3D [16] where the entire globe can be displayed
on the web using a Java3D applet. For the scope of this work,
we will use our own approach based on XML3D for visualization as described in Sect. 5.1.
3 System architecture
This section first describes the XML3D technology, which
enables interactive graphics in the Web browser, and subsequently discusses an extension to XML3D for animations,
namely Xflow. We then provide an overview of the Sirikata
virtual world platform before describing the prototype library that we have created to seamlessly integrate all of the
above components into a flexible framework.
Author's personal copy
S. Byelozyorov et al.
3.1 XML3D
XML3D [22] is based on XML and can be used as a
standalone document or embedded into eXtensible HyperText Markup Language (XHTML), using a separate
XML namespace. Being inside an XHTML document, certain parts of the HTML specification such as <img> and
<video> elements can be directly reused in XML3D as
well (e.g., for textures).
The root of an XML3D scene is declared by the
<xml3d> element, which creates a 3D canvas at the place
of declaration. This canvas can be formatted via built-in
browser CSS properties, e.g., borders, size, and background.
The descendants of the <xml3d> element define the content of the scene by mapping a scene graph onto the DOM
tree. Typically in the scene graph, replicated content is
stored only once and instantiated many times to save memory. Consequently, similarly to SVG, we introduce a definition subtree starting with a <defs> element, which includes all the content resources that may be used in the scene
but are not rendered directly (e.g., geometry, shaders, vertex
arrays, etc.). Its content is referenced by other parts of the
scene, such as <mesh> elements that instantiate referenced
geometry.
In 3D graphics, shading can be highly complex with
many different shaders, each having their individual parameters. CSS cannot provide this as it only allows for a fixed set
of predefined properties. Instead, a user can declare a shader
with parameters using a <shader> element. The code for
the shader can be declared using a <script> element,
which is reused from HTML, referenced by the shader.
In HTML documents, a CSS style defines the size, position, and appearance of a box-shaped layout element.
XML3D translates this concept also to 3D content, where
it describes the transformations and appearance properties
of <mesh> elements. Transformations and shaders can be
assigned to XML3D elements using CSS, in which case all
children of the node are affected. For the description of 3D
transformations, the CSS 3D Transforms [25] specification
can also be used.
As XML3D is designed to be a thin additional layer on
top of existing Web technology, its functionality is designed
to be generic, orthogonal to other technologies, and is kept
to the minimum needed to display and interact with 3D content. For example, XML3D is a part of the DOM, which
already provides scripting and event services—most additional functionalities can usually be implemented through
scripts instead of new special-purpose elements.
Harnessing JavaScript’s built-in timer functions, it is easy
to create animations by performing changes of the DOM
over time. Generic DOM event listeners reacting to mouse
movements and clicks can be connected directly to individual scene elements. XML3D supports attaching event handlers statically via an element’s attribute, e.g., “onclick” or
“onmousemove,” or dynamically via standard DOM methods like addEventListener.
While the event system described by the DOM events
specification of W3C is also applicable to 3D graphics, the
predefined event types need to be extended to provide additional information that might be needed for certain 3D applications. For instance, specialized 3D mouse events should
also provide information, like global-world coordinates and
surface normal.
The presented design has minimum additions needed to
introduce 3D graphics to the Web and reuses many design
ideas from HTML, CSS, SVG, JavaScript, and DOM. Deep
integration with widely adopted technologies allows one to
easily combine 3D and 2D content and makes it easier for
web developers to extend existing Web applications with
novel 3D features.
XML3D is currently implemented in both a modified version of Firefox, called RTfox, and a modified version of
Chromium which, similarly to Safari and other open-source
browsers, is based on the WebKit rendering engine. XML3D
is scheduled to support a portable shading language extension called AnySL in the near future.
3.2 Xflow
In its original specification, XML3D lacks mesh processing
capabilities comparable to vertex shaders supporting flexible animations. Even though it is possible to modify the
mesh data from JavaScript, it is not as efficient for large geometries. Xflow is an extension to XML3D that provides a
solution to this issue. It has all the necessary tools to define
a data flow graph declaratively in the document, which can
be reused to effectively generate complex animations.
A core element that represents a node in the flow graph
is the <data> element, which can contain raw data and
other <data> elements. Xflow encodes raw data via array
elements based on the data types they describe: <float>,
<float2>, <float3>, <int>. Each array element can
have an attribute “name” that associates semantics for the
data, e.g., position, normal, or weight. Data elements can
be referenced by <data>, <mesh>, and <shader> elements, where the last two function as data sinks. Figure 2
illustrates a flow graph.
The flow begins with source <data> elements that contain only raw data and propagates this data down to other
elements over references. Eventually, we plan to also support sensors as <data>-sources, which is particularly interesting on mobile devices, for camera, etc. Some <data>
nodes may have scripts attached to them, which provide processing capabilities to the node. Xflow defines a number of
built-in scripts that define basic processing tasks, such as
generating displacement, UV mapping, simple geometry or
noise functions, as well as more complex tasks, such as skinning, morphing, tessellation, forward kinematics, etc. When
Author's personal copy
From real cities to virtual worlds using an open modular architecture
Fig. 2 In Xflow, a flow describes how the data from the source elements is transformed by processing elements, which have attached
scripts, before arriving at the sink elements, such as a mesh or a shader.
This allows creating highly-realistic and fast animations which can be
mapped directly to hardware
custom processing is needed, the user can define their own
scripts using an associated <script> element.
Animations are typically driven by one or a few simple
parameters in the source <data> elements such as the morphing weight or skinning matrices. These parameters can be
changed dynamically using JavaScript triggering the evaluation of the rest of the flow using highly efficient algorithms
that can be directly mapped to the OpenGL vertex shaders
or OpenCL code to be executed on a GPU or on the CPU
or both, whichever offers better overall performance. While
Xflow introduces a new animation concept for web developers, it frees them from learning shader languages and provides an easy and intuitive way to encode complicated fast
animations.
Similarly to animations, another key component required
for interactive 3D scenes is physics simulation. In the past
adding physics properties required major changes to the
scene graph, while in [23] we present a declarative annotation framework that nicely separates the generic 3D scene
description from its physics model. As physical properties
are orthogonal to 3D geometry, we offer this framework
as an extension to XML3D and have implemented it as a
browser plug-in.
3.3 Sirikata
Sirikata is an open-source virtual world platform developed
at Stanford University. The goal of the platform is to provide
a set of open libraries and protocols that can be used to easily deploy a virtual world. The platform design is based on
five core principles: scalability, security, federation, extensibility, and flexibility.
Security has played an important role in the Internet and
virtual worlds, which use publicly available servers enabling
many different kinds of attacks. It is important to consider
Fig. 3 Sirikata separates the virtual world into three concepts: space
simulation, object simulation, and content delivery. Space servers handle communication between objects and synchronize frequently changing data, such as position. Object hosts, which could be clients or hosting servers, store full objects’ data and run the scripts associated with
them. Finally, the content delivery network offloads the bulk content
that does not frequently change from the communication link to the
space server, allowing for greater scalability
the security from the start, and that is why Sirikata has outlined this as one of the design principles.
Federation is a general principle that is underlying the
Web architecture. On the Internet, many different companies control Web pages, but no single company controls the
entire network. Instead, these companies have established a
set of standard protocols that allow for interoperability. Similar concepts can be applied to virtual worlds, where parts of
a big seamless world can be controlled by different entities.
The Stanford team has designed their platform to support
extensible virtual worlds, which means that users and developers should be able to extend the world by creating their
own content, scripts, or even plugins that can offer the whole
world new features.
Flexibility, on the other hand, allows their platform to be
used in different scenarios. As on the Web, where different
Web sites can be online shops, social networks, news, search
engines, etc., developers should be able to create different
kinds of worlds: online games, collaborative worlds, virtual
museums, recreations of the real world, and others.
All of these principles are reflected in Sirikata’s system
architecture by splitting the functionality of the platform
into three concepts: space management, object simulation,
and content delivery, as shown in Fig. 3. This is different
from the traditional approach, where all the objects with
their scripts and data are simulated on a single server or a
cluster. Instead, Sirikata separates space simulation on space
servers—such as discovery of nearby objects, position synchronization, world physics, etc.—from object simulation
on object hosts, such as running scripts that define the behavior of the objects.
This separation is important, because it allows implementing two of the aforementioned principles. Object hosts
can be run by different companies on servers that are con-
Author's personal copy
S. Byelozyorov et al.
trolled by each company respectively, effectively allowing
them to interoperate through the space server, thus creating a
federation. Additionally, it allows one to implement a better
security model, because the scripts and full object data are
controlled by their owner and do not have to be uploaded to
the space server or object hosts that are controlled by other
organizations.
Additionally, Sirikata suggests using an independent
content delivery network, which synchronizes data that is
mostly static or changes rarely, e.g., model meshes or textures. This allows offloading the network load from the
space server and provides it with more resources for communication of the frequently updated information, e.g., object
position and orientation. This design allows a space server
to handle many more objects in the world. Furthermore, a
given space server can connect to other space servers that
simulate other parts of the world, thus effectively increasing
scalability for space simulations.
The space server and object host implementation that is
provided by Sirikata is designed as a set of modules connected to the core that only implements very basic functionalities that are required to be supported by every participant
in the world. By adding to or extending existing modules
in the system, developers can provide new features to the
world or enhance existing ones, therefore providing substantial flexibility. The object host to space server communication protocol allows one to register objects dynamically, thus
enabling users to create content at runtime, which makes the
world extensible.
Based on the five outlined principles, the Sirikata platform is a good fit for virtual worlds on the Web.
3.4 Interconnect prototype library
In order to facilitate the development process, we have implemented a prototype library that can be used to embed virtual worlds into the browser. The library is an extension of
KataJS, which is a JavaScript library created by Sirikata developers, to which we have added XML3D and Xflow support. There are several advantages in using KataJS apart
from the fact that it was initially designed to work with
Sirikata.
Firstly, KataJS implements the WebSockets protocol for
communication with the Sirikata space server. The library is
optimized for use on modern multicore architecture because
it uses Web Workers [33], which is part of the upcoming
HTML-5 standard. Also it has a modular structure, allowing
one to easily replace components and parts of the system,
which is an important feature that allows one to quickly integrate latest research results.
In particular, each module has a communication protocol to other modules, which defines its interface but leaves
the programmer a choice for the implementation. The welldefined abstraction for the graphics subsystem allowed us to
integrate the XML3D concepts into the system quickly and
easily. The current implementation of the XML3D module
allows loading an initial world that contains static unmodifiable parts of the world not to be synchronized by Sirikata,
e.g., terrain and trees. Additionally, it allows new objects
to be dynamically created when information about them arrives from the space server. Such information is first received by the networking module and then translated into
a message for the graphic subsystem. When receiving such
a message, the XML3D module loads the associated geometric data (mesh) from the content delivery network, and
once this data is loaded and parsed, a new object is added
to the scene, which appears on the user’s screen. Note that
loading of the mesh is asynchronous, so it does not block the
XML3D module from processing other messages.
Apart from geometry data, a mesh can also contain
named animations. The animations are controlled by messages to the graphics subsystem, containing the name and
parameters of the animation to be played. Each object can
have its own mesh and animations specific to the object’s
structure. For instance, despite the fact that KataJS sends
the same walking animation message to every avatar, they
can behave differently, i.e., a human would walk using two
legs, while a tiger would use all four legs, and a car would
rotate its wheels.
In order to integrate this system with Xflow, we have
added custom XML nodes that map animation names to the
parameters that need to be regularly changed at the source
nodes of the flow graph. While these nodes are not essential,
they represent a convenient way to save the animation state
for each avatar. When the animation message from KataJS
arrives to the XML3D module, a simple script is triggered
that uses the information from these nodes to drive the animation.
4 Applications
In order to illustrate the previously outlined benefits of
our system, we have extended KataSpace, a demo virtual
world application that uses the KataJS library, created by
the Sirikata developers.
Thanks to its flexibility, our prototype library allows developers to quickly create new features or functionalities
needed by a specific application. For instance, let us assume that one wishes to increase the speed of the avatar
when the shift key is hold. As a default animation message
does not have a parameter to control the animation speed,
the latter would remain unchanged, resulting in an artificial
look. However, by simply creating a new message and implementing its handling in the XML3D module, the animations may then be easily run at different speeds. Moreover,
by exploiting the flexibility of Xflow, it is also possible to
Author's personal copy
From real cities to virtual worlds using an open modular architecture
Fig. 4 (left) An example feature illustrating how quickly developers
can introduce new functionalities into the system. An interface allows
overriding existing or newly created animations for the avatar by selecting them from the library and setting additional parameters. In this
case, we have mapped the jumping animation to the idle state in which
the avatar would typically stay still. (right) A browser’s built-in inspector may be used to modify the scene in a running application, which is
a convenient tool for content creators. This screenshot shows how the
impact of changing a given color value in the shader is immediately
visible on the screen (avatar’s shirt)
add another message that allows creating new animations for
the loaded avatar. The user is then able to select an animation from the animation library, set a number of parameters,
such as starting frame, length, and whether the animation
should loop, and map it to the animation name at run-time,
as shown in Fig. 4 (left). This functionality would be nontrivial to achieve in the majority of the existing virtual world
architectures, while using our system, the feature was implemented only in a couple of hours.
Moreover, the proposed platform exposes the full set of
debugging tools provided by modern browsers, allowing one
to create virtual worlds more quickly and easily. We greatly
benefited from this feature when designing the prototype library and application, for instance, in order to determine the
best approach for animation. These tools can also be used
by designers to create virtual world content in a similar way
as web developers design web pages—instead of changing
the source code and reloading the web page many times, one
can modify some parameters in a running browser and once
satisfied with the changes, sync them back with the original
code. An example of such modification is presented in Fig. 4
(right).
In addition, the modularity of the architecture provides
a great flexibility to the overall system, allowing the substitution of individual components so as to better match the
needs of a specific application. On a larger scale, even the
virtual world platform itself can be transparently substituted
if features beyond those provided by Sirikata are required,
without the need to change graphics, scripting or any other
component (see Fig. 5).
5 Real world in a virtual environment
In this section, we describe the approach used to generate
real 3D cities in a virtual environment. The first subsection
Fig. 5 Illustration of the networking part of our prototype library.
The WebSocket functionality can be replaced with some other socket,
which can be implemented as a plugin to the browser. As long as new
socket class provides the same interface as WebSocket, there is no need
to replace the protocol implementation or any other components. However, should one need other features than those provided by Sirikata,
the protocol implementation itself may also be seamlessly replaced in
order to connect to another virtual world platform
gives an overview of the XML3DMapServer, which is a system we created to generate XML3D from OpenStreetMap
data, while the second subsection covers the integration aspects of this system into our prototype library.
5.1 XML3DMapServer
The OpenStreetMap datasets are publicly available, either in
the form of the planet.osm dataset that contains the complete
OSM data, or in the form of various extracts for continents,
states, or other areas, provided through a number of mirrors.
While the planet.osm dataset is updated weekly, the update
frequency of these third party extracts differs from mirror
to mirror. For the purpose of this work, we used extracts
of several German states provided by geofabrik.de [3]. The
data is provided in the OSM XML file format and is imported into a PostGIS database, the de-facto standard opensource Geographic Information System (GIS) database, using open-source tools like osmosis [20]. PostGIS implements the specified standard for Simple Features for SQL
as defined by the OpenGeospatialConsortium (OGC), which
allows us to work on a more standard conformant interface that can also hold other GIS data, like the ones provided through governmental cadastral data. Also, PostGIS
provides a number of spatial functions and operations in order to process the different forms of point, line, or polygon
geometries.
One of the main design concepts is the use of layers and
map tiles in order to be able to serve 3D geometry in an efficient and flexible way. Splitting the world into several layers
covering different content is a common approach in a GIS.
We provide layers for 3D buildings, different forms of landuse, and different types of POI, which allows our users to
pick only the information they are interested in. The individual layers are predefined in the central database, and the
relevant features of the OSM data are mapped into the cor-
Author's personal copy
S. Byelozyorov et al.
responding layer based in their attributes. For some subsets,
preprocessing is necessary in order to filter, join, or clean-up
the data.
The tiling mechanism is based on Google’s map tiling
system as shown in Fig. 6: a square map of the earth in Mercartor projection is split again into square regions. Initially,
a minified map forms the source tile. It is assigned the coordinates (x, y, z) = (0, 0, 0), where x corresponds to the
position of the tile from west to east, 0 starting at the position of 180◦ W. y consequently goes from north to south,
where 0 is located at 90◦ N. z indicates the zoom factor,
where z = 1 indicates that the initial map is split into four
regions, and so on. With this scheme, every resulting tile
can be identified by a unique triple of numbers. Currently,
the maximum defined zoom level is 21, which results in a
total number of n = 421 ≈ 4, 398 × 1012 tiles. This system
allows for generating a variety of tiles with different resolutions for various purposes. In fact, such a scheme is rather
Fig. 6 Google map tiles at zoom level 1: A square world map is
split into four sub-regions. The resulting tile coordinates (x, y, z) are
(0, 0, 1) at the upper left, (1, 0, 1) at the upper right, (0, 1, 1) at the
lower left, and (1, 1, 1) at the lower right. Subsequent zoom levels
z > 1 will again split each region into four tiles following the same
scheme. (Source: maptiler.org)
Fig. 7 Illustration of the
architecture: OSM data is
imported into a PostGIS spatial
database which stores the 2D
vector data of points, linestrings,
and polygons as well as the
overall system configuration.
Upon requesting a set of layers
for a specific tilecode, the
webserver either delivers the
tiles out of its local disk cache
or, if not present, generates the
tiles out of the information
stored in the database using the
XML3DMapServ binary
standard and broadly used in various mapserver implementations.
It is therefore possible in our system to use such a map
tile code, e.g., (4400,2686,13) for the area around Brandenburger Tor in the city of Berlin, as a query parameter in addition to a list of layers to the XML3DMapServer for requesting a tile containing 3D geometry for this area instead
of an image map as in the case of Google. Tiles serve essentially as a form of container for a certain area, including
all relevant geometry. This way, individual tiles can also be
cached on disk in order to minimize the overhead of generating them.
The architecture of the XML3DMapServer as shown in
Fig. 7 follows the principles of existing Web Map Services
by receiving requests via HTTP and offloading the main
work to Common Gateway Interface (CGI) programs executed by the web server. The CGI program itself is written in C/C++ in our case and is responsible for accessing
the PostgreSQL database and performing the XML3D geometry construction. The Web service itself is written as a
number of PHP scripts that execute the CGI binary and are
run from an Apache web server. The Web service can be
queried using GET requests, in order to return either a list
of the available layers with their unique ID in JSON, or to
return the 3D geometry for the layers of a certain tile as defined above. If tiles are not present in the local disk cache on
the Web server, the CGI program is run in order to generate
(and then cache) the requested tile.
In order to generate the tiles, the Google map tile code
is being transcoded into a square region in an actual spatial coordinate reference system, to be able to efficiently
query the necessary source data. Depending on the requested
layer and its predefined configuration, the stored point, line,
and polygon vector data can then be interpreted as needed.
For example, point data that is tagged as a tree results in
an XML3D instance of a generic tree model or a specific
tree if other attributes are available, transformed to this geographical position. Polygonal areas covering landuse be-
Author's personal copy
From real cities to virtual worlds using an open modular architecture
Fig. 8 On the left side of the figure one can see a rendering of a virtual
city at the position and direction that corresponds to the red marker
and arrow on the OpenStreetMap on the right. The user can walk their
avatar through the generated city as they would do this in a real life.
(Map data ©OpenStreetMap contributors, CC-BY-SA)
come patches of terrain, and linestrings of road networks are
converted to street geometry, where different road classes
will result in different lane widths. For buildings, the polygonal footprint is extruded either to the actual height of the
building if the OSM tag “building:height” is set, to the average storey height times the number of levels if the latter is
available via “buildings:level,” and otherwise, a reasonable
building height is assumed out of a predefined range. The
amenity of a building or POIs set within the area of a building footprint, e.g., restaurant, influences the texturing of that
building. For everything else, a randomly chosen texture is
applied, but extending the rules for choosing specific textures is merely a matter of finding appropriate textures for
each specific case.
5.2 Integration
As the XML3DMapServer application was developed independently from the interconnect prototype library, an integration step was necessary. An essential step was the introduction of a tile server that would work as an object host in
the Sirikata architecture and serve as a communication point
between the two systems.
It was also necessary to define how generated objects
would be registered in the virtual world. As mentioned in
the previous section, the world generated from the OpenStreetMap data was structured as layered tiles. While we
could potentially break each layer into smaller objects, such
as buildings, trees, or roads, we have instead decided to register each layer as a single object to avoid unnecessary overhead. This approach is still efficient, since all of the generated objects are static and are not to be modified by the application. Also, tiles are reasonably small to allow KataJS to
efficiently load and unload them as the user moves through
the world.
Many objects generated by the XML3DMapServer have
common properties, e.g., appearance, geometry, etc. This
was taken into account during the design and develop-
ment of the XML3DMapServer so that the shared data
is efficiently reused by the means of references in the
XML3D document. In order to exploit this, the client was
designed such that shared data is downloaded once and
reused for every tile. Although this helps decreasing network and memory load, it also prevents other clients from
viewing the generated city. In order to resolve this issue, we
have introduced a custom communication protocol that allows optimized clients to detect objects originating from the
XML3DMapServer. While the normal clients can ignore this
protocol and simply use the full model, the optimized clients
are able to communicate back to the XML3DMapServer and
only download unique data for each tile.
As the result of the integration, we have created a system
that allows a user to walk their avatar through the virtual city
as depicted in Fig. 8.
6 Discussion and future work
While increased, the speed of development of the virtual
worlds with our prototype library still suffers from the absence of high-level protocols in Sirikata. Only generic messages between different objects are provided by the system,
and most of the features that are needed by the application
involve adding new message types, which are typically specific to the application and thus complicate the task of creating interoperable worlds. On the other hand, Sirikata is a
flexible system for research as one can build any protocol on
top of it. As one of the next steps, we are planning to work
on a high-level protocol that would provide useful building
blocks, simplify the application development, and allow for
better interoperability.
Additionally, the architecture of our interconnect prototype library could be improved by adding inherent support
for sharing data between objects. This approach can also be
extended
to
other
applications
beyond
the
XML3DMapServer.
In order to promote the open nature of the system, we
have created the Community Group for Declarative 3D at
the World Wide Web Consortium (W3C), which is an international community that develops standards for the Web. In
this group, we will work on the standardization of declarative 3D graphics in the Web, which is an essential step to
integrate virtual worlds into the Web.
7 Conclusion
In this paper, we have introduced a new open architecture combining well-established Web components with various emerging technologies, such as XML3D, Xflow, and
Author's personal copy
S. Byelozyorov et al.
Sirikata. The system greatly simplifies the development and
design of Web-based virtual worlds. Moreover, the prototype library that we have created serves as a proof-ofconcept application and offers an opportunity for the virtual
world community by allowing nonexperts to quickly and
easily create their own virtual worlds. Overall, the modular and flexible design allows the creation of many different types of worlds, such as gaming, social, and collaborative environments. Such worlds would be readily accessible
through the browser to all Web users. We have also shown
that crowd-sourcing can help bringing real cities into a virtual environment from openly accessible sources, such as
OpenStreetMap.
Acknowledgements We would like to thank the Embodied Agents
Research Group that has kindly provided an Amber model for the
user’s avatar. Also, we would like to thank the many contributors of
OpenStreetMap, who have provided high-quality data enabling us to
generate a 3D version of our world based on their data.
References
1. Behr, J., Eschler, P., Jung, Y., Zöllner, M.: X3DOM: A DOMbased HTML5/X3D integration model. In: Proceedings of the 14th
International Conference on 3D Web Technology, Web3D ’09,
pp. 127–135. ACM Press, New York (2009)
2. Byelozyorov, S., Pegoraro, V., Slusallek, P.: An open modular architecture for effective integration of virtual worlds in the web. In:
Gavrilova, M.L. (ed.) CyberWorlds, pp. 46–53. IEEE Press, New
York (2011)
3. Geofabrik: (2012). http://download.geofabrik.de/osm/
4. Google: (2010). http://code.google.com/apis/o3d/
5. Group, K.: OpenGL ES common profile specification version
2.0.25. (2010). http://www.khronos.org/registry/gles/specs/2.0/
es_full_spec_2.0.25.pdf
6. Horn, D., Cheslack-Postava, E., Mistree B, F.T., Azimy, T., Terrace, J., Freedman M, J., Levis, P.: To infinity and not beyond:
scaling communication in virtual worlds with Meru. Tech. rep,
Stanford University (2010)
7. Intel: Scalable Virtual Environments (2011). http://software.intel.
com/en-us/articles/scalable-virtual-environments/
8. Khronos Group: WebGL Specification (2011). https://www.
khronos.org/registry/webgl/specs/1.0/
9. Klein, F., Rubinstein, D., Byelozyorov, S., Sons, K., Philipp, S.:
Xflow—declarative Data Processing for the Web (2012). Planned
to be published in Web3D
10. Linden Lab: Second Life (2012). http://secondlife.com/
11. Multiverse (2012). http://www.multiverse.net
12. Neis, P., Zielstra, D., Zipf, A.: The street network evolution of crowdsourced maps: OpenStreetMap in Germany 2007–
2011. Future Internet 4(1), 1–21 (2011). doi:10.3390/fi4010001.
http://www.mdpi.com/1999-5903/4/1/1/
13. Open Cobalt (2012). http://www.opencobalt.org/
14. Open Simulator (2012). http://opensimulator.org
15. OpenStreetMap (2012). http://www.openstreetmap.org/
16. OpenStreetMap 3D (2012). http://www.osm-3d.de
17. OpenStreetMap
3D
Development
(2012).
http://wiki.
openstreetmap.org/wiki/3D_Development
18. General OpenStreetMap Statistics (2012). http://wiki.
openstreetmap.org/wiki/Statistics
19. Statistics on Taguse in OpenStreetMap (2012). http://taginfo.
openstreetmap.org/
20. Osmosis (2012). http://wiki.openstreetmap.org/wiki/Osmosis
21. Red Dwarf (2012). http://www.reddwarfserver.org/
22. Sons, K., Klein, F., Rubinstein, D., Byelozyorov, S., Slusallek,
P.: XML3D: interactive 3D graphics for the web. In: Proceedings of the 15th International Conference on Web 3D Technology,
pp. 175–184 (2010)
23. Sons, K., Slusallek, P.: XML3D physics: Declarative physics simulation for the Web. In: Workshop on Virtual Reality Interaction
and Physical Simulation (VRIPHYS). Eurographics Association
(2011)
24. W3C: VRML97 and Related Specifications (1997). http://www.
web3d.org/x3d/specifications/vrml/
25. W3C: CSS 3D Transforms Module Level 3 (2009). http://www.
w3.org/TR/css3–3d-transforms/
26. W3C: Document Object Model (2011). http://www.w3.org/
DOM/
27. W3C: HTML5 Draft Specification (2011). http://dev.w3.org/
html5/spec/
28. W3C: Hypertext Transfer Protocol (2011). http://www.w3.org/
Protocols/rfc2616/rfc2616.html
29. W3C: Scalable Vector Graphics. Tech. rep., W3C (2011)
30. W3C: The WebSocket API (2011). http://dev.w3.org/html5/
websockets/
31. W3C: XMLHttpRequest. Tech. rep., W3C (2011). http://www.w3.
org/TR/XMLHttpRequest/
32. Web3DConsortium: ISO/IEC 19775:200x—Extensible 3D (X3D)
(2008). http://www.web3d.org/x3d/specifications/
33. WHATWG: Web Workers (2011). http://whatwg.org/ww
Sergiy Byelozyorov was born in
1987 in Ukraine, received his Bachelor’s Degree in Computer Science with honors from the National
Technical University of Ukraine
Kyiv Polytechnic Institute and Master’s Degree in Computer Science
with honors from Saarland University, Germany, in 2010. Currently he
is pursuing his Ph.D. in Computer
Science at the Computer Graphics
Chair. His research interests are in
computer graphics, virtual environments, and web technologies.
Rainer Jochem finished his studies of Computer Science at Saarland
University in 2008 and is since then
working for the Agents and Simulated Reality Lab at DFKI. His work
is focused on geoinformation systems and related applications, with
the automated processing of geodata, use of Web 2.0 technologies,
and interactive visualizations being
of interest.
Author's personal copy
From real cities to virtual worlds using an open modular architecture
Vincent Pegoraro is a PostDoctoral Researcher in the Computer Graphics Laboratory (CGUdS)
within the department of Computer
Science at Saarland University, Germany. As part of the Multimodal
Computing and Interaction (M2CI)
Cluster of Excellence, his research
interests broadly include predictive
rendering, efficient global illumination, photorealistic image synthesis,
computer graphics, and scientific
visualization. He received a Ph.D.
degree in Computer Science from
the University of Utah, USA, in
2009. As a Research Assistant at the Scientific Computing and Imaging
(SCI) Institute, his dissertation focused on efficiently simulating physically based light transport in participating media within the framework
of the Center for the Simulation of Accidental Fires and Explosions
(C-SAFE).
Philipp Slusallek is a Scientific
Director at the German Research
Center for Artificial Intelligence
(DFKI), where he heads the research area Agents and Simulated
Reality since 2008. He is also Director for Research at the Intel Visual Computing Institute, a central
research institute of Saarland University co-founded by him in 2009.
At Saarland University, he is a professor for Computer Graphics since
1999 and a Principle Investigator at
the German Excellence-Cluster on
“Multimodal Computing and Interaction” since 2007. Before coming to Saarland University, he was a
Visiting Assistant Professor at Stanford University, USA. He studied
physics in Frankfurt and Tübingen (Diploma/M.Sc.) and got a Ph.D. in
Computer Science from Erlangen University. His research interests are
focused on 3D Internet technology and application, integrating graphics and artificial intelligence, immersive collaborative environments,
distributed simulation and visualization, as well as high-performance
graphics and computation.