From real cities to virtual worlds using an open modular architecture Sergiy Byelozyorov, Rainer Jochem, Vincent Pegoraro & Philipp Slusallek The Visual Computer International Journal of Computer Graphics ISSN 0178-2789 Vis Comput DOI 10.1007/s00371-012-0717-9 1 23 Your article is protected by copyright and all rights are held exclusively by SpringerVerlag. This e-offprint is for personal use only and shall not be self-archived in electronic repositories. If you wish to self-archive your work, please use the accepted author’s version for posting to your own website or your institution’s repository. You may further deposit the accepted author’s version on a funder’s repository at a funder’s request, provided it is not made publicly available until 12 months after publication. 1 23 Author's personal copy Vis Comput DOI 10.1007/s00371-012-0717-9 O R I G I N A L A RT I C L E From real cities to virtual worlds using an open modular architecture Sergiy Byelozyorov · Rainer Jochem · Vincent Pegoraro · Philipp Slusallek © Springer-Verlag 2012 Abstract The technologies for the Web and virtual worlds are currently converging, but although there are some efforts made to integrate them with each other, they typically rely on technologies foreign to most Web developers. In this paper, we present a new open architecture that combines several emerging and established technologies to provide convenient tools for developing virtual worlds directly in the Web. These technologies are easy to learn and understand by the Web community and allow for quick prototyping. Overall the modular architecture allows virtual worlds to be developed more quickly and more widely deployed. Additionally, we demonstrate that creating an adequate virtual environment can be an easy task when applying the principles of crowd-sourcing. We present an application that uses one of the largest available open data sources of geospatial information to bring 3D cities from the real world into the virtual environment. Keywords Virtual worlds · The Web · Open architecture · Geographic information system S. Byelozyorov () Saarland University, Saarbrücken, Germany e-mail: [email protected] R. Jochem DFKI, Saarbrücken, Germany e-mail: [email protected] V. Pegoraro Saarland University and MMCI, Saarbrücken, Germany e-mail: [email protected] P. Slusallek Saarland University and DFKI, Saarbrücken, Germany e-mail: [email protected] 1 Introduction The developments in the Web technology made over the last decade have led to a significant impact not only on businesses, but also on society as people from different backgrounds and cultures communicate with each other using Web-based social networks, such as Facebook and Twitter, thus increasingly merging the real world with a new borderless digital space. The Web has become so significant that the Internet itself has become a synonym for it. At the same time, we have seen a tremendous development in graphics in the last decade. Modern hardware is able to display highly realistic environments in real time, making games and other applications an immersive and entertaining experience. The general increase in the network bandwidth has allowed game companies to create a new type of games, massive multiplayer online games, where large number of players can play together in a single shared virtual world. We are currently at the point where these two developments can be combined by creating next-generation virtual worlds that are fully integrated into the Web. In the same way social networks have become a daily part of our society, we expect such shared environments to create another way of interaction and communication on the Internet. However, there is currently no widely accepted and easy way to integrate virtual worlds into the Web. In this paper, we propose an open modular architecture based on a combination of emerging technologies, such as XML3D [22], Xflow [9], and Sirikata [6], as well as established and widely available Web components. This combination has a number of advantages: Firstly, it makes the integration of virtual worlds in the browser much simpler for Web developers, as no special knowledge in the areas of networks, graphics, or system ar- Author's personal copy S. Byelozyorov et al. (Map data ©OpenStreetMap contributors, CC-BY-SA) Fig. 1 An example of a virtual world running in the browser using our architecture. On the left, our modified Chromium browser is run on Windows and uses rasterization to render the scene, while on the right, the same browser running on Ubuntu Linux uses ray-tracing for more realistic rendering (e.g., correct shadows) chitecture is required, other than Web technologies. In addition, developers can quickly prototype their applications using high-level concepts offered by the underlying technologies. The approach is highly flexible due to the modular architecture, which allows replacing components easily. In the Web community, this approach is known as mashup, where a developer combines data and functionalities from two or more sources to create new services and applications. Wider and easier deployment is achieved as users do not have to install additional specialized software, but instead use the browser that is typically shipped preinstalled on their computer. An illustration of the world running in different operating systems and using different rendering algorithms is shown in Fig. 1. Finally, an important property of such design is that it is an open system, i.e., it is based on open-source code, open protocols, and an open specification. The development of the Web has shown that open systems are more successful than the ones based on closed or proprietary technologies, thanks to a more dynamic response to user feedback via typically shorter integration cycles, and their sustainability beyond the life expectancy of a particular contributing entity. As we have shown earlier [2], such an open modular architecture provides an effective framework for interactively exploring virtual worlds and interacting with other users in it. In order to make it a truly end-to-end platform, this paper additionally shows how the framework can be extended to support the creation of the virtual environments themselves. Regarding the data available in today’s Web 2.0 with its social media and crowd-sourcing movements, one can observe that there is a large amount of information with meaningful spatial data about the real world already available: people map the world on OpenStreetMap [15], publish their photographs along with Global Positioning System (GPS) coordinates, send twitter information from specific locations, or publish their movements and “check-in” at real-world venues on the Web. Using the example of spatial data from OpenStreetMap, we show that the integration of real-world data into virtual worlds is easily possible. We describe our approach in the following way. Section 2 gives an overview of the related work and motivates our choice of technologies. The selected technologies are detailed in Sect. 3, which also presents the prototype library that we have developed. Potential applications of this library are then illustrated in Sect. 4, one of which is creating real cities in the virtual world as shown in Sect. 5. Limitations and future work are discussed in Sect. 6. 2 Related work In this section, we discuss the components needed to integrate virtual worlds into the Web and provide an overview of existing solutions for each of them: 3D graphics technologies, 3D animation techniques, networking components for the communication with the virtual world platform, and the use of data from the OpenStreetMap project. Author's personal copy From real cities to virtual worlds using an open modular architecture 2.1 3D graphics This subsection reviews some technologies for embedding 3D graphics into Web pages and analyses their pros and cons. One of the approaches used to integrate graphics into the Web is known as X3DOM [1]. This technology is developed in the IGD Fraunhofer institute and is an adaptation of the ISO-standardized Extensible 3D Graphics (X3D) [32] file format to HyperText Markup Language 5 (HTML-5) [27]. The advantage of this technology is that it is based on the existing standard and thus is backward compatible to existing content. However X3D, which is the successor of the Virtual Reality Modeling Language (VRML) [24], was developed with different underlying technologies than the Web. As such X3D has to bridge between backward compatibility and seamless integration into HTML and the Document Object Model (DOM) [26]. These differences lead to a number of inconsistencies in key areas such as events, scene graph structure, scripting, material models, use of Cascading Style Sheets (CSS) [27], as well as others. An alternative technology for 3D graphics available in modern browsers is WebGL [8], developed by the Khronos Group providing direct JavaScript bindings to the OpenGL ES 2.0 API [5]. It is already implemented by most browser vendors. WebGL allows for delegating heavy processing to the parallel graphics hardware available on most consumer PCs, such that the central processing unit may perform other tasks. On the other hand, WebGL does not define declarative approach to represent 3D content similar to HTML. Instead, a JavaScript-based scene graph library must be used, which is inconsistent with other parts of a Web document and limits performance. Google has tried to address this problem by providing a fast scene graph JavaScript API in their project O3D [4]. While this solved the performance issue and allowed for larger scene sizes, it is still unintuitive for the web developers to keep the content separate from the DOM. The project has been discontinued. Finally XML3D, which is developed at Saarland University together with the German Centre for Artificial Intelligence (DFKI) and the Intel Visual Computing Institute, defines a format for encoding 3D graphics as an extension to HTML. However, unlike X3D, it is free to adopt the best and most widely available Web technologies familiar to Web developers. The XML3D specification defines a 3D document as part of the DOM model in the browser. It reuses ideas from Scalable Vector Graphics (SVG) [29], HTML-5, CSS, and more. Elements that encode different aspects of the 3D scene (such as geometry, material, lights, etc.) provide interfaces to JavaScript that allow an easy and real-time modification of the document. XML3D is efficient because it uses vertex arrays to natively encode geometry, which can be directly mapped to the hardware. This technology is the most adequate for our needs, and we present it in more detail in Sect. 3.1. 2.2 Animation Without animation, a virtual world is a monotonous and static environment, while we would like to use rich and immersive environments. Additionally, we need to be able to efficiently synchronize animations over the network using as little bandwidth as possible. Therefore we need an easy, intuitive, and efficient way to encode animations. The most straightforward way to animate an object is to use a script that regularly changes the transformations of objects and vertices. This is also known as DOM scripting and typically uses JavaScript. While this method, being familiar to Web developers from their experience, is intuitive and easy to learn, it is highly inefficient when it comes to complex animations. In order to create a smooth animation of an avatar character, a script would need to move thousands of vertices many times per second. This would typically require a substantial amount of code and expensive hardware to have a smooth realistic animation. We can solve this problem by moving such modifications to parallel hardware by implementing them as vertex shaders that are executed on the Graphics Processing Unit (GPU)— and thus are much more efficient. However, at this point, it becomes highly unintuitive for the web developers who are typically not familiar with shading languages. Learning them requires deep knowledge of graphics and GPU programming, which means that only experts will be able to create such animations. Instead of writing scripts, we can describe animations declaratively as an extension to HTML. Such an approach was suggested by the Synchronized Multimedia Integration Language (SMIL). However, SMIL does not fit well into the DOM concepts, and its elements are not very well supported by authoring tools. Instead, most Web applications use the aforementioned DOM scripting capabilities for dynamic content. An alternative is Xflow, which is an extension to XML3D and defines a data flow declaratively in the document. The main idea behind it is that we can represent animations as graphs where only a few data elements are changed, such as bone positions, but influence many more data elements, such as the final mesh vertices of a character. With this approach, a programmer defines a flow in the document and only changes a few parameters in JavaScript that define the animation, while the underlying system propagates data through the Xflow graph and creates a smooth animation on the screen. Moreover, in order to synchronize such an animation, we will need to send the graph configuration once and only send updates to the initial data elements later. Animations will be automatically computed on each client Author's personal copy S. Byelozyorov et al. locally. Due to these advantages, we have selected it as a basis for describing animations in the virtual worlds. A more detailed description can be found in Sect. 3.2. 2.3 Networking Aside from 3D graphics and animations, we need a real-time networking technology to create a shared virtual world. An established and convenient way of transferring additional data to the web application after the page has already been loaded is Ajax [31], which is a technology based on asynchronous requests to the server. This approach has shown its reliability and usability in thousands of the rich Web 2.0 applications. It is, however, designed for casual data exchange, rather than for continuous data streams that are present in applications like virtual worlds. The communication overhead created for each request can easily exceed the transferred data in size, thus making this technology inefficient in terms of the network load. An emerging technology that was introduced in the upcoming HTML-5 standard is known as WebSockets [30]. Similarly to WebGL, it is implemented in most major browsers and supports continuous data streams with only 2 bytes of overhead for each message. In addition, it can provide similar functionality in the business environments, where all ports other than 80 are blocked, by multiplexing several WebSocket connections over a single Hypertext Transfer Protocol (HTTP) [28] connection. This is important, since we would like virtual worlds to be available for everyone despite the limitations created by local firewalls or network policies. These benefits and tight integration with existing Web technologies make it the most suitable networking technology for virtual worlds on the Web. 2.4 Virtual world platform In addition to the client components, we need a virtual world platform to provide the data and synchronize the state of the world. This subsection presents a number of existing and emerging technologies in the area. Many popular online games, such as World Of Warcraft, Eve Online, EverQuest, and many others, are typically based on a proprietary server and client software. While being extremely immersive and realistic, they typically do not allow users to create content, because such content has to be specially designed for the game’s rendering engine. This type of platforms is not suited for the Web, since they do not allow interoperability, require special client software per world, and do not allow users to change the world. One of the first virtual worlds that allowed to create user content and has gained public attention is Second Life [10] in 2003. This platform features an internal market, where users can trade their content with each other, opening opportunities for businesses. Second Life is developed by Linden Labs and is run on their private servers. Unfortunately, the system is closed and not designed for interoperability with other worlds, which is an essential part of the Web. Other systems that have similar limitations include Blue Mars, Twinity, and HiPiHi. Open Simulator [14] is a project that provides open world server software that is compatible with the Second Life client. The implementation inherits Second Life’s server architecture, known as the grid, but also extends it to a hypergrid mode, where external worlds can be connected together. In the grid and hypergrid architecture, the world is divided into fixed-sized square regions, each being supported by a single server only. Moreover, when the server is close to its maximum capacity, the users are prohibited to enter the region, which sets an upper bound for the number of users that can visit one region at a time. This creates an issue when an event of interest, e.g., a virtual concert, is happening in a given region and many users are trying to visit it. Servers have to be designed with excessive capacity, which is used at a fraction of its potential most of the time. The Open Cobalt [13] platform offers a fully decentralized open-source world browser and toolkit, which has embedded tools for collaborative research and education. It is based on the Squeak programming language, which allows modifying the program at runtime, in particular, for extending the world functionality when discovering new object types by sharing their implementation code. Unfortunately, this solution has security limitations as such code can be arbitrary and thus potentially malicious. Additionally, the current architecture requires each peer to store the entire world and run all downloaded scripts locally, which is not scalable to large worlds. Red Dwarf [21], a successor to Project Darkstar originally developed by Sun, is focusing their architecture on scalability and reuse of resources. With their approach, the simulation of the world regions does not map directly to the servers, but is distributed dynamically depending on the load of each region. This allows resources to be used more efficiently and decreases the cost of running a world. This provides a good solution for a world that is run by a single company. However, in the Web, different companies typically run their web-sites independently and do not share their hardware due to security issues. Project Multiverse [11] offers an architecture that is designed for games. The world is separated into zones, and zones into regions. When switching between zones, a user has to wait until the other zone is loaded. However, moving between the regions is seamless as loading of adjacent regions starts in advance when the user is close to the region boundary. Additionally, there are instances that are similar to zones, because they also require a loading screen, but apply when entering a certain closed structure, e.g., a house or a cave. This can be used, for example, to create spaces that Author's personal copy From real cities to virtual worlds using an open modular architecture are larger than their outer apparent size. Unfortunately all these features are very game-specific and do not fit into the generic definition of the virtual worlds that we would like to see on the Web. Distributed Scene Graph [7] was suggested by Intel as a solution for scalable virtual environments. In this approach the scene graph is decoupled from other components, such as client management, physical simulation, script processing or scene persistence, each of which is executed in a separate server process. Scene graph itself is split into partitions that are also executed in different server processes. This allows one to dynamically distribute the load depending on the current situation in the world. For example, if at one place there is a massive event which many users are attending, then it makes sense to dynamically allocate one server to perform user management tasks for that particular place and maybe just one more server to handle users in all other parts of the world. This allows one to efficiently reuse the hardware and thus decrease the cost of running a virtual world. While this architecture is very promising, we have chosen not to use it as it is not mature enough yet, i.e., animations, interaction, and some other traditional features of a rich 3D environment are not implemented yet. The Sirikata project, which is developed at Stanford University, offers an architecture that separates network nodes into space servers and object hosts, where space servers are responsible for managing a certain constrained space in the shared world, while object hosts run the objects within spaces. This approach offers a better security model, is scalable to a large number of players, and offers a federated world, where parts of it can be controlled by different organizations. Additionally, Stanford developers provide a ready-to-use implementation of the server and client libraries which help developing new Web clients. Moreover, they support WebSockets that we have identified as a technology that we would like to use for networking. Based on these assets, Sirikata appears as the most appropriate platform for our needs. The detailed platform architecture is presented in Sect. 3.3. 2.5 Geographical datasets Creating a copy of the real city requires access to its associated geoinformation. Usually, this sort of spatial data is provided by either surveying offices, often imposing restrictions that prohibit a free usage, or commercial data providers, again requiring license fees. This difficulty led to the creation of OpenStreetMap (OSM) in 2004, which has since then become the central resource for crowdsourced geospatial data of the entire world and therefore forms a promising source for our needs. OSM is a huge, rather unstructured, but very detailed, and mainly 2D geodatabase, being maintained by roughly 500,000 users world-wide. Initially, the map data was only collected by volunteers using handheld GPS devices in order to record roads that were then uploaded into the OSM database. Since then, public domain datasets have been integrated, and commercial and governmental data providers have also either opened their datasets in order to, e.g., use aerial imagery to map features, or have directly released their data under appropriate licenses to have them included in the OSM databases. Today, the total number of nodes exceeds 1.3 billion including approximately 115 million streets and approximately 48 million buildings [18, 19]. The entire planet.osm dataset is 250 GB (16 GB compressed) in size. It is licensed under the Creative Commons Attribution-Share Alike 2.0 License but is currently in the process of relicensing under the Open Database License. Recent research [12] also shows that the quality of the German street network data in OSM is expected to be comparable to proprietary data sets by mid2012. The data collected by the OSM community consists not only of the street networks, but also of points of interest (POI), landuse areas, and building footprints. For buildings, in some areas there is also height information available, which can then be used to extrude simple box-shaped building models out of the footprints. If no height information is available, reasonable defaults can be used in order to create a somewhat realistic impression. Together with information from additional tags associated with the individual features, one can easily place the appropriate 3D geometry on a POI, e.g., for trees, bus stops, etc., or try to apply generic textures for certain building types, e.g., restaurants, shops, pharmacies, etc. This makes OpenStreetMap a very interesting data source for creating a virtual copy of the real world. Nevertheless, as the system described in Sect. 5.1 is designed using common standards, it is not restricted to OSM data and other free or non-free datasets could be used as well. There already exists some effort [17] on generating 3D worlds out of OSM data, the most advanced of these projects being OSM-3D [16] where the entire globe can be displayed on the web using a Java3D applet. For the scope of this work, we will use our own approach based on XML3D for visualization as described in Sect. 5.1. 3 System architecture This section first describes the XML3D technology, which enables interactive graphics in the Web browser, and subsequently discusses an extension to XML3D for animations, namely Xflow. We then provide an overview of the Sirikata virtual world platform before describing the prototype library that we have created to seamlessly integrate all of the above components into a flexible framework. Author's personal copy S. Byelozyorov et al. 3.1 XML3D XML3D [22] is based on XML and can be used as a standalone document or embedded into eXtensible HyperText Markup Language (XHTML), using a separate XML namespace. Being inside an XHTML document, certain parts of the HTML specification such as <img> and <video> elements can be directly reused in XML3D as well (e.g., for textures). The root of an XML3D scene is declared by the <xml3d> element, which creates a 3D canvas at the place of declaration. This canvas can be formatted via built-in browser CSS properties, e.g., borders, size, and background. The descendants of the <xml3d> element define the content of the scene by mapping a scene graph onto the DOM tree. Typically in the scene graph, replicated content is stored only once and instantiated many times to save memory. Consequently, similarly to SVG, we introduce a definition subtree starting with a <defs> element, which includes all the content resources that may be used in the scene but are not rendered directly (e.g., geometry, shaders, vertex arrays, etc.). Its content is referenced by other parts of the scene, such as <mesh> elements that instantiate referenced geometry. In 3D graphics, shading can be highly complex with many different shaders, each having their individual parameters. CSS cannot provide this as it only allows for a fixed set of predefined properties. Instead, a user can declare a shader with parameters using a <shader> element. The code for the shader can be declared using a <script> element, which is reused from HTML, referenced by the shader. In HTML documents, a CSS style defines the size, position, and appearance of a box-shaped layout element. XML3D translates this concept also to 3D content, where it describes the transformations and appearance properties of <mesh> elements. Transformations and shaders can be assigned to XML3D elements using CSS, in which case all children of the node are affected. For the description of 3D transformations, the CSS 3D Transforms [25] specification can also be used. As XML3D is designed to be a thin additional layer on top of existing Web technology, its functionality is designed to be generic, orthogonal to other technologies, and is kept to the minimum needed to display and interact with 3D content. For example, XML3D is a part of the DOM, which already provides scripting and event services—most additional functionalities can usually be implemented through scripts instead of new special-purpose elements. Harnessing JavaScript’s built-in timer functions, it is easy to create animations by performing changes of the DOM over time. Generic DOM event listeners reacting to mouse movements and clicks can be connected directly to individual scene elements. XML3D supports attaching event handlers statically via an element’s attribute, e.g., “onclick” or “onmousemove,” or dynamically via standard DOM methods like addEventListener. While the event system described by the DOM events specification of W3C is also applicable to 3D graphics, the predefined event types need to be extended to provide additional information that might be needed for certain 3D applications. For instance, specialized 3D mouse events should also provide information, like global-world coordinates and surface normal. The presented design has minimum additions needed to introduce 3D graphics to the Web and reuses many design ideas from HTML, CSS, SVG, JavaScript, and DOM. Deep integration with widely adopted technologies allows one to easily combine 3D and 2D content and makes it easier for web developers to extend existing Web applications with novel 3D features. XML3D is currently implemented in both a modified version of Firefox, called RTfox, and a modified version of Chromium which, similarly to Safari and other open-source browsers, is based on the WebKit rendering engine. XML3D is scheduled to support a portable shading language extension called AnySL in the near future. 3.2 Xflow In its original specification, XML3D lacks mesh processing capabilities comparable to vertex shaders supporting flexible animations. Even though it is possible to modify the mesh data from JavaScript, it is not as efficient for large geometries. Xflow is an extension to XML3D that provides a solution to this issue. It has all the necessary tools to define a data flow graph declaratively in the document, which can be reused to effectively generate complex animations. A core element that represents a node in the flow graph is the <data> element, which can contain raw data and other <data> elements. Xflow encodes raw data via array elements based on the data types they describe: <float>, <float2>, <float3>, <int>. Each array element can have an attribute “name” that associates semantics for the data, e.g., position, normal, or weight. Data elements can be referenced by <data>, <mesh>, and <shader> elements, where the last two function as data sinks. Figure 2 illustrates a flow graph. The flow begins with source <data> elements that contain only raw data and propagates this data down to other elements over references. Eventually, we plan to also support sensors as <data>-sources, which is particularly interesting on mobile devices, for camera, etc. Some <data> nodes may have scripts attached to them, which provide processing capabilities to the node. Xflow defines a number of built-in scripts that define basic processing tasks, such as generating displacement, UV mapping, simple geometry or noise functions, as well as more complex tasks, such as skinning, morphing, tessellation, forward kinematics, etc. When Author's personal copy From real cities to virtual worlds using an open modular architecture Fig. 2 In Xflow, a flow describes how the data from the source elements is transformed by processing elements, which have attached scripts, before arriving at the sink elements, such as a mesh or a shader. This allows creating highly-realistic and fast animations which can be mapped directly to hardware custom processing is needed, the user can define their own scripts using an associated <script> element. Animations are typically driven by one or a few simple parameters in the source <data> elements such as the morphing weight or skinning matrices. These parameters can be changed dynamically using JavaScript triggering the evaluation of the rest of the flow using highly efficient algorithms that can be directly mapped to the OpenGL vertex shaders or OpenCL code to be executed on a GPU or on the CPU or both, whichever offers better overall performance. While Xflow introduces a new animation concept for web developers, it frees them from learning shader languages and provides an easy and intuitive way to encode complicated fast animations. Similarly to animations, another key component required for interactive 3D scenes is physics simulation. In the past adding physics properties required major changes to the scene graph, while in [23] we present a declarative annotation framework that nicely separates the generic 3D scene description from its physics model. As physical properties are orthogonal to 3D geometry, we offer this framework as an extension to XML3D and have implemented it as a browser plug-in. 3.3 Sirikata Sirikata is an open-source virtual world platform developed at Stanford University. The goal of the platform is to provide a set of open libraries and protocols that can be used to easily deploy a virtual world. The platform design is based on five core principles: scalability, security, federation, extensibility, and flexibility. Security has played an important role in the Internet and virtual worlds, which use publicly available servers enabling many different kinds of attacks. It is important to consider Fig. 3 Sirikata separates the virtual world into three concepts: space simulation, object simulation, and content delivery. Space servers handle communication between objects and synchronize frequently changing data, such as position. Object hosts, which could be clients or hosting servers, store full objects’ data and run the scripts associated with them. Finally, the content delivery network offloads the bulk content that does not frequently change from the communication link to the space server, allowing for greater scalability the security from the start, and that is why Sirikata has outlined this as one of the design principles. Federation is a general principle that is underlying the Web architecture. On the Internet, many different companies control Web pages, but no single company controls the entire network. Instead, these companies have established a set of standard protocols that allow for interoperability. Similar concepts can be applied to virtual worlds, where parts of a big seamless world can be controlled by different entities. The Stanford team has designed their platform to support extensible virtual worlds, which means that users and developers should be able to extend the world by creating their own content, scripts, or even plugins that can offer the whole world new features. Flexibility, on the other hand, allows their platform to be used in different scenarios. As on the Web, where different Web sites can be online shops, social networks, news, search engines, etc., developers should be able to create different kinds of worlds: online games, collaborative worlds, virtual museums, recreations of the real world, and others. All of these principles are reflected in Sirikata’s system architecture by splitting the functionality of the platform into three concepts: space management, object simulation, and content delivery, as shown in Fig. 3. This is different from the traditional approach, where all the objects with their scripts and data are simulated on a single server or a cluster. Instead, Sirikata separates space simulation on space servers—such as discovery of nearby objects, position synchronization, world physics, etc.—from object simulation on object hosts, such as running scripts that define the behavior of the objects. This separation is important, because it allows implementing two of the aforementioned principles. Object hosts can be run by different companies on servers that are con- Author's personal copy S. Byelozyorov et al. trolled by each company respectively, effectively allowing them to interoperate through the space server, thus creating a federation. Additionally, it allows one to implement a better security model, because the scripts and full object data are controlled by their owner and do not have to be uploaded to the space server or object hosts that are controlled by other organizations. Additionally, Sirikata suggests using an independent content delivery network, which synchronizes data that is mostly static or changes rarely, e.g., model meshes or textures. This allows offloading the network load from the space server and provides it with more resources for communication of the frequently updated information, e.g., object position and orientation. This design allows a space server to handle many more objects in the world. Furthermore, a given space server can connect to other space servers that simulate other parts of the world, thus effectively increasing scalability for space simulations. The space server and object host implementation that is provided by Sirikata is designed as a set of modules connected to the core that only implements very basic functionalities that are required to be supported by every participant in the world. By adding to or extending existing modules in the system, developers can provide new features to the world or enhance existing ones, therefore providing substantial flexibility. The object host to space server communication protocol allows one to register objects dynamically, thus enabling users to create content at runtime, which makes the world extensible. Based on the five outlined principles, the Sirikata platform is a good fit for virtual worlds on the Web. 3.4 Interconnect prototype library In order to facilitate the development process, we have implemented a prototype library that can be used to embed virtual worlds into the browser. The library is an extension of KataJS, which is a JavaScript library created by Sirikata developers, to which we have added XML3D and Xflow support. There are several advantages in using KataJS apart from the fact that it was initially designed to work with Sirikata. Firstly, KataJS implements the WebSockets protocol for communication with the Sirikata space server. The library is optimized for use on modern multicore architecture because it uses Web Workers [33], which is part of the upcoming HTML-5 standard. Also it has a modular structure, allowing one to easily replace components and parts of the system, which is an important feature that allows one to quickly integrate latest research results. In particular, each module has a communication protocol to other modules, which defines its interface but leaves the programmer a choice for the implementation. The welldefined abstraction for the graphics subsystem allowed us to integrate the XML3D concepts into the system quickly and easily. The current implementation of the XML3D module allows loading an initial world that contains static unmodifiable parts of the world not to be synchronized by Sirikata, e.g., terrain and trees. Additionally, it allows new objects to be dynamically created when information about them arrives from the space server. Such information is first received by the networking module and then translated into a message for the graphic subsystem. When receiving such a message, the XML3D module loads the associated geometric data (mesh) from the content delivery network, and once this data is loaded and parsed, a new object is added to the scene, which appears on the user’s screen. Note that loading of the mesh is asynchronous, so it does not block the XML3D module from processing other messages. Apart from geometry data, a mesh can also contain named animations. The animations are controlled by messages to the graphics subsystem, containing the name and parameters of the animation to be played. Each object can have its own mesh and animations specific to the object’s structure. For instance, despite the fact that KataJS sends the same walking animation message to every avatar, they can behave differently, i.e., a human would walk using two legs, while a tiger would use all four legs, and a car would rotate its wheels. In order to integrate this system with Xflow, we have added custom XML nodes that map animation names to the parameters that need to be regularly changed at the source nodes of the flow graph. While these nodes are not essential, they represent a convenient way to save the animation state for each avatar. When the animation message from KataJS arrives to the XML3D module, a simple script is triggered that uses the information from these nodes to drive the animation. 4 Applications In order to illustrate the previously outlined benefits of our system, we have extended KataSpace, a demo virtual world application that uses the KataJS library, created by the Sirikata developers. Thanks to its flexibility, our prototype library allows developers to quickly create new features or functionalities needed by a specific application. For instance, let us assume that one wishes to increase the speed of the avatar when the shift key is hold. As a default animation message does not have a parameter to control the animation speed, the latter would remain unchanged, resulting in an artificial look. However, by simply creating a new message and implementing its handling in the XML3D module, the animations may then be easily run at different speeds. Moreover, by exploiting the flexibility of Xflow, it is also possible to Author's personal copy From real cities to virtual worlds using an open modular architecture Fig. 4 (left) An example feature illustrating how quickly developers can introduce new functionalities into the system. An interface allows overriding existing or newly created animations for the avatar by selecting them from the library and setting additional parameters. In this case, we have mapped the jumping animation to the idle state in which the avatar would typically stay still. (right) A browser’s built-in inspector may be used to modify the scene in a running application, which is a convenient tool for content creators. This screenshot shows how the impact of changing a given color value in the shader is immediately visible on the screen (avatar’s shirt) add another message that allows creating new animations for the loaded avatar. The user is then able to select an animation from the animation library, set a number of parameters, such as starting frame, length, and whether the animation should loop, and map it to the animation name at run-time, as shown in Fig. 4 (left). This functionality would be nontrivial to achieve in the majority of the existing virtual world architectures, while using our system, the feature was implemented only in a couple of hours. Moreover, the proposed platform exposes the full set of debugging tools provided by modern browsers, allowing one to create virtual worlds more quickly and easily. We greatly benefited from this feature when designing the prototype library and application, for instance, in order to determine the best approach for animation. These tools can also be used by designers to create virtual world content in a similar way as web developers design web pages—instead of changing the source code and reloading the web page many times, one can modify some parameters in a running browser and once satisfied with the changes, sync them back with the original code. An example of such modification is presented in Fig. 4 (right). In addition, the modularity of the architecture provides a great flexibility to the overall system, allowing the substitution of individual components so as to better match the needs of a specific application. On a larger scale, even the virtual world platform itself can be transparently substituted if features beyond those provided by Sirikata are required, without the need to change graphics, scripting or any other component (see Fig. 5). 5 Real world in a virtual environment In this section, we describe the approach used to generate real 3D cities in a virtual environment. The first subsection Fig. 5 Illustration of the networking part of our prototype library. The WebSocket functionality can be replaced with some other socket, which can be implemented as a plugin to the browser. As long as new socket class provides the same interface as WebSocket, there is no need to replace the protocol implementation or any other components. However, should one need other features than those provided by Sirikata, the protocol implementation itself may also be seamlessly replaced in order to connect to another virtual world platform gives an overview of the XML3DMapServer, which is a system we created to generate XML3D from OpenStreetMap data, while the second subsection covers the integration aspects of this system into our prototype library. 5.1 XML3DMapServer The OpenStreetMap datasets are publicly available, either in the form of the planet.osm dataset that contains the complete OSM data, or in the form of various extracts for continents, states, or other areas, provided through a number of mirrors. While the planet.osm dataset is updated weekly, the update frequency of these third party extracts differs from mirror to mirror. For the purpose of this work, we used extracts of several German states provided by geofabrik.de [3]. The data is provided in the OSM XML file format and is imported into a PostGIS database, the de-facto standard opensource Geographic Information System (GIS) database, using open-source tools like osmosis [20]. PostGIS implements the specified standard for Simple Features for SQL as defined by the OpenGeospatialConsortium (OGC), which allows us to work on a more standard conformant interface that can also hold other GIS data, like the ones provided through governmental cadastral data. Also, PostGIS provides a number of spatial functions and operations in order to process the different forms of point, line, or polygon geometries. One of the main design concepts is the use of layers and map tiles in order to be able to serve 3D geometry in an efficient and flexible way. Splitting the world into several layers covering different content is a common approach in a GIS. We provide layers for 3D buildings, different forms of landuse, and different types of POI, which allows our users to pick only the information they are interested in. The individual layers are predefined in the central database, and the relevant features of the OSM data are mapped into the cor- Author's personal copy S. Byelozyorov et al. responding layer based in their attributes. For some subsets, preprocessing is necessary in order to filter, join, or clean-up the data. The tiling mechanism is based on Google’s map tiling system as shown in Fig. 6: a square map of the earth in Mercartor projection is split again into square regions. Initially, a minified map forms the source tile. It is assigned the coordinates (x, y, z) = (0, 0, 0), where x corresponds to the position of the tile from west to east, 0 starting at the position of 180◦ W. y consequently goes from north to south, where 0 is located at 90◦ N. z indicates the zoom factor, where z = 1 indicates that the initial map is split into four regions, and so on. With this scheme, every resulting tile can be identified by a unique triple of numbers. Currently, the maximum defined zoom level is 21, which results in a total number of n = 421 ≈ 4, 398 × 1012 tiles. This system allows for generating a variety of tiles with different resolutions for various purposes. In fact, such a scheme is rather Fig. 6 Google map tiles at zoom level 1: A square world map is split into four sub-regions. The resulting tile coordinates (x, y, z) are (0, 0, 1) at the upper left, (1, 0, 1) at the upper right, (0, 1, 1) at the lower left, and (1, 1, 1) at the lower right. Subsequent zoom levels z > 1 will again split each region into four tiles following the same scheme. (Source: maptiler.org) Fig. 7 Illustration of the architecture: OSM data is imported into a PostGIS spatial database which stores the 2D vector data of points, linestrings, and polygons as well as the overall system configuration. Upon requesting a set of layers for a specific tilecode, the webserver either delivers the tiles out of its local disk cache or, if not present, generates the tiles out of the information stored in the database using the XML3DMapServ binary standard and broadly used in various mapserver implementations. It is therefore possible in our system to use such a map tile code, e.g., (4400,2686,13) for the area around Brandenburger Tor in the city of Berlin, as a query parameter in addition to a list of layers to the XML3DMapServer for requesting a tile containing 3D geometry for this area instead of an image map as in the case of Google. Tiles serve essentially as a form of container for a certain area, including all relevant geometry. This way, individual tiles can also be cached on disk in order to minimize the overhead of generating them. The architecture of the XML3DMapServer as shown in Fig. 7 follows the principles of existing Web Map Services by receiving requests via HTTP and offloading the main work to Common Gateway Interface (CGI) programs executed by the web server. The CGI program itself is written in C/C++ in our case and is responsible for accessing the PostgreSQL database and performing the XML3D geometry construction. The Web service itself is written as a number of PHP scripts that execute the CGI binary and are run from an Apache web server. The Web service can be queried using GET requests, in order to return either a list of the available layers with their unique ID in JSON, or to return the 3D geometry for the layers of a certain tile as defined above. If tiles are not present in the local disk cache on the Web server, the CGI program is run in order to generate (and then cache) the requested tile. In order to generate the tiles, the Google map tile code is being transcoded into a square region in an actual spatial coordinate reference system, to be able to efficiently query the necessary source data. Depending on the requested layer and its predefined configuration, the stored point, line, and polygon vector data can then be interpreted as needed. For example, point data that is tagged as a tree results in an XML3D instance of a generic tree model or a specific tree if other attributes are available, transformed to this geographical position. Polygonal areas covering landuse be- Author's personal copy From real cities to virtual worlds using an open modular architecture Fig. 8 On the left side of the figure one can see a rendering of a virtual city at the position and direction that corresponds to the red marker and arrow on the OpenStreetMap on the right. The user can walk their avatar through the generated city as they would do this in a real life. (Map data ©OpenStreetMap contributors, CC-BY-SA) come patches of terrain, and linestrings of road networks are converted to street geometry, where different road classes will result in different lane widths. For buildings, the polygonal footprint is extruded either to the actual height of the building if the OSM tag “building:height” is set, to the average storey height times the number of levels if the latter is available via “buildings:level,” and otherwise, a reasonable building height is assumed out of a predefined range. The amenity of a building or POIs set within the area of a building footprint, e.g., restaurant, influences the texturing of that building. For everything else, a randomly chosen texture is applied, but extending the rules for choosing specific textures is merely a matter of finding appropriate textures for each specific case. 5.2 Integration As the XML3DMapServer application was developed independently from the interconnect prototype library, an integration step was necessary. An essential step was the introduction of a tile server that would work as an object host in the Sirikata architecture and serve as a communication point between the two systems. It was also necessary to define how generated objects would be registered in the virtual world. As mentioned in the previous section, the world generated from the OpenStreetMap data was structured as layered tiles. While we could potentially break each layer into smaller objects, such as buildings, trees, or roads, we have instead decided to register each layer as a single object to avoid unnecessary overhead. This approach is still efficient, since all of the generated objects are static and are not to be modified by the application. Also, tiles are reasonably small to allow KataJS to efficiently load and unload them as the user moves through the world. Many objects generated by the XML3DMapServer have common properties, e.g., appearance, geometry, etc. This was taken into account during the design and develop- ment of the XML3DMapServer so that the shared data is efficiently reused by the means of references in the XML3D document. In order to exploit this, the client was designed such that shared data is downloaded once and reused for every tile. Although this helps decreasing network and memory load, it also prevents other clients from viewing the generated city. In order to resolve this issue, we have introduced a custom communication protocol that allows optimized clients to detect objects originating from the XML3DMapServer. While the normal clients can ignore this protocol and simply use the full model, the optimized clients are able to communicate back to the XML3DMapServer and only download unique data for each tile. As the result of the integration, we have created a system that allows a user to walk their avatar through the virtual city as depicted in Fig. 8. 6 Discussion and future work While increased, the speed of development of the virtual worlds with our prototype library still suffers from the absence of high-level protocols in Sirikata. Only generic messages between different objects are provided by the system, and most of the features that are needed by the application involve adding new message types, which are typically specific to the application and thus complicate the task of creating interoperable worlds. On the other hand, Sirikata is a flexible system for research as one can build any protocol on top of it. As one of the next steps, we are planning to work on a high-level protocol that would provide useful building blocks, simplify the application development, and allow for better interoperability. Additionally, the architecture of our interconnect prototype library could be improved by adding inherent support for sharing data between objects. This approach can also be extended to other applications beyond the XML3DMapServer. In order to promote the open nature of the system, we have created the Community Group for Declarative 3D at the World Wide Web Consortium (W3C), which is an international community that develops standards for the Web. In this group, we will work on the standardization of declarative 3D graphics in the Web, which is an essential step to integrate virtual worlds into the Web. 7 Conclusion In this paper, we have introduced a new open architecture combining well-established Web components with various emerging technologies, such as XML3D, Xflow, and Author's personal copy S. Byelozyorov et al. Sirikata. The system greatly simplifies the development and design of Web-based virtual worlds. Moreover, the prototype library that we have created serves as a proof-ofconcept application and offers an opportunity for the virtual world community by allowing nonexperts to quickly and easily create their own virtual worlds. Overall, the modular and flexible design allows the creation of many different types of worlds, such as gaming, social, and collaborative environments. Such worlds would be readily accessible through the browser to all Web users. We have also shown that crowd-sourcing can help bringing real cities into a virtual environment from openly accessible sources, such as OpenStreetMap. Acknowledgements We would like to thank the Embodied Agents Research Group that has kindly provided an Amber model for the user’s avatar. Also, we would like to thank the many contributors of OpenStreetMap, who have provided high-quality data enabling us to generate a 3D version of our world based on their data. References 1. Behr, J., Eschler, P., Jung, Y., Zöllner, M.: X3DOM: A DOMbased HTML5/X3D integration model. In: Proceedings of the 14th International Conference on 3D Web Technology, Web3D ’09, pp. 127–135. ACM Press, New York (2009) 2. Byelozyorov, S., Pegoraro, V., Slusallek, P.: An open modular architecture for effective integration of virtual worlds in the web. In: Gavrilova, M.L. (ed.) CyberWorlds, pp. 46–53. IEEE Press, New York (2011) 3. Geofabrik: (2012). http://download.geofabrik.de/osm/ 4. Google: (2010). http://code.google.com/apis/o3d/ 5. Group, K.: OpenGL ES common profile specification version 2.0.25. (2010). http://www.khronos.org/registry/gles/specs/2.0/ es_full_spec_2.0.25.pdf 6. Horn, D., Cheslack-Postava, E., Mistree B, F.T., Azimy, T., Terrace, J., Freedman M, J., Levis, P.: To infinity and not beyond: scaling communication in virtual worlds with Meru. Tech. rep, Stanford University (2010) 7. Intel: Scalable Virtual Environments (2011). http://software.intel. com/en-us/articles/scalable-virtual-environments/ 8. Khronos Group: WebGL Specification (2011). https://www. khronos.org/registry/webgl/specs/1.0/ 9. Klein, F., Rubinstein, D., Byelozyorov, S., Sons, K., Philipp, S.: Xflow—declarative Data Processing for the Web (2012). Planned to be published in Web3D 10. Linden Lab: Second Life (2012). http://secondlife.com/ 11. Multiverse (2012). http://www.multiverse.net 12. Neis, P., Zielstra, D., Zipf, A.: The street network evolution of crowdsourced maps: OpenStreetMap in Germany 2007– 2011. Future Internet 4(1), 1–21 (2011). doi:10.3390/fi4010001. http://www.mdpi.com/1999-5903/4/1/1/ 13. Open Cobalt (2012). http://www.opencobalt.org/ 14. Open Simulator (2012). http://opensimulator.org 15. OpenStreetMap (2012). http://www.openstreetmap.org/ 16. OpenStreetMap 3D (2012). http://www.osm-3d.de 17. OpenStreetMap 3D Development (2012). http://wiki. openstreetmap.org/wiki/3D_Development 18. General OpenStreetMap Statistics (2012). http://wiki. openstreetmap.org/wiki/Statistics 19. Statistics on Taguse in OpenStreetMap (2012). http://taginfo. openstreetmap.org/ 20. Osmosis (2012). http://wiki.openstreetmap.org/wiki/Osmosis 21. Red Dwarf (2012). http://www.reddwarfserver.org/ 22. Sons, K., Klein, F., Rubinstein, D., Byelozyorov, S., Slusallek, P.: XML3D: interactive 3D graphics for the web. In: Proceedings of the 15th International Conference on Web 3D Technology, pp. 175–184 (2010) 23. Sons, K., Slusallek, P.: XML3D physics: Declarative physics simulation for the Web. In: Workshop on Virtual Reality Interaction and Physical Simulation (VRIPHYS). Eurographics Association (2011) 24. W3C: VRML97 and Related Specifications (1997). http://www. web3d.org/x3d/specifications/vrml/ 25. W3C: CSS 3D Transforms Module Level 3 (2009). http://www. w3.org/TR/css3–3d-transforms/ 26. W3C: Document Object Model (2011). http://www.w3.org/ DOM/ 27. W3C: HTML5 Draft Specification (2011). http://dev.w3.org/ html5/spec/ 28. W3C: Hypertext Transfer Protocol (2011). http://www.w3.org/ Protocols/rfc2616/rfc2616.html 29. W3C: Scalable Vector Graphics. Tech. rep., W3C (2011) 30. W3C: The WebSocket API (2011). http://dev.w3.org/html5/ websockets/ 31. W3C: XMLHttpRequest. Tech. rep., W3C (2011). http://www.w3. org/TR/XMLHttpRequest/ 32. Web3DConsortium: ISO/IEC 19775:200x—Extensible 3D (X3D) (2008). http://www.web3d.org/x3d/specifications/ 33. WHATWG: Web Workers (2011). http://whatwg.org/ww Sergiy Byelozyorov was born in 1987 in Ukraine, received his Bachelor’s Degree in Computer Science with honors from the National Technical University of Ukraine Kyiv Polytechnic Institute and Master’s Degree in Computer Science with honors from Saarland University, Germany, in 2010. Currently he is pursuing his Ph.D. in Computer Science at the Computer Graphics Chair. His research interests are in computer graphics, virtual environments, and web technologies. Rainer Jochem finished his studies of Computer Science at Saarland University in 2008 and is since then working for the Agents and Simulated Reality Lab at DFKI. His work is focused on geoinformation systems and related applications, with the automated processing of geodata, use of Web 2.0 technologies, and interactive visualizations being of interest. Author's personal copy From real cities to virtual worlds using an open modular architecture Vincent Pegoraro is a PostDoctoral Researcher in the Computer Graphics Laboratory (CGUdS) within the department of Computer Science at Saarland University, Germany. As part of the Multimodal Computing and Interaction (M2CI) Cluster of Excellence, his research interests broadly include predictive rendering, efficient global illumination, photorealistic image synthesis, computer graphics, and scientific visualization. He received a Ph.D. degree in Computer Science from the University of Utah, USA, in 2009. As a Research Assistant at the Scientific Computing and Imaging (SCI) Institute, his dissertation focused on efficiently simulating physically based light transport in participating media within the framework of the Center for the Simulation of Accidental Fires and Explosions (C-SAFE). Philipp Slusallek is a Scientific Director at the German Research Center for Artificial Intelligence (DFKI), where he heads the research area Agents and Simulated Reality since 2008. He is also Director for Research at the Intel Visual Computing Institute, a central research institute of Saarland University co-founded by him in 2009. At Saarland University, he is a professor for Computer Graphics since 1999 and a Principle Investigator at the German Excellence-Cluster on “Multimodal Computing and Interaction” since 2007. Before coming to Saarland University, he was a Visiting Assistant Professor at Stanford University, USA. He studied physics in Frankfurt and Tübingen (Diploma/M.Sc.) and got a Ph.D. in Computer Science from Erlangen University. His research interests are focused on 3D Internet technology and application, integrating graphics and artificial intelligence, immersive collaborative environments, distributed simulation and visualization, as well as high-performance graphics and computation.
© Copyright 2025 Paperzz