3D Realistic Animation of a Tennis Player Mustafa Kasap1 , Matthias Holländer2 , Anil Aksay3 , Philip Kelly4 , David Monaghan4 , Ciarán Ó Conaire4 , Nadia Magnenat-Thalmann1 , Tamy Boubekeur2 , Ebroul Izquierdo3 and Noel E. O’Connor4 1 MIRALab, Universite de Geneve (UNIGE), Switzerland 2 Computer graphics group, Telecom Paristech, France 3 Multimedia and Vision Group (MMV), Queen Mary University London, UK 4 CLARITY: Centre for Sensor Web Technologies, Dublin City University, Ireland [email protected],[email protected],anil.aksay@elec. qmul.ac.uk,[email protected] Abstract. In this paper we present the progress of collaborative ongoing work into the realistic 3D animation of a tennis player from the 3DLife project. The main focus of this work revolves around producing a realistic 3D virtual avatar of an athlete, capturing realistic motion data of the athlete via a pre-recorded motion database, streaming the motion over a network and displaying the virtual human model of the athlete in a virtual and fully interactive environment. While we have focused on tennis in this paper we believe our methods can be generalised to a wide range of other activities both in sports and otherwise. Keywords: Body Scanning, Scalable Rendering, Motion Capture, Inertial Measurement Units, Immersive Environment 1 Introduction 3DLife [1] is a European Union funded project that aims to integrate research conducted within Europe in the field of Media Internet. The next generation of Media Computing services will become the cornerstone of the Information Society, having a significant impact not only on the entertainment industry, but also in the way that society delivers key services such as health care, learning and commerce to all, irrespective of geographical location or access device. As such, 3Dlife is devoted to developing and integrating enabling technologies that could make interaction between humans in virtual on-line environments easier, more reliable and more realistic. In order to obtain this goal, the integration of recent progress in 3D data acquisition and processing, autonomous avatars, real-time rendering, interaction in virtual worlds and networking is required. In this work, we present a framework, built in the first 6 months of the 3DLife network of excellence, that integrates some of enabling technologies required for this strategic vision. In this paper, we integrate a number of technologies from 3DLife consortium members, resulting in system framework that can simulate the realistic movements of a real-world human body in an on-line virtual environment. As such, 2 Kasap, et al. the real-world motion of a human body is captured, and rendered as a realistic avatar in an immersive virtual environment. As a real case scenario, simulation of a tennis player in motion will be demonstrated. Several advanced tools and computer graphics methods are used to achieve the visual realism. The production pipeline of this work consists of the following stages: Firstly a human model is scanned and post-processed to generate high quality mesh with texture. Secondly these models are processed for scalable rendering. Thirdly the resulting high quality mesh is mapped to an animation library and finally the resulting model is streamed through a network and rendered in an interactive environment for a visual demonstration. By simulating a realistic virtual animation of a tennis player and displaying this in the context of a completing interactive virtual environment we believe that we will provide a level of realism and immersion not previously experienced. Section 2 outlines our proposed system framework. In Section 3 an overview of the digital creation of the realistic avatars is given including surface reconstruction and character skinning. In Section 4 a description of the rending process is presented. The procedure for acquiring accurate and realistic motion animation data is explained in Section 5 and networking is outlined in Section 6. An example screen shot of the system in presented in 7 and the conclusion is given in Section 8. 2 System Overview Figure 1 illustrates an overview of the different stages of the proposed system framework. The first stage involves generating a realistic virtual human body model for the real-world tennis player. In this work, this is achieved using a full body human laser scanner to obtain highly detailed 3D scans of the human subject. These scans are then post-processed in order to map a skeletal structure, skin, facial textures and virtual clothing to the human model as detailed in Section 3. In this work, as the system incorporates a scalable rendering framework (see Section 4), the model is also processed for scalable and real-time rendering. The second phase of the proposed system framework allows realistic movements from real-world human subjects to be conferred to virtual human models. Motion is incorporated into the model animation by employing human motion reconstruction techniques that employ a number of wireless wearable accelerometers and a motion database. This motion, when transferred to the skeletal structure in the model, provides realistic human-like movements of the virtual human body model (see stage three in Figure 1). The fourth stage of the system framework securely transfers the acquired motion data over a network. This data can then be streamed to a client and loaded onto a previously downloaded model, thus allowing for real-time motion to be transmitted and reconstructed via the virtual human model at a different location. The client consists of an interactive rendering environment, such a virtual tennis court, that allows either the streaming of pre-recorded animation 3D Realistic Animation of a Tennis Player 3 Fig. 1. Overall system outline. files or real-time animation motion data via a network connection – see the final stage as depicted in Figure 1. 3 Human body models 3D human body models are core elements of computer graphics environments such as games, virtual modelling tools and virtual reality applications. Recent technological advances in computer graphic techniques and hardware development have made it possible to use more realistic models in these virtual environments, including but not restricted to muscle and fat tissue deformation effects, during animation. The primary characters in the majority of these virtual environments are human body models and they require specific techniques for real-time animation and realism. Humans spent massive amounts of time and brain-power visually inspecting facial and hand movements, primarily for communication, and this has made us very astute at detecting artificial humans. Due to the importance of these factors specialized research areas focused on face, hand, skin, muscle modelling and skeleton attachment to generate realistic models that will improve the quality and realism of the environments. Human body models which are subject to all those techniques and environments are either designed by 3D model designers or acquired by 3D body scanners like the ones from Human Solutions [19]. 3.1 Surface Reconstruction Raw data generated from a full body human laser scanner can consist of a million point coordinates captured from subject’s surface. The resulting raw data 4 Kasap, et al. Fig. 2. Scanner generated point cloud to 3D textured model. requires post-processing to generate a mesh from the point cloud, see Figure 2 [9]. This post-processing operation consists of several minor stages such as outlier removal, noise reduction, simplification, normals estimation and triangulation [14]. In addition to those stages the final mesh is textured with the images of the scanned subject that are captured from different multiple angles. In almost all the post-processing stages the k-Nearest Neighbour (k-NN) of each point in the input data set is used. To remove the outliers vertices above the average distance of the k-NN are eliminated. To reduce the noise the following operation is applied on each vertex of the input set. A plane is fitted on the k-NN of the vertex under the operation. Then the vertex is projected on this plane to have a smoother trajectory along the k-NNs. Similarly the normal direction of each vertex in the input-set is estimated by assigning the same fitted plane’s normal vector to the vertex under the operation. To reconstruct a surface from the captured points, post-processed input point sets with estimated normals are used to compute an implicit function. Finally the iso-surface of this function is extracted to reconstruct the scanned surface [17]. In addition to these stages the reconstructed model surface is also textured to increase the visual realism. For the texture mapping several images of the model are captured from different view angles. Subsequently these images are mapped on the 3D model by using a ray-casting approach together with colour camera coordinates, see Figure 3. 3.2 Character Skinning Visually realistic character simulation is dependent not only on the realistic 3D model but also requires the model to be animated with life-like motion. Adding motion to a 3D static model requires additional information about the models skeleton and skin structure, see Figure 4. This information is generated by attaching a virtual skeleton onto the model and skinning it. Skinning is the process to assign deformation weights to model vertices [15]. These weights define the deformation relation between the vertex and the skeleton. When the underlying 3D Realistic Animation of a Tennis Player 5 Fig. 3. Rays are casted from colour cameras to construct texture map. virtual skeleton moves, corresponding vertices follow the skeleton with the ratio of the pre-defined skin weight. To animate the skeleton, mainly the motion capture devices are used. 4 Scalable rendering Scalable rendering means adapting the rendering process on the fly to the capabilities of the client’s machine. In many cases, this means using a high resolution mesh for high-end computers and low resolution meshes for low-cost computers. Usually this involves the process of up-sampling a low resolution mesh for highend computers or down-sampling a high resolution mesh for low-end machines. Today’s graphics processors are capable of performing the former one on-the-fly by means of real time tessellation. As a consequence, animation methods (e.g. skinning) can be applied on the low resolution mesh before up sampling this mesh to a higher resolution for high quality display. This reduces drastically all non rendering-related computations (e.g. animation, physics interaction) in contrast to slow animation on a high resolution mesh. Moreover, it lowers the memory footprint as only a low resolution mesh has to be kept in memory and minimizes the data-transfer between the main memory and the graphics card. In this work, a specific piece of software is being designed to support this stage of the system framework. The program is able to load the virtual human model person, as obtained from Section 3, and display it on a wide range of graphics hardware implementations. To achieve this aim, we maintain a simpli- 6 Kasap, et al. Fig. 4. Character Skinning. From left-to-right; (a) Static model; (b) Virtual skeleton attached; (c) Skinned (left thigh weights visualized); (d-e) Animated. fied version of the scan in main memory and use the ability of modern graphics cards to process and up-sample geometry fast and efficiently, thus improving the geometric and visual quality of real-time rendering. Real-time tessellation methods offer the ability to up-sample 3D surface meshes on-the-fly during rendering. This up-sampling relies on 3 major components: (a) a tessellation kernel, which can be implemented on the GPU (Graphics processing unit) or is available as a hardware unit; (b) the surface model, which defines the positions of the newly inserted vertices – we focus on recent visually smooth models; and (c) the adaptability, which tailors the spatially varying distribution of newly inserted samples on the input surface. In this work we focus on the last component and introduce a new view-dependent adaptability metric that builds upon both intrinsic and extrinsic criteria. As a result, we obtain better vertex distributions around important features in the tessellated mesh. In this work, we start by simplifying the raw input body scan mesh using a mesh decimation approach. We recursively use an edge collapse operator [8] using a combination of the seminal Quadric Error Metric [7], together with a normal-based metric derived from the variational shape approximation technique [6], which is more suitable for preserving sharp features. A single iteration of the pre-processing stage is performed and the simplified mesh can be set to a resolution which is suitable for the target platform (in the order of 3,000 triangles for a standard GPU). At run time, we use the cheap, visually smooth, Phong Tessellation operator [2], combined with an adaptive tessellation kernel based on refinement patterns and hardware instantiation [3,4] to perform the aforementioned up-sampling of a mesh. The latter procedure can be replaced seamlessly using recent hardware tessellation units. The tessellation process, as shown in Figure 5, is entirely performed on the GPU and has almost no prerequisites. Furthermore, it can 3D Realistic Animation of a Tennis Player 7 Fig. 5. From a simplified scanned model, real-time tessellation. reach high frame-rates, while adapting the level of detail of the character to their screen-space occupancy – as such it can therefore balance the rendering workload dynamically. Note that right before up-sampling, we apply skinning on all vertices of the coarse mesh. This process can be implemented on the GPU but at such a low resolution the CPU is not a bottleneck here. 5 Motion data In order to realistically simulate an activity in a virtual environment, the 3D movements of each person participating in the activity must be faithfully and efficiently captured. With accurate 3D motion information, each character in the environment can be animated in a realistic manner, resulting in not only naturalistic character movement but also increasing the ability for characters to efficiently interact with the environment surrounding them. The traditional gold standard technique for motion capture is to use an expensive multi-camera optical system, such as the Vicon [21] or Coda [5] systems. In either system, markers (which may be active [21] or passive [5]) are fixed on the human body in a number of predefined locations. These markers are tracked in 3D by a number of pre-calibrated cameras. At any given time instant, if the positional information of each marker location on the human body is known, 8 Kasap, et al. a pre-defined 3D human skeleton can be fit to the measured marker positions. Over a time period, these individual body poses are combined to obtain an animation of the captured 3D movement of the human character. The advantages of such systems are their high degree of accuracy and frame-rate, allowing precise and rapid human motion to be captured. As such, these techniques are regularly employed by sports science professionals to analyse movement and maximise athlete training and performance; animators and video game designers to make their animations more realistic; and the film industry for naturalistic motion in special effects. Although accurate, optical motion capture systems are cumbersome for capturing some activities – especially sporting scenarios. Motion capture rigs are typically expensive, difficult to set-up, and confine motion to a pre-calibrated area that tends to be small unless a large number of cameras are available – a difficulty for athletes in particular, who often move very quickly through large volumes of space in a match. In order to increase the size of the capture area to even a relatively small sporting field of play, such as a tennis court, a very expensive installation would be required – an unobtainable scenario for amateur athletes or home enthusiasts. In addition, and perhaps most importantly for realtime human-computer interaction in a virtual environment, few motion capture systems provide real-time motion of a character’s skeleton – rather, once the data is captured, it requires a semi-automatic clean-up process requiring human interaction and intervention to process and manipulate the captured data. This clean-up process ensures that the reconstructed motion is correct at times when the tacking of markers may be lost due to rapid movement or occlusion. In this work, we would like to make the system relatively cheap, near real-time and unobtrusive to wear. In order to obtain viable biomechanical information on player movement, the reconstructed motion should also be as accurate as possible when compared to the real movements of the player. For sports with a high level of motion, such as tennis, golf and baseball, the most important motions to be captured are usually a high velocity motion that takes place in a very small time interval, as such the proposed technique should have a high frame-rate in order to accurately capture these rapid motions. Recent work into the use of cheap inertial sensors to extract real-time body motion data provide an insight into possible techniques for achieving this aim. In our approach the motion from an actor (or athlete) is captured using a small number of ±10G tri-axial accelerometers placed on the human body. These accelerometers are light weight, relatively cheap, easy to set-up and unobtrusive for the athlete to wear for the vast majority of motions. Data from each accelerometer is sampled at 100Hz, time-stamped and transmitted via a serial wireless link to a data collecting base station. In our experiments, 1 accelerometer was placed onto each forearm and shin, 1 was positioned on the chest, and 1 was located on the lower back. These positions were strategically chosen to obtain input data from each limb, root position and torso. However, the locations used depends on where on the body accurate motion reconstruction is required. In cases where no 3D Realistic Animation of a Tennis Player 9 inertial measurement units are used on limbs, the system will recreate plausible movements based solely on the accelerometer input from other areas of the body. Inertial wireless accelerometers are small and lightweight enough so as not to interfere with a sports athlete’s performance when worn in clothing. An example of this technology can be seen embedded in the Nike+iPod Sports Kit [16], a device which measures and records the distance and pace of a walk or run. The technology consists of a small wireless accelerometer attached to or embedded in a shoe that detects when a foot is in contact with the ground, this information is then transmitted and combined with timing information to determine such statistics as the elapsed time of the workout, the distance travelled, pace, or calories burned by the individual wearing the shoes. Although not accurate enough for sport scientists (accurate to plus or minus 5%[13]) it is perceived to be accurate enough for the needs of an average user. The use of accelerometers in this sports kit is solely as a data-gathering technique to supply data for personal metrics on human performance in sports. It does not supply real time interaction between the user and a virtual environment, or reconstruct the pose of the actor using it. The use of accelerometers in the Nintendo Wii Remote [24] can supply this information at a coarse degree of accuracy. Gaming with the Wii remote can be regarded as a performance animation experience [18] – the accelerometers allow a player to interact with virtual objects and perform coarse movements to perform actions required in the game. Similar applications using in-built accelerometers in smart phones (such as the iPhone) to control movement in games have also been created. These technologies however only acquire movement/position information for a small part of the body. Techniques for full body human motion capture have also been developed. When held in a static position, a tri-axial accelerometer outputs a gravity vector that points towards the earth. This alone is enough to determine the sensor’s pitch and roll. Techniques, such as [12,20], use this information plus the assumption of low acceleration motion to obtain temporal poses from a previously determined body position and the double integration of accelerations to obtain movement between consecutive time segments. XSens [25] extends this approach to include the use of both accelerometers and gyroscopes in a single inertial measurement unit (IMU). As such, these techniques require no external cameras, emitters or markers are needed for relative motions, in addition the approaches have a large degree of portability and are suitable for capture in large capture areas. However, they are prone to positional drift which can compound over time and do not handle high velocity motion sequences well. To prevent this drift acoustic-inertial trackers have also proposed [22], which uses an acoustic subsystem to measure distances between markers, however the suit itself could be cumbersome for some activities such as high energy sports. In this work, we adopt the approach of [18] to reconstruct human motion using a number of wireless accelerometers. In addition, we extend their work to also reconstruct the motion of the full body (and not just the top half). We also improve the approach of [18] at transition points. Blending at transitions (i.e. between two or more distinct movements in the database), while making the 10 Kasap, et al. animation look smooth and naturalistic, may mean that artefacts that did not exist in the original motions can be introduced into the animation. Their proposed system escapes the problem of temporal drift by avoiding the integration of accelerometers, instead it determines the motion of a character by comparing the input accelerometer values against those defined in a pre-captured motion database. The section of the database that has the closest accelerometer readings to the input from the real-world character is chosen to represent the movement of the real-world character. The approach adopted can be broken down into three main stages; (1) before capturing an athlete, a database of motion data is built up in the lab, this database consists of motions that are specific to the real-world movements they expect to capture at run-time (in their work, this consisted of a set of upper body actions including stretching, jumping jacks, and mock tennis swings); (2) a calibration stage, whereby the accelerometers in the database are aligned with those of the real-world player; and (3) a search stage, whereby segments from the motion capture database that consist of similar accelerations to those performed by the real world player are obtained and provided as an output to the system. In order to perform smooth motion, blending of motion or “jumps” between locations in the database are performed. In the work of [18] the motion of the upper half of the character was obtained – the motion of the root and legs were not reconstructed. A common unwanted issue is known as footskate [10], a distracting error is when the character’s feet move when they ought to remain planted to the ground. As such, we remove footskate using a similar technique to the inverse kinematics based approach proposed by [11]. 6 Networking Implementation In order to accommodate users with varying hardware and networking capabilities, virtual human models are converted into several resolutions. In the beginning of the session, the client downloads a player avatar with the appropriate resolutions, similar to SDP (Session Description Protocol), according to user preference or network bandwidth. Virtual human models are stored in COLLADA file format and the data size is between 2 to 30 MBytes. In order to ensure error-free data transmission, virtual human models are transmitted before the session starts using TCP/IP protocol. In order to accommodate real-time animation, motion data is transmitted using UDP protocol with basic error detection capabilities such as in-packet CRC (Cyclic redundancy checks). After the validation of the packets, animation motion data is sent to the animation module for rendering. 3D Realistic Animation of a Tennis Player 7 11 Working Prototype In this work, three human avatars were created, using over 50 body scans in the process. For the motion capture module, a pre-captured database of sportsspecific motion data must be obtained. This database can be captured in short segments in the lab, but the preferred approach is in a sports arena during a competitive match scenario. When a player is asked to move around a lab providing sample shots, the player is performing without a specific goal, whereas there are multiple goals in a competitive scenario: the player must play the most tactically relevant shot, get the ball over the net, keep it in play, and move around the court to get into position for the next shot. We captured 5 minutes and 12 seconds of data of a high performance tennis athlete on court at 120Hz using a 12 camera Vicon motion capture system. During the capture session, the player was asked to play a variety of tennis shots – serves, forehands and backhands – into specific areas of the opposing player’s court who fed balls for non-serve shots. He was asked to perform typical tennis movements around the capture area, hitting shots at specific points to provide us with realistic player motion dynamics in a competitive scenario. However, the entire of the court could not be covered by the motion capture system. Rather, a smaller area around the court baseline was used, corresponding to the region where players spend most of their time, and the player was asked to stay within these boundaries. At the end of the capture session, two players participated in a competitive match to win individual points using standard tennis rules. An evaluation of the proposed technique revealed an average accuracy of between 10−20 degrees for mean joint angle error over 16 tennis motion sequences consisting of 6 different types of tennis movement (Serve, Forehand, Backhand, Motion, Move and Hit, Standard Games). For all these evaluations, a remove-and-test evaluation approach was adopted. In general, if a motion was performed in a similar manner to a motion in a database, good reconstruction was obtained. However, poor reconstruction resulted when a player performed significantly novel motions. Figure 6 shows the virtual tennis environment created in this work with realistically animated human models simulated and accurate, life-like, motion inferred on the models. The 3D movements of each human-model in the environment can viewed from any angle and replayed or slowed down by the user. In this figure, the rendered tennis players are located in a virtual tennis court environment obtained from Google 3D Warehouse [23]. 8 Conclusions and Future Work In this paper, we presented the progress of collaborative ongoing work into the realistic 3D animation of a tennis player from the 3D-Life project. The main focus of this work revolves around producing a realistic 3D virtual avatar of an athlete, capturing realistic motion data of the athlete via a pre-recorded motion database, streaming the motion over a network and displaying the virtual human model of the athlete in a virtual and fully interactive environment. With accurate 12 Kasap, et al. Fig. 6. A working prototype of the virtual tennis environment with animated human models. 3D motion information, simulated via motion capture and motion database, each character in the environment can be animated in a realistic manner, resulting in not only naturalistic character movement but also increasing the ability for characters to efficiently interact with the environment surrounding them. While we have focused on tennis in this paper we believe our methods can be generalised to a wide range of other activities both in sports and otherwise. Future work includes the synchronizing of audio data transmission over network, efficiently tracking the tennis ball and integrating this data into the virtual environment, full real-time tennis game simulation and increasing the level of detail in the simulations. Acknowledgements This research was partially supported by the European Commission under contract FP7-247688 3DLife. This work is supported by Science Foundation Ireland under grant 07/CE/I1147. References 1. 3DLife: http://www.3dlife-noe.eu (2010) 2. Boubekeur, T., Alexa, M.: Phong tessellation. In: ACM Transactions on Graphics (2008) 3D Realistic Animation of a Tennis Player 13 3. Boubekeur, T., Schlick, C.: Generic mesh refinement on gpu. In: SIGGRAPH/EUROGRAPHICS Conference On Graphics Hardware. pp. 99–104 (2005) 4. Boubekeur, T., Schlick, C.: A flexible kernel for adaptive mesh refinement on gpu. In: Computer Graphics Forum. pp. 102–113 (2007) 5. Coda: http://www.codamotion.com (2010) 6. David Cohen-Steiner, P.A., Desbrun, M.: Variational shape approximation. In: ACM Transactions on Graphics. pp. 905–914 (2004) 7. Garland, M., Heckbert, P.S.: Simplifying surfaces with color and texture using quadric error metrics. In: Siggraph. pp. 263–269 (1998) 8. Hoppe, H.: Progressive meshes. In: SIGGRAPH / ACM Special Interest Group on Computer Graphics and Interactive Techniques. pp. 99–108 (1996) 9. Hoppe, H., DeRose, T., Duchamp, T., McDonald, J., Stuetzle, W.: Surface reconstruction from unorganized points. In: SIGGRAPH / ACM Special Interest Group on Computer Graphics and Interactive Techniques. pp. 71–77 (1992) 10. Ikemoto, L., Arikan, O., Forsyth, D.: Knowing when to put your foot down. In: I3D ’06: Proceedings of the 2006 symposium on Interactive 3D graphics and games. pp. 49–53. ACM, New York, NY, USA (2006) 11. Kovar, L., Schreiner, J., Gleicher, M.: Footskate cleanup for motion capture editing. In: SCA ’02: Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation. pp. 97–104. ACM, New York, NY, USA (2002) 12. Lee, J., Ha, I.: Real-time motion capture for a human body using accelerometers. Robotica 19(6), 601–610 (2001) 13. McClusky, M.: The nike experiment: How the shoe giant unleashed the power of personal metrics. Med-Tech: Health (Wired) 17(7) (2009) 14. Michael Kazhdan, M.B., Hoppe, H.: Poisson surface reconstruction. In: SIGGRAPH / ACM Symp. on Geometry Processing. pp. 61–70 (2006) 15. N. Magnenat-Thalmann, R. Laperrire, D.T.: Joint-dependent local deformations for hand animation and object grasping. In: Canadian Information Processing Society. pp. 26–33 (1989) 16. Nike+iPod: http://www.apple.com/ipod/nike/ (2010) 17. Rineau, L., Yvinec, M.: A generic software design for delaunay refinement meshing. In: Elsevier Science Publishers B. V. pp. 100–110 (2007) 18. Slyper, R., Hodgins, J.: Action capture with accelerometers. In: 2008 ACM SIGGRAPH / Eurographics Symposium on Computer Animation (Jul 2008) 19. Solutions, H.: http://www.human-solutions.com (2010) 20. Tiesel, J., Loviscach, J.: A mobile low-cost motion capture system based on accelerometers. pp. II: 437–446 (2006) 21. Vicon: http://www.vicon.com (2010) 22. Vlasic, D., Adelsberger, R., Vannucci, G., Barnwell, J., Gross, M., Matusik, W., Popović, J.: Practical motion capture in everyday surroundings. ACM Trans. Graph. 26(3), 35 (2007) 23. Warehouse, G.S.D.M.: http://sketchup.google.com/3dwarehouse/ (2010) 24. Wii, N.: http://www.nintendo.com/wii (2010) 25. XSens: http://www.xsens.com (2010)
© Copyright 2025 Paperzz