REAL-TIME NOVEL RENDERING ARCHITECTURE FOR 3D DISPLAY

REAL-TIME NOVEL RENDERING ARCHITECTURE FOR 3D DISPLAY
Wei-Hao Huang, Wei-Jia Huang, Kai-Che Liu and Ludovic Angot
Electronics and Optoelectronics Research Laboratories,
Industrial Technology Research Institute, Taiwan, R.O.C
{huangweihao, weijiahuang, lkcjack, LudoAngot}@itri.org.tw
Abstract - This paper proposed the novel rendering
architecture based on GPU massive multithreading for
generating multi-view content to 3D display. With the
input of 2D plus depth images, this system can support
stereoscopic display as well as auto-stereoscopic one. In
order to achieve real-time and high quality depth image
based rendering (DIBR), anti-aliasing pixel rendering,
unoccupied pixel inpainting and multi-view interlacing are
proposed and optimized based on parallel architecture.
Meanwhile, the GPU-based resizing is also developed by
using GPU CUDA interpolation function. The antialiasing pixel rendering algorithm, considering occluding
effect simultaneously, is to solve the depth aliasing and
disparity quantization problem by using floating-point
based mapping to check occupying proportion.
Verification on NVIDIA GeForce GTX280 GPU has
demonstrated the real-time high quality performance for
the proposed novel rendering architecture.
Keywords-component: 3D display, anti-aliasing,
depth image based rendering, GPU, CUDA
I.
INTRODUCTION
Recently the movie “Avatar” successfully promotes
the entertainment toward 3D field. 3D definitely becomes
the mainstream and has amazed everyone nowadays. The
related technologies are presented from anaglyph glass (or
called red-blue glass) in the beginning to shutter glass and
polarize glass. Following the autostereoscopic display is
published for seeing 3D without the need of wearing any
particular glasses. Beside the display hardware supports,
3D content is demanded to support both stereoscopic and
autostereoscopic display. As shown in Fig. 1, the
connection between 3D content and 3D display is the
rendering to synthesize required display format. In the
paper, 2D image with depth map represents the input 3D
content format. On the other hand, the format of display is
various from stereoscopic to autostereoscopic display. 2view or multiple (N)-view are required to interlace the
output format to show 3D.
Berretty et al. [1] presented a procedure of rendering
3D on lenticular autostereoscopic display from 2D with
depth. however, the synthesized image quality is
descended by aliasing. Moreover, Cheng et al. [2]
proposed the algorithm to recover the large disocclusion
regions to maintain synthesis quality, but much more time
consuming is needed. Therefore, in this paper, the novel
rendering architecture is proposed to achieve real-time and
high quality depth image based rendering (DIBR). The
novel framework of DIBR consists of anti-aliasing pixel
Figure 1. The rendering framework and the system
Figure 2. The proposed novel rendering architecture
rendering, unoccupied pixel inpainting and multi-view
interlacing to solve the disparity aliasing by considering
occupying proportion while disparity mapping.
Optimization for parallel massive architecture is
developed so that the proposed algorithm can achieve realtime floating point operation according to the NVIDIA
GPU and CUDA architecture. Meanwhile, the GPUbased resizing is also designed by using GPU CUDA
interpolation function to adjust the input and output
resolution in real-time.
The paper is organized as follows. In section I., a brief
introduction about rendering is given. The framework of
the rendering is addressed in section II. The overall
architecture of the NVIDIA GPU and the proposed
parallel multithreading design are in section III., including
the architecture design of anti-aliasing pixel rendering and
unoccupied pixel inpainting. The implementation on the
NVIDIA GPU is shown in section IV. Finally, we
conclude the paper.
II.
NOVEL RENDERING FRAMEWORK
To achieve high quality rendering considering antialiasing, the framework of our proposed algorithm is
shown on Fig. 2. First of all, the architecture adjusts input
resolution to fit the display output requirements by
resizing the 2D image with depth map input based on
CUDA Texture Memory. “cudaFilterModeLinear” mode
is supported so that the value returned when fetching the
texture is computed based on the input texture coordinates.
After resizing input, the rendering framework is performed
in 3 steps: anti-aliasing pixel rendering, unoccupied pixel
inpainting and multi-view interlacing which is for fitting
the display format. Following describes the algorithms of
step 1 and 2 in detail.
respectively, where pixel x+1 and pixel x are in the
synthesized image. Based on the occupying proportion,
the intensity contributions for mapping the pixel C from
original 2D image to pixel x+1 and pixel x are C*α
A. Anti-aliasing Pixel Rendering
is the function to
and C * (1 − α ) . The equation j =1
calculate synthesized intensity for every pixel. But as
shown in Fig. 4(c), occupation ratio affected by other
pixels has to be considered in the previous equation and
becomes Cfg* β +Cbg*( α - γ ) where β and α are the
occupying proportion, γ is the occlusion ratio between
forground Cfg and background Cbg. Nonetheless it takes
O(n2) to exactly calculate every occlusion between n
pixels.
The anti-aliasing pixel rendering algorithm, considering
occlusion simultaneously, is to solve the depth and disparity
quantization problem by using floating-point based mapping.
This step is divided into two parts, one is to calculate the amount
of disparity from depth map for each pixel in each rendered view
and the other is to solve aliasing problem by checking occupying
proportion and mixing mapped color information based on the
weighting of occupying proportion.
n
∑α
j
⋅Cj
(a)
(b)
(c)
Figure 4. Anti-aliasing pixel rendering, (a) floating point disparity
mapping, (b) occupying proportion and (c) the proposed mapping
algorithm
Figure 3. Geometry of a point P to be rendered on a screen in relation
with viewer location
(a) Disparity Mapping
Faugeras [3] established the framework for
geometrical computation which was also used in [1]. The
disparity mapping method used in this paper is directly
derived from these two references.
The disparity
computation is based on the relationship between the
depth and the pixel location. The disparity of each pixel
in a new view is computed relatively to the position of
each corresponding pixels in the central view. As shown
in Fig. 3, the equation is as below:
b pz
ρ
si = s0 + k
(1)
2 d + pz
where s0 is the pixel position in the center view, k is the
view index (e.g. from -4 to +4 for nine views), b is the
stereo interval, p z is the distance from the point P to be
rendered to the eyes of the viewer, ρ is the ratio of the
screen definition in number of pixels to the width of the
display and d is the distance between the viewer and the
screen.
(b) Anti-aliasing by Checking Occupying Proportion
During the disparity mapping procedure, pixel
disparity value is often truncated into integer for easy
implementation in pixel-based mapping. As shown in Fig.
4(a), disparity will be quantized to the integer pixel
coordinate. However, the way causes “depth aliasing” and
reduces image quality a lot, especially when the image
having straight line. In order to reduce the aliasing
problem, floating point data structure to calculate pixel
disparity is needed. In the case, mapping each pixel to the
synthesized image may occupy across two pixels as shown
in Fig. 4(b). Therefore, occupying proportion has to be
considered as α to pixel x+1 and (1 − α ) to pixel x
In order to simplify the procedure, Cobj and Cpre are
defined to be the background and foreground intensity
with occupying proportion βand (α+β-γ) where the
occluding ratioγ is determined as max(0,α+β-1). Each
pixel keeps their previous color Cpre and previous
occupation ratio α . Finally, the new anti-aliasing
mapping equation becomes
Pixelnew = β * Cobj +
(α − γ )
α
* C pre
(2)
The simplified procedure only needs O(n) computation
loading. Nevertheless, it works correctly based on
considering occluding effect described detailed in the
following subsection C. It means disparity mapping have
to take account of proper order.
Figure 5 shows the verification for the proposed antialiasing pixel rendering by testing the video “The Fifth
Element” to verify the difference between with/without
anti-aliasing. Our proposed anti-aliasing procedure can
effectively reducing depth aliasing and the disparity
quantization problems.
(a)
(b)
Figure 5. (a) Traditional rendering and (b) the proposed anti-aliasing
pixel rendering
B. Unoccupied Pixel Inpainting
In the step of anti-aliasing pixel rendering, some pixels
in the synthesized image might be unoccupied or not fully
filled which means the final occupied proportion α for
the pixel is 0 or the value between 0 to 1(0<α<1). Both of
them we called is “unoccupied pixel”. In the case, two
steps are proposed to efficiently inpaint unoccupied pixels.
When the condition is (0<α<1), the color intensity is
replaced by C/α, else while α is zero, an approach that
copies the color value of the pixels located at the border of
the pixels as shown on figure 6. The key point is
following the occluding effect in subsection C because it
depends on direction and order. After anti-aliasing pixel
rendering and unoccupied pixel inpainting, all synthesized
pixels are fully filled and multiple views are rendered from
2D with depth map.
(a)
(b)
Figure 6. (a) Traditional unoccupied inpainting by using interpolation
and (b) our proposed copying the border pixel color values
C. Occluding Effect
In the section A. and B., occluding effect is an important issue
that direct the mapping order and inpainting direction. Here an
example is introduced to evaluate the disparity for a HD 3D
display with 1920 horizontal pixels and 60 cm wide. Parameters
are defined as ρ = 32cm −1 , b =6.6cm for the interval, Pz is 70
and the distance from viewing zone to screen is 150cm. Fig. 7
describes the two extreme rendered views k = -4 and k = 4 which
means the most left and right views to the original 2D (view 0).
Starting from view 0, two objects are determined in depth 255
(the near) and depth zero (the far) where case 1 and case 2
indicates two different rendering effect. Mapping from depth to
disparity is arranged to behind and in front of the display for
case 1 and 2 respectively.
Observing the mapping results in Fig. 7, the direction and order
for rendering and inpainting can be proposed as listed in Table I.
For rendering, no matter the case 1 or case 2, pixel rendering
direction should start “from right to left” for view in the left
(k<0) so that the near object is never occluded by the far object.
Meanwhile, for unoccupied pixel inpainting performed by
copying near pixel color, the order should begin “from left to
right” to inpaint with the background pixel(the far object).
According to the rules, for rendering and inpainting view in the
right (k>0) should be perform in the contrary way. Based on the
occluding effect principles as proposed in Table 1, the proposed
simply rendering and inpainting procedure in section A. and B.
can work correctly and efficiently.
Unoccupied
inpainting
K>0
K<0
Right
Left
Left
Right
III. MULITHREADING ARCHITECTUR DESIGN
In a matter of just a few years, the programmable Graphics
Processor Unit (GPU) has evolved into an absolute
computing workhorse [4]. With multiple cores driven by
very high memory bandwidth, today’s GPU offer
incredible resources for both graphics and no-graphics
processing. More specifically, the GPU is especially wellsuited to address problems that can be expressed as dataparallel computations. The proposed rendering framework
is very suitable to be optimized for the massive
multithreading architecture.
Computed Unified Device Architecture (CUDA) is
the revolutionary GPU built-in technology, which provides
a unified hardware and software solution for data-intensive
computing [5]. The thread computing processing model
that takes advantage of massively threaded NVIDIA GPU
architecture.
CUDA’s high-performance, scalable
computing architecture accelerates complex parallel
computation much faster than traditional CPU-based
architectures without the need of mapping to a graphics
API. Taking NVIDIA GeForce GTX 280 GPUs for
example, there are up to 240 parallel computing cores for
massive floating-point processing power to enable
maximum application performance. Moreover, CUDA
SDK unlocks the power of the GPU using industrystandard C language. As shown in Fig. 8, the architecture
in the GTX 200 is based on 10 clusters and 24 cores per
cluster. In this paper, GTX 200 core is selected as target
platform and programmed by using CUDA. Following
describes the parallel architecture design and framework
optimization based on the GPU architecture.
Figure 8. The GPU parallel architecture of GTX 200 series
Figure 7. Occluding effect description for near and far objects
Table I. Processing direction for different view
Method
View
Direction
Anti-aliasing pixel
K>0
Left
rendering
K<0
Right
Right
Left
Since the display is various from stereoscopic to
autostereoscopic type, the optimization strategy are
independent for rendering 2-view and multiple (N)-view.
The interlaced format of 2-view is not the same as multiple
N-view, such as 7-view or 9-view.
For 2-view
stereoscopic display, the way is that respectively taking
odd and even (or even and odd) rows from left- and righteye images to compose an interlace image. In the
proposed framework, the original 2D image is selected to
be one view so only the other one need to be rendered. In
order to avoid time consumption, ( Heightview / 2) * Widthview
threads are created to be in charge of moving each pixel in
odd (or even) rows from original 2D image and put into
interlaced image in corresponding position, where Height
and Width are the resolution of interlaced view. Since the
proposed rendering and inpainting framework are
preformed as line by line, three steps can be integrated in
the same procedure in one thread. It means only
1 / 2 * Height view threads to do anti-aliasing pixel rendering,
unoccupied pixel inpainting and 2-view interlace in one
step for remained half rows of the interlaced output. The
summary of 2-view stereoscopic multithreading
architecture is listed on Table II.
autostereoscopic display. Each thread has to store three
corresponding components, which number of views, which
pixel position and which sub-pixel, respectively. GPUbased multithreads can perform the proposed architecture
very well.
Table III. Threads arrangement for N-view display
Anti-aliasing pixel rendering
and
unoccupied pixel inpainting
Multi-view interlacing
IV.
Table II. Threads arrangement for 2-view display
Method Description
Copy odd (or even) rows from
original 2D image
Anti-aliasing pixel rendering,
unoccupied pixel inpainting and
interlacing as one step
Thread Number
( Height view / 2) * Widthview
1
Height view
2
(a)
(b)
Figure 9. Interlace configuration (a) top right corner of view k and (b) top
right corner of the final interlaced image
However, for N-view autostereoscopic display,
multiple views have to be rendered and interlaced in the
particular format. For example, one specified interlace
method based on slant sub-pixel arrangement is shown as
Fig. 9. It’s not the line by line architecture so that
multithreading arrangement should be separated from
rendering and inpainting. Moreover, when rendering and
inpainting multiple views, direction and order need to be
taken into account in the proposed framework. Because
the proper order is known reference, one thread can be
arranged to processing one row to finish anti-pixel
rendering and inpainting in the same step. Furthermore, as
the list in Table 1, rendering and inpainting orders are
reverse so that two procedures become one.
The
advantage is not only saving the threads arrangement but
also consuming less time to re-calculating the memory
address because inpainting can follow immediately the
data structure point after rendering. When the image
resolution is becoming larger in the future, the
performance improvement will be more obviously.
Because each thread is responsible to process the data of
one row, so the thread number is decided as “height of
view” by “Numbertotal_view -1”. On the other hand,
there are two approaches considered to implement and
optimize multi-view interlacing. The first one is to use a
look-up table which is pre-created and containing in the
memory to arrange each synthesized pixels from multiview to the particular subpixel coordinate in the interlaced
image. The second approach is to create a total of
Widthint erlace * Heightint erlace * 3 threads as listed in Table III.
where
Widthint erlace and Heightint erlace
are the resolution of the
Highview * ( Numtotoal _ view − 1)
Widthint erlace * Heightint erlace * 3
IMPLEMEMTATION AND VERIFICATION
In order to verify the proposed real-time novel rendering
architecture, including the framework in section II. and
architecture in section III. , implementation and several
comparisons are preformed based on the platform
configuration of 3GHz Intel Core2 Duo CPU and NVIDIA
GeForce GTX 280 GPU. Table 4 shows the specifications
of testing 2-, 7- and 9-view display. The execution time is
shown on Table 5 based on the proposed architectures for
2-view and N-view display, where rendering, inpainting
and interlacing are in one step and separated steps,
respectively. Not only for 2-view display but also N-view
display, the proposed architecture can achieve more than
real-time performance. Moreover, as listed in Table 6, by
comparing the performance of CPU-based sequential work
and GPU-based multithreading architecture, there are 6x,
12x and 10x efficiency improvement for 2-view, 7-view
and 9-view display.
N view
display
2
7
9
1
2
3
4
5
.
Table 4. The specification of N-view display
View number
Each view
Display resolution
for rendering
resolution (RGB)
(RGB)
1
1680*1050*3
1680*1050*3
6
823*400*3
1920*1200*3
8
640*360*3
1920*1080*3
Table 5. Execution time for 2-, 7- and 9-view display
(ms)
2
7
Data transition & resizing
3.94
3.65
Step1: pixel rendering
10.22
Step2: inpainting
5.24
Step3: interlacing
8.05
2-view procedure
18.55
(Step1+Step2+Step3)
Total
22.49
27.16
2-view
7-view
9-view
Table 6. Comparing CPU and GPU results
CPU
GPU
ms
fps
ms
fps
149.79
6.7
22.49
44.5
333.15
3.0
27.16
36.8
298.87
3.3
29.75
33.6
Input frame i
9
3.15
7.22
5.20
14.18
29.75
Speed
Up
6.6X
12.3X
10 X.
around 6-12x efficiency improvement comparing to CPU.
The novel rendering architecture is demonstrated for realtime rendering and showing 3D content with high image
quality.
REFERENCES
Depth map of frame i
[1] R.-P.M. Berretty, F. J. Peters and G. T. G. Volleberg, “Real Time
[2]
[3]
[4]
View -4 of frame i
View +4 of frame i
Figure 10. Rendering example from 2D and depth
According to the verification, the proposed novel
rendering architecture considering anti-aliasing can
achieve real-time performance and has better rendering
quality comparing to the traditional DIBR. Figure 10
shows the other example of the rendering from 2D image
and depth map to multiple views. Viewing on the barrier
type 9-view 1920x1080 autostereoscopic display shows
the good 3D quality in real-time. In the future, when
display resolution is increasing, for example, 4x full HD,
the proposed architecture still has the capability to achieve
the real-time performance by adjusting thread arrangement
on the NVIDIA new generation GPU “Fermi”, since the
proposed architecture is thread-based framework.
V.
CONCLUSION
The paper proposed a novel rendering framework based
on the massive multithreading architecture, including antialiasing pixel rendering, unoccupied pixel inpainting and
multi-view interlacing, respectively. The anti-aliasing
pixel rendering algorithm improves the synthesized image
quality a lot by checking occupying proportion and
calculating occluding ratio based on floating point
operation. The simply procedure is further developed
after taking direction/order of occluding effect into
accounts. The parallel multithreading architecture of the
proposed framework achieves real-time rendering for
stereoscopic and autostereosopic display from 2D and
depth map. Optimization strategies for 2-view and N-view
display are considered based on the required interlacing
format. The implementation on NVIDIA GPU shows
[5]
Rendering for Multiview Autostereoscopic Displays”, Procs.
Stereoscopic Displays and Applications Conference, SPIE vol.
6055, p. 208-219, JANUARY 2006.
C.M. Cheng, S.J. Lin, S.H. Lai, J.C. Yang, “Improved Novel View
Synthesis from Depth Image with Large Baseline,” International
Conference on Pattern Recognition, 2008.
O. Faugeras, “In Three-dimensional Computer Vision: a Geometric
Viewpoint,” MIT Press, Ed., 1994.
NVIDIA, “NVIDIA GeForce 8800 GPU Architecture Overview,”
Technical Brief, 2006.
NVIDIA, “NVIDIA CUDA Compute Unified Device Architecture,”
Programming Guide, 2008.