Skewed stereo time-of-flight camera for translucent object imaging

Image and Vision Computing 43 (2015) 27–38
Contents lists available at ScienceDirect
Image and Vision Computing
journal homepage: www.elsevier.com/locate/imavis
Skewed stereo time-of-flight camera for translucent object imaging☆
Seungkyu Lee a,1, Hyunjung Shim b
a
b
Department of Computer Engineering, Kyung Hee University, South Korea
School of Integrated Technology, Yonsei University, South Korea
a r t i c l e
i n f o
Article history:
Received 11 October 2014
Received in revised form 27 July 2015
Accepted 2 August 2015
Available online 18 September 2015
Keywords:
Translucent object imaging
ToF depth camera
Three-dimensional image processing
a b s t r a c t
Time-of-flight (ToF) depth cameras have widely been used in many applications such as 3D imaging, 3D
reconstruction, human interaction and robot navigation. However, conventional depth cameras are incapable
of imaging a translucent object which occupies a substantial portion of a real world scene. Such a limitation
prohibits realistic imaging using depth cameras. In this work, we propose a new skewed stereo ToF camera for
detecting and imaging translucent objects under minimal prior of environment. We find that the depth calculation
of a ToF camera with a translucent object presents a systematic distortion due to the superposed reflected light ray
observation from multiple surfaces. We propose to use a stereo ToF camera setup and derive a generalized depth
imaging formulation for translucent objects. Distorted depth value is refined using an iterative optimization.
Experimental evaluation shows that our proposed method reasonably recovers the depth image of translucent
objects.
© 2015 Elsevier B.V. All rights reserved.
1. Introduction
Due to the ability of direct three dimensional geometry acquisitions,
consumer depth cameras have widely applied to many applications such
as 3D reconstruction, human interaction, mixed reality and robotics.
However, depth images recorded by ToF depth cameras present limited
accuracy because 1) commercial active sensors have fairly limited infrared emission power and 2) their performance varies upon the characteristics of surface material such as reflectivity and translucency inherited
by the ToF sensing principle. Many postprocessing algorithms for 3D
imaging have been proposed under the Lambertian assumption of
matte surface. Depth cameras also work under the Lambertian assumption. In particular, depth cameras are infeasible to detect and image translucent objects while they occupy a substantial portion of a real world
scene. In order to achieve an accurate interaction and realistic imaging,
it is critical to handle the translucent object.
Many researchers have tried to detect transparent and translucent
object region [7] using conventional color cameras [9], laser beams [8]
or based on a prior shape knowledge [4]. This is particularly important
for navigating a robot or vehicle because collision detection for a transparent object is of substance in real situations [10]. McHenry et al. [12]
and McHenry and Ponce [11] consider distinct properties like the
texture distortion and specularity of transparent object in color image
to characterize the transparency. Lysenkov et al. [4] require a prior
knowledge on the shape of transparent object. During the training,
they obtain a 3D model of the object and try to match the model with
☆ This paper has been recommended for acceptance by Richard Bowden, PhD.
E-mail address: [email protected] (S. Lee).
1
Tel.: +82 1091979285.
http://dx.doi.org/10.1016/j.imavis.2015.08.001
0262-8856/© 2015 Elsevier B.V. All rights reserved.
the captured depth image. However, having such a shape prior is often
unrealistic and it is hard to extend this idea to general applications.
Wang et al. [5] detect the transparent region from aligned depth and
color distortion. Phillips et al. [6] use stereo color cameras to segment
transparent object regions at each color image. Murase [20][21] introduce a method for water surface reconstruction. He places a pattern at
the bottom of a water tank and captures the water surface using a
fixed color camera. If the water surface changes over time, the captured
pattern at the bottom also will be distorted. Given the camera position,
the water surface normal is estimated at each frame reconstructing the
shape of the water surface. Similarly, Kutulakos and Steger [17][18] use
a piece-wise light triangulation method to recover the refractive surface
with a known refractive index and a pattern. Recently, Morris and
Kutulakos [19] developed a stereo imaging system using a known background pattern and reconstructed dynamic liquid surface without refractive index. Inoshita et al. [16] assume a homogeneous surface with
known refractive index and explicitly calculate the height of a translucent object surface in the presence of inter-reflection. Meriaudeau et al.
[13] introduce new imaging architecture for transparent object reconstruction such as shape from polarization using the IR lights.
Some researchers have used multiple depth cameras for shape
recovery. They collect transparent object silhouettes from multiple
viewpoints and apply a visual hull for reconstruction. Klank et al. [2]
assume a planar background and take multiple IR intensity images of a
transparent object using a ToF depth camera. Since each shadow on the
planar background represents the silhouette of the transparent object
corresponding to one camera, the volume of the object is estimated by
a visual hull. Similarly, Albrecht and Marsland [1] use shadows on a planar background and take multiple depth images to collect the shadow
regions of transparent or highly reflective objects at different viewpoints.
28
S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38
Alt et al. [3] propose a similar framework using multiple depth images
from moving cameras and show rough reconstruction results. In these
frameworks, the number of viewpoints significantly affects the quality
of volume recovery and it is only applicable to a static scene. Kadambi
et al. [14] propose a coded time-of-flight camera and perform sparse
deconvolution of modulated lights to separate multi path from reflected
signals. The depth of transparent object is recovered by extracting a
reflected light path from a transparent object's surface. These previous
methods work in special background conditions or require a lot of
cameras to obtain reasonable reconstruction performance. Translucent
object reconstruction without environmental constraints using practical
imaging setup still remains a challenging problem.
In this paper, we propose a new approach to detecting a translucent
object and recovering its distorted depth using a skewed stereo ToF
depth camera. Our approach is implemented with a pair of commercial
depth cameras without any assumption except that observed scene is
composed of two layers: single foreground and background. Klank
et al. [2] show that ray intersections observed by multiple color cameras
reveal the rough location of original depth. We are inspired by this
preliminary idea and generalize it for imaging arbitrary translucent
surfaces with detailed shape. In fact, translucent surfaces present the
systematic depth distortion which interferes to employ traditional
stereo matching scheme directly. In order to account for depth distortions in translucent surfaces, we develop a new framework of modeling
these systematic depth distortions based on the understanding of
sensing principle and empirical study. The proposed algorithm consists
of two stages. Utilizing the behavior of depth distortion on translucent
surfaces, we first detect the translucent surface region. To process the
translucent regions, we formulate an energy function for our depth
optimization so as to recover the depth of translucent surfaces
(Sec. 3.2). The optimization process is constrained by three cost terms:
a modified stereo depth matching cost for accounting systematic
depth distortions, a regularization cost for noise elimination and a
depth topology cost for shape recovery of target object. Especially, the
topology term is iteratively updated by the IR intensity observation
(e.g., a texture image of ToF depth sensor) and the analysis of ToF
principle in translucent objects (Sec. 2). As a result, we can recover
the original shape of the translucent object (Sec. 3.3). The contributions
of the proposed work include (1) a generalized framework of detecting
a translucent object and recovering its distorted depth in real-time,
without minimal prior knowledge and (2) a thorough analysis of the
ToF principle on translucent surface for the recovery of detailed shape
in translucent surface.
2. ToF principle in translucent objects
A time-of-flight camera emits the IR signal with a fixed wavelength
and measures the traveling time from a target object to the camera.
Knowing the speed of light and its traveling time, it is easy to derive
its traveling distance. For the practical implementation, the phase
delay of the reflected IR from the emitted IR is used to alternate the
traveling time. For commercial depth cameras, the phase delay is defined
by the relation between N different electric charge values, collected in
different time slots. In this paper, we choose the case of N = 4 and
Fig. 1(a) illustrates the example of depth calculation using four electric
charges. Considering the emitted IR signal as a square wave, we can
derive the depth D as follows:
D¼
c
AðQ 3 −Q 4 Þ
c
Q 3 −Q 4
;
¼ tan−1
tan−1
Q 1 −Q 2
2
AðQ 1 −Q 2 Þ
2
ð1Þ
A is the amplitude of reflected IR, Q1 ~ Q4 are the normalized electric
charges and c represents the speed of light. Note that the amplitude A
varies along the distance and the albedo of the surfaces. From Fig. 1,
we know that |Q1 − Q 2| + |Q 3 − Q 4| = K and Q1 + Q 2 =
Q 3 + Q 4 = K, where K is a constant. This principle, however, becomes
invalid for a translucent object because it assumes a single surface producing a single reflected light path.
On the translucent surface, a subset of incident rays is reflected and
the other penetrates through the object media upon its translucency.
We consider that our target scene is composed of translucent foreground and opaque background, namely two-layer surface model as
seen in Fig. 2. Layer 1 is the translucent foreground and Layer 2 is the
opaque background where τ (0 ≤ τ ≤ 1) is translucency. Observed IR
signal has two components: reflected IR from Layer 1 ((1 − τ)I) and
reflected IR from Layer 2 (τ 2ρ2I) as shown in Fig. 2.
Then, the reflected IR is the superposition of two different IR signals.
Under this circumstance, the single path assumption in Eq. (1) is invalid
and the resulting depth is incorrect. Under this two-layer translucent
model, Eq. (1) ought to be rewritten as:
Dtr ¼
(
)
ð1−τÞA f Q f 3 −Q f 4 þ τ 2 Ab ðQ b3 −Q b4 Þ
c
;
tan−1
2
ð1−τÞA f Q f 1 −Q f 2 þ τ 2 Ab ðQ b1 −Q b2 Þ
ð2Þ
where τ is a normalized translucency (0 ≤ τ ≤ 1), where τ = 1 means
that the object is perfectly transparent. f and b denote foreground and
background respectively. This modified formulation explains that the
Fig. 1. Example of a time-of-flight camera depth calculation: Four electric charge values Q1 ~ Q4 are collected when the emitted IR signal is a square wave. Eq. (1) is used to calculate
distance.
S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38
Fig. 2. Two-layer model of translucent object in ToF camera. Layer 1 is translucent
foreground, Layer 2 is opaque background and τ (0 ≤ τ ≤ 1) is translucency.
ToF camera observes the overlapped IR signals reflected from foreground
and background objects. In this case, the depth value is determined by
translucency (τ) and two amplitudes (Af and Ab) of reflected IR signals
arrived at the sensor. Unlike Eq. (1), the IR amplitude terms Af and Ab
are not eliminated in the modified depth formula. Note that the IR amplitude Af and Ab vary upon the traveling distance and the surface texture of
target object. This introduces a new and critical depth distortion: for
example, the objects at the same distance but at different amplitudes
present different depth values.
Fig. 3 illustrates the depth distortions affected by the background
texture behind a translucent object. Fig. 3(a) shows both the IR intensity
and depth map of a normal opaque object. The IR amplitude varies along
the texture on the object while the calculated depth shows identical
distance from the camera because the IR amplitude does not affect the
depth calculation as shown in Eq. (1). In Fig. 3(b), however, we put a
translucent object in between the opaque pattern object and depth
camera. Now Eq. (1) is not valid anymore and we have to use a multilayer object model like our two layer model in Eq. (2) to analyze the
overlapped IR light rays. First of all, the depth of foreground object
with high translucency cannot be correctly measured because most of
emitted IR light rays penetrate the object and fly to the background
(e.g., a textured object) Furthermore, the IR amplitude of the background object affects the depth calculation in this case. We are able to
see that the different IR amplitudes of the background object come to
29
get a different foreground depth value in Fig. 3(b). In Fig. 3(c), we
move the translucent object toward the depth camera. Because the
traveling distance of reflected light from the translucent foreground is
decreased, the attenuation of corresponding IR amplitude is also
decreased. As a result, Af becomes more dominant in Eq. (2), especially
in the region of darker background. Therefore, the calculated depth
becomes closer to the real translucent foreground object location
(becomes darker blue in Fig. 3(c)). However, the region of brighter
background keeps its background depth value, where Af is relatively
less dominant due to higher Ab. We can see that the foreground depth
error in darker background decreases when we move the translucent
object closer to the camera (from (b) to (c) in Fig. 3). If we can control
the background and decrease Ab, foreground terms will be more dominant and the calculated depth will be closer to the ground truth foreground depth value. However, we have no such control or knowledge
of the background object in real situations and we need a generalized
solution for translucent object imaging.
We observe a similar effect with translucency (τ) variation that also
changes the dominance of Af and Ab in Eq. (2). This experiment shows
that the previous depth calculation fails to obtain the correct depth of
the translucent object. We have verified that our new Eq. (2) for the
two layer model can explain what happens with a translucent foreground object in front of opaque background in real situations.
Another interesting behavior of the depth of translucent objects is
depth reversal due to its miscalculated depth value. In other words,
the miscalculated depth following Eq. (2) does not always lie in
between the ground truth depth of the translucent foreground and
opaque background. Under some conditions, the miscalculated depth
can be farther or closer than both the foreground and the background
(Fig. 4). Fig. 4 shows two sample depth images of a translucent object.
In these two cases, the translucent foreground object distance from
the camera has been changed with a fixed background object located
at the maximum operating range (5 m in this camera). In Fig. 4(b),
the foreground object is located 2.5 m away from the camera and the
observed depth of the translucent objects is pushed toward the background object. In the case of Fig. 4(c), however, the foreground object
Fig. 3. Depth errors of translucent foreground and opaque background objects. First row images are IR intensity images, second row images are depth images and the last row shows the
relative locations of depth camera, translucent foreground and opaque background objects, respectively.
30
S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38
Fig. 4. (a) A translucent object and opaque background. Background is at the end of the sensing range (5 m). (b) Foreground object is located at 2.5 m away from the camera.
(c) Foreground object is located at 1 m away from the camera.
is located at around 1 m away from the camera and the observed depth
of the translucent objects is pulled toward the camera that is closer than
both the foreground and the background. This situation easily can be
explained mathematically using Eq. (2). Based on this two layer
model, we try to find if there is any case where the following condition
is satisfied.
(
)
ð1−τ ÞA f Q f 3 −Q f 4 þ τ 2 Ab ðQ b3 −Q b4 Þ
Q f 3 −Q f 4
c
c
tan−1
− tan−1
N0
2
2
2
Q f 1 −Q f 2
ð1−τ ÞA f Q f 1 −Q f 2 þ τ Ab ðQ b1 −Q b2 Þ
ð3Þ
Let us consider a special case when 1 − τ = τ2 and we replace
Ab(Q b1 − Q b2) = B1, Ab(Q b3 − Q b4) = B2, Af (Q f1 − Q f 2) = F1 and
Af(Q f3 − Q f4) = F2. We rewrite the condition as follows:
tan−1
F 1 B2 −F 2 B1
N0
F 1 ð F 1 þ B1 Þ þ F 2 ð F 2 þ B2 Þ
ð4Þ
Fig. 1(b) shows four possible cases of (Q1 − Q 2) and (Q 3 − Q 4) status. We easily can find examples satisfying this condition. For example,
when our background is located at the maximum operating range of the
ToF camera (in the fourth quarter) and the foreground in the first quarter in Fig. 1(b), B1 N 0, B2 ≈ 0, F1 N 0 and F2 N 0 satisfying Eq. (4). As a result, we will observe the situation like Fig. 4(c). In this work, we adopt
the two layer translucent model assuming that there is only one
media change in the translucent object. The study of ToF principle in
this section will be employed in Sec. 3.3 for the recovery of original
shape of a translucent object.
3. Proposed method
3.1. Detection of translucent object
Separating superposed multiple light rays from a single observation
with unknown translucency (τ) and IR amplitude (Af and Ab) in Eq. (2)
is an ill-posed problem. In our framework, we propose to use a pair of
observations from different viewpoints using a stereo ToF camera as
shown in Fig. 5(a). By fixing the stereo setup, we perform an extrinsic
calibration [15] between two ToF cameras to compute the geometric
transformation. Given the extrinsic parameters, we can register two
point clouds captured by a stereo depth camera in the same coordinate
system.
Given an opaque object with a totally matte surface (i.e., a
Lambertian surface), observed 3D geometry at different viewpoints
should overlap each other in the world coordinate except in occluded
regions. Occlusion always appears with the changes in viewpoint or
field-of-view. In general, an occluded 3D point of one camera does not
have any corresponding depth point at the other camera. Similarly, a
Fig. 5. Stereo ToF camera: Translucent object captured at each camera is either closer or farther from the real location along the respective light ray directions.
S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38
translucent surface point has no corresponding point at the other viewpoint, because the distortion due to translucency appears along the
respective IR ray direction (Fig. 5(b)). Utilizing this view dependent
property, we detect the candidate of translucent points by comparing
two point clouds from the stereo ToF camera. In order to detect the
inconsistent points and distinguish translucent points from occluded
points, we employ the following scheme (Fig. 6).
P ¼ pLi ; pRj pLi −pRm Nδ; pRj −pLn Nδ; f or all pLn ∈ ΦL ; pRm ∈ ΦR ð5Þ
P represents a set of inconsistent points, δ is the minimum distance
to determine an inconsistent point and ΦL and ΦR are point clouds
inferred by left and right ToF cameras respectively. We choose the
points of one view pLi ∈ ΦL that are located farther than δ from all 3D
points of the other view pRm ∈ ΦR and denote them as an inconsistent
point set P. Among the inconsistent points in P, we remove the points
that are out of the field-of-view of the other ToF camera. And then, we
project the remaining inconsistent points toward the other camera.
During the projection, if there is any inconsistent point at one view
that approaches any point at the other view within the minimum
distance δ in Eq. (5), we conclude that the point is occluded in the
other camera's viewpoint. We remove all these occluded points and
denote all remaining points by translucent points PtrL and PtrR. Note that
even though this detection method finds most translucent points in
general, it is apparent that translucent points can be mistakenly
removed in some complicated scenes. This affects translucent point
recovery performance. (We will show experimental evaluations for
our detection and recovery results as well as recovery only results
using ground truth translucent region information).
Fig. 7(a) shows two registered point clouds (blue and black) from
both ToF cameras. The translucent parts are pushed toward the different
directions of the background. Fig. 7(b) shows translucent region
(indicated in red color) detection results. The opaque foreground object
gives an idea where the translucent foreground object originally has
been located. In real depth images, we easily can observe inconsistent
points that are neither an occlusion case nor a translucent case due to
imaging noise and low resolution (Fig. 7(b) shows some noisy points).
These noisy points (usually around the translucent object boundary)
will be removed by 3D spatial filtering in the next step.
Stereo ToF setup for translucent object imaging works well with an
object with a narrow body or located close enough to the camera so as
to have no overlapped points between the translucent object regions
(see Fig. 5). Frequently, however, the overlapping occurs between the
two translucent object observations either when the translucent object
is located far away from the stereo ToF camera or when the body of the
translucent object is wide as illustrated in Fig. 8. In these cases, the baseline distance of the stereo setup is not long enough to separate the
31
inferred translucent surfaces and overlapped observations in the translucent object are not detected. Only partial (non-overlapped points)
translucent regions will be detected and reconstructed. If we increase
the baseline distance to alleviate this problem, the size of the occluded
region will increase and our stereo setup will have a bigger size becoming less practical. We propose a simple modification of our framework
to alleviate this problem. Fig. 9 shows a skewed stereo ToF camera,
where a ToF camera on the right is slightly moved backward. The
skewed stereo ToF hardly observes such overlap problem (DRtr = DLtr).
Using Eq. (2), the condition for the singular case is as follows.
ð1−τÞF 2 þ τ 2 B2 ð1−τÞð F 2 þ Δ2 Þ þ τ 2 ðB2 þ Δ2 Þ
¼
ð1−τÞF 1 þ τ 2 B1 ð1−τÞð F 1 þ Δ1 Þ þ τ 2 ðB1 þ Δ1 Þ
ð6Þ
where F1 = Af (Q f1 − Q f 2), F2 = Af (Q f 3 − Q f 4), B1 = Ab(Q b1 − Q b2),
B2 = Ab(Q b3 − Q b4) and Δ1, Δ2 (|Δ1| = |Δ2|) are the amount of electric
charge changes determined by the distance difference of the two ToF
cameras and distance-electric charge relations illustrated in Fig. 1(a).
This equation is simplified as follows:
τ2 ðB1 B2 Þ þ ð1−τÞð F 1 F 2 Þ ¼ 0
ð7Þ
that is the singular condition of our skewed stereo ToF framework. Now,
our skewed stereo setup provides a crucial shape cue of translucent
surface that will be used in our depth refinement step (Fig. 9). Detection
of translucent region works as described in the previous section
(Fig. 10).
3.2. Depth Refinement of Translucent Object
To recover the depth of the translucent object, we minimize the
distance between two translucent surface observations moving each
point along its ray direction. PLtr and PRtr are the detected translucent
point sets from each ToF camera. If a translucent point is inferred by
both cameras and has moved from the original location as shown in
Fig. 5(b), we assume that the original location of the translucent point
should be around the intersection of the two light rays from both
cameras. Our method minimizes the distance between the two point
sets moving them along the respective light ray bidirectionally. We
introduce an iterative energy minimization framework to find the
optimal intersections of the two interconnected point sets. Our energy
function at tth iteration for each 3D point connected to eight neighbor
pixels is defined as follows:
Energy ¼ Ot þ λRt þ σT t
ð8Þ
where λ and σ determine the weight of the score terms. Energy is
minimized over iteration moving 3D points along their respective ray
Fig. 6. Sample translucent object with stereo ToF camera.
32
S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38
Fig. 7. Detection using stereo ToF Camera. Translucent foreground points are indicated in red color. Some noisy points are miss-classified as translucent foreground in (b).
directions. The iteration stops when the average change of the energy
falls below a decision boundary.
Data Cost: O is the data cost for finding intersections between
translucent point rays from both cameras. In reality, a light ray of one
camera never intersects with any light ray of the other camera because
the cameras have limited resolution and no exact correspondences are
guaranteed. Furthermore, discrete 3D points of a non-planar surface
from different viewpoints cannot match each other perfectly as
illustrated in Fig. 11. We detect a set of intersecting points between a
light ray of translucent point of one camera and translucent surface of
the other camera. Our data cost O is the summation of every minimum
Euclidean distances from a translucent point of one camera to the translucent point clouds of the other camera.
Ot ¼
X
i
X
min pLi −pRj þ
min pLi −pRj j
j
i
ð9Þ
light ray directions. With this cost, each point tries to find the closest corresponding point from the other camera independently, approaching the
intersection between its light ray and the corresponding surface of the
other camera.
Regularization Cost: R is the regularization score enforcing the depth
similarity among the valid depth values of eight neighbor pixels in
depth image (pLn ∈ PtrL and pRm ∈ PtrR). n and m are neighbor pixel indices
for each translucent point clouds. Depth similarity term favors a surface
having similar depth values rather than keeping the observed shape of
the translucent surface. In fact, the depth distortion of a translucent
surface does not vary linearly from its ground truth depth and the
depth observation shows the nonlinear distortion from the original
shape. Furthermore, translucent surfaces are often highly reflective
and it introduces an additional depth distortion.
Rt ¼
XX
XX
R
pL −pL þ
p −pR i
i
where pLi ∈ PtrL and pRj ∈ PRtr are the observed translucent points from both
cameras and i and j are the indices of the translucent points. Note that
the locations of pLi and pRj are updated at each iteration along respective
n
n
j
j
m
ð10Þ
n
Topology Cost: Topological consistency term T favors keeping the
relative depth location of the observed points within the surface points.
Fig. 8. Overlapped translucent observations with a translucent object located far from the cameras or having a wide body.
S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38
33
Fig. 9. Skewed stereo ToF Camera.
Different from the regularization term R, topological consistency term
is introduced to recover the original surface shape from the nonlinear
distortion.
Tt ¼
X
t Li þ
X
i
t Rj
j
^L ^L t i ¼ p
i −piþ1 if
0
observed (distorted) topological consistency during iteration step. In
most circumstances, it projects detected translucent depth points onto
somewhere in the intersecting regions having similar depth values partially keeping the current topology (Fig. 12). This problem is addressed
in the next section.
ð11Þ
otherwise
L
3.3. Shape recovery from modified topological cost
^ Li b p
^ Liþ1
p
R
^ j are the sorted depth points by their depth values in
^ i and p
where p
descending order. In our translucent surface model, we know that the
depth distortion occurs along each ray direction. Based on this observation, the order of depth value in our image plane can represent detailed
shape of the object surface. Any small or partial change in shape from
the original shape will change this sorted depth order. If there is any
reversal in order, related points will be penalized to keep the original
shape. However, current observations used in our optimization framework has limitation in recovering original shape because any information on the original shape of translucent objects is totally lost in those
distorted depth images. It just tunes to favor either depth similarity or
Shape Cue from IR Intensity Images. As we pointed out in the previous
section, the observed shape of a translucent object is distorted such that
they cannot be directly used to recover the details of the original shape.
The main reason of this distortion can be explained by the non-linear
characteristic of translucent depth calculation studied in Sec. 2 in detail.
However, based on the knowledge of this non-linear distortion, we can
recover the original location of each point iteratively as follows. In order
to recover the fine details of the original translucent surfaces, we utilize
a pair of depth images and corresponding IR intensity images taken
from our skewed stereo ToF camera. As investigated in Eq. (2) and in
the experiments shown in Fig. 3, different distances to an object give
different IR amplitudes Af and Ab. Therefore, Eq. (2) calculated from
multiple depth images can give new information about the relation
between the electric charge values of the foreground and the background so that we can infer the location of the original translucent
Fig. 10. Detection using skewed stereo ToF Camera. Translucent foreground points are indicated in red color.
34
S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38
where Δ is the increment of camera distance along depth direction. β1
and β2 are the slopes of (Qf1 − Qf2) and (Qf3 − Qf4) graph along the
distance in Fig. 1(b). B1 and B2 become
B1 ¼
Δβ2 ∓ Δβ1 Z 2
Z 2 −Z 1
B2 ¼ Z 1 B1
ð13Þ
2DL
2DR
where Z 1 ¼ tanð c b Þ, Z 2 ¼ tanð c b Þ, ± Δβ2 and ± Δβ1 are all known
values from our observations and Fig. 1(b). Finally, we can decide
Ab from the IR intensity ILb observed by the left camera and the constraint
Fig. 11. Discrete 3D points of a non-planar surface from different viewpoints are taken
from different surface locations.
depth in terms of (Q f 1 − Q f 2) and (Q f 3 − Q f4). Unknown variables in
this framework are IR amplitudes Af, Ab, translucency τ, and real depth
values at the first camera Df and Db of both translucent foreground
and opaque background behind. With our skewed stereo setup, if a
background point of one camera can be seen from the other camera,
we can specify Ab and Db reducing unknown variables. In general, a
translucent foreground object has a homogeneous surface, or identical
translucency and opaque reflectivity. Then we can extend our calculations to the neighbor foreground points with obtained homogeneous
translucency τ and reflective Af where the background cannot be seen
from any of the stereo camera.
Let two opaque background observations of corresponding foreground depth points from our skewed stereo camera be DLb and DRb. In
reality, finding a pair of corresponding foreground depth points from
stereo camera requires accurate registration of depth images. However,
observed depth points cannot be used to register each other because
they are all potential translucent points with wrong depth values before
our recovery step. Therefore, we consider that background points share
similar IR intensity and depth values. In Sec. 5.2, we experimentally
show the effect of IR intensity variation of background on our depth
recovery result. From Eq. (1), we express DLb and DRb in terms of B1 =
Ab(Qb1 − Qb2), B2 = Ab(Qb3 − Qb4), Δ, β1 and β2.
DLb ¼
DRb
c
B2
tan−1
B1
2
c
B2 Δβ2
¼ tan−1
B1 Δβ1
2
3 þQ 4
of jQ 1 −Q 2 j þ jQ 3 −Q 4 j ¼ Q 1 þQ 2 þQ
¼ K as depicted in Fig. 1. Note
2
that the amplitude of a reflected IR signal (strength of reflected IR signal
at a time instance) cannot be observed directly and can only be induced
from the IR intensity (strength of reflected IR signal during a fixed time
duration such as the integration time of ToF camera).
Ab ¼
ILb
2ðjB1 j þ jB2 jÞ
ð14Þ
Similarly, ILf and IRf are two IR intensity observations of the point on
the translucent foreground surface defined as follows.
ILf ¼ ð1−τÞA f 1 K þ τ2 Ab1 K
IRf ¼ ð1−τÞA f 2 K þ τ2 Ab2 K
¼ ð1−τÞA f 1
2
2
Df
Db
K þ τ 2 Ab1
K
Df þ Δ
Db þ Δ
ð15Þ
In practice, we need to account for the light attenuation which is
inversely proportional to the flying distance. For that, we substitute
D
2
2
f
b
Þ and Ab2 ¼ Ab1 ðDbDþΔ
Þ . From Eq. (15), we derive the
A f 2 ¼ A f 1 ðD f þΔ
following relation.
2
I Rf −τ2 C 2
Df
¼ L
;
Df þ Δ
I f −τ2 C 1
ð16Þ
2
b
Note that ILf , IRf , C1 = AbK and C 2 ¼ Ab KðDbDþΔ
Þ are known constants
D
2
f
Þ is
from our previous derivations and observations. The value ðD f þΔ
ð12Þ
used to infer an original shape of translucent surface in the following.
Topological Consistency Update Let us revisit our topological consistency term. pLi and pLi + 1 are two neighbor depth pixels in our sorted
Fig. 12. (a) Depth similarity and (b) topological consistency terms force 3D points to move as illustrated.
S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38
Table 1
Experimental results of our translucent surface detection and depth recovery: Solid line
box in the second row represents original depth regions of each translucent surface.
Dotted line box in the fourth row indicates reconstructed translucent object.
35
list of depth points on a translucent object surface. In the current
iteration, there are corresponding points pRj and pRj + 1, respectively.
D
D0
2
2
f
f
Þ from the pLi and pRj pair and ðD0 þΔ
Þ
Then we can calculate ðD f þΔ
f
D
2
D0
2
f
f
from the pLi + 1 and pRj + 1 pair. Finally, if ðD f þΔ
Þ NðD0 þΔ
Þ , then we can
f
Test object
conclude Df N D′,f vice versa. This can assign new topology constraints
L
L
R
R
on p̂ i and p̂ iþ1 as well as p̂ j and p̂ jþ1 recovering the original shape of
the translucent object. In order to gear this topology inference, we
iteratively update the translucency term (τ). First, we initialize
Df(1) by the data term O1 and put this in Eq. (16) to calculate the initial
τ. Given the initial τ, we can activate our topology inference so as to
compute an initial topological consistency score T1 in Eq. (11). In the
next step, data term will update foreground depth value to Df(2) and τ
and topology inference will be updated following these iterations
approaching ground truth translucency and original shapes.
Depth
observation
Translucent
depth points
detected &
recovered
4. Experimental results
Translucent
surface
reconstruction
Object
To build our skewed stereo ToF camera setup, we use a pair of MESA
SR4000 ToF depth cameras. The baseline distance between the two
cameras is set to 30 cm and the right side of the ToF camera is moved
backward by 25 cm. We perform both qualitative and quantitative
Input
29
Our Result
3D Mesh
Representation
Fig. 13. Translucent depth refinement results on various challenging objects in both 2D
and 3D depth image representations.
Object
Input
30
Our Result
3D Mesh
Representation
Fig. 14. Translucent depth refinement results on various challenging objects in both 2D
and 3D depth image representations (Continued).
36
S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38
Fig. 15. Two-layer model-based method reconstructs a rather flat surface compared to the
original shape for three layers object such as Q3.
experiments to evaluate the proposed method. In our experiments,
weights λ and σ in Eq. (8) are empirically set to 1.2 and 0.8, respectively.
4.1. Qualitative Evaluation
Table 1 illustrates several experimental results in a 3-dimensional
point cloud. T1 is a planar translucent object attached on a planar
opaque object. From the opaque region (lower body), we can imagine
where the translucent region has to be located (solid line box). The 3D
points on the translucent surface (red dots in the fourth row) are
correctly relocated to the original location (dotted line box). T2 is a
transparent bottle with an opaque label and cap on top. Except these
opaque regions, all other points on the transparent surface are scattered
backward and present the severe shape distortion. This is because its
surface includes variable thickness or varying specularities, introducing
additional distortions in observation. T3 and T4 are relatively less translucent and the observed depth points are not so far from the original
location. In T3, two observed point clouds are overlapped to each
other. As a result, only a subset of translucent points is chosen as inconsistent points. Nevertheless, our recovered depth shows the complete
original shape since observations from two viewpoints complement
missing points. T4 has both translucent and transparent regions.
Especially, the observation of the transparent region appears near the
background. Although the target surface has inhomogeneous material
characteristics, we can successfully recover its original shape. As long
as they are identified as inconsistent points, they can be treated as transparent points and undistorted as shown in this result. Our method
reconstructs the original shape by virtue of the modified topological
consistency. In the last column of Table 1, we visualize the close-up
(a) Before
view of our recovered depth points for T3 and T4 inside the thick
boxes. These illustrations show that our method recovers detailed
shapes of the original translucent object.
Figs. 13 and 14 summarize our translucent depth recovery results on
various challenging objects using both 2D and 3D representations. Each
object is located on a gray box or a clamp indicating their ground truth
positions. 3D mesh visualization of objects Q1 and Q2 and their depth
images before and after our recovery show that our method is able to
recover detailed original shape of translucent objects. In Q3 and Q4,
overall translucent regions are well recovered yet some distorted points
on a specular surface are not restored correctly. Q5 and Q6 have almost
transparent regions that are correctly recovered. However, the boundary
of these transparent objects in the reconstructed depth image has many
discontinuities. In the translucent object detection step, the points
around the boundary of a transparent object have difficulties in finding
corresponding points from the other camera view due to the interference
of background points. Q8 and Q10 are thin objects that coincide well
with our two-layer model. Q3, Q4, Q5 and Q6 have two thin layers in
those foreground objects (total of three layers including background),
and our two-layer model-based method reconstructs a rather flat surface
compared to the original shape (Fig. 15). Q7 and Q9 are thick objects that
have light refraction and attenuation within the foreground media. Our
method shows robust results on these cases. Different refraction amount
on the surface points calculates different translucency even with a uniformly translucent object. As a result, recovered shape will be distorted
along the difference in refraction amount.
Fig. 16 shows object Q9 in another viewpoint showing a closed-up
view with its surface details. The detailed shape of the object is
recovered better at the upper part of the object after our refinement.
Even though it appears that the original surface detail is lost in the
observed depth, our shape recovery step reverses such a distortion
process using a two-layer translucent surface model accounting the
ToF principle. As a result, we can recover the original shape of the translucent object. Average processing time with our dataset is around 360 s
using an Intel i3-3.2GHz desktop PC. If there is a multi-layer translucent
object or complex background situation that cannot be explained well
with our two-layer model, recovered surface shape will present
additional distortions. This issue will be discussed in Sec. 5.
4.2. Quantitative evaluation
In order to measure the performance of our method, we recover the
shape of a transparent cup having controlled multiple levels of translucency. We vary the translucency by adding white opaque ink into the
water in stages. Fig. 17 shows some samples of target objects for the
evaluation. Overall, we have 11 stages from transparent to opaque.
The most opaque stage serves as our ground truth. We compute the
average error for all other 10 stages (τ-Index = 1 to 10) by comparing
(b) After
Fig. 16. Detailed surface shape refinement of Object Q9.
S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38
-Index = 1
-Index = 4
-Index = 7
37
-Index = 10
Fig. 17. Sample translucent objects for quantitative evaluation: Translucency index (τ-Index) 1, 4, 7 and 10.
with the ground truth before and after our depth refinement. Fig. 18
shows our quantitative evaluation results. Proposed method is evaluated
in two ways. In the first experiment, both our translucent region detection and recovery are evaluated. In the second experiment, only translucent region recovery is performed based on the ground truth translucent
region information. With the first 9 cases, our method shows reasonable
refinement results. The overall average depth errors in the translucent
object before and after our proposed detection and refinement are 76
and 10 mm. (Average accuracy of the SR4000 camera with opaque
objects is known to be around 10 mm). 86.79% of distance error is
recovered from the original observation of the translucent object
obtaining opaque object level accuracy. In the last (10th) case, observed
depth image is completely transparent so that it is difficult to correctly
detect the translucent region. Consequently, only a subset of the
translucent region is correctly recovered, the average error after our refinement is significantly increased.
5. Discussion
5.1. Validity of our two-layer model
Our two-layer model assumes that a foreground object is thin
enough to be considered there is only a single layer causing reflection.
Furthermore, the background surface albedo is set to 1 meaning that
every IR lights will bounce back to the camera. With some real objects,
a generalized multi-layer model with reflectivity terms less than 1 is
required to correctly represent and reconstruct their shape in arbitrary
situations. Refraction and attenuation model in the foreground media
also can be included in the generalized model. However, this extension
adds extra unknown variables in our formulation requiring extra observations. In order to keep the practicability of our solution, we propose to
use a two-layer model that effectively approximates most regular translucent objects as shown in our experimental results.
5.2. Complex background
In our shape recovery step, complex background such as black and
white patterns or complex geometry can introduce additional errors
in the refinement result. It is because that our method considers that
an occluded background surface from one viewpoint has similar characteristics to the background surface seen from the other viewpoint in our
stereo camera. If a translucent object is located close enough to the camera to see every background surface from either camera, it is not necessary to consider the type of background. However, in many cases, there
exists background surface that cannot be seen from one or both of our
stereo cameras. Fig. 19 is an example of our refinement result with
black and white pattern on the background. Flat and translucent object
surface is reconstructed for both background cases showing that the
complex background results in the distorted surface.
5.3. Effect of the smoothness term
Fig. 18. Quantitative evaluation results.
In our energy function in Eq. (8), regularization cost Rt yields a
smooth reconstructed surface. It helps to alleviate environmental variation in shape recovery and keep our refinement robust to unknown
noise sources. On the other hand, minute surface shape details can be
lost after refinement. In our experiments, weight λ for the regularization
cost is empirically set to 1.2. On the other hand, weight σ for the topology cost is empirically set to 0.8. This means that the configuration prefers to have smoother surface removing any unexpected noises rather
than the reconstruction of detailed shape on the translucent surface. If
an object has homogeneous foreground with uniform background without much geometrical variation and the object is thin enough to be explained by our simple two-layer model, we can increase the ratio σλ
to focus more on the reconstruction performance of detailed shape.
38
S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38
Fig. 19. Translucent surface refinement with uniform and non-uniform background.
6. Conclusion
In this paper, we propose a skewed stereo ToF depth camera for
transparent and translucent object detection and distorted depth
refinement. We find that translucent surfaces present the systematic
depth distortion which interferes to employ traditional stereo matching
scheme directly. In order to account for depth distortions in translucent
surfaces, we model these systematic depth distortions based on the
understanding of sensing principle and experimental study. On top of
the theoretical model, we develop an iterative optimization framework.
Our optimization framework includes a topological consistency term
which helps in recovering the details of surfaces.
In the future, we plan to extend our proposed method for detecting
and reconstructing highly reflective surface (specular surface). In
general, the surface of a translucent object has highly reflective surface
as well. Reflective surface detection and recovery prior to our translucent
object refinement will improve the overall refinement performance.
Increasing the number of cameras to more than two also improves the
accuracy of depth refinement because it gives better approximation of
the topology consistency term.
The proposed method can improve not only the quality of three
dimensional reconstruction, but also human–computer interaction
and robot navigation. For instance, autonomous vehicle equipped with
our method can correctly locate any translucent obstacle and evade
successfully.
Acknowledgment
This work was supported by Kyung Hee University in 2013 under
grant KHU-20130684. This work was also supported by the Global
Frontier R&D Program on “Human-centered Interaction for Coexistence”
funded by the National Research Foundation of Korea grant funded by
the Korean Government (MSIP) (2012M3A6A3057376).
References
[1] S. Albrecht, S. Marsland, Seeing the Unseen: Simple Reconstruction of Transparent
Objects from Point Cloud Data, 2nd Workshop on Robots in Clutter, 2013.
[2] U. Klank, D. Carton, M. Beetz, Transparent Object Detection and Reconstruction on a
Mobile Platform, IEEE International Conference on Robotics and Automation, 5971–
5978, 2011.
[3] N. Alt, P. Rives, E. Steinbach, Reconstruction of Transparent Objects in Unstructured
Scenes with a Depth Camera, IEEE International Conference on Image Processing,
2013.
[4] I. Lysenkov and V. Eruhimov and G. Bradski, Recognition and Pose Estimation of
Rigid Transparent Objects with s Kinect Sensor, Robotics Science and Systems
Conference (2102).
[5] T. Wang and X. He and Barnes, N., Glass object localization by joint inference of
boundary and depth, International Conference on Pattern Recognition (2012).
[6] C.J. Phillips, K.G. Derpanis, K. Daniilidis, A Novel Stereoscopic Cue for Figure-Ground
Segregation of Semi-Transparent Objects, IEEE International Conference on Computer
Vision Workshops, 2011.
[7] I. Ihrke, K. Kutulakos, H. Lensch, M. Magnor, W. Heidrich, State of the art in transparent
and specular object reconstruction, STAR Proc. of Eurographics (2008).
[8] S. Yang and C. Wang, Dealing with laser scanner failure: Mirrors and windows, Proc.
of the IEEE Intl. Conf. on Robotics and Automation (2008).
[9] M. Fritz, M. Black, G. Bradski, T. Darrell, An Additive Latent Feature Model for Transparent Object Recognition, Advances in Neural Information Processing Systems, 2009.
[10] V.R. Kompella, P. Sturm, Detection and Avoidance of Semi-Transparent Obstacles
Using a Collective-Reward Based Approach, IEEE International Conference on
Robotics and Automation, 2011.
[11] K. McHenry, J. Ponce, A Geodesic Active Contour Framework for Finding Glass, IEEE
Conference on Computer Vision and Pattern Recognition, 2006.
[12] K. McHenry, J. Ponce, D. Forsyth, Finding Glass, IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, 2005.
[13] F. M´eriaudeau and R. Rantoson and D. Fofi and C. Stolz, Review and comparison of
non-conventional imaging systems for three-dimensional digitization of transparent
objects, Journal of Electronic Imaging, Vol.21, No.2, pages 021105 (2012)
[14] A. Kadambi, R. Whyte, A. Bhandari, L. Streeter, C. Barsi, A. Dorrington, R. Raskar,
Coded Time of Flight Cameras: Sparse Deconvolution to Address Multipath
Interference and Recover Time Profiles, ACM Transactions on Graphics, 2013.
[15] Z. Zhang, A Flexible New Technique for Camera Calibration, IEEE Transactions on
Pattern Analysis and Machine Intelligence, 2000.
[16] C. Inoshita, Y. Mukaigawa, Y. Matsushita, Y. Yagi, Shape from Single Scattering for
Translucent Objects, In Proceedings of European Conference on Computer Vision, 2012.
[17] K.N. Kutulakos, E. Steger, A Theory of Refractive and Specular 3D Shape by Light-Path
Triangulation, In Proceedings of IEEE International Conference on Computer Vision,
2005.
[18] K.N. Kutulakos, E. Steger, A Theory of Refractive and Specular 3D Shape by LightPath Triangulation, International Journal of Computer Vision, 2008.
[19] N.J. Morris, K.N. Kutulakos, Dynamic Refraction Stereo, IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2011.
[20] H. Murase, Surface Shape Reconstruction of an Undulating Transparent Object,
International Conference on Computer Vision, 1990.
[21] H. Murase, Surface Shape Reconstruction of a Nonrigid Transparent Object Using
Refraction and Motion, IEEE Transactions on Pattern Analysis and Machine Intelligence,
1992.