Image and Vision Computing 43 (2015) 27–38 Contents lists available at ScienceDirect Image and Vision Computing journal homepage: www.elsevier.com/locate/imavis Skewed stereo time-of-flight camera for translucent object imaging☆ Seungkyu Lee a,1, Hyunjung Shim b a b Department of Computer Engineering, Kyung Hee University, South Korea School of Integrated Technology, Yonsei University, South Korea a r t i c l e i n f o Article history: Received 11 October 2014 Received in revised form 27 July 2015 Accepted 2 August 2015 Available online 18 September 2015 Keywords: Translucent object imaging ToF depth camera Three-dimensional image processing a b s t r a c t Time-of-flight (ToF) depth cameras have widely been used in many applications such as 3D imaging, 3D reconstruction, human interaction and robot navigation. However, conventional depth cameras are incapable of imaging a translucent object which occupies a substantial portion of a real world scene. Such a limitation prohibits realistic imaging using depth cameras. In this work, we propose a new skewed stereo ToF camera for detecting and imaging translucent objects under minimal prior of environment. We find that the depth calculation of a ToF camera with a translucent object presents a systematic distortion due to the superposed reflected light ray observation from multiple surfaces. We propose to use a stereo ToF camera setup and derive a generalized depth imaging formulation for translucent objects. Distorted depth value is refined using an iterative optimization. Experimental evaluation shows that our proposed method reasonably recovers the depth image of translucent objects. © 2015 Elsevier B.V. All rights reserved. 1. Introduction Due to the ability of direct three dimensional geometry acquisitions, consumer depth cameras have widely applied to many applications such as 3D reconstruction, human interaction, mixed reality and robotics. However, depth images recorded by ToF depth cameras present limited accuracy because 1) commercial active sensors have fairly limited infrared emission power and 2) their performance varies upon the characteristics of surface material such as reflectivity and translucency inherited by the ToF sensing principle. Many postprocessing algorithms for 3D imaging have been proposed under the Lambertian assumption of matte surface. Depth cameras also work under the Lambertian assumption. In particular, depth cameras are infeasible to detect and image translucent objects while they occupy a substantial portion of a real world scene. In order to achieve an accurate interaction and realistic imaging, it is critical to handle the translucent object. Many researchers have tried to detect transparent and translucent object region [7] using conventional color cameras [9], laser beams [8] or based on a prior shape knowledge [4]. This is particularly important for navigating a robot or vehicle because collision detection for a transparent object is of substance in real situations [10]. McHenry et al. [12] and McHenry and Ponce [11] consider distinct properties like the texture distortion and specularity of transparent object in color image to characterize the transparency. Lysenkov et al. [4] require a prior knowledge on the shape of transparent object. During the training, they obtain a 3D model of the object and try to match the model with ☆ This paper has been recommended for acceptance by Richard Bowden, PhD. E-mail address: [email protected] (S. Lee). 1 Tel.: +82 1091979285. http://dx.doi.org/10.1016/j.imavis.2015.08.001 0262-8856/© 2015 Elsevier B.V. All rights reserved. the captured depth image. However, having such a shape prior is often unrealistic and it is hard to extend this idea to general applications. Wang et al. [5] detect the transparent region from aligned depth and color distortion. Phillips et al. [6] use stereo color cameras to segment transparent object regions at each color image. Murase [20][21] introduce a method for water surface reconstruction. He places a pattern at the bottom of a water tank and captures the water surface using a fixed color camera. If the water surface changes over time, the captured pattern at the bottom also will be distorted. Given the camera position, the water surface normal is estimated at each frame reconstructing the shape of the water surface. Similarly, Kutulakos and Steger [17][18] use a piece-wise light triangulation method to recover the refractive surface with a known refractive index and a pattern. Recently, Morris and Kutulakos [19] developed a stereo imaging system using a known background pattern and reconstructed dynamic liquid surface without refractive index. Inoshita et al. [16] assume a homogeneous surface with known refractive index and explicitly calculate the height of a translucent object surface in the presence of inter-reflection. Meriaudeau et al. [13] introduce new imaging architecture for transparent object reconstruction such as shape from polarization using the IR lights. Some researchers have used multiple depth cameras for shape recovery. They collect transparent object silhouettes from multiple viewpoints and apply a visual hull for reconstruction. Klank et al. [2] assume a planar background and take multiple IR intensity images of a transparent object using a ToF depth camera. Since each shadow on the planar background represents the silhouette of the transparent object corresponding to one camera, the volume of the object is estimated by a visual hull. Similarly, Albrecht and Marsland [1] use shadows on a planar background and take multiple depth images to collect the shadow regions of transparent or highly reflective objects at different viewpoints. 28 S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38 Alt et al. [3] propose a similar framework using multiple depth images from moving cameras and show rough reconstruction results. In these frameworks, the number of viewpoints significantly affects the quality of volume recovery and it is only applicable to a static scene. Kadambi et al. [14] propose a coded time-of-flight camera and perform sparse deconvolution of modulated lights to separate multi path from reflected signals. The depth of transparent object is recovered by extracting a reflected light path from a transparent object's surface. These previous methods work in special background conditions or require a lot of cameras to obtain reasonable reconstruction performance. Translucent object reconstruction without environmental constraints using practical imaging setup still remains a challenging problem. In this paper, we propose a new approach to detecting a translucent object and recovering its distorted depth using a skewed stereo ToF depth camera. Our approach is implemented with a pair of commercial depth cameras without any assumption except that observed scene is composed of two layers: single foreground and background. Klank et al. [2] show that ray intersections observed by multiple color cameras reveal the rough location of original depth. We are inspired by this preliminary idea and generalize it for imaging arbitrary translucent surfaces with detailed shape. In fact, translucent surfaces present the systematic depth distortion which interferes to employ traditional stereo matching scheme directly. In order to account for depth distortions in translucent surfaces, we develop a new framework of modeling these systematic depth distortions based on the understanding of sensing principle and empirical study. The proposed algorithm consists of two stages. Utilizing the behavior of depth distortion on translucent surfaces, we first detect the translucent surface region. To process the translucent regions, we formulate an energy function for our depth optimization so as to recover the depth of translucent surfaces (Sec. 3.2). The optimization process is constrained by three cost terms: a modified stereo depth matching cost for accounting systematic depth distortions, a regularization cost for noise elimination and a depth topology cost for shape recovery of target object. Especially, the topology term is iteratively updated by the IR intensity observation (e.g., a texture image of ToF depth sensor) and the analysis of ToF principle in translucent objects (Sec. 2). As a result, we can recover the original shape of the translucent object (Sec. 3.3). The contributions of the proposed work include (1) a generalized framework of detecting a translucent object and recovering its distorted depth in real-time, without minimal prior knowledge and (2) a thorough analysis of the ToF principle on translucent surface for the recovery of detailed shape in translucent surface. 2. ToF principle in translucent objects A time-of-flight camera emits the IR signal with a fixed wavelength and measures the traveling time from a target object to the camera. Knowing the speed of light and its traveling time, it is easy to derive its traveling distance. For the practical implementation, the phase delay of the reflected IR from the emitted IR is used to alternate the traveling time. For commercial depth cameras, the phase delay is defined by the relation between N different electric charge values, collected in different time slots. In this paper, we choose the case of N = 4 and Fig. 1(a) illustrates the example of depth calculation using four electric charges. Considering the emitted IR signal as a square wave, we can derive the depth D as follows: D¼ c AðQ 3 −Q 4 Þ c Q 3 −Q 4 ; ¼ tan−1 tan−1 Q 1 −Q 2 2 AðQ 1 −Q 2 Þ 2 ð1Þ A is the amplitude of reflected IR, Q1 ~ Q4 are the normalized electric charges and c represents the speed of light. Note that the amplitude A varies along the distance and the albedo of the surfaces. From Fig. 1, we know that |Q1 − Q 2| + |Q 3 − Q 4| = K and Q1 + Q 2 = Q 3 + Q 4 = K, where K is a constant. This principle, however, becomes invalid for a translucent object because it assumes a single surface producing a single reflected light path. On the translucent surface, a subset of incident rays is reflected and the other penetrates through the object media upon its translucency. We consider that our target scene is composed of translucent foreground and opaque background, namely two-layer surface model as seen in Fig. 2. Layer 1 is the translucent foreground and Layer 2 is the opaque background where τ (0 ≤ τ ≤ 1) is translucency. Observed IR signal has two components: reflected IR from Layer 1 ((1 − τ)I) and reflected IR from Layer 2 (τ 2ρ2I) as shown in Fig. 2. Then, the reflected IR is the superposition of two different IR signals. Under this circumstance, the single path assumption in Eq. (1) is invalid and the resulting depth is incorrect. Under this two-layer translucent model, Eq. (1) ought to be rewritten as: Dtr ¼ ( ) ð1−τÞA f Q f 3 −Q f 4 þ τ 2 Ab ðQ b3 −Q b4 Þ c ; tan−1 2 ð1−τÞA f Q f 1 −Q f 2 þ τ 2 Ab ðQ b1 −Q b2 Þ ð2Þ where τ is a normalized translucency (0 ≤ τ ≤ 1), where τ = 1 means that the object is perfectly transparent. f and b denote foreground and background respectively. This modified formulation explains that the Fig. 1. Example of a time-of-flight camera depth calculation: Four electric charge values Q1 ~ Q4 are collected when the emitted IR signal is a square wave. Eq. (1) is used to calculate distance. S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38 Fig. 2. Two-layer model of translucent object in ToF camera. Layer 1 is translucent foreground, Layer 2 is opaque background and τ (0 ≤ τ ≤ 1) is translucency. ToF camera observes the overlapped IR signals reflected from foreground and background objects. In this case, the depth value is determined by translucency (τ) and two amplitudes (Af and Ab) of reflected IR signals arrived at the sensor. Unlike Eq. (1), the IR amplitude terms Af and Ab are not eliminated in the modified depth formula. Note that the IR amplitude Af and Ab vary upon the traveling distance and the surface texture of target object. This introduces a new and critical depth distortion: for example, the objects at the same distance but at different amplitudes present different depth values. Fig. 3 illustrates the depth distortions affected by the background texture behind a translucent object. Fig. 3(a) shows both the IR intensity and depth map of a normal opaque object. The IR amplitude varies along the texture on the object while the calculated depth shows identical distance from the camera because the IR amplitude does not affect the depth calculation as shown in Eq. (1). In Fig. 3(b), however, we put a translucent object in between the opaque pattern object and depth camera. Now Eq. (1) is not valid anymore and we have to use a multilayer object model like our two layer model in Eq. (2) to analyze the overlapped IR light rays. First of all, the depth of foreground object with high translucency cannot be correctly measured because most of emitted IR light rays penetrate the object and fly to the background (e.g., a textured object) Furthermore, the IR amplitude of the background object affects the depth calculation in this case. We are able to see that the different IR amplitudes of the background object come to 29 get a different foreground depth value in Fig. 3(b). In Fig. 3(c), we move the translucent object toward the depth camera. Because the traveling distance of reflected light from the translucent foreground is decreased, the attenuation of corresponding IR amplitude is also decreased. As a result, Af becomes more dominant in Eq. (2), especially in the region of darker background. Therefore, the calculated depth becomes closer to the real translucent foreground object location (becomes darker blue in Fig. 3(c)). However, the region of brighter background keeps its background depth value, where Af is relatively less dominant due to higher Ab. We can see that the foreground depth error in darker background decreases when we move the translucent object closer to the camera (from (b) to (c) in Fig. 3). If we can control the background and decrease Ab, foreground terms will be more dominant and the calculated depth will be closer to the ground truth foreground depth value. However, we have no such control or knowledge of the background object in real situations and we need a generalized solution for translucent object imaging. We observe a similar effect with translucency (τ) variation that also changes the dominance of Af and Ab in Eq. (2). This experiment shows that the previous depth calculation fails to obtain the correct depth of the translucent object. We have verified that our new Eq. (2) for the two layer model can explain what happens with a translucent foreground object in front of opaque background in real situations. Another interesting behavior of the depth of translucent objects is depth reversal due to its miscalculated depth value. In other words, the miscalculated depth following Eq. (2) does not always lie in between the ground truth depth of the translucent foreground and opaque background. Under some conditions, the miscalculated depth can be farther or closer than both the foreground and the background (Fig. 4). Fig. 4 shows two sample depth images of a translucent object. In these two cases, the translucent foreground object distance from the camera has been changed with a fixed background object located at the maximum operating range (5 m in this camera). In Fig. 4(b), the foreground object is located 2.5 m away from the camera and the observed depth of the translucent objects is pushed toward the background object. In the case of Fig. 4(c), however, the foreground object Fig. 3. Depth errors of translucent foreground and opaque background objects. First row images are IR intensity images, second row images are depth images and the last row shows the relative locations of depth camera, translucent foreground and opaque background objects, respectively. 30 S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38 Fig. 4. (a) A translucent object and opaque background. Background is at the end of the sensing range (5 m). (b) Foreground object is located at 2.5 m away from the camera. (c) Foreground object is located at 1 m away from the camera. is located at around 1 m away from the camera and the observed depth of the translucent objects is pulled toward the camera that is closer than both the foreground and the background. This situation easily can be explained mathematically using Eq. (2). Based on this two layer model, we try to find if there is any case where the following condition is satisfied. ( ) ð1−τ ÞA f Q f 3 −Q f 4 þ τ 2 Ab ðQ b3 −Q b4 Þ Q f 3 −Q f 4 c c tan−1 − tan−1 N0 2 2 2 Q f 1 −Q f 2 ð1−τ ÞA f Q f 1 −Q f 2 þ τ Ab ðQ b1 −Q b2 Þ ð3Þ Let us consider a special case when 1 − τ = τ2 and we replace Ab(Q b1 − Q b2) = B1, Ab(Q b3 − Q b4) = B2, Af (Q f1 − Q f 2) = F1 and Af(Q f3 − Q f4) = F2. We rewrite the condition as follows: tan−1 F 1 B2 −F 2 B1 N0 F 1 ð F 1 þ B1 Þ þ F 2 ð F 2 þ B2 Þ ð4Þ Fig. 1(b) shows four possible cases of (Q1 − Q 2) and (Q 3 − Q 4) status. We easily can find examples satisfying this condition. For example, when our background is located at the maximum operating range of the ToF camera (in the fourth quarter) and the foreground in the first quarter in Fig. 1(b), B1 N 0, B2 ≈ 0, F1 N 0 and F2 N 0 satisfying Eq. (4). As a result, we will observe the situation like Fig. 4(c). In this work, we adopt the two layer translucent model assuming that there is only one media change in the translucent object. The study of ToF principle in this section will be employed in Sec. 3.3 for the recovery of original shape of a translucent object. 3. Proposed method 3.1. Detection of translucent object Separating superposed multiple light rays from a single observation with unknown translucency (τ) and IR amplitude (Af and Ab) in Eq. (2) is an ill-posed problem. In our framework, we propose to use a pair of observations from different viewpoints using a stereo ToF camera as shown in Fig. 5(a). By fixing the stereo setup, we perform an extrinsic calibration [15] between two ToF cameras to compute the geometric transformation. Given the extrinsic parameters, we can register two point clouds captured by a stereo depth camera in the same coordinate system. Given an opaque object with a totally matte surface (i.e., a Lambertian surface), observed 3D geometry at different viewpoints should overlap each other in the world coordinate except in occluded regions. Occlusion always appears with the changes in viewpoint or field-of-view. In general, an occluded 3D point of one camera does not have any corresponding depth point at the other camera. Similarly, a Fig. 5. Stereo ToF camera: Translucent object captured at each camera is either closer or farther from the real location along the respective light ray directions. S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38 translucent surface point has no corresponding point at the other viewpoint, because the distortion due to translucency appears along the respective IR ray direction (Fig. 5(b)). Utilizing this view dependent property, we detect the candidate of translucent points by comparing two point clouds from the stereo ToF camera. In order to detect the inconsistent points and distinguish translucent points from occluded points, we employ the following scheme (Fig. 6). P ¼ pLi ; pRj pLi −pRm Nδ; pRj −pLn Nδ; f or all pLn ∈ ΦL ; pRm ∈ ΦR ð5Þ P represents a set of inconsistent points, δ is the minimum distance to determine an inconsistent point and ΦL and ΦR are point clouds inferred by left and right ToF cameras respectively. We choose the points of one view pLi ∈ ΦL that are located farther than δ from all 3D points of the other view pRm ∈ ΦR and denote them as an inconsistent point set P. Among the inconsistent points in P, we remove the points that are out of the field-of-view of the other ToF camera. And then, we project the remaining inconsistent points toward the other camera. During the projection, if there is any inconsistent point at one view that approaches any point at the other view within the minimum distance δ in Eq. (5), we conclude that the point is occluded in the other camera's viewpoint. We remove all these occluded points and denote all remaining points by translucent points PtrL and PtrR. Note that even though this detection method finds most translucent points in general, it is apparent that translucent points can be mistakenly removed in some complicated scenes. This affects translucent point recovery performance. (We will show experimental evaluations for our detection and recovery results as well as recovery only results using ground truth translucent region information). Fig. 7(a) shows two registered point clouds (blue and black) from both ToF cameras. The translucent parts are pushed toward the different directions of the background. Fig. 7(b) shows translucent region (indicated in red color) detection results. The opaque foreground object gives an idea where the translucent foreground object originally has been located. In real depth images, we easily can observe inconsistent points that are neither an occlusion case nor a translucent case due to imaging noise and low resolution (Fig. 7(b) shows some noisy points). These noisy points (usually around the translucent object boundary) will be removed by 3D spatial filtering in the next step. Stereo ToF setup for translucent object imaging works well with an object with a narrow body or located close enough to the camera so as to have no overlapped points between the translucent object regions (see Fig. 5). Frequently, however, the overlapping occurs between the two translucent object observations either when the translucent object is located far away from the stereo ToF camera or when the body of the translucent object is wide as illustrated in Fig. 8. In these cases, the baseline distance of the stereo setup is not long enough to separate the 31 inferred translucent surfaces and overlapped observations in the translucent object are not detected. Only partial (non-overlapped points) translucent regions will be detected and reconstructed. If we increase the baseline distance to alleviate this problem, the size of the occluded region will increase and our stereo setup will have a bigger size becoming less practical. We propose a simple modification of our framework to alleviate this problem. Fig. 9 shows a skewed stereo ToF camera, where a ToF camera on the right is slightly moved backward. The skewed stereo ToF hardly observes such overlap problem (DRtr = DLtr). Using Eq. (2), the condition for the singular case is as follows. ð1−τÞF 2 þ τ 2 B2 ð1−τÞð F 2 þ Δ2 Þ þ τ 2 ðB2 þ Δ2 Þ ¼ ð1−τÞF 1 þ τ 2 B1 ð1−τÞð F 1 þ Δ1 Þ þ τ 2 ðB1 þ Δ1 Þ ð6Þ where F1 = Af (Q f1 − Q f 2), F2 = Af (Q f 3 − Q f 4), B1 = Ab(Q b1 − Q b2), B2 = Ab(Q b3 − Q b4) and Δ1, Δ2 (|Δ1| = |Δ2|) are the amount of electric charge changes determined by the distance difference of the two ToF cameras and distance-electric charge relations illustrated in Fig. 1(a). This equation is simplified as follows: τ2 ðB1 B2 Þ þ ð1−τÞð F 1 F 2 Þ ¼ 0 ð7Þ that is the singular condition of our skewed stereo ToF framework. Now, our skewed stereo setup provides a crucial shape cue of translucent surface that will be used in our depth refinement step (Fig. 9). Detection of translucent region works as described in the previous section (Fig. 10). 3.2. Depth Refinement of Translucent Object To recover the depth of the translucent object, we minimize the distance between two translucent surface observations moving each point along its ray direction. PLtr and PRtr are the detected translucent point sets from each ToF camera. If a translucent point is inferred by both cameras and has moved from the original location as shown in Fig. 5(b), we assume that the original location of the translucent point should be around the intersection of the two light rays from both cameras. Our method minimizes the distance between the two point sets moving them along the respective light ray bidirectionally. We introduce an iterative energy minimization framework to find the optimal intersections of the two interconnected point sets. Our energy function at tth iteration for each 3D point connected to eight neighbor pixels is defined as follows: Energy ¼ Ot þ λRt þ σT t ð8Þ where λ and σ determine the weight of the score terms. Energy is minimized over iteration moving 3D points along their respective ray Fig. 6. Sample translucent object with stereo ToF camera. 32 S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38 Fig. 7. Detection using stereo ToF Camera. Translucent foreground points are indicated in red color. Some noisy points are miss-classified as translucent foreground in (b). directions. The iteration stops when the average change of the energy falls below a decision boundary. Data Cost: O is the data cost for finding intersections between translucent point rays from both cameras. In reality, a light ray of one camera never intersects with any light ray of the other camera because the cameras have limited resolution and no exact correspondences are guaranteed. Furthermore, discrete 3D points of a non-planar surface from different viewpoints cannot match each other perfectly as illustrated in Fig. 11. We detect a set of intersecting points between a light ray of translucent point of one camera and translucent surface of the other camera. Our data cost O is the summation of every minimum Euclidean distances from a translucent point of one camera to the translucent point clouds of the other camera. Ot ¼ X i X min pLi −pRj þ min pLi −pRj j j i ð9Þ light ray directions. With this cost, each point tries to find the closest corresponding point from the other camera independently, approaching the intersection between its light ray and the corresponding surface of the other camera. Regularization Cost: R is the regularization score enforcing the depth similarity among the valid depth values of eight neighbor pixels in depth image (pLn ∈ PtrL and pRm ∈ PtrR). n and m are neighbor pixel indices for each translucent point clouds. Depth similarity term favors a surface having similar depth values rather than keeping the observed shape of the translucent surface. In fact, the depth distortion of a translucent surface does not vary linearly from its ground truth depth and the depth observation shows the nonlinear distortion from the original shape. Furthermore, translucent surfaces are often highly reflective and it introduces an additional depth distortion. Rt ¼ XX XX R pL −pL þ p −pR i i where pLi ∈ PtrL and pRj ∈ PRtr are the observed translucent points from both cameras and i and j are the indices of the translucent points. Note that the locations of pLi and pRj are updated at each iteration along respective n n j j m ð10Þ n Topology Cost: Topological consistency term T favors keeping the relative depth location of the observed points within the surface points. Fig. 8. Overlapped translucent observations with a translucent object located far from the cameras or having a wide body. S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38 33 Fig. 9. Skewed stereo ToF Camera. Different from the regularization term R, topological consistency term is introduced to recover the original surface shape from the nonlinear distortion. Tt ¼ X t Li þ X i t Rj j ^L ^L t i ¼ p i −piþ1 if 0 observed (distorted) topological consistency during iteration step. In most circumstances, it projects detected translucent depth points onto somewhere in the intersecting regions having similar depth values partially keeping the current topology (Fig. 12). This problem is addressed in the next section. ð11Þ otherwise L 3.3. Shape recovery from modified topological cost ^ Li b p ^ Liþ1 p R ^ j are the sorted depth points by their depth values in ^ i and p where p descending order. In our translucent surface model, we know that the depth distortion occurs along each ray direction. Based on this observation, the order of depth value in our image plane can represent detailed shape of the object surface. Any small or partial change in shape from the original shape will change this sorted depth order. If there is any reversal in order, related points will be penalized to keep the original shape. However, current observations used in our optimization framework has limitation in recovering original shape because any information on the original shape of translucent objects is totally lost in those distorted depth images. It just tunes to favor either depth similarity or Shape Cue from IR Intensity Images. As we pointed out in the previous section, the observed shape of a translucent object is distorted such that they cannot be directly used to recover the details of the original shape. The main reason of this distortion can be explained by the non-linear characteristic of translucent depth calculation studied in Sec. 2 in detail. However, based on the knowledge of this non-linear distortion, we can recover the original location of each point iteratively as follows. In order to recover the fine details of the original translucent surfaces, we utilize a pair of depth images and corresponding IR intensity images taken from our skewed stereo ToF camera. As investigated in Eq. (2) and in the experiments shown in Fig. 3, different distances to an object give different IR amplitudes Af and Ab. Therefore, Eq. (2) calculated from multiple depth images can give new information about the relation between the electric charge values of the foreground and the background so that we can infer the location of the original translucent Fig. 10. Detection using skewed stereo ToF Camera. Translucent foreground points are indicated in red color. 34 S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38 where Δ is the increment of camera distance along depth direction. β1 and β2 are the slopes of (Qf1 − Qf2) and (Qf3 − Qf4) graph along the distance in Fig. 1(b). B1 and B2 become B1 ¼ Δβ2 ∓ Δβ1 Z 2 Z 2 −Z 1 B2 ¼ Z 1 B1 ð13Þ 2DL 2DR where Z 1 ¼ tanð c b Þ, Z 2 ¼ tanð c b Þ, ± Δβ2 and ± Δβ1 are all known values from our observations and Fig. 1(b). Finally, we can decide Ab from the IR intensity ILb observed by the left camera and the constraint Fig. 11. Discrete 3D points of a non-planar surface from different viewpoints are taken from different surface locations. depth in terms of (Q f 1 − Q f 2) and (Q f 3 − Q f4). Unknown variables in this framework are IR amplitudes Af, Ab, translucency τ, and real depth values at the first camera Df and Db of both translucent foreground and opaque background behind. With our skewed stereo setup, if a background point of one camera can be seen from the other camera, we can specify Ab and Db reducing unknown variables. In general, a translucent foreground object has a homogeneous surface, or identical translucency and opaque reflectivity. Then we can extend our calculations to the neighbor foreground points with obtained homogeneous translucency τ and reflective Af where the background cannot be seen from any of the stereo camera. Let two opaque background observations of corresponding foreground depth points from our skewed stereo camera be DLb and DRb. In reality, finding a pair of corresponding foreground depth points from stereo camera requires accurate registration of depth images. However, observed depth points cannot be used to register each other because they are all potential translucent points with wrong depth values before our recovery step. Therefore, we consider that background points share similar IR intensity and depth values. In Sec. 5.2, we experimentally show the effect of IR intensity variation of background on our depth recovery result. From Eq. (1), we express DLb and DRb in terms of B1 = Ab(Qb1 − Qb2), B2 = Ab(Qb3 − Qb4), Δ, β1 and β2. DLb ¼ DRb c B2 tan−1 B1 2 c B2 Δβ2 ¼ tan−1 B1 Δβ1 2 3 þQ 4 of jQ 1 −Q 2 j þ jQ 3 −Q 4 j ¼ Q 1 þQ 2 þQ ¼ K as depicted in Fig. 1. Note 2 that the amplitude of a reflected IR signal (strength of reflected IR signal at a time instance) cannot be observed directly and can only be induced from the IR intensity (strength of reflected IR signal during a fixed time duration such as the integration time of ToF camera). Ab ¼ ILb 2ðjB1 j þ jB2 jÞ ð14Þ Similarly, ILf and IRf are two IR intensity observations of the point on the translucent foreground surface defined as follows. ILf ¼ ð1−τÞA f 1 K þ τ2 Ab1 K IRf ¼ ð1−τÞA f 2 K þ τ2 Ab2 K ¼ ð1−τÞA f 1 2 2 Df Db K þ τ 2 Ab1 K Df þ Δ Db þ Δ ð15Þ In practice, we need to account for the light attenuation which is inversely proportional to the flying distance. For that, we substitute D 2 2 f b Þ and Ab2 ¼ Ab1 ðDbDþΔ Þ . From Eq. (15), we derive the A f 2 ¼ A f 1 ðD f þΔ following relation. 2 I Rf −τ2 C 2 Df ¼ L ; Df þ Δ I f −τ2 C 1 ð16Þ 2 b Note that ILf , IRf , C1 = AbK and C 2 ¼ Ab KðDbDþΔ Þ are known constants D 2 f Þ is from our previous derivations and observations. The value ðD f þΔ ð12Þ used to infer an original shape of translucent surface in the following. Topological Consistency Update Let us revisit our topological consistency term. pLi and pLi + 1 are two neighbor depth pixels in our sorted Fig. 12. (a) Depth similarity and (b) topological consistency terms force 3D points to move as illustrated. S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38 Table 1 Experimental results of our translucent surface detection and depth recovery: Solid line box in the second row represents original depth regions of each translucent surface. Dotted line box in the fourth row indicates reconstructed translucent object. 35 list of depth points on a translucent object surface. In the current iteration, there are corresponding points pRj and pRj + 1, respectively. D D0 2 2 f f Þ from the pLi and pRj pair and ðD0 þΔ Þ Then we can calculate ðD f þΔ f D 2 D0 2 f f from the pLi + 1 and pRj + 1 pair. Finally, if ðD f þΔ Þ NðD0 þΔ Þ , then we can f Test object conclude Df N D′,f vice versa. This can assign new topology constraints L L R R on p̂ i and p̂ iþ1 as well as p̂ j and p̂ jþ1 recovering the original shape of the translucent object. In order to gear this topology inference, we iteratively update the translucency term (τ). First, we initialize Df(1) by the data term O1 and put this in Eq. (16) to calculate the initial τ. Given the initial τ, we can activate our topology inference so as to compute an initial topological consistency score T1 in Eq. (11). In the next step, data term will update foreground depth value to Df(2) and τ and topology inference will be updated following these iterations approaching ground truth translucency and original shapes. Depth observation Translucent depth points detected & recovered 4. Experimental results Translucent surface reconstruction Object To build our skewed stereo ToF camera setup, we use a pair of MESA SR4000 ToF depth cameras. The baseline distance between the two cameras is set to 30 cm and the right side of the ToF camera is moved backward by 25 cm. We perform both qualitative and quantitative Input 29 Our Result 3D Mesh Representation Fig. 13. Translucent depth refinement results on various challenging objects in both 2D and 3D depth image representations. Object Input 30 Our Result 3D Mesh Representation Fig. 14. Translucent depth refinement results on various challenging objects in both 2D and 3D depth image representations (Continued). 36 S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38 Fig. 15. Two-layer model-based method reconstructs a rather flat surface compared to the original shape for three layers object such as Q3. experiments to evaluate the proposed method. In our experiments, weights λ and σ in Eq. (8) are empirically set to 1.2 and 0.8, respectively. 4.1. Qualitative Evaluation Table 1 illustrates several experimental results in a 3-dimensional point cloud. T1 is a planar translucent object attached on a planar opaque object. From the opaque region (lower body), we can imagine where the translucent region has to be located (solid line box). The 3D points on the translucent surface (red dots in the fourth row) are correctly relocated to the original location (dotted line box). T2 is a transparent bottle with an opaque label and cap on top. Except these opaque regions, all other points on the transparent surface are scattered backward and present the severe shape distortion. This is because its surface includes variable thickness or varying specularities, introducing additional distortions in observation. T3 and T4 are relatively less translucent and the observed depth points are not so far from the original location. In T3, two observed point clouds are overlapped to each other. As a result, only a subset of translucent points is chosen as inconsistent points. Nevertheless, our recovered depth shows the complete original shape since observations from two viewpoints complement missing points. T4 has both translucent and transparent regions. Especially, the observation of the transparent region appears near the background. Although the target surface has inhomogeneous material characteristics, we can successfully recover its original shape. As long as they are identified as inconsistent points, they can be treated as transparent points and undistorted as shown in this result. Our method reconstructs the original shape by virtue of the modified topological consistency. In the last column of Table 1, we visualize the close-up (a) Before view of our recovered depth points for T3 and T4 inside the thick boxes. These illustrations show that our method recovers detailed shapes of the original translucent object. Figs. 13 and 14 summarize our translucent depth recovery results on various challenging objects using both 2D and 3D representations. Each object is located on a gray box or a clamp indicating their ground truth positions. 3D mesh visualization of objects Q1 and Q2 and their depth images before and after our recovery show that our method is able to recover detailed original shape of translucent objects. In Q3 and Q4, overall translucent regions are well recovered yet some distorted points on a specular surface are not restored correctly. Q5 and Q6 have almost transparent regions that are correctly recovered. However, the boundary of these transparent objects in the reconstructed depth image has many discontinuities. In the translucent object detection step, the points around the boundary of a transparent object have difficulties in finding corresponding points from the other camera view due to the interference of background points. Q8 and Q10 are thin objects that coincide well with our two-layer model. Q3, Q4, Q5 and Q6 have two thin layers in those foreground objects (total of three layers including background), and our two-layer model-based method reconstructs a rather flat surface compared to the original shape (Fig. 15). Q7 and Q9 are thick objects that have light refraction and attenuation within the foreground media. Our method shows robust results on these cases. Different refraction amount on the surface points calculates different translucency even with a uniformly translucent object. As a result, recovered shape will be distorted along the difference in refraction amount. Fig. 16 shows object Q9 in another viewpoint showing a closed-up view with its surface details. The detailed shape of the object is recovered better at the upper part of the object after our refinement. Even though it appears that the original surface detail is lost in the observed depth, our shape recovery step reverses such a distortion process using a two-layer translucent surface model accounting the ToF principle. As a result, we can recover the original shape of the translucent object. Average processing time with our dataset is around 360 s using an Intel i3-3.2GHz desktop PC. If there is a multi-layer translucent object or complex background situation that cannot be explained well with our two-layer model, recovered surface shape will present additional distortions. This issue will be discussed in Sec. 5. 4.2. Quantitative evaluation In order to measure the performance of our method, we recover the shape of a transparent cup having controlled multiple levels of translucency. We vary the translucency by adding white opaque ink into the water in stages. Fig. 17 shows some samples of target objects for the evaluation. Overall, we have 11 stages from transparent to opaque. The most opaque stage serves as our ground truth. We compute the average error for all other 10 stages (τ-Index = 1 to 10) by comparing (b) After Fig. 16. Detailed surface shape refinement of Object Q9. S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38 -Index = 1 -Index = 4 -Index = 7 37 -Index = 10 Fig. 17. Sample translucent objects for quantitative evaluation: Translucency index (τ-Index) 1, 4, 7 and 10. with the ground truth before and after our depth refinement. Fig. 18 shows our quantitative evaluation results. Proposed method is evaluated in two ways. In the first experiment, both our translucent region detection and recovery are evaluated. In the second experiment, only translucent region recovery is performed based on the ground truth translucent region information. With the first 9 cases, our method shows reasonable refinement results. The overall average depth errors in the translucent object before and after our proposed detection and refinement are 76 and 10 mm. (Average accuracy of the SR4000 camera with opaque objects is known to be around 10 mm). 86.79% of distance error is recovered from the original observation of the translucent object obtaining opaque object level accuracy. In the last (10th) case, observed depth image is completely transparent so that it is difficult to correctly detect the translucent region. Consequently, only a subset of the translucent region is correctly recovered, the average error after our refinement is significantly increased. 5. Discussion 5.1. Validity of our two-layer model Our two-layer model assumes that a foreground object is thin enough to be considered there is only a single layer causing reflection. Furthermore, the background surface albedo is set to 1 meaning that every IR lights will bounce back to the camera. With some real objects, a generalized multi-layer model with reflectivity terms less than 1 is required to correctly represent and reconstruct their shape in arbitrary situations. Refraction and attenuation model in the foreground media also can be included in the generalized model. However, this extension adds extra unknown variables in our formulation requiring extra observations. In order to keep the practicability of our solution, we propose to use a two-layer model that effectively approximates most regular translucent objects as shown in our experimental results. 5.2. Complex background In our shape recovery step, complex background such as black and white patterns or complex geometry can introduce additional errors in the refinement result. It is because that our method considers that an occluded background surface from one viewpoint has similar characteristics to the background surface seen from the other viewpoint in our stereo camera. If a translucent object is located close enough to the camera to see every background surface from either camera, it is not necessary to consider the type of background. However, in many cases, there exists background surface that cannot be seen from one or both of our stereo cameras. Fig. 19 is an example of our refinement result with black and white pattern on the background. Flat and translucent object surface is reconstructed for both background cases showing that the complex background results in the distorted surface. 5.3. Effect of the smoothness term Fig. 18. Quantitative evaluation results. In our energy function in Eq. (8), regularization cost Rt yields a smooth reconstructed surface. It helps to alleviate environmental variation in shape recovery and keep our refinement robust to unknown noise sources. On the other hand, minute surface shape details can be lost after refinement. In our experiments, weight λ for the regularization cost is empirically set to 1.2. On the other hand, weight σ for the topology cost is empirically set to 0.8. This means that the configuration prefers to have smoother surface removing any unexpected noises rather than the reconstruction of detailed shape on the translucent surface. If an object has homogeneous foreground with uniform background without much geometrical variation and the object is thin enough to be explained by our simple two-layer model, we can increase the ratio σλ to focus more on the reconstruction performance of detailed shape. 38 S. Lee, H. Shim / Image and Vision Computing 43 (2015) 27–38 Fig. 19. Translucent surface refinement with uniform and non-uniform background. 6. Conclusion In this paper, we propose a skewed stereo ToF depth camera for transparent and translucent object detection and distorted depth refinement. We find that translucent surfaces present the systematic depth distortion which interferes to employ traditional stereo matching scheme directly. In order to account for depth distortions in translucent surfaces, we model these systematic depth distortions based on the understanding of sensing principle and experimental study. On top of the theoretical model, we develop an iterative optimization framework. Our optimization framework includes a topological consistency term which helps in recovering the details of surfaces. In the future, we plan to extend our proposed method for detecting and reconstructing highly reflective surface (specular surface). In general, the surface of a translucent object has highly reflective surface as well. Reflective surface detection and recovery prior to our translucent object refinement will improve the overall refinement performance. Increasing the number of cameras to more than two also improves the accuracy of depth refinement because it gives better approximation of the topology consistency term. The proposed method can improve not only the quality of three dimensional reconstruction, but also human–computer interaction and robot navigation. For instance, autonomous vehicle equipped with our method can correctly locate any translucent obstacle and evade successfully. Acknowledgment This work was supported by Kyung Hee University in 2013 under grant KHU-20130684. This work was also supported by the Global Frontier R&D Program on “Human-centered Interaction for Coexistence” funded by the National Research Foundation of Korea grant funded by the Korean Government (MSIP) (2012M3A6A3057376). References [1] S. Albrecht, S. Marsland, Seeing the Unseen: Simple Reconstruction of Transparent Objects from Point Cloud Data, 2nd Workshop on Robots in Clutter, 2013. [2] U. Klank, D. Carton, M. Beetz, Transparent Object Detection and Reconstruction on a Mobile Platform, IEEE International Conference on Robotics and Automation, 5971– 5978, 2011. [3] N. Alt, P. Rives, E. Steinbach, Reconstruction of Transparent Objects in Unstructured Scenes with a Depth Camera, IEEE International Conference on Image Processing, 2013. [4] I. Lysenkov and V. Eruhimov and G. Bradski, Recognition and Pose Estimation of Rigid Transparent Objects with s Kinect Sensor, Robotics Science and Systems Conference (2102). [5] T. Wang and X. He and Barnes, N., Glass object localization by joint inference of boundary and depth, International Conference on Pattern Recognition (2012). [6] C.J. Phillips, K.G. Derpanis, K. Daniilidis, A Novel Stereoscopic Cue for Figure-Ground Segregation of Semi-Transparent Objects, IEEE International Conference on Computer Vision Workshops, 2011. [7] I. Ihrke, K. Kutulakos, H. Lensch, M. Magnor, W. Heidrich, State of the art in transparent and specular object reconstruction, STAR Proc. of Eurographics (2008). [8] S. Yang and C. Wang, Dealing with laser scanner failure: Mirrors and windows, Proc. of the IEEE Intl. Conf. on Robotics and Automation (2008). [9] M. Fritz, M. Black, G. Bradski, T. Darrell, An Additive Latent Feature Model for Transparent Object Recognition, Advances in Neural Information Processing Systems, 2009. [10] V.R. Kompella, P. Sturm, Detection and Avoidance of Semi-Transparent Obstacles Using a Collective-Reward Based Approach, IEEE International Conference on Robotics and Automation, 2011. [11] K. McHenry, J. Ponce, A Geodesic Active Contour Framework for Finding Glass, IEEE Conference on Computer Vision and Pattern Recognition, 2006. [12] K. McHenry, J. Ponce, D. Forsyth, Finding Glass, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. [13] F. M´eriaudeau and R. Rantoson and D. Fofi and C. Stolz, Review and comparison of non-conventional imaging systems for three-dimensional digitization of transparent objects, Journal of Electronic Imaging, Vol.21, No.2, pages 021105 (2012) [14] A. Kadambi, R. Whyte, A. Bhandari, L. Streeter, C. Barsi, A. Dorrington, R. Raskar, Coded Time of Flight Cameras: Sparse Deconvolution to Address Multipath Interference and Recover Time Profiles, ACM Transactions on Graphics, 2013. [15] Z. Zhang, A Flexible New Technique for Camera Calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000. [16] C. Inoshita, Y. Mukaigawa, Y. Matsushita, Y. Yagi, Shape from Single Scattering for Translucent Objects, In Proceedings of European Conference on Computer Vision, 2012. [17] K.N. Kutulakos, E. Steger, A Theory of Refractive and Specular 3D Shape by Light-Path Triangulation, In Proceedings of IEEE International Conference on Computer Vision, 2005. [18] K.N. Kutulakos, E. Steger, A Theory of Refractive and Specular 3D Shape by LightPath Triangulation, International Journal of Computer Vision, 2008. [19] N.J. Morris, K.N. Kutulakos, Dynamic Refraction Stereo, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011. [20] H. Murase, Surface Shape Reconstruction of an Undulating Transparent Object, International Conference on Computer Vision, 1990. [21] H. Murase, Surface Shape Reconstruction of a Nonrigid Transparent Object Using Refraction and Motion, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1992.
© Copyright 2026 Paperzz