Light Field Based Digital Refocusing Using a DSLR Camera
with a Pinhole Array Mask
Chih-Chieh Chen, Yi-Chang Lu, Ming-Shing Su
Graduate Institute of Electronics Engineering, National Taiwan University
{b92901073, yichanglu, mingshingsu}@ntu.edu.tw
ABSTRACT
In this paper, a computational photography system is
utilized to sample 4D light fields. The system is
implemented using a normal DSLR camera with a mask
printed using Kodak LVT technique. We reconfigure the
camera by inserting a pinhole array mask in front of the
sensors. The mask blocks part of light and samples the rest
that passes through the pinholes. With these 4D light field
data, the refocused images are obtained by rearranging the
captured sub-images. The range of refocusing is also studied
to avoid diffraction blurring effect in the refocused images.
Index Terms— Light field, pinhole array mask, digital
refocusing
1. INTRODUCTION
Conventional cameras only capture 2-dimensional
projection of a scene. To acquire more information, the
technique of computational photography can be used by redesigning a conventional camera. With the additional
information and novel post-processing techniques, higher
quality images can be obtained.
Our camera is designed to collect the information of 4D
light fields, which is a function representing the amount of
energy of different rays of light. The light field, ܮሺݑǡ ݒǡ ݏǡ ݐሻ,
represents the light ray that intersects the uv-plane at ሺݑǡ ݒሻ
and intersects the st-plane at ሺݏǡ ݐሻ. Each light ray can be
uniquely described by these two intersections.
The information of the light field is 4-dimensional. As
mentioned earlier, in a traditional camera we can only have
the 2-dimensional information. The data of the other 2dimentions are lost in the process of charge integration
within sensor. For example, we can regard the sensor plane
as the st-plane and the lens as the uv-plane. Each sensor
pixel integrates the charges generated by all the light rays
shining on it, thus the distinct information of a single point
on the uv-plane is lost.
The concept of 4D light field was first presented in [1].
They use a moving camera to capture the light field, and it is
a time-consuming process. The other method is to use a
large camera array [2]. Both methods require relatively huge
equipment and can only work in certain environment. The
978-1-4244-4296-6/10/$25.00 ©2010 IEEE
754
hand held light field acquisition camera was later presented
in [3-7]. In [3], a microlens array is inserted in front of the
sensors. Each microlens samples the radiance at the position.
In [4], a programmable aperture with an electronically
controlled liquid crystal array is designed. The coded
apertures are used in a time-multiplexed post-processing
technique of light field. In [5] a technique of using cosine
mask is proposed, in which, the results are computed in
frequency domain instead. We are also inspired by them
about the mask material. The researchers [6] calculate depth
of focus based on the defocus of a sparse set of dots
projected onto the scene. The researchers [7] place a
patterned mask within the aperture of the camera lens and
use statistical models to retrieve depth information.
The concept of our system is micro camera array inside a
DLSR camera. Each pinhole represents a camera, sampling
the radiance passing through the pinhole. The data are
recorded by sensors in the corresponding location.
The structure of this paper is as follows: Section 2
explains the design of our camera. In Section 3 we describe
how to synthesize the refocused image using the information
captured by our camera. In Section 4, the experimental
results are presented and the range of refocusing is
discussed. Finally, we conclude our work in Section 5.
2. CAMERA DESIGN
To capture the light field inside the camera, we put a mask
in front of the image sensors. An array of pinholes is aligned
at a fixed distance on the mask. Only the rays of light
passing through the pinholes can arrive at the photo sensor.
Other rays of light will be blocked by the mask.
The rays of light passing through a pinhole form a
circle sub-image in the corresponding region in the sensors.
The diameter of the sub-image is ܣൈ ሺሺ݀ ݎݏ݊݁ݏെ
݀݉ܽ ݇ݏሻȀ݀݉ܽ ݇ݏሻ, where A is the diameter of the aperture,
݀݉ܽ ݇ݏis the distance between the mask and the lens, and
݀ ݎݏ݊݁ݏis the distance between the sensor and the lens. To
collect as much information, the size of sub-images should
cover as many sensor pixels without any overlapping. We
can adjust the size of relative aperture to meet the criteria so
that the sub-images would not be corrupted, as illustrated by
Figure 1.
ICASSP 2010
Figure 3: The relationship between uv, st, and xy-plane. One
is the results of a linear combination of the other two.
3. DIGITAL REFOCUSING
Figure 1: (a) The arrangement of lens and pinholes. The
relative aperture of the lens and the pinholes interval have to
be matched; (b) mismatched, insufficient resolution; (c)
matched; (d) mismatched, corrupted sub-images.
Figure 2: The object to be captured must be placed inside
the focus plane of a conventional camera. Different parts of
the letter ‘A’ are projected to different sub-image.
Each pixel in the sub-image collects the light which
passes through the lens and the corresponding pinholes on
the mask. The light field on the mask ݇ݏܽ݉ܮሺݑǡ ݒǡ ݏǡ ݐሻ
represents the light passing through at ሺݑǡ ݒሻ on the lens
plane, ݈ܵ݁݊ ݏሺݑǡ ݒሻ , and at ሺݏǡ ݐሻ on the mask plane,
ܵ݉ܽ ݇ݏሺݏǡ ݐሻ . Tracing back these rays of light, they will
intersect at a point on the plane, ܵ ݁ܿܽݎݐ, which is ݀݁ܿܽݎݐ
away from the lens, whereͳȀ݀ ݁ܿܽݎݐ ͳȀ݀݉ܽ ݇ݏൌ ͳȀ݂ by
the thin lens equation, and f is the optical focus distance of
lens. ݀݉ܽ ݇ݏ. If an object is placed between the plane ܵ݁ܿܽݎݐ
and the lens, the detected light field will provide enough
information for refocusing processes (Figure 2). Meanwhile,
if the object is put exactly on the plane ܵ ݁ܿܽݎݐ, all the data
within a sub-image are collected from the same point on the
object, which fails to provide sufficient information.
755
An object image taken by a conventional camera is clear if
this object is focused; otherwise, it would be blurred if it is
out-of-focus. When a conventional camera focuses on an
object, the rays of light emit from the object will form the
image in the same area on the sensor plane, which makes the
object image sharp and clear. The distance ݂݀ ݏݑܿbetween
the focused object and the lens is determined by ݀ ݎݏ݊݁ݏand
f. However, a conventional camera can only focus on one
plane in one single shot, thus lacking the ability to refocus
on the object.
In our camera design, because of the pinhole mask, the
light field ݇ݏܽ݉ܮሺݑǡ ݒǡ ݏǡ ݐሻ of an image array is captured.
With the captured ݇ݏܽ݉ܮሺݑǡ ݒǡ ݏǡ ݐሻ , the image can be
synthesized on any user-assigned virtual film plane,
ܵ ܿ݅ݐ݄݁ ݐ݊ݕݏ. For the given virtual film plane ܵ ܿ݅ݐ݄݁ ݐ݊ݕݏ, the
distance ݀ ݏݑ݂ܿ݁ݎof refocused image can be found
by the equation ͳȀ݀ ݏݑ݂ܿ݁ݎ ͳȀ݀ ܿ݅ݐ݄݁ ݐ݊ݕݏൌ ͳȀ݂ , where
݀ ܿ݅ݐ݄݁ ݐ݊ݕݏis the distance between the lens and the virtual
plane ܵ( ܿ݅ݐ݄݁ ݐ݊ݕݏFigure 3). Let parameter r be ݀ ܿ݅ݐ݄݁ ݐ݊ݕݏȀ
݀݉ܽ ݇ݏ, then digital refocusing can be achieved by adjusting
parameter r with the information from a single camera shot.
To render the image on the virtual xy-plane, ܵ ܿ݅ݐ݄݁ ݐ݊ݕݏ,
the image pixel value at a point ሺ Ͳݔǡ Ͳݕሻ is given by the rays
passing through the lens and the point, expressed by
Equation (1) with the light field ܿ݅ݐ݄݁ ݐ݊ݕݏܮሺݑǡ ݒǡ Ͳݔǡ Ͳݕሻ of
uv-plane and xy-plane:
ܲሺ Ͳݔǡ Ͳݕሻ ൌ ܿ݅ݐ݄݁ ݐ݊ݕݏܮ ݒݑሺݑǡ ݒǡ Ͳݔǡ Ͳݕሻ݀ݒ݀ݑ.
(1)
The light field ݐ݊ݕݏܮΰ݁ ܿ݅ݐሺݑǡ ݒǡ Ͳݔǡ Ͳݕሻ on an assigned
virtual film plane, ܵ ܿ݅ݐ݄݁ ݐ݊ݕݏ, can be derived from the light
field ݇ݏܽ݉ܮሺݑǡ ݒǡ ݏǡ ݐሻ which is captured by our pinhole mask
camera (Figure 3). Thus the ray passing through s and x will
ݏݎെݔ
ݐݎെݕ
also intersect the lens plane at ݑൌ
and ݒൌ
.
ݎെͳ
ݎെͳ
Thus,
ܿ݅ݐ݄݁ ݐ݊ݕݏܮሺݑǡ ݒǡ ݔǡ ݕሻ ൌ ݇ݏܽ݉ܮሺ
ݏݎെ ݐݎ ݔെݕ
ݎെͳ
ǡ
ݎെͳ
ǡ ݏǡ ݐሻ.
(2)
Figure 4: (a) The pre-processed image. (b) The refocused image on the closer green box. (c) The refocused image on the
plane between the two boxes. (d) The refocused image on the farther blue box.
In our design we only have discrete ݅ݏand ݆ݐvalues on
the plane ܵ݉ܽ ݇ݏሺݏǡ ݐሻ. Let ሺ ݅ݏǡ ݆ݐሻ be the designed pinhole
position, where i, j are integers. The light field passing
through ሺ ݅ݏǡ ݆ݐሻ on the plane ܵ݉ܽ ݇ݏሺݏǡ ݐሻ and ሺݑǡ ݒሻ on the
plane ݈ܵ݁݊ ݏሺݑǡ ݒሻ is captured by the corresponding point
value at the sensor, ܳሺݑǡ ݒǡ ݅ݏǡ ݅ݐሻ.
The pinhole resolution can be measured by the distances
between pinholes, ο ݏൌ ݅ݏͳ Ȃ ݅ݏand ο ݐൌ ݅ݐͳ Ȃ ݅ݐ,
respectively. Then the refocused image value ܲሺݔǡ ݕሻ at
ܵ ܿ݅ݐ݄݁ ݐ݊ݕݏሺݔǡ ݕሻ is synthesized by accumulating the
corresponding point values ܳሺݑǡ ݒǡ ݅ݏǡ ݅ݐሻ at the sensors.
Thus, the equation becomes:
ͳ
ൈ ܳሺݑǡ ݒǡ ݅ݏǡ ݅ݐሻ
ܲሺݔǡ ݕሻ ؆ ܰܳ
ൌ
ͳ
ܰܳ
݅
݆
݅ݏݎെ ݆ ݐݎ ݔെݕ
ǡ
ݎെͳ
ݎെͳ
ൈ σ݅ σ݆ ܳሺ
ǡ ݅ݏǡ ݅ݐሻ.
(3)
Due to the limitations of the sensor area and pinhole
resolution, only the light field being captured, ܳሺݑǡ ݒǡ ݅ݏǡ ݅ݐሻ,
are accumulated. ܲሺݔǡ ݕሻ should be normalized by the pixel
count, defined as NQ, of the accumulated valid values.
756
4. RESULTS AND DISCUSSION
The DLSR camera we use is Nikon D80. The pre-processed
image is shown in Figure 4(a). The green box is placed 3
meters from the camera and the blue one is 5 meters.
Figures 4(b)(c)(d) show the results of dynamic digital
refocusing. Figure 4(b) is focused on the closer green box
(r=1.21), so the characters on the blue box in the
background are not clear. Once we move the plane of
refocusing toward the blue box, Figures 4(c) (r=1.16) and
4(d) (r=1.11) show the differences.
In a conventional camera, an image is the result of
convolution of the lens and objects. An in-focus object in
the image is convoluted with a narrower window function.
Meanwhile, an out-of-focus object is convoluted with a
wider window function. In our design, the array of subimages is the captured light field information sampled in the
angular and spatial domains. Sub-images are the samples of
the convolution results at different locations. According to
Equation (3), the dynamic refocused image can be
synthesized by shifting and averaging these sub-image
samples to reconstruct the window functions of different
focusing planes. However, when an object is too far away
from the in-focus plane, there would be zeros in the
reconstructed window due to the lack of angular sampling
5. CONCLUSION
Figure 5: The limitation of mask resolution leads to
diffraction blurring. (a) The refocused image (r=1.11). (b)
The out-of-focus image (r=1.08). (d) The diffraction blurred
image (r=1.05).
The authors have designed a camera system for light field
acquisition using a normal DLSR camera with a mask
inserted in front of the sensor. We also proposed a postprocessing algorithm to refocus scenes with information
captured by a single camera shot. Diffraction blurring is also
studied, which is caused by the lack of angular resolution.
The range of refocusing is derived to avoid diffraction
blurring effect.
6. ACKNOWLEDGEMENTS
Figure 6: The v-coordinates of the characters in the two
neighboring sub-images are displaced by οߥ.
resolution. Therefore the phenomenon of diffraction
blurring will occur (Figure 5).
Thus, given an r value of a refocusing plane, say ݏݑ݂ܿ݁ݎݎ,
we can find the range ሾ݀ܰ ܴܣܧǡ ݀ ܴܣܨሿ that the object inside
the boundary would not have the problem of diffraction
blurring. From Equation (3), it reveals that for an object at
distance d, the uv coordinates of two neighboring subimages are different by οߤ ൌ ቀ
ݎ
ݎ
ݎെͳ
ൈ οݏቁ in the u direction
ൈ οݐቁ in v direction (Figure 6), where r is
and οߥ ൌ ቀ
ݎെͳ
the corresponding value to refocus at d.
For out-of-focus objects in the refocused image,
the οߤ value of the objects, οߤ ݐ݆ܾܿ݁, is different from
οߤ ݏݑ݂ܿ݁ݎ. We define the value of differences as shift error.
When the shift error is larger than a threshold of K pixels in
sub-images, diffraction blurring occurs.
For the upper bound, ݀ ܴܣܨ, the corresponding r value is
ݎ
ൈοݏ
. To avoid diffraction blurring,
ܴܣܨݎand οߤ ܴܣܨis ܴܣܨ
ܴܣܨ ݎെͳ
the shift error should satisfy the following equation:
(οߤ ܴܣܨȂ οߤ ݏݑ݂ܿ݁ݎሻ ܭ, or,
ο ݏൈ ሺ
ܴܣܨ ݎ
ܴܣܨ ݎെͳ
െ
ݏݑ݂ܿ݁ݎ ݎ
ݏݑ݂ܿ݁ݎ ݎെͳ
ሻ ܭ.
(4)
ο ݏൈ ൬
ݏݑ݂ܿ݁ݎ ݎെͳ
െ
ܴܣܧܰ ݎ
ܴܣܧܰ ݎെͳ
7. REFERENCES
[1] M. Levoy, and P. Hanrahan, “Light Field Rendering,”
Proceedings of the 23rd Annual Conference on Computer graphics
and interactive techniques, ACM Press, New York, NY, USA, pp.
31-42, 1996.
[2] B. Wilburn, N. Joshi, V. Vaish, E. Talvala, E. Antunez, A.
Barth, A. Adams, M. Horowitz, and M. Levoy, “High Performance
Imaging Using Large Camera Arrays,” ACM Trans. Graphics,
ACM Press, New York, NY, USA, vol 24, pp. 765-776, 2005.
[3] R. Ng, M. Levoy, M. Brѐdif, G. Duval, M. Horowitz and P.
Hanrahan, “Light Field Photography with a Hand-held Plenoptic
Camera,” Computer Science Technical Report CSTR 2005-02,
Stanford University, Stanford, CA, USA, 2005.
[4] C.K. Liang, T.H. Lin, B.Y. Wong, C. Liu and H.H. Chen,
“Programmable Aperture Photography: Multiplexed Light Field
Acquisition,” ACM Trans. Graphics, ACM Press, New York, NY,
USA, vol 27, no 3, 2008.
[5] A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J.
Tumblin, “Dappled Photography: Mask Enhanced Cameras for
Heterodyned Light Field and Coded Aperture Refocusing,” ACM
Trans. Graphics, ACM Press, New York, NY, USA, vol. 26, issue
3, no. 69, 2007.
[6] F. Moreno-Noguer, P.N. Belhumeur, and S.K. Nayar, “Active
Refocusing of Images and Videos,” ACM Trans. Graphics, ACM
Press, New York, NY, USA, vol. 26, issue 3, no. 67, 2007.
Similarly,
ݏݑ݂ܿ݁ݎ ݎ
This research was partially supported by the National
Science Council, Taiwan, under NSC-97-2622-E-002-012CC1.
[7] A. Levin, R. Fergus, F. Durand, and W.T. Freeman, “Image
൰ ܭ.
(5)
For given ݏݑ݂ܿ݁ݎݎand K, the values of ܴܣܧܰݎand ܴܣܨݎcan
be calculated by Equations (4) and (5). Then, the
corresponding range of refocusing ሾ݀ܰ ܴܣܧǡ ݀ ܴܣܨሿ can be
determined.
757
and Depth from a Conventional Camera with a Coded Aperture,”
ACM Trans. Graphics, ACM Press, New York, NY, USA, vol. 26,
issue 3, no. 70, 2007.
© Copyright 2026 Paperzz