DETERMINING CAMERA RESPONSE FUNCTIONS

DETERMINING CAMERA RESPONSE FUNCTIONS FROM COMPARAGRAMS OF
IMAGES WITH THEIR RAW DATAFILE COUNTERPARTS
Corey Manders, Steve Mann
University of Toronto
Dept. of Electrical and Computer Engineering
10 King’s College Rd.
Toronto, Canada
ABSTRACT
Many digital cameras now have an option of raw data output. We
first show, by way of superposigrams, that this raw data output is
quantimetrically (i.e. in terms of the camera’s response to light)
linear, in the case of the Nikon D2H digital SLR camera. Next, we
perform comparametric analysis on compressed images together
with their corresponding raw data images in order to determine
the camera’s response function.
1. INTRODUCTION: TYPICAL CAMERAS AND
TRADITIONAL IMAGE PROCESSING
Most cameras do not provide an output that varies linearly with
light input. Instead, most cameras contain a dynamic range compressor, as illustrated in Fig. 1. Historically, the dynamic range
of a television tube (e.g. human visual response turns out to be approximately the same as the response of the television camera) [1].
For this reason, processing done on typical video signals will be on
a perceptually relevant tone scale. Moreover, any quantization on
such a video signal (e.g. quantization into 8 bits) will be close to
ideal in the sense that each step of the quantizer will have associated with it a roughly equal perceptual change in perceptual units.
Most still cameras also provide dynamic range compression
built into the camera. For example, the Nikon D2h camera captures
internally in 12 bits (per pixel per color) and then applies dynamic
range compression, and finally outputs the range–compressed images in 8 bits (per pixel per color). Fortunately, the Nikon D2h
camera also allows output of images in a non–range–compressed
12-bit (per pixel per color) format.
1.1. Why Stockham was wrong
Fig. 1: Typical camera and display: light from subject matter passes through lens
(typically approximated with simple algebraic projective geometry, e.g. an idealized
“pinhole”) and is quantified in units “q” by a sensor array where noise nq is also
added, to produce an output which is compressed in dynamic range by a typically
unknown function f . Further noise nf is introduced by the camera electronics, including quantization noise if the camera is a digital camera and compression noise if
the camera produces a compressed output such as a jpeg image, giving rise to an output image f1 (x, y). The apparatus that converts light rays into f1 (x, y) is labelled
CAMERA. The image f1 is transmitted or recorded and played back into a DISPLAY
system where the dynamic range is expanded again. Most cathode ray tubes exhibit a
nonlinear response to voltage, and this nonlinear response is the expander. The block
labelled “expander” is therefore not usually a separate device. Typical print media
also exhibit a nonlinear response that embodies an implicit “expander”.
compressor in video cameras arose because it was found that televisions did not produce a linear response to the video signal. In
particular, it was found that early cathode ray screens provided a
light output approximately equal to voltage raised to the exponent
of 2.5. Rather than build a circuit into every television to compensate for this nonlinearity, a partial compensation (exponent of
1/2.22) was introduced into the television camera at much lesser
cost since there were far more televisions than television cameras
in those days.
Coincidentally, the logarithmic response of human visual perception is approximately the same as the inverse of the response
When video signals are processed using linear filters, there is an
implicit homomorphic filtering operation on the photoquantity (a
measure of the quantity of light present on a sensor array element [2]). As should be evident from Fig. 1, operations of storage,
transmission, and image processing take place between approximately reciprocal nonlinear functions of dynamic range compression and dynamic range expansion.
Many users of image processing methodology are unaware of
this fact, because there is a common misconception that cameras
produce a linear output, and that displays respond linearly. In fact
there is a common misconception that nonlinearities in cameras
and displays arise from defects and poor quality circuits, when in
actual fact these nonlinearities are fortuitously present in display
media and deliberately present in most cameras. Thus the effect
of processing signals such as f1 in Fig. 1 with linear filtering is,
whether one is aware of it or not, homomorphic filtering. Stockham advocated a kind of homomorphic filtering operation in which
the logarithm of the input image was taken, followed by linear filtering (e.g. linear space invariant filters), followed by taking the
antilogarithm [3].
In essence, what Stockham didn’t appear to realize, is that
such homomorphic filtering is already manifest in simply doing
ordinary linear filtering on ordinary picture signals (whether from
video, film, or otherwise). In particular, the compressor gives an
image f1 = f (q) = q 1/2.22 = q 0.45 (ignoring noise nq and nf )
which has the approximate effect of f1 = f (q) = log(q + 1)
(e.g. roughly the same shape of curve, and roughly the same effect, e.g. to brighten the mid–tones of the image prior to process-
ing). Similarly a typical video display has the effect of undoing
(approximately) this compression, e.g. darkening the mid–tones
of the image after processing with q̂ = f ˜−1 (f1 ) = f12.5 . Thus
in some sense what Stockham did, without really realizing it, was
to apply dynamic range compression to already range compressed
images, then do linear filtering, then apply dynamic range expansion to images being fed to already expansive display media.
1.2. On the value of doing the exact opposite of what Stockham advocated
There exist certain kinds of image processing for which it is preferable to operate linearly on the photoquantity q. Such operations
include sharpening of an image to undo the effect of the point
spread function (PSF) blur of a lens, or to increase the camera’s
gain retroactively. We may also add two or more differently illuminated images of the same subject matter if the processing is
done in photoquantities. What is needed in these forms of photoquantigraphic image processing is an anti–homomorphic filter.
The manner in which an anti–homomorphic filter is inserted into
the image processing path is shown in Fig. 2.
Fig. 2: The anti–homomorphic filter: Two new elements fˆ−1 and fˆ have been
inserted, as compared to Fig. 1. These are estimates of the the inverse and forward
nonlinear response function of the camera. Estimates are required because the exact nonlinear response of a camera is generally not part of the camera specifications.
(Many camera vendors do not even disclose this information if asked.) Because of
noise in the signal f1 , and also because of noise in the estimate of the camera nonlinearity f , what we have at the output of fˆ−1 is not q, but, rather, an estimate, q̃.
This signal is processed using linear filtering, and then the processed result is passed
through the estimated camera response function, fˆ, which returns it to a compressed
tone scale suitable for viewing on a typical television, computer, or the like, or for
further processing.
quantimetric (the neither radiometric nor photometric manner in
which the camera responds to light [5, 6, 7]) response function.
In digital cameras, the camera response function maps the actual
quantity of light impinging on each element of the sensor array to
the pixel values that the camera outputs.
Linearity (which is typically not exhibited by most camera response functions) implies the following two conditions:
1. Homogeneity: A function is said to exhibit homogeneity if
and only if f (ax) = af (x), for all scalar a.
2. Superposition: A function is said to exhibit superposition if
and only if f (x + y) = f (x) + f (y).
In image processing, homogeneity arises when we compare
differently exposed pictures of the same subject matter. Superposition arises when we superimpose (superpose) pictures taken
from differently illuminated instances of the same subject matter,
using a simple law of composition such as addition (i.e. using the
property that light is additive).
A variety of techniques have been proposed to recover camera response functions, such as using test patterns of known reflectance, and using different exposures of the same subject matter [5][6][7]. Recently, a method using a superposigram [8] was
used. The method differed from other methods in that it did not
require the use of test patterns, nor a camera that was capable of
adjusting its exposure.
The following technique is used: in a dark environment, set up
two distinct light sources. Take three pictures, one with each light
on individually (pa , pb ), and one with the two lights on together
(pc ).
3. THE SUPERPOSIGRAM AND LINEARITY
From the three images taken in the method described, we may form
a superposigram. To do this, each pixel location is considered in
the three images. Note that this may be done using both raw data
files and range compressed files (such as PPM or JPEG) which are
available from virtually all digital cameras. For each pixel location
Previous work has dealt with the insertion of an anti–homomorphic there exists three values, one value from each of the three images.
If the range of the data is relatively low (such as x ∈ [0, 255], x ∈
filter in the image processing chain. However, in the case of usN in the case of typical pixels), a three dimensional array may be
ing a camera in which the raw 12-bit data is available, processing
used to store the data as a superposigram. To do this, the array is
using the raw data (NEF files), may proceed as shown in figure 3.
initialized to all 0. For each pixel position, the three values from
the three images become an index into the three dimensional array.
The bin corresponding to this index is incremented. The procedure
is repeated for each pixel position. This yields a superposigram.
The situation becomes slightly more complicated when dealing with raw data. If the complete superposigram structure were
to be constructed as a typical array, at least 128 gigabytes would
be needed for storage. Of course, much of the array will remain
as zero after the superposigram has been constructed. For this reason, superposigram data was stored in point–dictionary form as
(x,
y, x + y,count). To efficiently store this data, a structure simiFig. 3: A modified method of photoquantimetric image processing (shown in figlar to a hash table was used. The C code used to perform this task
ure 2), in which the raw data is available, and consequently no anti–homomorphic
filter is necessary. Moreover, a comparison (e.g. comparametric analysis) between
is available at http://comparametric.sourceforge.net, and is freely
compressed and raw data is possible.
distributable under the GNU license.
If the camera response of the raw data is truly linear, then the
third axis should be the summation of points on first and second
axis. That is to say, the superposigram should define a plane of the
2. A SIMPLE CAMERA MODEL
form:
While the geometric calibration of cameras is widely practiced and
q1 + q2 − q1+2 = 0.
(1)
understood [4], often much less attention is given to the camera’s
Where q is the photoquantimetric value recorded from the raw
data of the sensor. The superposigram resulting from this situation
is shown in figure 4.
Fig. 6: A superposigram from the non-linear, range compressed data from the JPEG
output. The data used to from the superposigram is shown in figure 5. As expected,
the superposigram is a convex surface rather than a plane.
Fig. 4: A superposigram from the raw data of a Nikon D2H digital SLR camera. The
points which exhibit clipping (the data value has reached its maximum possible value)
have been removed to simplify the plot and more clearly demonstrate the linearity of
the data. The data used to from the superposigram is shown in figure 5.
4. THE SUPERPOSIGRAM AND TYPICAL CAMERA
RESPONSE FUNCTIONS
Unlike the linear response present in the raw data of the Nikon
D2H camera, the data typically which is available as jpegs from
cameras is non-linear. This is immediately apparent in viewing the
superposigram constructed using the JPEG data from a camera.
Unlike the plane shown in figure 4, the superposigram is a convex
surface, as shown in figure 6.
Though the superposigram may be used to solve for the response function, as shown in [8], if the raw data is available (as
is the case with the Nikon D2H), the response function is easily
found by noticing that the comparagram[9][5] between the raw linear data and the range compressed data (such as the decompressed
JPEG images) is the camera response function. One expects the
non-linearity when working with pixels from a JPEG or PPM image. In particular, if the pixels of a typical image were doubled,
we do not expect the same result as doubling the√exposure time of
the image of increasing the fstop by a factor of 2.
5. CALCULATING THE RESPONSE FUNCTION AND
DETERMINING ERROR
As mentioned, the comparagram between the raw data and the
range compressed data is the response function of camera. In detail, the following may be done: an array of dimensions 4096×256
is created and initialized to 0. The dimensions are such because
the raw data from the camera is 12 bits per pixel whereas the range
compressed data is 1 byte. Each pixel position on the two image
becomes a coordinate into the array. For each pixel position, the
corresponding element is incremented. In the case of a Nikon D2H
digital camera, the result is sown in figure 7.
To simplify the computation of the response function, the 12bit data may be reduced to 8-bit data by dividing by 16 and rounding to retain the linearity of the data. The comparagram procedure
may once again be repeated (this time with a 256 × 256 array), to
produce a simplified version of the response function. A very good
approximation to the response function may be found by composing a discrete function from the maximum bin counts across the
rows of the comparagram. This simplified comparagram along
with the resulting discrete function is shown in figure 8.
Computed Comparagram with Camera Response Function
250
200
150
100
50
50
100
150
200
250
Fig. 8: A range-reduced comparagram of the raw camera data against the range compressed data, with the discrete camera response function plotted as the maximal bin
counts of the comparagram.
5.1. Confirming the correctness of the camera response function by homogeneity
Fig. 5: One of the many datasets used in the computation of the superposigram. Leftmost: Picture in Deconism Gallery with only the upper lights turned on. Middle:
Picture with only the lower lights turned on. Rightmost: Picture with both the upper
and lower lights turned on together.
The first measure described is termed a homogeneity-test of the
camera response function (regardless of how it was obtained). The
homogeneity-test requires two differently (by a scalar factor of k)
exposed pictures, f (q) and f (kq), of the same subject matter.
Comparagram of Raw and Range Compressed Data
250
500
1000
1500
2000
2500
3000
3500
4000
Fig. 7: 256 by 4096 comparagram
Method used to determine
the response function
Homogeneity with parametric solution
Direct from Raw Data
(Previous Work [9])
Homogeneity, direct solution
Superposition, direct solution
Superposition
Error
Homogeneity
Error
7.2018
8.8096
8.6751
8.5450
8.1201
9.9827
9.4011
9.5361
Table 1: This table shows the per-pixel errors observed in using lookup tables arising from several methods of calculating f and f −1 . The leftmost column denotes
the method used to determine the response function. The middle column denotes
how well the resulting response function superimposes images, based on testing the
candidate response function on pictures of subject matter taken under different lighting positions. The rightmost column denotes how well the resulting response function
amplitude-scales images, and was determined based on using differently exposed pictures of the same subject matter. The entries in the rightmost two columns are mean
squared error divided by the number of pixels in an image.
To conduct the test, the dark image f (q) is lightened, and then
tested to see how close it is (in the mean squared error sense) to
f (kq). The mean-squared difference is termed the homogeneity
error. To lighten the dark image, it is first converted from imagespace f to lightspace, q, by computing f −1 (f (q)). Then the
photoquantities q are multiplied by a constant value, k. Finally,
we convert it back to imagespace, by applying f . Alternatively we
could apply f −1 to both images and multiply the first by k and
compare them in lightspace (as photoquantities).
6. ACKNOWLEDGMENTS
The authors would like to thank Nikon Camera for their various
donations of digital cameras, lenses, and funding.
7. CONCLUSION
In this paper we showed how an unknown nonlinear camera response function can be recovered using homogeneity and/or superposition properties of light. The easiest method to implement,
which also gives rise to the lowest error (as evaluated for both
homogeneity and superposition) was to simply compute a comparagram between a range compressed image and its raw datafile
counterpart, available on many cameras. Rather than using test
charts, or minimizing a sum of squares error resulting from the
camera’s non-linearity, the method relied on the comparagram, a
very simple data structure presented in earlier work, to solve for
the function directly. The method was may also be used as a baseline for other methods which solve for the response function indirectly when raw data is also available.
8. REFERENCES
[1] Charles Poynton, A technical introduction to digital video, John Wiley
& Sons, 1996.
[2] Steve Mann, Intelligent Image Processing, John Wiley and Sons,
November 2 2001, ISBN: 0-471-40637-6.
[3] T. G. Stockham, Jr., “Image processing in the context of a visual
model,” Proc. IEEE, vol. 60, no. 7, pp. 828–842, July 1972.
5.2. Confirming the correctness of the camera response function by superposition
Another test of a camera response function termed the superpositiontest, requires three pictures pa = f (qa ), pb = f (qb ) and pc =
f (qa+b ). The inverse response function is applied to pa and pb
and the resulting photoquantities qa and qb are added. We now
compare this sum (in either imagespace or lightspace) with pc (or
qc ). The resulting mean squared difference is the superposition
error.
5.3. Comparing homogeneity and superposition errors in response functions found by each of various methods
The results of comparison of homogeneity and superposition errors in response functions found by various methods (including
previous published work) are compared in Table 1. As expected,
the direct method using the raw data produces the lowest error.
Note however that the error is not 0 due to the noise imposed primarily by the lossy compression of JPEG data.
[4] E. Trucco and A. Verri, Introductory Techniques for 3-D Computer
Vision, Prentice Hall, NJ, 1998.
[5] F. M. Candocia,
“A least squares approach for the joint domain and range registration of images,”
IEEE ICASSP,
vol. IV, pp. 3237–3240, May 13-17 2002,
avail. at
http://iul.eng.fiu.edu/candocia/Publications/Publications.htm.
[6] S. Mann and R. Mann, “Quantigraphic imaging: Estimating the camera response and exposures from differently exposed images,” CVPR,
pp. 842–849, December 11-13 2001.
[7] S. Mann, “Compositing multiple pictures of the same scene,” in
Proceedings of the 46th Annual IS&T Conference, Cambridge, Massachusetts, May 9-14 1993, The Society of Imaging Science and Technology, pp. 50–52, ISBN: 0-89208-171-6.
[8] C. Aimone C. Manders and S. Mann, “Camera response function recovery from different illuminations of identical subject matter,” IEEE
ICIP 2004, p. to be published, 2004.
[9] S. Mann, “Comparametric equations with practical applications in
quantigraphic image processing,” IEEE Trans. Image Proc., vol. 9,
no. 8, pp. 1389–1406, August 2000, ISSN 1057-7149.