DETERMINING CAMERA RESPONSE FUNCTIONS FROM COMPARAGRAMS OF IMAGES WITH THEIR RAW DATAFILE COUNTERPARTS Corey Manders, Steve Mann University of Toronto Dept. of Electrical and Computer Engineering 10 King’s College Rd. Toronto, Canada ABSTRACT Many digital cameras now have an option of raw data output. We first show, by way of superposigrams, that this raw data output is quantimetrically (i.e. in terms of the camera’s response to light) linear, in the case of the Nikon D2H digital SLR camera. Next, we perform comparametric analysis on compressed images together with their corresponding raw data images in order to determine the camera’s response function. 1. INTRODUCTION: TYPICAL CAMERAS AND TRADITIONAL IMAGE PROCESSING Most cameras do not provide an output that varies linearly with light input. Instead, most cameras contain a dynamic range compressor, as illustrated in Fig. 1. Historically, the dynamic range of a television tube (e.g. human visual response turns out to be approximately the same as the response of the television camera) [1]. For this reason, processing done on typical video signals will be on a perceptually relevant tone scale. Moreover, any quantization on such a video signal (e.g. quantization into 8 bits) will be close to ideal in the sense that each step of the quantizer will have associated with it a roughly equal perceptual change in perceptual units. Most still cameras also provide dynamic range compression built into the camera. For example, the Nikon D2h camera captures internally in 12 bits (per pixel per color) and then applies dynamic range compression, and finally outputs the range–compressed images in 8 bits (per pixel per color). Fortunately, the Nikon D2h camera also allows output of images in a non–range–compressed 12-bit (per pixel per color) format. 1.1. Why Stockham was wrong Fig. 1: Typical camera and display: light from subject matter passes through lens (typically approximated with simple algebraic projective geometry, e.g. an idealized “pinhole”) and is quantified in units “q” by a sensor array where noise nq is also added, to produce an output which is compressed in dynamic range by a typically unknown function f . Further noise nf is introduced by the camera electronics, including quantization noise if the camera is a digital camera and compression noise if the camera produces a compressed output such as a jpeg image, giving rise to an output image f1 (x, y). The apparatus that converts light rays into f1 (x, y) is labelled CAMERA. The image f1 is transmitted or recorded and played back into a DISPLAY system where the dynamic range is expanded again. Most cathode ray tubes exhibit a nonlinear response to voltage, and this nonlinear response is the expander. The block labelled “expander” is therefore not usually a separate device. Typical print media also exhibit a nonlinear response that embodies an implicit “expander”. compressor in video cameras arose because it was found that televisions did not produce a linear response to the video signal. In particular, it was found that early cathode ray screens provided a light output approximately equal to voltage raised to the exponent of 2.5. Rather than build a circuit into every television to compensate for this nonlinearity, a partial compensation (exponent of 1/2.22) was introduced into the television camera at much lesser cost since there were far more televisions than television cameras in those days. Coincidentally, the logarithmic response of human visual perception is approximately the same as the inverse of the response When video signals are processed using linear filters, there is an implicit homomorphic filtering operation on the photoquantity (a measure of the quantity of light present on a sensor array element [2]). As should be evident from Fig. 1, operations of storage, transmission, and image processing take place between approximately reciprocal nonlinear functions of dynamic range compression and dynamic range expansion. Many users of image processing methodology are unaware of this fact, because there is a common misconception that cameras produce a linear output, and that displays respond linearly. In fact there is a common misconception that nonlinearities in cameras and displays arise from defects and poor quality circuits, when in actual fact these nonlinearities are fortuitously present in display media and deliberately present in most cameras. Thus the effect of processing signals such as f1 in Fig. 1 with linear filtering is, whether one is aware of it or not, homomorphic filtering. Stockham advocated a kind of homomorphic filtering operation in which the logarithm of the input image was taken, followed by linear filtering (e.g. linear space invariant filters), followed by taking the antilogarithm [3]. In essence, what Stockham didn’t appear to realize, is that such homomorphic filtering is already manifest in simply doing ordinary linear filtering on ordinary picture signals (whether from video, film, or otherwise). In particular, the compressor gives an image f1 = f (q) = q 1/2.22 = q 0.45 (ignoring noise nq and nf ) which has the approximate effect of f1 = f (q) = log(q + 1) (e.g. roughly the same shape of curve, and roughly the same effect, e.g. to brighten the mid–tones of the image prior to process- ing). Similarly a typical video display has the effect of undoing (approximately) this compression, e.g. darkening the mid–tones of the image after processing with q̂ = f ˜−1 (f1 ) = f12.5 . Thus in some sense what Stockham did, without really realizing it, was to apply dynamic range compression to already range compressed images, then do linear filtering, then apply dynamic range expansion to images being fed to already expansive display media. 1.2. On the value of doing the exact opposite of what Stockham advocated There exist certain kinds of image processing for which it is preferable to operate linearly on the photoquantity q. Such operations include sharpening of an image to undo the effect of the point spread function (PSF) blur of a lens, or to increase the camera’s gain retroactively. We may also add two or more differently illuminated images of the same subject matter if the processing is done in photoquantities. What is needed in these forms of photoquantigraphic image processing is an anti–homomorphic filter. The manner in which an anti–homomorphic filter is inserted into the image processing path is shown in Fig. 2. Fig. 2: The anti–homomorphic filter: Two new elements fˆ−1 and fˆ have been inserted, as compared to Fig. 1. These are estimates of the the inverse and forward nonlinear response function of the camera. Estimates are required because the exact nonlinear response of a camera is generally not part of the camera specifications. (Many camera vendors do not even disclose this information if asked.) Because of noise in the signal f1 , and also because of noise in the estimate of the camera nonlinearity f , what we have at the output of fˆ−1 is not q, but, rather, an estimate, q̃. This signal is processed using linear filtering, and then the processed result is passed through the estimated camera response function, fˆ, which returns it to a compressed tone scale suitable for viewing on a typical television, computer, or the like, or for further processing. quantimetric (the neither radiometric nor photometric manner in which the camera responds to light [5, 6, 7]) response function. In digital cameras, the camera response function maps the actual quantity of light impinging on each element of the sensor array to the pixel values that the camera outputs. Linearity (which is typically not exhibited by most camera response functions) implies the following two conditions: 1. Homogeneity: A function is said to exhibit homogeneity if and only if f (ax) = af (x), for all scalar a. 2. Superposition: A function is said to exhibit superposition if and only if f (x + y) = f (x) + f (y). In image processing, homogeneity arises when we compare differently exposed pictures of the same subject matter. Superposition arises when we superimpose (superpose) pictures taken from differently illuminated instances of the same subject matter, using a simple law of composition such as addition (i.e. using the property that light is additive). A variety of techniques have been proposed to recover camera response functions, such as using test patterns of known reflectance, and using different exposures of the same subject matter [5][6][7]. Recently, a method using a superposigram [8] was used. The method differed from other methods in that it did not require the use of test patterns, nor a camera that was capable of adjusting its exposure. The following technique is used: in a dark environment, set up two distinct light sources. Take three pictures, one with each light on individually (pa , pb ), and one with the two lights on together (pc ). 3. THE SUPERPOSIGRAM AND LINEARITY From the three images taken in the method described, we may form a superposigram. To do this, each pixel location is considered in the three images. Note that this may be done using both raw data files and range compressed files (such as PPM or JPEG) which are available from virtually all digital cameras. For each pixel location Previous work has dealt with the insertion of an anti–homomorphic there exists three values, one value from each of the three images. If the range of the data is relatively low (such as x ∈ [0, 255], x ∈ filter in the image processing chain. However, in the case of usN in the case of typical pixels), a three dimensional array may be ing a camera in which the raw 12-bit data is available, processing used to store the data as a superposigram. To do this, the array is using the raw data (NEF files), may proceed as shown in figure 3. initialized to all 0. For each pixel position, the three values from the three images become an index into the three dimensional array. The bin corresponding to this index is incremented. The procedure is repeated for each pixel position. This yields a superposigram. The situation becomes slightly more complicated when dealing with raw data. If the complete superposigram structure were to be constructed as a typical array, at least 128 gigabytes would be needed for storage. Of course, much of the array will remain as zero after the superposigram has been constructed. For this reason, superposigram data was stored in point–dictionary form as (x, y, x + y,count). To efficiently store this data, a structure simiFig. 3: A modified method of photoquantimetric image processing (shown in figlar to a hash table was used. The C code used to perform this task ure 2), in which the raw data is available, and consequently no anti–homomorphic filter is necessary. Moreover, a comparison (e.g. comparametric analysis) between is available at http://comparametric.sourceforge.net, and is freely compressed and raw data is possible. distributable under the GNU license. If the camera response of the raw data is truly linear, then the third axis should be the summation of points on first and second axis. That is to say, the superposigram should define a plane of the 2. A SIMPLE CAMERA MODEL form: While the geometric calibration of cameras is widely practiced and q1 + q2 − q1+2 = 0. (1) understood [4], often much less attention is given to the camera’s Where q is the photoquantimetric value recorded from the raw data of the sensor. The superposigram resulting from this situation is shown in figure 4. Fig. 6: A superposigram from the non-linear, range compressed data from the JPEG output. The data used to from the superposigram is shown in figure 5. As expected, the superposigram is a convex surface rather than a plane. Fig. 4: A superposigram from the raw data of a Nikon D2H digital SLR camera. The points which exhibit clipping (the data value has reached its maximum possible value) have been removed to simplify the plot and more clearly demonstrate the linearity of the data. The data used to from the superposigram is shown in figure 5. 4. THE SUPERPOSIGRAM AND TYPICAL CAMERA RESPONSE FUNCTIONS Unlike the linear response present in the raw data of the Nikon D2H camera, the data typically which is available as jpegs from cameras is non-linear. This is immediately apparent in viewing the superposigram constructed using the JPEG data from a camera. Unlike the plane shown in figure 4, the superposigram is a convex surface, as shown in figure 6. Though the superposigram may be used to solve for the response function, as shown in [8], if the raw data is available (as is the case with the Nikon D2H), the response function is easily found by noticing that the comparagram[9][5] between the raw linear data and the range compressed data (such as the decompressed JPEG images) is the camera response function. One expects the non-linearity when working with pixels from a JPEG or PPM image. In particular, if the pixels of a typical image were doubled, we do not expect the same result as doubling the√exposure time of the image of increasing the fstop by a factor of 2. 5. CALCULATING THE RESPONSE FUNCTION AND DETERMINING ERROR As mentioned, the comparagram between the raw data and the range compressed data is the response function of camera. In detail, the following may be done: an array of dimensions 4096×256 is created and initialized to 0. The dimensions are such because the raw data from the camera is 12 bits per pixel whereas the range compressed data is 1 byte. Each pixel position on the two image becomes a coordinate into the array. For each pixel position, the corresponding element is incremented. In the case of a Nikon D2H digital camera, the result is sown in figure 7. To simplify the computation of the response function, the 12bit data may be reduced to 8-bit data by dividing by 16 and rounding to retain the linearity of the data. The comparagram procedure may once again be repeated (this time with a 256 × 256 array), to produce a simplified version of the response function. A very good approximation to the response function may be found by composing a discrete function from the maximum bin counts across the rows of the comparagram. This simplified comparagram along with the resulting discrete function is shown in figure 8. Computed Comparagram with Camera Response Function 250 200 150 100 50 50 100 150 200 250 Fig. 8: A range-reduced comparagram of the raw camera data against the range compressed data, with the discrete camera response function plotted as the maximal bin counts of the comparagram. 5.1. Confirming the correctness of the camera response function by homogeneity Fig. 5: One of the many datasets used in the computation of the superposigram. Leftmost: Picture in Deconism Gallery with only the upper lights turned on. Middle: Picture with only the lower lights turned on. Rightmost: Picture with both the upper and lower lights turned on together. The first measure described is termed a homogeneity-test of the camera response function (regardless of how it was obtained). The homogeneity-test requires two differently (by a scalar factor of k) exposed pictures, f (q) and f (kq), of the same subject matter. Comparagram of Raw and Range Compressed Data 250 500 1000 1500 2000 2500 3000 3500 4000 Fig. 7: 256 by 4096 comparagram Method used to determine the response function Homogeneity with parametric solution Direct from Raw Data (Previous Work [9]) Homogeneity, direct solution Superposition, direct solution Superposition Error Homogeneity Error 7.2018 8.8096 8.6751 8.5450 8.1201 9.9827 9.4011 9.5361 Table 1: This table shows the per-pixel errors observed in using lookup tables arising from several methods of calculating f and f −1 . The leftmost column denotes the method used to determine the response function. The middle column denotes how well the resulting response function superimposes images, based on testing the candidate response function on pictures of subject matter taken under different lighting positions. The rightmost column denotes how well the resulting response function amplitude-scales images, and was determined based on using differently exposed pictures of the same subject matter. The entries in the rightmost two columns are mean squared error divided by the number of pixels in an image. To conduct the test, the dark image f (q) is lightened, and then tested to see how close it is (in the mean squared error sense) to f (kq). The mean-squared difference is termed the homogeneity error. To lighten the dark image, it is first converted from imagespace f to lightspace, q, by computing f −1 (f (q)). Then the photoquantities q are multiplied by a constant value, k. Finally, we convert it back to imagespace, by applying f . Alternatively we could apply f −1 to both images and multiply the first by k and compare them in lightspace (as photoquantities). 6. ACKNOWLEDGMENTS The authors would like to thank Nikon Camera for their various donations of digital cameras, lenses, and funding. 7. CONCLUSION In this paper we showed how an unknown nonlinear camera response function can be recovered using homogeneity and/or superposition properties of light. The easiest method to implement, which also gives rise to the lowest error (as evaluated for both homogeneity and superposition) was to simply compute a comparagram between a range compressed image and its raw datafile counterpart, available on many cameras. Rather than using test charts, or minimizing a sum of squares error resulting from the camera’s non-linearity, the method relied on the comparagram, a very simple data structure presented in earlier work, to solve for the function directly. The method was may also be used as a baseline for other methods which solve for the response function indirectly when raw data is also available. 8. REFERENCES [1] Charles Poynton, A technical introduction to digital video, John Wiley & Sons, 1996. [2] Steve Mann, Intelligent Image Processing, John Wiley and Sons, November 2 2001, ISBN: 0-471-40637-6. [3] T. G. Stockham, Jr., “Image processing in the context of a visual model,” Proc. IEEE, vol. 60, no. 7, pp. 828–842, July 1972. 5.2. Confirming the correctness of the camera response function by superposition Another test of a camera response function termed the superpositiontest, requires three pictures pa = f (qa ), pb = f (qb ) and pc = f (qa+b ). The inverse response function is applied to pa and pb and the resulting photoquantities qa and qb are added. We now compare this sum (in either imagespace or lightspace) with pc (or qc ). The resulting mean squared difference is the superposition error. 5.3. Comparing homogeneity and superposition errors in response functions found by each of various methods The results of comparison of homogeneity and superposition errors in response functions found by various methods (including previous published work) are compared in Table 1. As expected, the direct method using the raw data produces the lowest error. Note however that the error is not 0 due to the noise imposed primarily by the lossy compression of JPEG data. [4] E. Trucco and A. Verri, Introductory Techniques for 3-D Computer Vision, Prentice Hall, NJ, 1998. [5] F. M. Candocia, “A least squares approach for the joint domain and range registration of images,” IEEE ICASSP, vol. IV, pp. 3237–3240, May 13-17 2002, avail. at http://iul.eng.fiu.edu/candocia/Publications/Publications.htm. [6] S. Mann and R. Mann, “Quantigraphic imaging: Estimating the camera response and exposures from differently exposed images,” CVPR, pp. 842–849, December 11-13 2001. [7] S. Mann, “Compositing multiple pictures of the same scene,” in Proceedings of the 46th Annual IS&T Conference, Cambridge, Massachusetts, May 9-14 1993, The Society of Imaging Science and Technology, pp. 50–52, ISBN: 0-89208-171-6. [8] C. Aimone C. Manders and S. Mann, “Camera response function recovery from different illuminations of identical subject matter,” IEEE ICIP 2004, p. to be published, 2004. [9] S. Mann, “Comparametric equations with practical applications in quantigraphic image processing,” IEEE Trans. Image Proc., vol. 9, no. 8, pp. 1389–1406, August 2000, ISSN 1057-7149.
© Copyright 2026 Paperzz