Elston Tochip, Robert Prakash, Phillip Lachman Tuesday, March 20, 2007 EE362/Psych 221 Camera Phone Color Appearance Utility Finding a Way to Identify Camera Phone Picture Color Today in the 21st century, phones have become and will continue to be the portal able digital platform for variety of imaging applications. From pictures to video to personal organizers, they have become the personal computer on the go, don’t leave home without it. With this new technological advancement, we saw an opportunity to take the camera phone one step further and use it to help vision impaired individuals identify the color of images. Approximately 10 million blind people live within the U.S. today, including 55,200 legally blind children and 5.5 million elderly people. Color blind people, consisting of 8% of males and up to 2% of females in a population of 300+ million, account for over 30 million within the U.S. These people have a right to see what many of us take for granted on a daily basis, the right to experience life to its fullest. To help make a small push towards that direction, our goal for this project was to develop a software application that would be able to accomplish the following: 1) 2) 3) 4) Receives a phone camera quality image Identifies the predominant color(s) regions within the image Estimates the color name for the predominant region Audibly transmits the predominant color to the user Our software takes the incoming image, which is surrounded by a white frame fixed a6-8 inches away from the camera phone and run an edge detection algorithm to identify the background of interest. From there the code, using HSV as coordinate system, identifies the color of each pixel within the background and sums up the colors before announcing the predominant color within the region. The issues we ran into and our resulting solution are explained. Edge Detection The focus of this task was to identify the target card. After poring through various sources on the web, and discussing the topic with coworkers, we decided on a method that simply measured for steep or sharp changes in intensity across an image. The fundamental idea was that where an edge existed, there would be a quick change in the intensity and still large enough to measure. By computing the gradient around every pixel, we could find these points and mark them along the image. To accomplish this gradient computation, we used the Canny Edge Detection method. This algorithm works in three steps. 1) Gaussian smoothing of the image 2) Computing the gradient of the intensities in the image 3) Thresholding the norm of the gradient image to isolate edge pixels The resulting output of the Canny algorithm was a binary-like image with high values at pixel locations where an edge was detected, and a low base value for all other surrounding pixels. See figure below. Figure 1: Original image (left) of white card on white surface versus Canny Thresholded Image on Right However, before diving into more specifics about the algorithm, a lengthy discussion on problems associated with the edge detection, particularly for our purposes, must be addressed. Identifying Edge Location The Canny Edge Detection algorithm effectively “paints” the pixels that mark the edges of the card. Now, the question becomes how to use these pixels to do the following: 1) Find the location of the OUTER edge of the white card in the picture 2) Find the location of the INNER edge of the target hole in the white card 3) Identify stripes or other patterns with lines from the actual card edges. The task is a lot more difficult than visual perception might suggest. Looking at the thresholded image, it appears easy to locate the lines of the card, the lines of the pattern if they exist, as well as the random dots of background noise. The problem is to mathematically these high pixels with their perceived edges in the image. Problems with Identifying Edges To simplify the problem during first pass analysis, I used pictures of the white card positioned vertically standing up as shown in the figure below. Card Target Hole Figure 2: Simplified picture of 1st Analysis As shown, the sides of the card match up perfectly with the sides of the photograph. If this is always the case, one could simply take the thresholded image and locate the edges by computing the maximum values and binning these values (by row and then by column) into a histogram based on x and y pixel location value (x being horizontal axis, y vertical). By finding the most common values in the histogram, that value would correspond to the edge of the card or inner target hole. The ideal situation as described above cannot be assumed to happen EVERY time the camera is utilized, especially if the user is blind. For example, the card may be slanted as in the below figure: Card Target Hole Figure 3: Slanted Target In this case, the binning of high pixel values no longer works, since the card is at a high angle. One could solve this by finding all the high pixel values as before, and computing a linear regression fit of the points to find the edges of the target white card, and similarly of the hole in the card. Another possible solution would be to compute and adjust the slope between points recursively. Switching from one side of the card to another side could be determined by computing a change in sign of the slope of a set of points. For example, in the above figure, the bottom of the card has a negative slope, whereas the right side has a positive slope. Computationally, this would take a long time and be very complex, but would work. Still if there was a way to guarantee the upright, vertical card that would greatly simplify the algorithm. Impact of Background Images During our experiments with various background targets, several situations occurred where detecting the outline of the card with the Canny algorithm proved to be very difficult. These were grouped into two categories based on 1) lighting and 2) surrounding background of the target. In the first case, lighting can prove to be a very important factor in detecting an object. If the light source is dim, not enough photons are reflected off the white card to sufficiently separate it from its background. The result is that visually the human eye can distinguish the difference, but the image does not reflect (numerically) a high enough difference between the background and the edge of the card. For example, in the case of an off-white image, say beige, the lack of light captured in the image may cause poor intensity gradients via the Canny algorithm. Figure 4: White Card on Off white in Ambient lighting. Even moderate lighting affects thresholding ability of Canny Algorithm! An extension of this problem surfaces if you have perfect lighting and a target with a color matching EXACTLY that of the white card. You would not be able to see the edges at all. There needs to be a solution to this problem. Another instance where background color could affect the edge detections is if there is a striped or checkered target. The result is perfectly formed lines that will pass the threshold test of the Canny algorithm, causing a series of lines to appear in addition to the edges of the card. See the example below. Figure 5: Image is of white background with stripes. Note the edges and stripes are all thresholded. Which one is the correct edge?? User Complications/Problems In addition to the math, user complications presented a challenging task to resolve as well. As defined by the project statement, we wanted to develop an algorithm that could be used by color-blind and BLIND individuals through the use of the camera phone. Color-blind individuals can visually aim the camera phone well enough to ensure the white card is completely in the field of view of the lens. How about a blind person? We know that putting the camera right up to the target hole in the white card is futile since the lack of lighting will probably leave us a dark-blackish looking image. This would be similar to taking a picture at night without a flash. Additionally, this simply nullifies the use of the white card as a baseline for calculating luminescence. The problem thus becomes how to reduce camera aiming error for someone who cannot see. If you think that aiming is simple even if you cannot see, think again. Having the sense of vision, we as designers may OVERSIMPLIFY this problem dramatically. Here is a test you can do to prove this to yourself: 1) Take any camera you have, a white index card (4” x 6”) 2) Close your eyes 3) Then try to take a centered picture of the card at about a 1 foot length. You can probably do this fairly easily. NOW, try it again, only this time using a different order of steps: 1) Close your eyes first for 5 minutes, 2) Pick up the camera and white card 3) Try to take a picture at the same length as before with the card centered. Look at the pictures, there should be a marked difference in the centering of the white card in both images (unless you peeked!). This is all true knowing we can see to begin with. How much more difficult would this be for someone who has NEVER been able to see. Another example would be trying to take a picture of yourself. People always try it and always miss a few times (cutting of part of the face, head, or others in the picture), before coming close to an acceptable one. A blind individual would not want to be continuously adjusting the card repeatedly. More importantly, how would they know that the target was in the center, or even if the picture they took included the white card at all? If we are to believe that we can simply line up a camera and the white card target now, I believe we have proved that is a false assumption. How do we solve this? Our Problems Solved As discussed, a variety of complications arises that need to be considered before the algorithm can even be written. These involved everything from the math involved to user interfaces required to use the algorithm properly. The following paragraphs will address the aforementioned concerns that appeared over the course of our analysis and design of the edge detection algorithm for both mathematical computation and user feasibility. The Contraption- Flying Blind In all the stated challenges regarding the use of this algorithm, we saw the most important one as that of user feasibility. It is akin to having a skilled marksman hit a target without aiming. Similarly, the algorithm, no matter how perfect, is useless when the target white card cannot be acquired. To resolve this situation, and in agreement with Bob Dougherty, we decided a contraption that could be connected to the phone and used to offset aiming error was deemed a necessity. The Design A bulky contraption would not be ideal to carry around. The goal was to devise something small and compact enough to fit in a pocket or small pouch that could be carried without discomfort. We decided upon a device that was collapsible to the size of a 4 inch by 6 inch white card. On one end, the card would be attached and allowed to flip out and stand upright. On the other side, a mounting device could be used to attach the setup to the camera phone. The amount could include dials to allow fin adjustments of the phone position. A distance of about 6-8 inches between the camera phone and the white card was set to keep the design small and yet allow ample lighting to gauge the color of the target. See the figure below: Camera Mount White Card Base Board to separate Camera from Card Hinge Assembly to allow for folding Figure 6: Diagram of Assembly from Horizontal View Here are a few sample pictures before using the device and after using the device. Figure 7: Photos without and with the cardholding device The first is an attempt to photograph the orange shoebox. The second is an attempt to photograph the purple gorilla. Notice how the card is not completely in the picture and sometimes does not even include the target object! The white card holder served multiple purposes other than just helping the user aim. It also could predetermine the white card orientation. We forced the white card to be positioned vertically such that the edges of the photo and the edges of the card were parallel. This simplified our algorithm computationally. First, we could improve the efficiency of the algorithm by eliminating strange orientations, such as the slanted card described earlier. Second, the card holder guaranteed the card was always dominating the Field of View of the camera, minimizing the problem of background clutter which could only complicate the edge detection. White Card Modifications Another issue mentioned earlier concerned lowlight level backgrounds or objects with a similar color to that of the card. To resolve this, I outlined the edges of the white card and the edges of the whole with a thick, black line. The white to black transition, I believed, would provide the steepest gradient for intensity, regardless of lighting and background. Proof of Concept To prove the theories of the usefulness of this card holding apparatus, I used basic materials to construct a primitive but useful device pictured below. Thin, wooden plywood boards were used for the base, with a shorter board attached at one end via a mini-hinge. To this shorter board we attached a white card with the black outlines on the outer edges and inner target edges as well. At the other extreme of the base, a simple thick paper clip was attached that allowed a slim phone to be locked into place. Using this device I was able to take several pictures and run the edge-detection and later the color selecting algorithm on them. It worked as well as I had expected. The white card was always upright, and took up a large area of the photographs. Even more impressive was the ability to use the Canny algorithm to “see” edges against very white and dimly lit backgrounds. See figures below: Figure 8: Original Picture of Blue Material Using Card Holding Device Figure 9: Canny Edge Detected Thresholds Now that we have specified the nature of the problem, we can proceed to a discussion on the development of our edge detection algorithm using the Canny Edge Detector. The Algorithm Step 1: Blurring and Sharpening Edges in the Image Prior to using the Canny algorithm, the photographs are initially preprocessed to sharpen the edges present. This is done by using a Laplacian convolution mask. The Laplacian kernel is simply a 3x3 matrix filled with -1’s except at the center, where the value is set to 8. The kernel is an approximation of the second derivative, highlighting changes in intensity. Due to the Laplacian’s high sensitivity to noise, Gaussian smoothing is done beforehand to blur and eliminate noisy pixels in the photograph. The smoothed photo is then added to the result of the Laplacian convolution to obtain a new image that has sharpened all edges for improved detection by the Canny Algorithm. See the figures below. Initial image Image after Blurring and Sharpening. Initial Image (GrayScale) Image After Blurring and Sharpening (GrayScale) Initial Image Top Inner Edge Blur/Sharpened Top Inner Edge: Notice less noisy in center and smoother Note: Outer edge spikes are from photo The first image has more noisy pixels at points around the edges of the card (see top inner edge). The second image has reduced these spurious points by smoothing and the Laplacian convolution. The result is cleaner lines on the outside of the card and around the target hole. Step 2: Using the Canny Algorithm As stated earlier, the Canny Edge Detection Algorithm provided a way to find the pixels that outlined the points of largest intensity change across the image. To reiterate, this process has three distinct phases: 1) Gaussian smoothing of the image 2) Computing the gradient of the intensities in the image 3) Thresholding the norm of the gradient image to isolate edge pixels Before processing, the image was first converted to a grayscale. This made the lighting gradients more visible for the white card against the background. Computing the Gradient The Gaussian smoothing of the image is done by convolving it iteratively with a Gaussian mask. The derivative of the smoothed image is then computed to identify the gradients which identify the edges if they exist. In our case, we had to perform these operations in two directions- vertically and then horizontally on the image. The matlab code we obtained completes both the first and second steps using one matrix. As users, we were able to tune the Gaussian mask by setting its size and standard deviation. Extensive testing showed that the larger the mask, the more smeared (and by default thicker) the edges became. A similar result occurred from increasing the standard deviation to a large value. Ideally, we wanted to finely isolate all edges, so the smaller the standard deviation, the better it worked. However, to avoid detecting small gradual changes in intensity, the standard deviation could not be set too low. We eventually settled on a mask size of 20, with a standard deviation of 5. One important note is that regardless of the mask size or standard deviation, initial testing showed variability to lighting conditions and background color. This led to our decision to use the black outlines on the white card which was stated earlier. The result was clearer detection of the white card outline every time, regardless of the Gaussian mask parameters. Thresholding The last step of the Canny algorithm performs a type of binary thresholding by setting all non-edge values to a single base number while leaving all edge pixels with a high number. This low “zero” is identified by taking a percentage (alpha) of the difference between the maximum and minimum intensity values from the norm of the gradient of the image. The result is an image that has high values only where edges exist, while the rest of the image is set to one base value (analogous to a binary image of ones and zeroes). The problem here is that if alpha is set too low, every random bright spot that was detected appears in the thresholded image. Conversely, by setting alpha too high, we definitely remove these bright spots but at the cost of filtering out some of the edge pixels as well. After initial monte carlo testing over a few sample images, I was able to assess that the best value of alpha ranged from 0.05 to 0.15 in decent lighting conditions regardless of the mask size or standard deviation used. As lighting became weaker, the more problematic this became, whereas excessive light minimally impacted the edge detection. This was a residual effect created by the gradient computation and its inability to highlight the white card edges. As expected, the lighting effect was diminished when the black outlines were applied to the white card. In the end, I decided to fix alpha and keep it at a level of 0.10 Figure 9: Histogram of Threshold Values over Various Gaussian Masks Step 3: Processing the Thresholded Image At this stage of the algorithm, we have a thresholded image with outlines of the white card and target hole identified. The next step is to find the target hole in the thresholded image and extract that matrix of pixels from the original RGB image for processing by the color detection part of our algorithm. Our idea was very simple given we had simplified the problem of card orientation to that of a vertically standing white card. The key phases are as follows: 1) Detect pixels that have high thresholded values on left, right, top and bottom of the photographs, searching from outside to inside 2) Bin the values for each side to estimate the white card outer edge location 3) Crop the original threholded image based on the computed edges and repeat to find the inner target hole Phase 1: First, a detection of high valued pixels in the thresholded image is done to identify the sides of the card. Code was written to do a recursive search from the left side of the image until it found values exceeding threshold for 6 consecutive pixels. This was done using the idOuterEdgeOutsideIn.m matlab routine. The pixel location was stored, and the search continued up until the first quarter of the photograph size. Similarly, another recursive search was applied from the right side of the image throughout the last quarter of the photograph. This was done row by row throughout the image. We could limit the search to the first and last quarters of the photograph because of the assumed positioning of the card in the field of view using the card holder. Note we only searched horizontally, not vertically. This will be explained shortly. Phase 2: Taking the distribution of the stored pixel locations from Phase 1 above, we simply sorted the values into bins associated with their appropriate positions along the horizontal axis in the image. This axis was broken down into pixels, with the 0 value being on the far left, bottom corner of the image. By examining the number of pixels in each particular bin number, we could determine an estimate for the left and right outer edges of the card. To see this more clearly, let us take the values accumulated from searching the left fourth of the image. See the diagram below: Target Hole Right Fourth Left Fourth Card Figure 10: Diagram of Search procedure: First, a search is done row by row to obtain thresholded pixels on the left fourth of the image, followed by a similar procedure on the right fourth of the image. If we look at the size of the bins, we expect a high value associated with a pixel location along which there was a vertical line in the thresholded image. In finding the edge for the left side, we start with the highest bin number (which should be closest to the left edge of the card, assuming pixel 0 of the horizontal axis is in the lower left corner of the image) and walk backwards towards the lowest bin. At each bin, the algorithm checks the total number of values it contains. If the bin contains a number greater than 10% of the sum total of all pixels, then we believe we have an edge. Why is this true? We know the left edge of the card is visible due to the black outline AND we know the left and right edges run the total height of the photo (OR definitely a majority of it). Since we only crop and search on the left fourth of the image, we are guaranteed to find at least that one edge of the card, and possibly more edges if the background has stripes. However, we know with certainty, that the edge of the card is closest to the center within that section of the photo since we have effectively cutout all other edges with the white card. Even if there is a gap between the top of the card and the top of the photo, the dominant bin will be the bin closest to the center and constituting a sizable percentage (a minimum of 10-15%) of the total number of binned pixels. An equivalent procedure is applied to the right fourth of the image. The computeSpread.m matlab routine assesses these bins, their contents and the percentage of high pixel locations associated with each bin. Again, by finding the largest bins closest to the center of the image in each of the searched regions, we can find the outer edges of the card! To identify the top and bottom sides of the card, we simply use the information gathered from the left and right searches. The top side begins at the first few rows where we start receiving high pixel data for the left and right searches. Since we know the left and right edges from the binning of the values, we merely take an average of the first 10 rows of these searches where high thresholded values appear. However, in case the top of the card is NOT cropped off, we do not count rows in the estimate that do not contain a value for the left and right edges within a spread of 50 pixels. The top part of the card corresponds to the first few rows at which we start accumulating high pixels at points within a range of +/-25 pixels of the determined left and right edge. This is done in the findTopEdge.m routine. Similarly, the bottom corresponds to the last few rows of high pixel values. However, we assume the bottom of the card is cutout from the use of the cardholding device. Thus, we simply estimate the edge from where the last few left and right edges still collect high inputs. Phase 3: Once the outer edge pixel locations were identified, another search, similar to the one described in Phase 2, is done to isolate the inner target hole via the idInnerEdgeOutsideIn.m matlab routine. This search also uses a binning method to identify edges, except we simply search to one edge and stop on both the left and right sides. We can afford to do this since we know the card is white and we are just looking for the black outline of the target center. There should be only one dominant set of bins on each side of the target. The top and bottom of the target hole is simply computed to be the first few rows (or last few rows) where the left and right searched locate high thresholded values. See figure below. Target Hole Left Side Right Side Card Figure 11: Diagram of Inner Target Search: Notice that the outer edge has been cropped off. All that is left is a picture where the edges are all inside the white card. The only stripe or high thresholded values should be the inner edge of the target hole, all else is washed out. Step 4: Color Detection Input After identifying the inner and outer edges of white card, the exact location of the target hole is identified by adding or subtracting the appropriate edges together in the function computeEdgeLocation.m. The original image is then cropped down to these edges to effectively zoom in on the color patch in the target. This 3-D matrix is then handed over to the color processing code. Things Learned As simple as the search sounds, many things were learned and incorporated into the algorithm to make sure it functioned correctly. Edge Location Facts One interesting discovery was that the outer border of the photographs always had bright spots associated with them in the thresholded image. As a result, initial tests always identified at least one edge of the photograph as the edge of the white card. To resolve this, each image was always cropped by 15 pixels on each edge to exclude these noisy components before searching was initiated. A second problem during the edge detection process that repeatedly occurred was the variability of pixels identifying the edges. This was discussed briefly in the binning of Phase 2 above. If the card was not perfectly parallel to the camera lens, the lines detected would look slightly slanted. (This is similar to looking at a flat road towards the horizon- it looks as if it converges to a point in the distance.) Given we had a mounting device that would keep the card at a reasonably steady angle to the camera, we assumed, after extensive testing, that the difference in pixel locations for any left or right edge should not exceed 50 pixels. This led to the method implemented above in computeSpread.m where the algorithm checks to see if bins are separated by 50 or more pixels. If the separation is less than 50 pixels, an average of the two bins is used to get a better estimate. Ideally, a well designed card holding device would eliminate this problem, keeping the card rigid and standing perfectly perpendicular to the bore sight of the camera. Alternate Algorithm Work During the process of finding the edge locations, one could envision multiple solutions to identifying the thresholded pixels. Our initial implementation was similar to our final outside to inside search procedure, but searched for only the 4 maximum pixel locations for each row and column. The two pixels farthest from the center would identify the outer edges, the other two the inner target. It was our belief that the highest values would mark the edges every time. The first problem arose with using the black outline white card. The transition from white to black and from black to the background often caused 8 edges to appear on flat colored backgrounds. This was easy to solve using buffers to eliminate the two edges every time. However, another problem arose once multicolored or patterned backgrounds/targets were being photographed. If there were stripes or changes in the back, say a table edge, those transitions would appear as high valued edges in the Canny algorithm. Similarly, if the target had very bright lines we might detect high pixels inside the target hole. The issue then became one of knowing how many high-valued pixels do we store? If we limit it to four, corresponding to the edges of the card and target, we may miss them due to outside noise or the background pattern. Even if we buffered, we could not be sure that the four maximum pixel intensities were always that of the white card. Similarly, if we stored ALL high valued pixels, it required a long and iterative process that was very inefficient. This led us to use the outside in search procedure. See the idOuterEdgeLocationMax.m matlab file. The outside-in search procedure works well because of the use of the white card holder. Initially we implemented this outside-in search procedure without the holder. This proved to be as tedious a method as storing all the high valued pixels. Many of the problems we encountered were similar to those expressed above in searching for the maximum values in each row/column. There were too many thresholded values at times, depending on the photograph. If we simply searched until a set of high pixels was located, we might be identifying a stripe in the background, or just a random noisy set of pixels. This led us to the idea of a card holder to simplify the edge detection problem. See idOuterEdgeLocationOutsideInOriginal.m As discussed, the holder serves to vertically orient the card and maintain a regular distance to the card. This ensures the card is the dominant image in the photograph. In many of our test photos, one can notice the bottom and top edges are removed from the photo due to the nearness of the camera. Similarly, the background, vertically, on the sides is limited to a size not much larger than the height of the card. This greatly aids us in the outside to inside search method we implemented. By computing the max pixel value locations for each outer quarter of the image, we could be sure we were NOT getting max values inside the target hole. This gave us more confidence in our estimates of the outer edge. Secondly, because the top and bottom was cropped off, in the worst case, our bins for the left and right edges would we of equal size if there were stripes on the outside margins that might pass threshold in the Canny algorithm. In that case, since our card was designed to show maximum gradient via the black outline, the most probable edge still remained in the maximum bin closest to the center of the image. The only issue with the method eventually decided upon is that the card must be centered in the field of view of the camera on the device. This may appear difficult for a blind person, but using algorithms similar to the ones written already, we can accomplish that I believe. This leads us to the next point, base lining the aim of the camera on the device. Base lining the Cardholding Device A brief discussion of how the device would be guaranteed to center the white card and camera is of importance here. It is safe to say that many could question how well a blind individual could center their camera lens and the white card. Similarly, how could one center the camera such that the base (and top) of the white card is cropped? This could be done by giving the user a centering algorithm with audio feedback and an adjustable mount on the card holding device. The algorithm would base it off a white card with its center still intact but outlined in black, and a white background sheet which could be included in the package when purchased by the user. Very simply, the user would attach this card, take photos on the white background, and the algorithm could compute the edges. Based on the white background, there is no additional line content except the borders of the card, so the algorithm could supply feedback to the user to accurately adjust the mount until the edges matched as we described above. Then the target card with the whole could be inserted in place of the test card and function just as well. Color Detection Once the color target is acquired by the edge detection algorithm, the next step is color detection. Color detection consists of two basic steps, the normalization of the image pixels and then deciding the color. As stated earlier, the input to the color detection is the 3D matrix which contains the RGB values of the color target taken from the original images. Under perfect lighting conditions the image would contain the exact RGB values for the target and the color detection would be trivial. However, since in our case the image is taken from a low quality camera and the lighting in undetermined, the color in the image may not be the actual color a person would see under ideal conditions. To correct for this we needed to take these RGB values run a normalization algorithm and then decide on the color in the target. We divided this problem into two distinct parts, normalization or color correction and deciding the color. Normalization The image taken by a camera is a rendering of the light that impinges on the lens, the ambient lighting, the reflectivity of the object being photographed and its motion all affect the image. Hence, a photo of a red shirt under fluorescent lighting might appear slightly pink, whereas the same shirt under tungsten will appear more orange. The process of normalizing or white balancing is used to correct for such effects. There are many algorithms that can be followed to do white balancing. Most cameras have some process already inbuilt do to just this, but cell-phone cameras are low end, and the minimal processing that is done is not enough. Using class notes and research done online, we found various methods to achieve the desired result. However, since the program must run on a cell phone with limited resources, we decided to keep the algorithms as simple as possible, so that no matrix inversion or complex operations were required. “Gray World” assumption The first strategy investigated was the “gray world” assumption whereby as the name suggests, the world is considered to be gray on average. So the average R, G and B values for the image are normalized to 128, and this value is used to normalize all the pixels. This is a well known technique which returned usable results with the images that were tested, under most lighting conditions. The white balancing was correct even in low light conditions, when the brightest point had RGB values near gray. This algorithm could be used in darker conditions, and returned better results though as discussed in the color recognition section, if illumination is below a minimum level the color recognition is unreliable. However as illustrated in figure 12 below, under bright conditions and with uniform colors only, gray world algorithm fails. 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 100 100 20 40 60 80 100 120 140 160 180 200 220 20 40 60 80 100 120 140 160 180 200 220 Figure 12: Illustration of normalization using “Gray World”, on a bright white surface under fluorescent lighting. Normalizing using White Point The gray world assumption holds valid for most lighting and is a useful way of white balancing if no other information is available from the picture. But we have the advantage of having a white card in the image, which can be used as the “white point”, a reference from which we can calculate the normalization coefficients for the RGB value. To achieve this, we find the max RGB point in the card image and normalize that to (255, 255,255). This method brings up the RGBs for the target image, since the card is known to be pure white. The results achieved were comparable to those using the gray world assumption, and in most cases with bright lighting, the results were better, as illustrated in the figures below. However, under poor lighting conditions, this algorithm performs marginally worse. But, after extensive testing with different lighting conditions and images, it was found that the overall performance was better than “gray world” and hence, this algorithm was chosen. 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 100 100 20 40 60 80 100 120 140 160 180 200 220 20 40 60 80 100 120 140 160 180 200 220 Figure 13: Illustration of normalization using white point, on a bright white surface under fluorescent lighting. Under very low lighting, with max RGB below 100, both the normalization algorithms are rendered useless, since the target image looks essentially looks black and the white card appears dark gray, and the normalization just creates artifact pixels and doesn’t bring out the real color. To counter this problem, a more extensive and complicated white balance algorithm must be implanted, or a flash must be required on the device. After normalization of the pixels, the 3D matrix is then sent to the color recognition algorithm where the color of the target image is decided. Color Recognition Initially, after the normalization was finished, the color recognition part seemed trivial. The original plan was to bin each pixel depending on the RGB values. This is the simplest method with no conversion required. However, on initial testing we recognized that making bins with upper and lower was complicated, since a slight change in any one of the three parameters can cause the colors to vary. Setting even upper and lower bounds is almost impossible, since some colors have wider ranges. Also there was no set pattern and grouping, to allow nested if statements in the code. We spent over a week attempting to come up with methods to group the RGB values. Initially, just sorting the color lists by R, then G, and B. This method did bring similar colors together, but there were still some random colors scattered around, we did realize though that R seems to be dominant. Next, we attempted to find the max of R, G and B for each pixel and normalize that pixel, hence if Gold is (255, 215, 0) then the max of RGB for that pixel is 255 and the normalized pixel will be (1, .84, 0). On sorting, we found all the colors where R is the maximum have a reddish hue. And all with max B are bluish. But, after working through all the color, we again had the problem of overlap and unclear bounds. At this point, we started to look beyond RGB and into more complicated the colors schemes that are translations of RGB into a different domain. The CIE-Lab color system seemed to be the answer, it was discussed quite extensively in class and could potentially provide a representation for the colors which could be easily divided and subdivided. However, to be able to use CIELab the illumination must be known since the transformation requires different constants which are in turn dependent on the ambient lighting in which the picture was taken. Once, this illumination is known the image represented using CIELab becomes independent of the device taking the picture, which is also one of the pros of using this scheme. However, while researching CIELab, we found the HSV color scheme. HSV is a simple transformation from RGB, and stands for Hue, Saturation, Value model. The Hue is the color type like red, blue; Saturation is the vibrancy of the color (i.e. how faded or sharp a color is); and Value is the brightness of the color. Figure 14 shows the cylindrical representation of this standard. Figure 14: Cylindrical representation of HSV model. The Hue represents the color, changing as one moves around the circumference of the cone. The Saturation represents the sharpness of the color, being zero at the center (completely faded) to 1 at the circumference for full vibrancy. The Value goes from zero to 1 also, and represents the brightness of the color, zero being dark and 1 being fully bright. The HSV model suits the needs of this project perfectly; the following formulas could be easily coded and are not computationally intensive. Figure 15: Equations used to convert R, G and B values to HSV Once the above calculation is run, the H value can be used to determine color and the S and V value for the lightness or darkness. The Hue value is calculated from 0-360 similar to angles, and grouping colors now a trivial task. Figure 16: Hue (H) values in HSV and its corresponding color After deciding on HSV, the code was written to incorporate the major VIBGYOR colors with each color having a light, true and dark version, plus white, black and gray. This scheme seems to provide a robust system since making upper and lower bounds can be set easily and lightness and darkness are also clearly defined. As future work, more colors could be added simply by making the bounds smaller and defining colors in higher precision. The following figures show some test pictures and the results after running them through the matlab routine. Figure 17a: Pure white card under fluorescent lighting Result: WHITE Figure 17b: Light Green and white striped shirt under sunlight Result: Light Green Figure 17c: Pure red backpack under tungsten lighting Result: Crimson Future Work From picture framing to edge detection to color identification, our algorithm did what it was intended to do, but of course there is plenty of room for improvement. To continue with this work, the first area that needs exploring is implementing the algorithm on a camera phone. All of our processing was done on a laptop where memory and RAM is not a problem. Obviously the same kind of computing power cannot be expected from a phone. The second front that needs exploring is decreasing the processing time to audibly deliver the color to the user. Both of these fronts would require streaming the algorithm to run in a clean and efficient manner for maximum practically and usability. Another area for possible improvement would be increasing the color library. Our algorithm was able to recognize and deliver only 24 different colors. Finally, increasing the algorithm’s sophistication in order to be able to detect color patterns like strips and patch patterns rather than only the predominant color. The last two fronts, expanding the color library and increasing the algorithm’s sophistication would increase the processing time and memory requirements, but with the camera phone continuing to increase in computing power and capability, we believe this could be achieved. All people have a right to experience life to its fullest which includes engaging the world through all five senses. If nature does not provide an individual with the full use of vision, we hope our project has taken a small step in rectifying that. Appendix I All of our Matlab scripts and source code are attached to our project website, which can be found at http://stanford.edu/~denes/Psych221/Psych_221Final_Project.htm Appendix II Group Project Roles & Responsibilities: Elston Tochip: Development of the edge detection algorithm. Robert Prakash: Development of the color identification algorithm. Phillip Lachman: Development and selection of the picture images, color schemes and audio output files.
© Copyright 2026 Paperzz