Edge Detection

Elston Tochip,
Robert Prakash,
Phillip Lachman
Tuesday, March 20, 2007
EE362/Psych 221
Camera Phone Color Appearance Utility
Finding a Way to Identify Camera Phone Picture Color
Today in the 21st century, phones have become and will continue to be the portal
able digital platform for variety of imaging applications. From pictures to video to
personal organizers, they have become the personal computer on the go, don’t leave
home without it. With this new technological advancement, we saw an opportunity to
take the camera phone one step further and use it to help vision impaired individuals
identify the color of images.
Approximately 10 million blind people live within the U.S. today, including
55,200 legally blind children and 5.5 million elderly people. Color blind people,
consisting of 8% of males and up to 2% of females in a population of 300+ million,
account for over 30 million within the U.S. These people have a right to see what many
of us take for granted on a daily basis, the right to experience life to its fullest.
To help make a small push towards that direction, our goal for this project was to
develop a software application that would be able to accomplish the following:
1)
2)
3)
4)
Receives a phone camera quality image
Identifies the predominant color(s) regions within the image
Estimates the color name for the predominant region
Audibly transmits the predominant color to the user
Our software takes the incoming image, which is surrounded by a white frame fixed a6-8
inches away from the camera phone and run an edge detection algorithm to identify the
background of interest. From there the code, using HSV as coordinate system, identifies
the color of each pixel within the background and sums up the colors before announcing
the predominant color within the region. The issues we ran into and our resulting
solution are explained.
Edge Detection
The focus of this task was to identify the target card. After poring through
various sources on the web, and discussing the topic with coworkers, we decided on a
method that simply measured for steep or sharp changes in intensity across an image.
The fundamental idea was that where an edge existed, there would be a quick change in
the intensity and still large enough to measure. By computing the gradient around every
pixel, we could find these points and mark them along the image.
To accomplish this gradient computation, we used the Canny Edge Detection
method. This algorithm works in three steps.
1) Gaussian smoothing of the image
2) Computing the gradient of the intensities in the image
3) Thresholding the norm of the gradient image to isolate edge pixels
The resulting output of the Canny algorithm was a binary-like image with high values at
pixel locations where an edge was detected, and a low base value for all other
surrounding pixels. See figure below.
Figure 1:
Original image (left) of white card on white surface versus
Canny Thresholded Image on Right
However, before diving into more specifics about the algorithm, a lengthy discussion on
problems associated with the edge detection, particularly for our purposes, must be
addressed.
Identifying Edge Location
The Canny Edge Detection algorithm effectively “paints” the pixels that mark the edges
of the card. Now, the question becomes how to use these pixels to do the following:
1) Find the location of the OUTER edge of the white card in the picture
2) Find the location of the INNER edge of the target hole in the white card
3) Identify stripes or other patterns with lines from the actual card edges.
The task is a lot more difficult than visual perception might suggest. Looking at the
thresholded image, it appears easy to locate the lines of the card, the lines of the pattern if
they exist, as well as the random dots of background noise. The problem is to
mathematically these high pixels with their perceived edges in the image.
Problems with Identifying Edges
To simplify the problem during first pass analysis, I used pictures of the white
card positioned vertically standing up as shown in the figure below.
Card
Target
Hole
Figure 2: Simplified picture of 1st Analysis
As shown, the sides of the card match up perfectly with the sides of the photograph. If
this is always the case, one could simply take the thresholded image and locate the edges
by computing the maximum values and binning these values (by row and then by
column) into a histogram based on x and y pixel location value (x being horizontal axis, y
vertical). By finding the most common values in the histogram, that value would
correspond to the edge of the card or inner target hole.
The ideal situation as described above cannot be assumed to happen EVERY time
the camera is utilized, especially if the user is blind. For example, the card may be
slanted as in the below figure:
Card
Target
Hole
Figure 3: Slanted Target
In this case, the binning of high pixel values no longer works, since the card is at a high
angle. One could solve this by finding all the high pixel values as before, and computing
a linear regression fit of the points to find the edges of the target white card, and similarly
of the hole in the card. Another possible solution would be to compute and adjust the
slope between points recursively. Switching from one side of the card to another side
could be determined by computing a change in sign of the slope of a set of points. For
example, in the above figure, the bottom of the card has a negative slope, whereas the
right side has a positive slope. Computationally, this would take a long time and be very
complex, but would work. Still if there was a way to guarantee the upright, vertical card
that would greatly simplify the algorithm.
Impact of Background Images
During our experiments with various background targets, several situations
occurred where detecting the outline of the card with the Canny algorithm proved to be
very difficult. These were grouped into two categories based on 1) lighting and 2)
surrounding background of the target.
In the first case, lighting can prove to be a very important factor in detecting an
object. If the light source is dim, not enough photons are reflected off the white card to
sufficiently separate it from its background. The result is that visually the human eye can
distinguish the difference, but the image does not reflect (numerically) a high enough
difference between the background and the edge of the card. For example, in the case of
an off-white image, say beige, the lack of light captured in the image may cause poor
intensity gradients via the Canny algorithm.
Figure 4: White Card on Off white in Ambient lighting.
Even moderate lighting affects thresholding ability of Canny
Algorithm!
An extension of this problem surfaces if you have perfect lighting and a target
with a color matching EXACTLY that of the white card. You would not be able to see
the edges at all. There needs to be a solution to this problem. Another instance where
background color could affect the edge detections is if there is a striped or checkered
target. The result is perfectly formed lines that will pass the threshold test of the Canny
algorithm, causing a series of lines to appear in addition to the edges of the card. See
the example below.
Figure 5: Image is of white background with stripes.
Note the edges and stripes are all thresholded. Which
one is the correct edge??
User Complications/Problems
In addition to the math, user complications presented a challenging task to resolve
as well. As defined by the project statement, we wanted to develop an algorithm that
could be used by color-blind and BLIND individuals through the use of the camera
phone. Color-blind individuals can visually aim the camera phone well enough to ensure
the white card is completely in the field of view of the lens. How about a blind person?
We know that putting the camera right up to the target hole in the white card is futile
since the lack of lighting will probably leave us a dark-blackish looking image. This
would be similar to taking a picture at night without a flash. Additionally, this simply
nullifies the use of the white card as a baseline for calculating luminescence. The
problem thus becomes how to reduce camera aiming error for someone who cannot see.
If you think that aiming is simple even if you cannot see, think again. Having the sense
of vision, we as designers may OVERSIMPLIFY this problem dramatically. Here is a
test you can do to prove this to yourself:
1) Take any camera you have, a white index card (4” x 6”)
2) Close your eyes
3) Then try to take a centered picture of the card at about a 1 foot length.
You can probably do this fairly easily. NOW, try it again, only this time using a different
order of steps:
1) Close your eyes first for 5 minutes,
2) Pick up the camera and white card
3) Try to take a picture at the same length as before with the card centered.
Look at the pictures, there should be a marked difference in the centering of the white
card in both images (unless you peeked!). This is all true knowing we can see to begin
with. How much more difficult would this be for someone who has NEVER been able to
see. Another example would be trying to take a picture of yourself. People always try it
and always miss a few times (cutting of part of the face, head, or others in the picture),
before coming close to an acceptable one. A blind individual would not want to be
continuously adjusting the card repeatedly. More importantly, how would they know that
the target was in the center, or even if the picture they took included the white card at all?
If we are to believe that we can simply line up a camera and the white card target now, I
believe we have proved that is a false assumption. How do we solve this?
Our Problems Solved
As discussed, a variety of complications arises that need to be considered before
the algorithm can even be written. These involved everything from the math involved to
user interfaces required to use the algorithm properly. The following paragraphs will
address the aforementioned concerns that appeared over the course of our analysis and
design of the edge detection algorithm for both mathematical computation and user
feasibility.
The Contraption- Flying Blind
In all the stated challenges regarding the use of this algorithm, we saw the most
important one as that of user feasibility. It is akin to having a skilled marksman hit a
target without aiming. Similarly, the algorithm, no matter how perfect, is useless when
the target white card cannot be acquired. To resolve this situation, and in agreement with
Bob Dougherty, we decided a contraption that could be connected to the phone and used
to offset aiming error was deemed a necessity.
The Design
A bulky contraption would not be ideal to carry around. The goal was to devise
something small and compact enough to fit in a pocket or small pouch that could be
carried without discomfort. We decided upon a device that was collapsible to the size of
a 4 inch by 6 inch white card. On one end, the card would be attached and allowed to flip
out and stand upright. On the other side, a mounting device could be used to attach the
setup to the camera phone. The amount could include dials to allow fin adjustments of
the phone position. A distance of about 6-8 inches between the camera phone and the
white card was set to keep the design small and yet allow ample lighting to gauge the
color of the target. See the figure below:
Camera
Mount
White Card
Base Board to separate
Camera from Card
Hinge
Assembly
to allow for
folding
Figure 6: Diagram of Assembly from Horizontal View
Here are a few sample pictures before using the device and after using the device.
Figure 7: Photos without and with the cardholding device
The first is an attempt to photograph the orange shoebox. The second is an attempt to
photograph the purple gorilla. Notice how the card is not completely in the picture and
sometimes does not even include the target object!
The white card holder served multiple purposes other than just helping the user
aim. It also could predetermine the white card orientation. We forced the white card to
be positioned vertically such that the edges of the photo and the edges of the card were
parallel. This simplified our algorithm computationally. First, we could improve the
efficiency of the algorithm by eliminating strange orientations, such as the slanted card
described earlier. Second, the card holder guaranteed the card was always dominating
the Field of View of the camera, minimizing the problem of background clutter which
could only complicate the edge detection.
White Card Modifications
Another issue mentioned earlier concerned lowlight level backgrounds or objects
with a similar color to that of the card. To resolve this, I outlined the edges of the white
card and the edges of the whole with a thick, black line. The white to black transition, I
believed, would provide the steepest gradient for intensity, regardless of lighting and
background.
Proof of Concept
To prove the theories of the usefulness of this card holding apparatus, I used basic
materials to construct a primitive but useful device pictured below. Thin, wooden
plywood boards were used for the base, with a shorter board attached at one end via a
mini-hinge. To this shorter board we attached a white card with the black outlines on the
outer edges and inner target edges as well. At the other extreme of the base, a simple
thick paper clip was attached that allowed a slim phone to be locked into place.
Using this device I was able to take several pictures and run the edge-detection
and later the color selecting algorithm on them. It worked as well as I had expected. The
white card was always upright, and took up a large area of the photographs. Even more
impressive was the ability to use the Canny algorithm to “see” edges against very white
and dimly lit backgrounds. See figures below:
Figure 8: Original Picture of Blue Material Using Card Holding Device
Figure 9: Canny Edge Detected Thresholds
Now that we have specified the nature of the problem, we can proceed to a
discussion on the development of our edge detection algorithm using the Canny Edge
Detector.
The Algorithm
Step 1: Blurring and Sharpening Edges in the Image
Prior to using the Canny algorithm, the photographs are initially preprocessed to
sharpen the edges present. This is done by using a Laplacian convolution mask. The
Laplacian kernel is simply a 3x3 matrix filled with -1’s except at the center, where the
value is set to 8. The kernel is an approximation of the second derivative, highlighting
changes in intensity. Due to the Laplacian’s high sensitivity to noise, Gaussian
smoothing is done beforehand to blur and eliminate noisy pixels in the photograph. The
smoothed photo is then added to the result of the Laplacian convolution to obtain a new
image that has sharpened all edges for improved detection by the Canny Algorithm. See
the figures below.
Initial image
Image after Blurring and Sharpening.
Initial Image (GrayScale)
Image After Blurring and Sharpening (GrayScale)
Initial Image Top Inner Edge
Blur/Sharpened Top Inner Edge:
Notice less noisy in center and smoother
Note: Outer edge spikes are from photo
The first image has more noisy pixels at points around the edges of the card (see top inner
edge). The second image has reduced these spurious points by smoothing and the
Laplacian convolution. The result is cleaner lines on the outside of the card and around
the target hole.
Step 2: Using the Canny Algorithm
As stated earlier, the Canny Edge Detection Algorithm provided a way to find the
pixels that outlined the points of largest intensity change across the image. To reiterate,
this process has three distinct phases:
1) Gaussian smoothing of the image
2) Computing the gradient of the intensities in the image
3) Thresholding the norm of the gradient image to isolate edge pixels
Before processing, the image was first converted to a grayscale. This made the lighting
gradients more visible for the white card against the background.
Computing the Gradient
The Gaussian smoothing of the image is done by convolving it iteratively with a
Gaussian mask. The derivative of the smoothed image is then computed to identify the
gradients which identify the edges if they exist. In our case, we had to perform these
operations in two directions- vertically and then horizontally on the image.
The matlab code we obtained completes both the first and second steps using one
matrix. As users, we were able to tune the Gaussian mask by setting its size and standard
deviation. Extensive testing showed that the larger the mask, the more smeared (and by
default thicker) the edges became. A similar result occurred from increasing the standard
deviation to a large value. Ideally, we wanted to finely isolate all edges, so the smaller
the standard deviation, the better it worked. However, to avoid detecting small gradual
changes in intensity, the standard deviation could not be set too low. We eventually
settled on a mask size of 20, with a standard deviation of 5.
One important note is that regardless of the mask size or standard deviation, initial
testing showed variability to lighting conditions and background color. This led to our
decision to use the black outlines on the white card which was stated earlier. The result
was clearer detection of the white card outline every time, regardless of the Gaussian
mask parameters.
Thresholding
The last step of the Canny algorithm performs a type of binary thresholding by
setting all non-edge values to a single base number while leaving all edge pixels with a
high number. This low “zero” is identified by taking a percentage (alpha) of the
difference between the maximum and minimum intensity values from the norm of the
gradient of the image. The result is an image that has high values only where edges exist,
while the rest of the image is set to one base value (analogous to a binary image of ones
and zeroes). The problem here is that if alpha is set too low, every random bright spot
that was detected appears in the thresholded image. Conversely, by setting alpha too
high, we definitely remove these bright spots but at the cost of filtering out some of the
edge pixels as well.
After initial monte carlo testing over a few sample images, I was able to assess
that the best value of alpha ranged from 0.05 to 0.15 in decent lighting conditions
regardless of the mask size or standard deviation used. As lighting became weaker, the
more problematic this became, whereas excessive light minimally impacted the edge
detection. This was a residual effect created by the gradient computation and its inability
to highlight the white card edges. As expected, the lighting effect was diminished when
the black outlines were applied to the white card. In the end, I decided to fix alpha and
keep it at a level of 0.10
Figure 9: Histogram of Threshold Values over Various Gaussian Masks
Step 3: Processing the Thresholded Image
At this stage of the algorithm, we have a thresholded image with outlines of the
white card and target hole identified. The next step is to find the target hole in the
thresholded image and extract that matrix of pixels from the original RGB image for
processing by the color detection part of our algorithm. Our idea was very simple given
we had simplified the problem of card orientation to that of a vertically standing white
card. The key phases are as follows:
1) Detect pixels that have high thresholded values on left, right, top and bottom
of the photographs, searching from outside to inside
2) Bin the values for each side to estimate the white card outer edge location
3) Crop the original threholded image based on the computed edges and repeat to
find the inner target hole
Phase 1:
First, a detection of high valued pixels in the thresholded image is done to identify
the sides of the card. Code was written to do a recursive search from the left side of the
image until it found values exceeding threshold for 6 consecutive pixels. This was done
using the idOuterEdgeOutsideIn.m matlab routine. The pixel location was stored, and the
search continued up until the first quarter of the photograph size. Similarly, another
recursive search was applied from the right side of the image throughout the last quarter
of the photograph. This was done row by row throughout the image. We could limit the
search to the first and last quarters of the photograph because of the assumed positioning
of the card in the field of view using the card holder. Note we only searched
horizontally, not vertically. This will be explained shortly.
Phase 2:
Taking the distribution of the stored pixel locations from Phase 1 above, we
simply sorted the values into bins associated with their appropriate positions along the
horizontal axis in the image. This axis was broken down into pixels, with the 0 value
being on the far left, bottom corner of the image. By examining the number of pixels in
each particular bin number, we could determine an estimate for the left and right outer
edges of the card. To see this more clearly, let us take the values accumulated from
searching the left fourth of the image. See the diagram below:
Target
Hole
Right
Fourth
Left
Fourth
Card
Figure 10: Diagram of Search procedure: First, a search is done row by row to
obtain thresholded pixels on the left fourth of the image, followed by a
similar procedure on the right fourth of the image.
If we look at the size of the bins, we expect a high value associated with a pixel
location along which there was a vertical line in the thresholded image. In finding the
edge for the left side, we start with the highest bin number (which should be closest to the
left edge of the card, assuming pixel 0 of the horizontal axis is in the lower left corner of
the image) and walk backwards towards the lowest bin. At each bin, the algorithm
checks the total number of values it contains. If the bin contains a number greater than
10% of the sum total of all pixels, then we believe we have an edge. Why is this true?
We know the left edge of the card is visible due to the black outline AND we know the
left and right edges run the total height of the photo (OR definitely a majority of it).
Since we only crop and search on the left fourth of the image, we are guaranteed to find
at least that one edge of the card, and possibly more edges if the background has stripes.
However, we know with certainty, that the edge of the card is closest to the center within
that section of the photo since we have effectively cutout all other edges with the white
card. Even if there is a gap between the top of the card and the top of the photo, the
dominant bin will be the bin closest to the center and constituting a sizable percentage (a
minimum of 10-15%) of the total number of binned pixels. An equivalent procedure is
applied to the right fourth of the image. The computeSpread.m matlab routine assesses
these bins, their contents and the percentage of high pixel locations associated with each
bin. Again, by finding the largest bins closest to the center of the image in each of the
searched regions, we can find the outer edges of the card!
To identify the top and bottom sides of the card, we simply use the information
gathered from the left and right searches. The top side begins at the first few rows where
we start receiving high pixel data for the left and right searches. Since we know the left
and right edges from the binning of the values, we merely take an average of the first 10
rows of these searches where high thresholded values appear. However, in case the top
of the card is NOT cropped off, we do not count rows in the estimate that do not contain a
value for the left and right edges within a spread of 50 pixels. The top part of the card
corresponds to the first few rows at which we start accumulating high pixels at points
within a range of +/-25 pixels of the determined left and right edge. This is done in the
findTopEdge.m routine. Similarly, the bottom corresponds to the last few rows of high
pixel values. However, we assume the bottom of the card is cutout from the use of the
cardholding device. Thus, we simply estimate the edge from where the last few left and
right edges still collect high inputs.
Phase 3:
Once the outer edge pixel locations were identified, another search, similar to the
one described in Phase 2, is done to isolate the inner target hole via the
idInnerEdgeOutsideIn.m matlab routine. This search also uses a binning method to
identify edges, except we simply search to one edge and stop on both the left and right
sides. We can afford to do this since we know the card is white and we are just looking
for the black outline of the target center. There should be only one dominant set of bins
on each side of the target. The top and bottom of the target hole is simply computed to be
the first few rows (or last few rows) where the left and right searched locate high
thresholded values. See figure below.
Target
Hole
Left
Side
Right
Side
Card
Figure 11: Diagram of Inner Target Search: Notice that the outer edge has
been cropped off. All that is left is a picture where the edges are
all inside the white card. The only stripe or high thresholded
values should be the inner edge of the target hole, all else is
washed out.
Step 4: Color Detection Input
After identifying the inner and outer edges of white card, the exact location of the
target hole is identified by adding or subtracting the appropriate edges together in the
function computeEdgeLocation.m. The original image is then cropped down to these
edges to effectively zoom in on the color patch in the target. This 3-D matrix is then
handed over to the color processing code.
Things Learned
As simple as the search sounds, many things were learned and incorporated into
the algorithm to make sure it functioned correctly.
Edge Location Facts
One interesting discovery was that the outer border of the photographs always had
bright spots associated with them in the thresholded image. As a result, initial tests
always identified at least one edge of the photograph as the edge of the white card. To
resolve this, each image was always cropped by 15 pixels on each edge to exclude these
noisy components before searching was initiated.
A second problem during the edge detection process that repeatedly occurred was
the variability of pixels identifying the edges. This was discussed briefly in the binning
of Phase 2 above. If the card was not perfectly parallel to the camera lens, the lines
detected would look slightly slanted. (This is similar to looking at a flat road towards the
horizon- it looks as if it converges to a point in the distance.) Given we had a mounting
device that would keep the card at a reasonably steady angle to the camera, we assumed,
after extensive testing, that the difference in pixel locations for any left or right edge
should not exceed 50 pixels. This led to the method implemented above in
computeSpread.m where the algorithm checks to see if bins are separated by 50 or more
pixels. If the separation is less than 50 pixels, an average of the two bins is used to get a
better estimate. Ideally, a well designed card holding device would eliminate this
problem, keeping the card rigid and standing perfectly perpendicular to the bore sight of
the camera.
Alternate Algorithm Work
During the process of finding the edge locations, one could envision multiple
solutions to identifying the thresholded pixels. Our initial implementation was similar to
our final outside to inside search procedure, but searched for only the 4 maximum pixel
locations for each row and column. The two pixels farthest from the center would
identify the outer edges, the other two the inner target. It was our belief that the highest
values would mark the edges every time. The first problem arose with using the black
outline white card. The transition from white to black and from black to the background
often caused 8 edges to appear on flat colored backgrounds. This was easy to solve using
buffers to eliminate the two edges every time. However, another problem arose once
multicolored or patterned backgrounds/targets were being photographed. If there were
stripes or changes in the back, say a table edge, those transitions would appear as high
valued edges in the Canny algorithm. Similarly, if the target had very bright lines we
might detect high pixels inside the target hole. The issue then became one of knowing
how many high-valued pixels do we store? If we limit it to four, corresponding to the
edges of the card and target, we may miss them due to outside noise or the background
pattern. Even if we buffered, we could not be sure that the four maximum pixel
intensities were always that of the white card. Similarly, if we stored ALL high valued
pixels, it required a long and iterative process that was very inefficient. This led us to use
the outside in search procedure. See the idOuterEdgeLocationMax.m matlab file.
The outside-in search procedure works well because of the use of the white card
holder. Initially we implemented this outside-in search procedure without the holder.
This proved to be as tedious a method as storing all the high valued pixels. Many of the
problems we encountered were similar to those expressed above in searching for the
maximum values in each row/column. There were too many thresholded values at times,
depending on the photograph. If we simply searched until a set of high pixels was
located, we might be identifying a stripe in the background, or just a random noisy set of
pixels. This led us to the idea of a card holder to simplify the edge detection problem.
See idOuterEdgeLocationOutsideInOriginal.m
As discussed, the holder serves to vertically orient the card and maintain a regular
distance to the card. This ensures the card is the dominant image in the photograph. In
many of our test photos, one can notice the bottom and top edges are removed from the
photo due to the nearness of the camera. Similarly, the background, vertically, on the
sides is limited to a size not much larger than the height of the card. This greatly aids us
in the outside to inside search method we implemented.
By computing the max pixel value locations for each outer quarter of the image,
we could be sure we were NOT getting max values inside the target hole. This gave us
more confidence in our estimates of the outer edge. Secondly, because the top and
bottom was cropped off, in the worst case, our bins for the left and right edges would we
of equal size if there were stripes on the outside margins that might pass threshold in the
Canny algorithm. In that case, since our card was designed to show maximum gradient
via the black outline, the most probable edge still remained in the maximum bin closest to
the center of the image.
The only issue with the method eventually decided upon is that the card must be
centered in the field of view of the camera on the device. This may appear difficult for a
blind person, but using algorithms similar to the ones written already, we can accomplish
that I believe. This leads us to the next point, base lining the aim of the camera on the
device.
Base lining the Cardholding Device
A brief discussion of how the device would be guaranteed to center the white card
and camera is of importance here. It is safe to say that many could question how well a
blind individual could center their camera lens and the white card. Similarly, how could
one center the camera such that the base (and top) of the white card is cropped? This
could be done by giving the user a centering algorithm with audio feedback and an
adjustable mount on the card holding device. The algorithm would base it off a white
card with its center still intact but outlined in black, and a white background sheet which
could be included in the package when purchased by the user. Very simply, the user
would attach this card, take photos on the white background, and the algorithm could
compute the edges. Based on the white background, there is no additional line content
except the borders of the card, so the algorithm could supply feedback to the user to
accurately adjust the mount until the edges matched as we described above. Then the
target card with the whole could be inserted in place of the test card and function just as
well.
Color Detection
Once the color target is acquired by the edge detection algorithm, the next step is
color detection. Color detection consists of two basic steps, the normalization of the
image pixels and then deciding the color. As stated earlier, the input to the color detection
is the 3D matrix which contains the RGB values of the color target taken from the
original images. Under perfect lighting conditions the image would contain the exact
RGB values for the target and the color detection would be trivial. However, since in our
case the image is taken from a low quality camera and the lighting in undetermined, the
color in the image may not be the actual color a person would see under ideal conditions.
To correct for this we needed to take these RGB values run a normalization algorithm
and then decide on the color in the target. We divided this problem into two distinct parts,
normalization or color correction and deciding the color.
Normalization
The image taken by a camera is a rendering of the light that impinges on the lens,
the ambient lighting, the reflectivity of the object being photographed and its motion all
affect the image. Hence, a photo of a red shirt under fluorescent lighting might appear
slightly pink, whereas the same shirt under tungsten will appear more orange. The
process of normalizing or white balancing is used to correct for such effects. There are
many algorithms that can be followed to do white balancing. Most cameras have some
process already inbuilt do to just this, but cell-phone cameras are low end, and the
minimal processing that is done is not enough.
Using class notes and research done online, we found various methods to achieve
the desired result. However, since the program must run on a cell phone with limited
resources, we decided to keep the algorithms as simple as possible, so that no matrix
inversion or complex operations were required.
“Gray World” assumption
The first strategy investigated was the “gray world” assumption whereby as the
name suggests, the world is considered to be gray on average. So the average R, G and B
values for the image are normalized to 128, and this value is used to normalize all the
pixels. This is a well known technique which returned usable results with the images that
were tested, under most lighting conditions. The white balancing was correct even in low
light conditions, when the brightest point had RGB values near gray. This algorithm
could be used in darker conditions, and returned better results though as discussed in the
color recognition section, if illumination is below a minimum level the color recognition
is unreliable. However as illustrated in figure 12 below, under bright conditions and with
uniform colors only, gray world algorithm fails.
10
10
20
20
30
30
40
40
50
50
60
60
70
70
80
80
90
90
100
100
20
40
60
80
100
120
140
160
180
200
220
20
40
60
80
100
120
140
160
180
200
220
Figure 12: Illustration of normalization using “Gray World”, on a bright white surface
under fluorescent lighting.
Normalizing using White Point
The gray world assumption holds valid for most lighting and is a useful way of
white balancing if no other information is available from the picture. But we have the
advantage of having a white card in the image, which can be used as the “white point”, a
reference from which we can calculate the normalization coefficients for the RGB value.
To achieve this, we find the max RGB point in the card image and normalize that to (255,
255,255). This method brings up the RGBs for the target image, since the card is known
to be pure white. The results achieved were comparable to those using the gray world
assumption, and in most cases with bright lighting, the results were better, as illustrated
in the figures below. However, under poor lighting conditions, this algorithm performs
marginally worse. But, after extensive testing with different lighting conditions and
images, it was found that the overall performance was better than “gray world” and
hence, this algorithm was chosen.
10
10
20
20
30
30
40
40
50
50
60
60
70
70
80
80
90
90
100
100
20
40
60
80
100
120
140
160
180
200
220
20
40
60
80
100
120
140
160
180
200
220
Figure 13: Illustration of normalization using white point, on a bright white surface
under fluorescent lighting.
Under very low lighting, with max RGB below 100, both the normalization
algorithms are rendered useless, since the target image looks essentially looks black and
the white card appears dark gray, and the normalization just creates artifact pixels and
doesn’t bring out the real color. To counter this problem, a more extensive and
complicated white balance algorithm must be implanted, or a flash must be required on
the device.
After normalization of the pixels, the 3D matrix is then sent to the color
recognition algorithm where the color of the target image is decided.
Color Recognition
Initially, after the normalization was finished, the color recognition part seemed
trivial. The original plan was to bin each pixel depending on the RGB values. This is the
simplest method with no conversion required. However, on initial testing we recognized
that making bins with upper and lower was complicated, since a slight change in any one
of the three parameters can cause the colors to vary. Setting even upper and lower bounds
is almost impossible, since some colors have wider ranges. Also there was no set pattern
and grouping, to allow nested if statements in the code.
We spent over a week attempting to come up with methods to group the RGB
values. Initially, just sorting the color lists by R, then G, and B. This method did bring
similar colors together, but there were still some random colors scattered around, we did
realize though that R seems to be dominant. Next, we attempted to find the max of R, G
and B for each pixel and normalize that pixel, hence if Gold is (255, 215, 0) then the max
of RGB for that pixel is 255 and the normalized pixel will be (1, .84, 0). On sorting, we
found all the colors where R is the maximum have a reddish hue. And all with max B are
bluish. But, after working through all the color, we again had the problem of overlap and
unclear bounds.
At this point, we started to look beyond RGB and into more complicated the
colors schemes that are translations of RGB into a different domain. The CIE-Lab color
system seemed to be the answer, it was discussed quite extensively in class and could
potentially provide a representation for the colors which could be easily divided and
subdivided. However, to be able to use CIELab the illumination must be known since the
transformation requires different constants which are in turn dependent on the ambient
lighting in which the picture was taken. Once, this illumination is known the image
represented using CIELab becomes independent of the device taking the picture, which is
also one of the pros of using this scheme.
However, while researching CIELab, we found the HSV color scheme. HSV is a
simple transformation from RGB, and stands for Hue, Saturation, Value model. The Hue
is the color type like red, blue; Saturation is the vibrancy of the color (i.e. how faded or
sharp a color is); and Value is the brightness of the color. Figure 14 shows the cylindrical
representation of this standard.
Figure 14: Cylindrical representation of HSV model.
The Hue represents the color, changing as one moves around the circumference of
the cone. The Saturation represents the sharpness of the color, being zero at the center
(completely faded) to 1 at the circumference for full vibrancy. The Value goes from zero
to 1 also, and represents the brightness of the color, zero being dark and 1 being fully
bright.
The HSV model suits the needs of this project perfectly; the following formulas
could be easily coded and are not computationally intensive.
Figure 15: Equations used to convert R, G and B values to HSV
Once the above calculation is run, the H value can be used to determine color and
the S and V value for the lightness or darkness. The Hue value is calculated from 0-360
similar to angles, and grouping colors now a trivial task.
Figure 16: Hue (H) values in HSV and its corresponding color
After deciding on HSV, the code was written to incorporate the major VIBGYOR
colors with each color having a light, true and dark version, plus white, black and gray.
This scheme seems to provide a robust system since making upper and lower bounds can
be set easily and lightness and darkness are also clearly defined. As future work, more
colors could be added simply by making the bounds smaller and defining colors in higher
precision.
The following figures show some test pictures and the results after running them
through the matlab routine.
Figure 17a: Pure white card under fluorescent lighting Result: WHITE
Figure 17b: Light Green and white striped shirt under sunlight Result: Light
Green
Figure 17c: Pure red backpack under tungsten lighting Result: Crimson
Future Work
From picture framing to edge detection to color identification, our algorithm did
what it was intended to do, but of course there is plenty of room for improvement. To
continue with this work, the first area that needs exploring is implementing the algorithm
on a camera phone. All of our processing was done on a laptop where memory and RAM
is not a problem. Obviously the same kind of computing power cannot be expected from
a phone. The second front that needs exploring is decreasing the processing time to
audibly deliver the color to the user. Both of these fronts would require streaming the
algorithm to run in a clean and efficient manner for maximum practically and usability.
Another area for possible improvement would be increasing the color library. Our
algorithm was able to recognize and deliver only 24 different colors. Finally, increasing
the algorithm’s sophistication in order to be able to detect color patterns like strips and
patch patterns rather than only the predominant color. The last two fronts, expanding the
color library and increasing the algorithm’s sophistication would increase the processing
time and memory requirements, but with the camera phone continuing to increase in
computing power and capability, we believe this could be achieved.
All people have a right to experience life to its fullest which includes engaging the
world through all five senses. If nature does not provide an individual with the full use of
vision, we hope our project has taken a small step in rectifying that.
Appendix I
All of our Matlab scripts and source code are attached to our project website,
which can be found at http://stanford.edu/~denes/Psych221/Psych_221Final_Project.htm
Appendix II
Group Project Roles & Responsibilities:
Elston Tochip:
Development of the edge detection algorithm.
Robert Prakash: Development of the color identification algorithm.
Phillip Lachman: Development and selection of the picture images, color schemes and
audio output files.