IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
1
Learning-based Shadow Recognition and
Removal from Monochromatic Natural Images
Mingliang Xu, Member, IEEE, Jiejie Zhu, Member, IEEE, Pei Lv, Member, IEEE,
Bing Zhou, Member, IEEE, Marshall F. Tappen, Member, IEEE and Rongrong Ji, Senior Member, IEEE
Abstract—This paper addresses the problem of recognizing
and removing shadows from monochromatic natural images
from a learning based perspective. Without chromatic information, shadow recognition and removal are extremely
challenging in the literature, mainly due to the missing of
invariant color cues. Natural scenes make this problem even
harder due to the complex illumination condition and ambiguity from many near-black objects. In this paper, a learning
based shadow recognition and removal scheme is proposed to
tackle the challenges above. Firstly, we propose to use both
shadow-variant and invariant cues from illumination, texture
and odd order derivative characteristics to recognize shadows.
Such features are used to train a classifier via boosting a
decision tree and integrated into a Conditional Random Field,
which can enforce local consistency over pixel labels. Secondly,
a Gaussian model is introduced to remove the recognized
shadows from monochromatic natural scenes. The proposed
scheme is evaluated using both qualitative and quantitative
results based on a novel database of hand-labeled shadows,
with comparisons to the existing state-of-the-art schemes. We
show that the shadowed areas of a monochromatic image can
be accurately identified using the proposed scheme, and highquality shadow-free images can be precisely recovered after
shadow removal.
Index Terms—Shadow Recognition, Shadow Removal,
Monochromatic Image, Nature Scene, Decision Tree, Conditional Random Field, Gaussian Model.
I. I NTRODUCTION
Shadows are among the most noticeable effects on understanding the structure and semantic of a natural scene.
While shadows can provide useful cues regarding scene
properties such as object size, shape, and movement [16],
their existence has also complicated various image recognition tasks for natural images, ranging from object recognition to scene parsing [4], [35], [32], [29] and removal [9],
[2], [30], [3].
While promising progress has been made in the literature,
most existing works still retains on chromatic information for shadow recognition and removal. For instance,
Finlayson et al. [9] located and removed the shadows
using an invariant color model. Shor and Lischinski [30]
propagated the shadows from pixels to a region using a
color-based region growing method. Salvador et al. [28]
adopted invariant color features to segment cast shadows.
Levine et al. [17] and Arévalo et al. [3] both studied the
color ratios across boundaries to assist shadow recognition.
Note that a serial of schemes have also been proposed to
detect shadows in videos by using color-related motion cues
[24], [23], [34], [19], [14].
(a) Diffusion
(b) Specularity
(c) Self-shading
Fig. 1. Ambiguity of shadow recognition in monochromatic domain. The
diffuse object (swing) in (a) and the specular object (car) in (b) both have
a dark albedo. The trees in the aerial image (c) appear black because of
self-shading. Such objects are very difficult to be separated from shadows
which are also relatively dark.
One fundamental assumption for chromatic-based shadow recognition and removal is that the chromatic appearance of image regions does not change across shadow
boundaries, while the intensity component of a pixel does.
Such approaches work well given color images where
object surfaces are still discernible inside the shadow.
However, such schemes cannot be deployed to monochromatic images where shadows also prevalently exist. To the
best of our knowledge, shadow detection and removal for
monochromatic images is left unexploited. In particular,
there exist two particular challenges:
• Objects in monochromatic images are more likely to
appear black or near black, as shown in Figure 1. Such
black objects exhibit similar intensity statistics to the
shadow regions to be removed, which increases the
difficulty in recognizing and removing shadows.
• Shadow recognition highly relies on separating
changes in albedo from illumination change. Therefore, chromatic information is the most important cue,
which is however missing in monochromatic images.
Despite the above difficulties, detecting and removing
shadow in monochromatic images are still highly demanded
in real-world applications. On one hand, color is not always
available in all types of sensors, such as sensors tuned to
specific spectra, or designed for specific usages like aerial,
satellite, and celestial images, as well as sensors to capture
images from very high dynamic scenes. On the other hand,
it is widely known that humans are still able to identify
shadows in monochromatic images [11], which inspires us
to exploit the underlying principles in shadow modeling
from monochromatic images.
In this paper, we present a learning-based approach for
detecting and removing shadows from a single monochromatic image. The shadow recognition approach is based
on boosted decision tree classifiers and integrated into
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
a Conditional Random Field (CRF). Finally, we further
present a Gaussian model based method for removing
shadows, which uses less parameters than previous work.
Different from the existing works, we target at capturing the
unique statistics of shadows within monochromatic images
by exploiting both shadow-variant and invariant cues in a
data-driven, learning-based manner. Given sufficient training data, our approach is more general than previous works
that exploited grayscale cues to remove general illumination
effects [4], [35], [32], as quantitatively evaluated on vast
natural images ranging from highways, parking lots to
indoor office scenes. Comparing to the most related work
proposed in [25], the training set improved or separating
shading and reflectance changes is still limited to small
objects, while our work focuses on shadows in large-scale
outdoor scenes.
The remainder of this paper is organized as follows:
Section II introduces the features we extracted for shadow
recognition. Section III formulates the proposed learning
model. Section IV discusses two practical issues of feature
boosting for accuracy improvement and parallel training for
acceleration. Section V presents the experimental results
with comparisons to alternative schemes. Finally, Section
VII concludes this paper. Along with this paper, we have also released the dataset and all the codes, which are available
from http://sites.google.com/site/jiejiezhu1/Home/project
shadow.
II. M ONOCHROMATIC BASED F EATURE E XTRACTION
A. Dataset
We have built a database consisting of 355 images1 .
For each image, the shadows have been handcraft-labeled
at the pixel level, which are further cross-validated by
a group of volunteers to ensure the label precision. In
this database, approximately 50% images were collected
from various outdoor environments such as campus and
downtown areas. To ensure a wide variety of lighting
conditions, we also collected images at different times
throughout the day. The dataset includes additional 74 aerial
images with a size of 257 × 257 from Overhead Imagery
Research Dataset (OIRDS) [31] and another 54 image with
a size of 640 × 480 from LabelMe dataset [27]2 . Figure 2
shows several examples.
Compared to the database used in our previous work
[13], the shadows in the newly added 110 images are
extremely challenging for recognition (see the last row
in Figure 2). Our motivation is to check how good the
learning methods introduced in this paper perform on this
challenging scenes. Nevertheless, results from the previous
work can be also used as a baseline for comparison purpose.
1 All the dataset used in the paper can be downloaded from
http://sites.google.com/site/jiejiezhu1/Home/project shadow
2 The camera we used to capture part of the images in the database
is a Canon Digital Rebel XTi. The camera model used for the shadow
images from the LabelMe database and from the OIRDS database are not
reported.
2
Fig. 2. Example of shadows in the dataset. The images captured by
the authors are linearly transformed from raw format to TIF format. The
images from LableMe are in JPG format. The images from OIRDS are in
TIF format. All the color images are converted to gray scale images. The
last row shows some challenging examples.
B. Feature Extraction
Given a single monochromatic natural image, our goal
is to identify those pixels that are associated with shadows.
Without chromatic information, the features should be able
to identify illumination, texture and odd order derivative
characteristics. Rather than using pixel values alone, we
also include the features from homogeneous regions generated by over-segmentation using intensity values [20].
To reliably capture the cues across shadow boundaries,
following [6], we collect statistics of neighboring pairs of
shadow/non-shadow segments from all individual images in
the dataset. These statistics are represented as histograms
from shadow and non-shadow segments given equal bin
centers.
We propose three types of features: shadow-variant
features that describe different characteristics in shadows
and in non-shadows; shadow-invariant features that exhibit
similar behaviors across shadow boundaries and near-black
features that distinguish shadows from near-black pixels.
The above complimentary cues are further integrated to
provide strong predictions of shadows jointly. For example,
if the segment has invariant texture (compared with its
neighbor) but has variant intensities, it is more likely that
this segment belongs to shadows. Furthermore, each type
of feature is characterized by scalar values to provide information relevant to shadow properties. Below we introduce
in details the above features:
C. Shadow-Variant Features
Intensity: Since shadows are expected to be relatively
dark, we gather statistics (Figure 3 (a)) about the intensity
of pixel values, based on which a histogram is calculated
and further augmented with the averaged intensity value
and the standard deviation inside a segment.
Local Max: In a local patch, shadows have very low
intensity values. Therefore, its local max value is expected
to be small. On the contrary, non-shadows often have high
intensity values and its local max value is expected to be
large (Figure 3 (b)). We capture this cue by a local max
completed at each segment.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
3
(a) Log Illumination
(a) Entropy
(b) Segment-based edge response
Fig. 4. Two cues used to distinguish shadows from near-black objects.
The distributions in (a) are obtained from 40 images including near-black
objects along with shadows from our dataset. In (b), the edge map is
showed in white. The segmentation boundaries are shown in red.
(b) Local Max
Fig. 3.
Mean Histograms of log illumination and local max. The
histograms are generated from neighboring pairs of shadow/non-shadow
segments using equal bin centers. The number of bins is set to 150. By
gathering the histograms from all individual images in the dataset, we plot
their mean histogram in this figure.
Smoothness: Shadows are often smoothed among neighborhood pixels. This is because shadows tend to suppress
local variations on the underlining surfaces. We use the
method proposed by Forth and Fleck [21] to capture this
cue. The method subtracts a smoothed version from the
original image. To measure the smoothness, we also use
the standard deviation of the smoothed version from each
segment.
Skewness: We gathered several statistical variables (standard deviation, skewness and kurtosis) to find a mean
value of 1.77 for shadows and -0.77 for non-shadows in
skewness empirically. It indicates that the asymmetries in
shadows and in non-shadows are different, which helps to
locate shadows. This odd order statistic is also found to
be useful in extracting reflectance and gloss from natural
scenes [12] [1].
D. Shadow-Invariant Features
Gradient Similarity: We have observed that transforming the image with a pixel-wise log transformation makes
the shadow being an additive offset to the pixel values in the
scene. Therefore, the distribution of image gradient values
is expected to be invariant across shadow boundaries. To
capture this cue, we measure the similarity between the
distributions of several first order derivatives of Gaussian
filters.
Texture Similarity: We have also observed that the
textural properties of surfaces are changed little across
shadow boundaries. We measure the textural properties of
an image region using the method introduced in [20]. The
method filters a database of images with a bank of Gaussian
derivative filters with 8 orientations and 3 scales, and then
applies clustering to form 128 discrete centers. Given a
new image, the texon is assigned as the histograms binned
at these discrete centers.
As shown in Figure 5 (f) where the color indicates
the texton index selected at each point, the distributions
of textons inside and outside shadows are similar. The
primary difference is that the distortion artifacts in the
darker portions of the image lead to a slight increase in
the number of lower-index textons, as indicated by more
blue pixels.
E. Near-Black Features
As shown in Figure 3 (a), the pixel intensity is a good
indicator of the shadow, which is usually dark. Unfortunately, this heuristic alone cannot be used to reliably
identify shadows as it fails when facing dark objects. We
have found that objects with a dark albedo are some of
the most difficult image regions to separate from shadows.
Similarly, tree regions are also difficult because of the
complex self-shading caused by the leaves. Due to the
complexity of hand-labeling, the self-shading within a tree
is not considered to be a shadow. We refer to these
objects as near-black objects (see Figure 1). To correctly
distinguish shadows from near-black objects, we introduce
two additional features:
Discrete Entropy: We have observed that the entropy
of patches inside shadows differs from patches inside
objects. The entropy is computed for each segment using
the formula below:
∑
Ei =
−pi × log2 (pi ),
(1)
i∈ω
where ω denotes all the pixels inside the segment, pi is the
probability of the histogram counts at pixel i.
Figure 4(a) shows the distribution of the entropy values in
patches from shadows and near-black objects respectively.
These histograms show that the entropy of diffuse objects
with dark albedos are relatively small, as most black
objects are textureless in natural scenes. Figure 4(a) also
shows histograms of the entropy of patches from specular
objects(shown in blue) and image patches from trees(shown
in green). There is a small mode at lower entropy values
in the histogram from tree patches. Explanation lies in that
some parts of trees are over-exposured to the sun, leading
to saturated areas with low entropy.
Edge Response: Another feature we found useful is the
edge response, which is often small in shadows. Figure 4
(b) shows an example where segments in shadows have
near zero edge response, while that of specular object (the
body of the car) has a strong edge response. We quantize
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
4
(a) Intensity
(b) Smoothness
(c) Local Max
(d) Skewness
(e) Gradient
(f) Texton
(g) Entropy
(h) Edge Map
Fig. 5. Feature visualization. In color figures, pixels in blue have small
values, while pixels in red have large values.
this cue by summing up edge responses (computed using
the Canny edge with a threshold setting to 0.01) inside a
segment. Figure 5 shows a typical scene to visualize all 8
proposed features.
III. L EARNING TO R ECOGNIZE S HADOWS
From the dataset introduced in Section II-A, we randomly selected 125 images as training data, and used
the remaining 120 images as test data. We formulate the
shadow detection problem as a per-pixel binary labeling
problem, where every pixel is classified as either being
inside shadow or not. Formally speaking, our goal is to
assign yi ∈ {−1, +1} for all pixels i. Here, a Logistic
Random Field (LRF) model is adopted, whose parameters
are trained in the learning process to specify the decision
boundary.
l, given an input image o, has a form of
∏
1∏
(4)
P (l|o) =
ϕi (li |oi )
ψi,j (li , lj |oi ),
Z i
<i,j>
∏
where <i,j> denotes the product of all pairs of neighboring pixels, both horizontal and vertical. The constant Z is
the normalizing factor of the distribution.
The LRF model discriminatively estimates the marginal
distribution over each pixel’s label. Essentially, a Gaussian
CRF is used to estimate the response image r∗ which is
passed in a pixel-wise fashion through the logistic function
σ(·). Formally, the likelihood of pixel i taking the label +1
(denoted as li ) is expressed as
li = σ(ri∗ )
where r∗ = arg min, C(r; o)
(5)
r
where C(r; o) is a quadratic cost function that captures the
same types of propagation behavior desired from the CRF
model. A similar model was also proposed in [7]. We then
introduce how to formulate C(r; o) below.
B. Learning LRF to Recognize Shadows
The cost function C(r; o) is based upon interactions
between the expected response r and the current observations o. It expresses the relationship between responses
in a neighborhood jointly to recognize shadows. We define
C(r; o) as
∑
C(r; o) = w(o; θ1 )(ri − 10)2 + w(o; θ2 )(ri + 10)2 +
i
∑
w(o; θ3 )
(ri − rj )2 .
<i,j>
A. Logistic Random Field
The Logistic Random Field is essentially a logistic
regression model, which is generalized to the conditional
random field [33]. If we regard each pixel as a sample, the
probability that a pixel should be labeled as shadows +1
given its observation x is:
p(+1|x) = σ(wT f ).
(2)
The vector w is a weighting vector that defines a decision
boundary in the feature space and f is the feature of this
pixel. The function σ(·) is the logistic function:
1
.
σ(a) =
1 + exp(−a)
(3)
Logistic regression can be viewed as converting the linear
function wT f , which ranges from −∞ to +∞, into a
probability, which ranges from 0 to 1.
To generalize Logistic Regression to a random field model, we introduce response image, r∗ . r∗ is a linear response
map computed by wiT f . We express the probability of a
pixel belong to shadow as li , which is calculated from the
output of the logistic function li = σ(r∗i ).
We use a pairwise CRF to model the labeling problem,
which provides an elegant means to enforce local consistency and smoothness. The conditional distribution over labels
(6)
Each term ri refers to the entry pixel i in the response image
r. These two terms pull each pixel to either -10 or +10 in
the response image r∗ . While the response image should
technically vary between −∞ to +∞, ranging the values
between -10 and +10 is sufficient in practice. In particular,
setting a specific pixel to +10 makes the probability of
using the logistic function σ(·) being equal to 1−(4×10−5 ).
The weight w(o; θk ), k ∈ 1, 2, 3 at pixel i is given by
(
)
∑
w(oi ; θk ) =
exp θkj fij ,
(7)
j∈Nf
where θkj is the parameter vector associated with the j th
feature fi for term k, Nf is the total number of features.
By concatenating θ1 , θ2 , θ3 into a vector θ, we show that
θ can be found by minimizing the negative log-likelihood of
the training dataset. We define the negative log-likelihood
of model response over a training image by L(θ) as
∑
∑
(
)
L(θ) =
log 1 + exp(−ti ri∗ ) + λ
θj2 , (8)
i
j∈N⊗
where ti is the ground-truth probability of each pixel
labeled to shadows, and the second term is a quadratic
regularization term to avoid overfitting. λ is manually set
to 10−4 . We use a standard gradient descent method to
iteratively update the parameters θ which are all initialized
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
5
at zero. Note that L(θ) depends on θ via r∗ . The regularizer
penalizes the model parameters uniformly, corresponding
to imposing a uniform variance onto θ. This motivates a
normalization of each type of the feature into [-1,1].
We show the details of the gradient computation of
Equation (8) and the computation of r∗ using a matrix
representation in the following subsection.
C. Gradient Computation in Matrix Formulation
(a) Reference
We rewrite the cost function C(r; o) in a matrix representation by
C(r; o) = (Ar′ − b)T W(o, θ)(Ar′ − b),
(9)
where A is a set of matrices A1 ...AN⊗ that perform the
same linear operations as convolving an image r with a
filter f . For the first two terms in C(r; o), we derive two
filters f1 , f2 by setting only one of the pixel to 1 in a
3 × 3 window while others are remaining zero. For the
last term, we loop through all possible combinations of [-1
1] pair in the 3 × 3 window, and choose each of them
as a convolutional filter. Finally, A is pre-computed by
convolving the filters to a virtual image with identical image
size to the images from the dataset. If the total number of
pixel in the image k is n, then Ak is a n × n matrix3 .
r′ is a vector created by unwrapping the response image
r, and performing A1 r′ is identical to performing f1 ⊗ r.
We concatenate b using −10 and +10 for the first two terms
in C(r; o), and simply set b = 0 for the third term.
If the number of unknowns in Equation (9) is n, the
number of constraints is m = N⊗ ∗ n, where N⊗ is the
total number of filters. We can see it is an over-determined
problem. In addition, Equation (9) is a quadratic function,
therefor its minimal can be computed using pseudo-inverse
r∗ = (AT A)−1 AT W(o, θ)b.
(10)
Now, we can derive the computation of the gradient for
the loss function L(θ) with respect to θ as:
∂L(θ)
∂L(θ) ∂r∗
∂θ = ∂r∗ ∂θ
∑
−t exp(−t∗r∗ )
∂L(θ)
j∈N⊗ θ
∂r∗ = 1+exp(−t∗r∗ ) + 2λ
∂r∗
∂θ
= (AT A)−1 AT ∂W(o,θ)
b
∂θ
(b) Entropy Map
(c) Mask
Fig. 6. Examples of conditional probability. In the entropy maps (b), the
red colors denote high values while blue ones denote small value. The
masks (c) are computed by a soft approach using an exponential function
based on two entropy centers clustered from each entropy map. We chose
the large center in the first case and chose the small center in the second
case. The brightness of the pixel denotes the probability of being a shadow
pixel. Part of the segmentation boundaries are shown in red.
A. Feature Boosting
The LRF model works well if the marginal distribution
of a pixel belonging to shadows can be defined using the
proposed features. Our results show that the linear classifier
trained from the LRF achieves acceptable results (75.2%),
but it also misclassifies many dark pixels such as near black
objects (as shown in Figure 8). One reason is that the case
to distinguish shadows from near black objects are very
complex. Figure 6 shows such a case where a pixel is likely
to be shadow according to the entropy cue varies in different
images.
To tackle this issue, Boosted Decision Tree (BDT) [5]
is adopted, which builds a number of hierarchy trees by
increasing the weights of the misclassified training samples.
We sample each segment from the training dataset and
learn the BDT estimators. For each pixel in the image,
BDT estimator returns a probability distribution over the
shadow labels. We use 40 weak classifiers, and ensemble
their outputs as new features in the LRF model. In the
experiment section, we show that by integrating BDT with
LRF, we achieve the best shadow recognition rate. Note
that, combining the labeling distributions from decision
trees and random fields has also been studied in [36], where
improved segmentation results are observed by enforcing
the layout consistency.
B. Parallel Training
IV. P RACTICAL I SSUES
In this section, we introduce two practical issues in our
model. In Feature Boosting, we show that by integrating
Boosted Decision Tree instead of the raw features, the
recognition accuracy of LRF can be further improved. In
Parallel Training, we introduce a low cost acceleration
method (without using of cluster servers) to update the
gradient.
3 It is encouraged to refer to the filt2mat.m file in the online code for
details on how to construct Ak
We further implement a parallel version of learning
parameters from the LRF model with two considerations:
• the memory requirement for loading all the training
data exceeds a commodity PC memory.
• gradient updating requires tremendous computing resources.
Traditional methods to solve the first problem is to load
a subset of training data or to compute the features on
the fly. Online training approaches such as Stochastic
Gradient Descent, can be also introduced to accelerate the
convergence by updating the parameters from each training
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
sample. In this paper, we handle the efficiency challenge
from an architecture level, i.e. parallel training without
special requirements of a cluster.
For each pixel, our LRF model requires 1, 290 or even
more features to be used to in classification. In the boosting
step, there are 40 outputs for each pixel. Together with
them, a horizontal and vertical derivative, and an one bias
feature are added as new features in the LRF model,
which gives us totally 43 features. As in Section III-C,
in pixel-level, we include a 3 × 3 neighboring window to
construct the filters. In this 3 × 3 window, for the data
term, we also concatenate its 8 neighbors’s feature for both
negative (-10) and positive (+10) cases. For the smoothness
term, the number of filters ([-1,1] pair) is 12. The total
number of filters used is 9 + 9 + 12 = 30. By timing 43
features, we have 43 × 30 = 1290 features in total. To
train 125 images, it requires around 9G of memory without
considering additional matrix operations.
Based upon the above analysis, it is obvious that gradient
computation of ∂L(θ)
requires lots of matrix computation,
∂θ
which is the most time consuming task to update the
gradient of parameters θ. Given 125 training images, we
update the gradient after summing the gradients over all
the training images. This is different from online gradient
descent, such as stochastic approches, where the gradient is
updated for each training image. The summation provides
an elegant way to distribute the gradient computation task
into multiple processors by assigning a subset of training
images for each processor.
We used MatLabMPI [15] for model updating, which is
a set of Matlab scripts that implement a subset of Message
Passing Interface (MPI) and allow Matlab programs to be
run on multiple processors. One advantage of MatlabMPI
is that it does not require a cluster to execute the parallel
commands. We configured three individual PCs including
10 processors and allow them to share 20G of memory. It
takes around 10 hours to learn the LRF parameters.
V. E VALUATION ON R ECOGNITION R ESULTS
To evaluate how well the classifiers can locate the
shadows, we select a subset of the images (126) including
high quality (can be easily labeled by people) shadows as
the training dataset. We predict the shadow label at every
pixel for another 124 images with ground truth shadow
labels.
The pixels that identified as shadows are then compared
with the shadow masks associated with each image. True
Positives T P are measured as the number of pixels inside
the mask. False Positives F P are measured as the number
of pixels outside the mask. False Negatives F N are measured as the number of pixels falsely located outside the
mask.
Our results are divided into two main groups. The first
group includes three comparisons using monochromatic
based features:
• different types of features
• different classification models
6
• different levels of over-segmentations
In the second group, we compared the performance between
monochromatic and chromatic based features.
For all the comparisons, we used the accuracy computed
by T P +FT PP +F N where T P is the true positive, F P is the
false positive and F N is false negative. Dropping the term
of the true negative can help us to understand the essential
performance difference between classifiers focused on the
shadows 4 .
Overall, our results show that features expressing illumination, textural and odd derivative characteristics can
successfully identify shadows. Our results show BDT integrated with LRF using two combined levels of segments
achieves highest recognition rate at 38.1, which is 6.9%
higher than LRF alone; 3.7% higher than Support Vector
Machine (SVM) and 2.2% higher than BDT. Our results
also show the chromatic features is superior than the
monochromatic feature, and their combination contributes
a 1.8% accuracy improvement.
It is also interesting to see that several combination of
the features, such as all variant features with edge and
all variant features wit Entropy and Edge, achieve best
accuracy in our dataset. We would like to point out that
although our dataset includes variance illumination and
places where shadow is casted, but they are not complete
and perfect. This leads to training results biased towards
some features such as the edge and the entropy. With more
training data available, we are expected to see results from
combining all variant features and invariants outperform all
the other feature combinations.
A. Quantitative Results from Monochromatic Features
Comparisons between different types of features In
this experiment, we test all the possible combination of
variant and invariant features. For variant features, we add
each feature at a time. Given all the variant features, we
add each invariant feature at a time. The combination gives
us totally 30 types of features.
For brevity, we train the classifiers at segment level with
a number of 500, and using both the per-pixel features
and the segment features. Per-pixel features is the feature
value computed using the method introduced in II-B for
each pixel. To generate the segment features, we histogram
the per-pixel features inside a segment into 10 bins, and
augmenting them by its mean and standard deviation. We
compare the results in Figure V-A and Table I.
Comparisons between different classification models
In this comparison, we compared results with Support
Vector Machine (SVM) using radial basis kernel function,
Boosted Decision Tree (BDT), Logistic Random Field
(LRF) and a combined approach (BDT+LRF). We obtained
around 30,000 training samples (segments) for SVM and
BDT, and we trained the classifiers using all the features
4 Note that in our
T P +T N
.
T P +F P +F N +T N
previous work [13], we report the accuracy using
We found it is an unfair comparison because of the
number of nonshadow pixels is several times the number of shadow pixels
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
with their histogram values. We set the number of nodes in
the decision tree to 40 and the initial gain step in the LRF
model is set to 0.01. It takes around 15 minutes to train the
BDT and around 10 hours to train the LRF in 500 iterations.
We set the iteration number to 500 because we found the
loss in Equation 8 decreases very slow afterwards.
We display their PR curves in Figure 8 (g). Our results
show that LRF alone performs worst with only 26.9% accuracy. We believe this is because linear classifier trained from
LRF performs poorly on complex conditional probabilities,
such as distinguishing shadows from near black objects. We
can also see SVM using single classification method can
only achieve 30.1% accuracy. BDT with over-segmentation
performs well with 31.6% accuracy. By enforcing the local
consistency, BDT+LRF achieves highest accuracy at 33.8%.
Comparisons between different levels
In this comparison, we did experiments to train the BTD
using different number of over-segmentations in an image.
We tested the number of 200, 500 and their combined
experiments. In the combined test, totally 83 features are
used in the LRF, and the number of parameters reaches to
2490. It takes around 24 hours to train the classifier using
our parallel version.
From the numerical results, we can see by combining
two levels’ over-segmentations, the accuracy reaches 38.1%
which is the highest accuracy among all the experiments.
We believe the single level approaches perform worse than
the combined approach because over-segmentation is not
perfect in each level. By incorporating multiple levels of
segment information, our approach achieves best results but
sacrifices in computing recourses.
B. Experiments with color-based features
In this section, we compare results from classifiers using
monochromatic and chromatic based features, respectively.
While preparing the monochromatic dataset, we keep a
local copy of a color version which is also available from
the project website. Our motivation of experimenting with
color features is aimed to answer two questions:
1) How well the chromatic classifier is compared with
the monochromatic classifier?
2) Does their combination perform better than either
chromatic or monochromatic classifier alone?
The chromatic-based features we selected are from an
illumination invariant model proposed by Finlayson etc [9]
for linear images. Essentially, [9] observed that illumination
changes in a specified 2D space tend to fall on parallel
straight lines for a given camera. Their results show that
this 2D space is a 2D band-ratio chromaticity color space,
B
log( G
R ), log( R ). The straight lines are observed as the
invariant directions while lighting is changed. The illumination invariant space is therefore defined as an orthogonal
direction to all the parallel lines. A gray scale shadow free
image can be obtained by projecting the 2D band-ratio
points into the orthogonal direction.
In our experiment, we choose three types of log chromatic features:
7
TABLE I
N UMERICAL C OMPARISON OF DIFFERENT METHODS
Variant
2 Variants
3 Variants
Invariant
2 Invariant
3 Invariant
Features
Intensity
Smoothness
Local max
Skewness
Intensity,Smoothness
Intensity,Local max
Intensity, Skewness
Smoothness, Local max
Smoothness, Skewnss
Local max, Skewness
Int,Sm,Lo
Int,Sm,Sk
Int,Lo,Sk
Sm,Lo,Sk
All variant
Variant,Texton
Variant,Gradients
Variant,Entropy
Variant,Edge
Variant,Tex,Gra
Variant,Tex,Ent
Variant,Tex,Edg
Variant,Gra,Ent
Variant,Gra,Edg
Variant,End,Edg
Variant,Tex,Gra,Ent
Variant,Tex,Gra,Edg
Variant,Tex,Ent,Edg
Variant,Gra,Ent,Edg
Variant,Invariant
BDT
25.1
23.2
24.9
10.8
25.3
26.0
32.7
28.2
21.5
33.2
28.1
31.3
33.0
32.5
32.4
27.8
32.4
32.0
36.1
29.3
27.8
32.9
32.3
31.4
35.5
29.0
30.3
30.9
31.3
31.9
BDT+LRF
30.3%
13.9%
18.4%
18.2%
28.6 %
30.0%
34.9%
31.8%
20.0%
33.1%
28.6 %
30.0%
34.9%
34.9%
20.0%
35.3 %
36.7 %
37.0 %
40.2 %
35.3 %
30.0 %
37.0 %
36.3 %
33.9 %
40.1%
32.8 %
33.0 %
33.4 %
37.0 %
33.8 %
TABLE II
N UMERICAL C OMPARISON
OF DIFFERENT METHODS . N OTE THAT THE
DATASET USED IN COLOR TEST IS SIMPLER THAN IN MONOCHROMATIC
TEST.
Models
Levels
Color
Method
Liner SVM
Boosted Decision Tree
Logistic Random Fields
BDT+LRF
200 Segments
500 Segments
Combine 200 and 500 Segments
Monochromatic
Chromatic
Chromatic+Monochromatic
Accuracy
30.1%
31.6%
26.9%
33.8%
28.7%
33.8%
38.1%
45.6%
49.6%
52.4%
* log values for R,G,B channels.
B
* log values for band-ratio, log( G
R ), log( R ).
* A number of sampled directions that generates the gray
scale shadow free image.
The first two types of the features are shadow variant features. The third type of the features includes both shadow
invariant and shadow variant features. The shadow invariant
features are the true directions (or almost true) orthogonal
to the parallel straight lines denoting illumination invariant
space. We are expected to see consistent properties across
the shadow boundary from these features. The shadow
variant features are obtained from the false directions that
are not actually orthogonal to the parallel straight lines.
We are expected to see inconsistent properties across the
shadow boundary from these features.
Because the camera response from different digital cameras are not same, the orthogonal direction for each camera
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
1 Variant
8
2 Variants
50
3 Variants
50
BDT
BDT+LRF
45
50
BDT
BDT+LRF
45
40
40
40
35
35
35
30
30
30
25
25
25
20
20
20
15
15
15
10
10
10
5
5
5
0
0
1
2
3
4
BDT=21, BDT+LRF=20.2
1
Variant+1 Invariant
2
3
4
5
BDT=27.85, BDT+LRF=29.7333
6
0
3
4
5
BDT=31.46, BDT+LRF=29.3
50
BDT
BDT+LRF
45
40
40
35
35
35
30
30
30
25
25
25
20
20
20
15
15
15
10
10
10
5
5
5
0
0
1
2
3
4
5
BDT=31.5333, BDT+LRF=35.4333
BDT
BDT+LRF
45
40
2
3
4
BDT=32.075, BDT+LRF=36.275
2
Variant+3 Invariant
50
BDT
BDT+LRF
1
1
Variant+2 Invariant
50
45
BDT
BDT+LRF
45
6
0
1
2
3
4
5
BDT=30.68, BDT+LRF=33.92
Fig. 7. Comparison of different feature combination. It is interesting to see that LRF works poorly for single variant features, such as smoothness and
local max. Results also show by combining variant feature with two invariant features, both BDT and BDT+LRF achieve best accuracy. The highest
score achieved is 40.2% using all variant features and edge features. A combination of all variant and invariant features achieves 31.9% and 33.8%
using BDT and BDT integrated to LRF, respectively.
is also different [9]. Therefore, we only select the images
captured by Canon Digital Rebel XTi from our dataset to
make a fair comparison.
We train the LRF model without including neighbors’
dataterm in a 3 × 3 neighborhood as we did in monochromatic feature comparison. We believe reducing the number
of features and focusing on the LRF model (without BDT
outputs ) can help us to understand the central problem
of finding the essential difference between monochromatic
and chromatic classifiers.
In the chromatic test, we sampled 36 directions (the
angles are from 0 − 180 degree, each interval is 5 degree),
and trained totally 2 ∗ 39 + 12 ∗ 39 = 546 features. In the
monochromatic test, we trained 2∗8+12∗8 = 112 features.
In addition, we also trained a system by concatenating all
the features which gives us 2 ∗ 47 + 12 ∗ 47 = 658 features.
We plot their ROC curves in Figure 8 (h).
We can see from the result that chromatic classifier
performs better than monochromatic classifier, with an improvement of recognition accuracy around 4%. By combining them, we obtain another 3.8% accuracy improvement.
Note that the accuracy from this test is higher than that
from previous two tests. This is because the shadows from
our captured dataset (using a DLSR camera) is more easily
observed. And this leads to a better recognition accuracy
both in training and testing.
C. Qualitative Results of Monochromatic Classifier
We present part of the qualitative results from our
monochromatic tests in Figure 8. Note that to avoid presenting the duplicated visualized results, we show complimentary results to our previous work [13] in this Figure. Similar
results presented in our previous work are also obtained in
the new dataset. We set the threshold to 0.5 when obtaining
the binary estimates. For brevity, we only present the results
using combined levels of segments for BDT+LRF.
The first case shows a simple scene with shadows casted
on the grass. The probability map from BDT is unsurprisingly inconsistent because BDT treats each segment
individually. The result from LRF model alone is neither
perfect as we can see some misclassified pixels on the grass
and near object boundaries.
The second and third cases show two examples of objects
with dark albedo. We can see the probabilities of such
objects being shadows from LRF alone are falsely as
high as real shadows. Results from BDT are also not
acceptable as thresholding the correct shadows from all the
probabilities is very difficult.
The fourth and fifth cases show two complex examples.
We can see LRF model can not distinguish shadows from
trees. BDT does good job on separating shadows from trees,
but the labeling is inconsistent.
From all these test examples, we can see our combined
approach performs better, in terms of both accuracy and
consistency, than either of BDT or LRF model alone.
Nevertherless, we found that water is the most difficult
object to be misclassified as shadows. We show two examples from very challenging case in the last row of Figure 8.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
(a) Reference
(b) LRF
9
(c) SVM
(d) BDT
1
0.9
0.9
0.8
0.7
BDT
LRF
BDT+LRF
SVM
0.6
0.5
0
0.1
0.2
0.3
False positive rate
(g) Learning models
(f) Ground Truth
ROC curves
1
True positive rate
True positive rate
ROC curves
(e) BDT+LRF
0.4
0.8
0.7
Color
Color+Gray
Gray
0.6
0.5
0.5
0
0.1
0.2
0.3
False positive rate
0.4
0.5
(h) Chromatic and monochromatic features
Fig. 8. Qualitative and quantitative results. The visualized results ((a) to (f)) are compared from the LRF model, Linear Support Vector Machine
(SVM), Boosted Decision Tree (BDT) and combined approach (BDT+LRF). The learned classifier are all using monochromatic based features. The
last row shows the result from a very challenging case. The water is flowing from cement. The shadows are the darkest part cast by the cement in the
water. Quantitative comparisons are from different types of features and different types of learning models. In (g), we compare results from different
learning models using the variant+invariant+bin monochromatic features. In (h), we compare results from chromatic-based and monochromatic-based
features. To be more broadly present our work, the results presented in the paper is complimentary to our previous work [13].
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
10
In the test, shadows in this scene are casted on water, which
is a transparent media and also appears black. In the second
overhead scene, all the methods introduced in the paper
tend to falsely label the water as shadows. We conclude
that the optical properties of classify shadows from the
water are not captured well using our proposed features.
We believe this is a field for further research.
(a) reference
(b) probability
(c) line-sample
Fig. 9. Example of line segment sampling from shadow probability map
(b). The line segments are sampled both in horizontal (in red) and vertical
(in blue) cases in the log derivative to (a).
VI. S HADOW R EMOVAL IN M ONOCHROMATIC
NATURAL I MAGES
In this section, we introduce our approach to remove
the recognized shadow in monochromatic natural images.
Our method is based on the observation that illumination,
caused from shadow, across shadow boundary is smoothly
changed. We find that in the gradient domain, this leads
to a Gaussian approximation to cancel the shadows. We
also find that a Gaussian distributed assumption of the
underlining texture across shadow boundary is helpful to
obtain high quality shadow-free images.
A. Linear Shadow Model
We assume that all the images for shadow removal
are taken in the raw format using DSLR cameras. We
introduce two assumptions that make shadows separable
from underlining surfaces in a linear model.
We make our first assumption that the input shadow
image, I(x, y), can be expressed as the product of a
reflectance image, R(x, y), and the shadow image, S(x, y)
I(x, y) = R(x, y) × S(x, y)
(11)
The reflectance of a scene is the material of the surfaces
in the scene, and the shadow of a scene refer to the
illumination that is occluded from objects in the scene.
Considering the images in the log domain, the derivatives
of the input image are the sum of the derivatives of the
reflectance and the illumination. We find that significant
shadow boundaries and reflectance edges is unlikely occur
at the same point. Although this assumption is not consistent to occlusion shadows, where edges indicate both a
shadow and a material edge [8], we find such cases are
actually very few in natural scenes.
Consequently, we make the second assumption that every
image derivative is either caused by shadow or reflectance.
A similar assumption of shading and reflectance is reported
in [32]. This reduces the problem of assuming that the
image’s x and y derivative changes on the shadow boundary
are mainly from illumination changes, in our case, the
shadows.
B. Parametric Illumination Change Model
We parameterize the linear shadow model for each pixel
across the shadow boundary by sampling horizontal and
vertical line segments to model the illumination change
△I(x, y). Line segments provides easily parameterizable
illumination profile, which is also used in [18]. Figure 9
shows an example of the sampled line segments.
The difference between our approach and [18] is that
we model △I(x, y) in the log derivative domain instead
of the RGB domain as we observed that a simple Gaussian
model, with fewer parameters, can well describe the shadow
canceling process by favoring the smoothness inside a
neighborhood. Therefore, our approach actually models
▽(△I(x, y)), the derivative of the illumination change
model.
To sample the line segments, we first use Canny edge
detection method to locate the shadow edge from a binary
version of the shadow probability map obtained by the
recognition process. And then, we dilate p pixels inward
and outward given the Canny edge map and obtain boundary that is sufficiently across the shadow boundary. The
dilation parameter p is chosen based on three factors:
the position of the occlusion object, the direction of the
illumination and the image size. In our experiments, we set
p = 10 for all the tests5 .
C. Monochromatic Shadow Removal
To cancel the monochromatic shadow effect in the gradient domain, we first fit a local Gaussian function, for
each line segment, to simulate illumination changes. To
further preserve the underlining texture across the shadow
boundary, we formulate a Markov Random Field (MRF)
model to globally optimize the parameters of the Gaussian
function. Although other advanced techniques, such as
Belief Propagation, Graph Cut and Variational Model can
also be used to optimize the parameters, we find standard
gradient descent works also well.
Shadow Canceling
For each line segment, the Gaussian function, f , is fitted
by the gradients’ difference to the mean
−(x − u)2
)
(12)
2 ∗ σ2
where u is
∑the center of the derivative values computed
as u = n1 i (yi ∗ xi ). yi and xi forms a local coordinate
system with xi denotes the line segment’s horizontal coordinates and yi denotes its vertical coordinates. n is the total
number of the pixels in the line segment. σ is the standard
deviation of the distribution and A is a constant scaling
the whole distribution. We initialize A = 0.1 and σ = 0.1.
We show two examples of the fitted Gaussian in Figure
f = A ∗ exp(
5 We also experiment other techniques, such as setting p automatically
by doing a matting approach on the shadow boundary. But we found the
automatically results are comparable to a tuned parameter.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
11
0.1
0.4
0
0.2
−0.1
0
−0.2
−0.2
−0.3
0
5
10
15
Gauss Fit Line 61
20
25
0
(a) hard-boundary
5
10
15
Gauss Fit Line 61
20
(b) soft-boundary
Fig. 10. Example of fitting two Gaussians using sampled line segment
accross the shadow boundary. (a) shows an example of Gaussian fitting on
a hard (shadow is strong) shadow boundary strong. (b) shows an example
of Gaussian fitting on a soft shadow boundary. Soft shadows are often the
shadows casted on rich textured regions, such as grass.
VI-C. Note that the signs of these two Gassusian peaks are
different because the sign of the gradients across shadow
boundary from shadows to non-shadows are different. They
are either positive or negative.
Texture Preserving
To preserve the texture across the shadow boundary after
canceling the shadows, we assume the texture distribution
is a Gaussian distribution in the gradient domain. We
formulate a Markov Random Field (MRF) to optimize the
parameters A, u, σ of the Gaussian function f by forcing
the texture distribution as a Gaussian. In the MRF, each
line segment is viewed as a node, and its two nearest line
segments are edges. The energy function is formulated as
∑
∑
E=
ϕ(y, f ) + λ
φ(A, u, σ)
(13)
i
<i,j>
where y is the observed derivative values. The data term ϕ
is designed as a function to measure the texture distribution
of the canceled derivatives y − f .
For natural images, this texture distribution can be well
expressed as a Gaussian distribution with mean u0 and
standard deviation σ0 .
ϕ(y, f ) = exp(
−(y − f − u0 )2
)
2 ∗ σ02
(a) Local and Enlarged
(b) Global and Enlarged
Fig. 11. Comparison of shadow removal results using local Gaussian fitting and global MRF optimization. (c) shows visual artifcats of saturation
on some regions.
texture preserving refinement. We can see the global approach can generate more natural and visual pleasing result
by preserving the textures correctly.
We summarize all the steps in Algorithm VI-C.
Algorithm 1 Gaussian-based Shadow Removal
1) Thresholding the shadow probability map returned from the
recombination process. The threshold is set to 0.1.
2) Detecting the edges from the binary map.
3) Dilate and erode the edge map to sufficiently include
shadow boundary. The number of dilation and erosion
pixels is set to 10.
4) For each pixel on the edge map, we sample one horizontal
line segment and one vertical line segment from the boundary map.
5) In the log domain, for each line segment, we fit a Gaussian
function given their gradients.
6) We formulate a MRF model to globally refine the underling
texture by forcing its gradient as a Gaussian distribution and
penalty the consistency in a neighboring area.
7) Finally, we recover the shadow-free image by solving a
Laplace equation given the shadow canceled derivatives.
(14)
In our approach, u0 , σ0 are calculated using additional 6
pixels by extending the line segment on both sides.
The smoothness term φ is simply a summation of L2
function that penalties consistency within neighboring area
for each parameter A, u, σ. The weight from the smoothness term λ is set to 0.1 in all the tests.
We minimize the Equation (13) by standard gradient
descent method. Although this method does not guarantee global optimum, we find high quality results can be
obtained within 500 iterations of gradient updates.
In all our tests, to recover the shadow-free image, we
re-integrate the canceled derivatives y − f (note that we do
horizontal and vertical derivative canceling separately) by
iteratively solving a Laplace equation given the horizontal
and vertical derivatives for 5000 times.
Figure 11 shows a comparison of the recovered shadowfree image using local Gaussian fitting method and global
D. Results
We show part of the shadow removal results in Figure
14. The first four rows show results from the images taken
by a Cannon Digital XTI camera, with a lens size of 50mm,
and the last two rows show results from the overhead image
library. We can see, overall, the local method works well
while the global method performs better than local method.
We also see saturation artifacts exist in local results ( in 1rd
and 2rd rows). We believe the artifacts near the right corner
of 1rd row is because of the imperfection line sampling due
to the acute angle.
It is known that typical digital photos have more noise in
shadow (dark) areas. This artifact can be clearly seen from
two overhead images showed in Figure 14. By successfully
removing the shadows, we show that the recovered shadowfree images are more semantically useful – missing context
such as the cars is enhanced.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
(a) Shadow
(b) Probability
(c) Recovered
(d) Raw dx
(e) Boundary
(f) Recovered dx
Fig. 12. Results of shadow removal. The first row shows an example of
removing shadows from both shadow and material edges. The second row
shows a failed example where the recognition system falsely gives low
scores to the strip inside shadows. This causes the system falsely canceling
the derivatives to the white strip which is wrongly appeared black in the
recovered result.
One interesting result is shown in the first row in Figure
12 where shadow and reflectance edge are very complex.
Although this may violate our assumption that significant
shadow boundary and reflectance edges are unlikely happen
at the same point, the recovered result is still acceptable.
Although our shadow removal approach works well for
most of the scene, it still relies on the shadow recognition
results, especially the correctness of the shadow boundary.
We show a failed example in the second row in Figure 12.
12
changes, in some channels, caused by shadows from color
images is more sever compared with that from monochromatic images, a linear combination of all the channels. We
show a comparison of the horizontal x derivatives of a
textured scene from its color and monochromatic version
with ground truth data.
To obtain the ground truth data, we fix the camera, and
quickly capture two scenes with/without shadows using a
remote controller. The shadows casted on the grass are
from a plane board occluding the sun light. We choose
the derivatives inside shadows and histogram them into 50
bins. The distance between histograms from shadows and
ground truth are measured using Earth Mover’s Distance
[26]. We show x’s derivative results in Figure 13. We find
results from y’s derivaties are very similar.
We can see derivative distortion inside shadows from
monochromatic images are actually smaller compared to
that from a color version. We believe this makes our shadow
removal algorithm focus on shadow boundary while leaving
the interior unconcerned a success.
VII. C ONCLUSIONS
We have proposed a system for recognizing and removal
shadows from monochromatic natural images. Using the
cues presented in this paper, our method can successfully
identify shadows in real-world data where color is unavailable. We found that while single-pixel classification
strategies work well, a Boosted Decision Tree integrated
into a CRF-based model achieves the best results. We also
found that a simple assumption of Gaussian illumination
changes in the gradient domain across the shadow boundary
can help to remove the shadows.
Acknoledgement
The authors thank Edward H. Edelson, Graham Finlayson, Dani Lischinski and Srinivasa Narasimhan for helpful discussions. M.F.T. was supported by a grant from the
NGA (HM1582-08-1-0021).
E. Discussion
Most of earlier shadow removal algorithms zero the
derivatives of pixels and add a constant scale to the intensities enclosed within the shadow boundary [10]. These
approaches produce good results when the shadow boundary is sharp, but often it falsely nullifys textures on curved
and textured surfaces for complex natural scenes.
A number of recent shadow removal algorithms have
been proposed to preserve the textures using color [37],
[30], gradient [22], [18], second order smoothness [2]
or intrinsic image [9]. These approaches work well and
descent results are demonstrated.
Losing chromatic information makes shadow removal
more challenging because approaches rely on colors to
find affinity between shadows and non-shadows, or seeking
an illumination invariant space where intrinsic image can
be recovered can not be used. However, monochromatic
shadows do have some special properties that are useful for
shadow removal. One observation we find is that gradient
R EFERENCES
[1] E. Adelson. Image statistics and surface perception. In Human Vision
and Electronic Imaging XIII, Proceedings of the SPIE, Volume 6806,
pages 680602–680609, 2008. 3
[2] E. Arbel and H. Hel-Or. Texture-preserving shadow removal in color
images containing curved surfaces. In CVPR, 2007. 1, 12
[3] V. Arévalo, J. González, and G. Ambrosio. Shadow detection
in colour high-resolution satellite images. Int. J. Remote Sens.,
29(7):1945–1963, 2008. 1
[4] M. Bell and W. Freeman. Learning local evidence for shading and
reflection. In ICCV, 2001. 1, 2
[5] M. Collins, R. E. Schapire, and Y. Singer. Logistic regression,
adaboost and bregman distances. In MACHINE LEARNING, pages
158–169, 2000. 5
[6] R. Dror, A. Willsky, and E. Adelson. Statistical characterization of
real-world illumination. Journal of Vision, 4(9):821–837, 2004. 2
[7] M. A. T. Figueiredo. Bayesian image segmentation using gaussian
field priors. In Energy Minimization Methods in Computer Vision
and Pattern Recognition, 2005. 4
[8] G. Finlayson, C. Fredembach, and M. S. Drew. Detecting illumination in images. In ICCV, 2007. 10
[9] G. D. Finlayson, S. D. Hordley, C. Lu, and M. S. Drew. On the
removal of shadows from images. IEEE Trans. Pattern Anal. Mach.
Intell., 28(1):59–68, 2006. 1, 7, 8, 12
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
(a) Color Reference
0.14
13
(b) Color Shadows
(c) Gray Reference
0.12
Shadow
Ground Truth
0.12
0.1
0.12
Shadow
Ground Truth
0.1
(d) Gray Shadows
0.12
Shadow
Ground Truth
0.1
Shadow
Ground Truth
0.1
0.08
0.08
0.08
0.06
0.06
0.06
0.04
0.04
0.04
0.02
0.02
0.02
0.08
0.06
0.04
0.02
0
−1.5 −1.2 −0.9 −0.6 −0.3 0 0.3 0.6 0.9 1.2
ch=R emd=
1.0686e−06
(e) Red
0
−1.6 −1.3 −0.9 −0.6 −0.3 0.1 0.4 0.7
ch=G emd=
5.0941e−06
1
1.4
(f) Green
0
−1.8 −1.5 −1.1 −0.7 −0.3 0 0.4 0.8 1.2 1.6
ch=B emd=
1.5489e−05
(g) Blue
0
−1.5 −1.2 −0.9 −0.6 −0.3 0.1 0.4 0.7
Gray emd=
3.2579e−06
1
1.3
(h) Gray
Fig. 13. Comparison of texture distortion from color shadows and monochromatic shadows. The horizontal values are the horizontal gradients of the
log channels. The vertical values are the normalized count in each bins. The texture distortion from the blue channel is more sever because the images
are taken in outdoor environment under the sun.
[10] C. Fredembach and G. Finlayson. Hamiltonian path based shadow
removal. In BMVC, pages 970–980, 2005. 12
[11] A. K. Frederick, C. Beauce, and L. Hunter. Colour vision brings
clarity to shadows. Perception, 33(8):907–914, 2004. 1
[12] M. Isamu, N. Shin’ya, S. Lavanya, and A. Edward H. Image statistics
and the perception of surface qualities. Nature, 447:206–209, 2007.
3
[13] Z. Jiejie, S. Kegan G.G., M. Syed Z., and T. Marshall F. Learningbased shadow recognition from monochromatic natural images.
CVPR, 2010. 2, 6, 8, 9
[14] A. Joshi and N. Papanikolopoulos. Learning to detect moving
shadows in dynamic environments. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 30(11):2055–2063, 2008. 1
[15] J. Kepner. Matlabmpi. J. Parallel Distrib. Comput., 64(8):997–1005,
2004. 6
[16] D. Kersten, D. Knill, P. Mamassian, and I. Bülthoff. Illusory motion
from shadows. Nature, 379:31, 1996. 1
[17] M. D. Levine and J. Bhattacharyya. Removing shadows. Pattern
Recogn. Lett., 26(3):251–265, 2005. 1
[18] F. Liu and M. Gleicher. Texture-consistent shadow removal. In
ECCV, pages 437–450, Berlin, Heidelberg, 2008. Springer-Verlag.
10, 12
[19] N. Martel-Brisson and A. Zaccarin. Learning and removing cast
shadows through a multidistribution approach. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 29(7):1133–1146, 2007.
1
[20] D. R. Martin, C. C. Fowlkes, and J. Malik. Learning to detect natural
image boundaries using local brightness, color, and texture cues.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
26(5):530–549, 2004. 2, 3
[21] K. McHenry, J. Ponce, and D. Forsyth. Finding glass. In CVPR,
pages 973–979, Washington, DC, USA, 2005. IEEE Computer
Society. 3
[22] A. Mohan, J. Tumblin, and P. Choudhury. Editing soft shadows in a
digital photograph. IEEE Comput. Graph. Appl., 27(2):23–31, 2007.
12
[23] S. Nadimi and B. Bhanu. Physical models for moving shadow and
object detection in video. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 26(8):1079–1087, 2004. 1
[24] J. Rittscher, J. Kato, S. Joga, and A. Blake. A probabilistic
background model for tracking. In ECCV, pages 336–350, 2000.
1
[25] G. Roger, J. Micah, A. Edward, and F. Willam. Ground truth
dataset and baseline evaluations for intrinsic image algorithms. page
23352342, Washington, DC, USA, 2009. IEEE Computer Society. 2
[26] Y. Rubner, C. Tomasi, and L. J. Guibas. A metric for distributions
with applications to image databases. In ICCV ’98: Proceedings
of the Sixth International Conference on Computer Vision, page 59,
Washington, DC, USA, 1998. IEEE Computer Society. 12
[27] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman.
Labelme: A database and web-based tool for image annotation. Int.
J. Comput. Vision, 77(1-3):157–173, 2008. 2
[28] E. Salvador, A. Cavallaro, and T. Ebrahimi. Cast shadow segmentation using invariant color features. Comput. Vis. Image Underst.,
95(2):238–259, 2004. 1
[29] L. Shen, P. Tan, and S. Lin. Intrinsic image decomposition with
non-local texture cues. In CVPR, 2008. 1
[30] Y. Shor and D. Lischinski. The shadow meets the mask: Pyramidbased shadow removal. Comput. Graph. Forum, 27(2):577–586,
2008. 1, 12
[31] F. Tanner, B. Colder, C. Pullen, D. Heagy, C. Oertel, and P. Sallee.
Overhead imagery research data set (oirds) an annotated data library
and tools to aid in the development of computer vision algorithms,
2009. 2
[32] M. Tappen, W. Freeman, and E. Adelson. Recovering intrinsic
images from a single image. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 27(9):1459–1472, 2005. 1, 2, 10
[33] M. Tappen, K. Samuel, C. Dean, and D. Lyle. The logistic random
field – a convenient graphical model for learning parameters for
mrf-based labeling. In CVPR, 2008. 4
[34] Y. Wang, K. Loe, and J. Wu. A dynamic conditional random field
model for foreground and shadow segmentation. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 28(2):279–289, 2006.
1
[35] Y. Weiss. Deriving intrinsic images from image sequences. In ICCV,
pages 68–75, 2001. 1, 2
[36] J. Winn and J. Shotton. The layout consistent random field for
recognizing and segmenting partially occluded objects. In CVPR,
pages 37–44, 2006. 5
[37] T. P. Wu, C. K. Tang, M. S. Brown, and H.-Y. Shum. Natural shadow
matting. ACM Trans. Graph., 26(2):8, 2007. 12
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , 2016
(a) Shadow Images
(b) Probability Map
14
(c) Local Method
(d) Gloabl Refinement
Fig. 14. Results of shadow removal from monochromatic images. We can see while the local method works well, a global method achieves better
results. This can be observed by comparing resutls from first two rows. It is also interesting to see that more information can be observed by removing
the shadows in the last two images from overhead database. see that after shadow removal the
© Copyright 2026 Paperzz