Are we able to distinguish color attributes?

Are We Able to Distinguish Color
Attributes?†
M. Melgosa,1* M. J. Rivas,1 E. Hita,1
F. Viénot2
1
Departamento de Óptica, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain
2
Muséum National d’Histoire Naturelle, Laboratoire de Photobiologie, 75005 Paris, France
Received 20 March 1999; revised 29 January 2000; accepted 25 March 2000
Abstract: The perception and understanding of the three
color attributes have been analyzed from two experiments
using pairs of Munsell samples, where only one of the three
color attributes were changed/unchanged (Experiment I/II)
at a time. In each experiment, 36 pairs with color differences of 3 different sizes (average values of 15.8 and 21.7
CIELAB units for Experiments I and II, respectively) were
assessed under standardized conditions by 40 normal observers, 20 of them with previous knowledge and experience
in colorimetry. At a 95% confidence level, the results from
the two experiments were not significantly different, indicating that color attributes were not easily distinguished:
for example, for experienced observers, the percentage of
correct answers for identifying the color attribute responsible for a color difference was only 72.4%, the random
probability being 33.3%. There were no significant differences between the results found by men and women. The
worst distinguished attribute was Chroma, that is, the least
frequent confusion was between Hue and Value or vice
versa. Value differences were more easily detected for achromatic than for chromatic pairs, both for experienced and
inexperienced observers. With respect to the size of the
color differences, we observed that large hue differences
were more easily identifiable than smaller ones, and a
constant Hue was more identifiable when the entire color
difference was small. © 2000 John Wiley & Sons, Inc. Col Res Appl,
25, 356 –367, 2000
* Correspondence to: Dr. Manuel Melgosa, Departamento de Óptica,
Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain
(e-mail: [email protected]) Contract grant sponsor: Ministerio de
Educación y Ciencia (España); contract grant number: PB96-1454.
†
The preliminary results of this work were presented at the 1998 Annual
Meeting of the Optical Society of America (Proc. 1998 OSA Annual
Meeting, Paper MDD7, p. 73).
© 2000 John Wiley & Sons, Inc.
356
Key words: colorimetry; color vision; color differences;
Munsell; CIELAB
INTRODUCTION
The three basic perceptual attributes of color, usually designated as lightness, hue, and saturation, are generally understood and differentiated through their definitions in classical textbooks on colorimetry,1-4 as well as through the
examples provided by several color-order systems, where
samples vary along scales for each of these attributes (e.g.,
the Munsell Book of Color5). These perceptual attributes are
generally presented as being independent.
As it is well known, when presented with a pair of color
stimuli, our visual system can accurately judge equality or
inequality within the limits of color-discrimination thresholds. In the case of equal stimuli, the visual system does not
achieve the distinction of the diverse spectral distributions
leading to the same color (metamerism). In the case of
unequal stimuli, the color difference can be assumed a
priori to be the result of disparities in one or more of the
aforementioned attributes, but it seems that it is not easy to
identify which attribute(s) might be involved, and in what
proportions. In short, it is usually said that the human visual
system acts as a zero instrument (detection of equality/
inequality),6 but it does not have analytical properties,
among which should probably be included an accurate
discerning of the different attributes responsible for a given
color difference.
In any case, it might be asked to what extent, as a result
of a process that is both perceptive and cognitive, we can
distinguish whether a given color difference is due to lightness, hue, or saturation. It is reasonable to assume that at
least observers who are experienced in color discrimination
tasks (or normal observers with a reliable degree of training)
COLOR research and application
can distinguish each of the three color attributes in simple
experiments such as those described here, where only one of
the three attributes is varied (or held constant). It has been
reported in the literature7 that variations in lightness can be
caused by changes in hue or saturation (Helmholtz–Kohlrausch effect), and variations in hue can be caused by
changes in lightness (Bezold–Brücke effect) or in colorimetric purity (Aubert–Abney effect). The present experiments have not been designed to study such effects, which
were taken into consideration during the renotation of the
Munsell system, but we cannot rule out the influence of
these effects in the possible confusion of the three color
attributes by our observers.
In the following section, we examine the design of two
experiments carried out in order to answer the question
addressed in the title of this work: are we able to distinguish
color attributes? This is a question that could be considered
interesting within the fundamentals of colorimetry.
MATERIALS AND METHODS
Two experiments were performed, in each of which 36 pairs
of glossy samples from the Munsell atlas5 were observed.
This atlas was chosen, because it is one of the most widely
used color-order systems,8-10 in which the three color attributes are illustrated by different scales with constant
perceptual steps. In addition, the samples in this atlas are
rigorously controlled for quality over the different scales.
Following the designation of the scales in the Munsell
system, henceforth the three color attributes (lightness, hue,
and saturation) will be designated as Value, Hue, and
Chroma, respectively.
In our first experiment (Experiment I), samples of each
pair differed in only one of the color attributes, while in the
second experiment (Experiment II) samples of each pair had
only one of the color attributes identical. The 36 pairs of
samples of each experiment were distributed into 4 groups
around the Hues 5Y, 10G, 5PB, and 5RP, with 9 sample
pairs per group. In Experiment I, in each group, 3 pairs
varied only in Value, 3 pairs varied only in Chroma, and 3
pairs varied only in Hue, the variation differing in magnitude within each series. Experiment II followed an analogous design, having also 4 groups of 9 sample pairs, 3
having the same Value, 3 the same Chroma, and 3 the same
Hue (the color difference in each case being of a different
magnitude due to the two attributes that varied at the same
time, and along the same proportion in Munsell scales, in
each pair). Figure 1 illustrates the procedure followed for
the selection of pairs for Experiment I, indicating, as an
example, the 9 pairs selected for the group around the Hue
10G (a similar procedure was used in the selection of
samples for Experiment II). The fact that we had only one
Munsell atlas for our experiments implied certain limitations in the selection of samples, including that, chronologically, Experiment II had to be performed after Experiment
I was finished.
The 9 pairs of samples from each group were randomly
Volume 25, Number 5, October 2000
FIG. 1. Schematic of the criteria used to select the sample
pairs analyzed in Experiment I (Hue 10G). The pairs present
differences in only one of the three color attributes: (upper
left) Value, (upper right) Chroma, or (bottom) Hue. In each
case, the color differences are of three different magnitudes
(pairs designated 1, 2, and 3).
positioned over gray cardboard backgrounds (10.0 cm
width ⫻ 14.3 cm height), five pairs on the left column and
four pairs on the right column, allowing the cardboard to be
held by the observer using the free lower right corner. The
two samples of each pair were juxtaposed (without separation) subtending a visual angle of 4.6° ⫻ 7.6° for an
observer at a distance of 25 cm, the minimum separation
between pairs being 0.7 cm. The sample pairs were viewed
simultaneously inside a Verivide cabinet CAC120 with a
D65 luminous source, and their colors were measured by a
SpectraScan PR-704 spectroradiometer of Photo Research,
under identical conditions to those used for visual judgments, assuming the CIE 1964 Standard Observer. The
CIELAB coordinates of the gray cardboard background on
which the sample pairs were presented were a* ⫽ ⫺1.3 ;
b* ⫽ ⫺3.0 ; L* ⫽ 62.2. The reference white used to obtain
357
the CIELAB coordinates was a pressed barium-sulphatepowder plaque supplied with the spectroradiometer and
measured under the same experimental conditions as the
samples.
For each sample pair, the observer was simply asked
which color attribute was different in Experiment I, and
which color attribute was identical in Experiment II. The
observers were not informed concerning the experimental
design (e.g., they did not know that in the 9 sample pairs of
each group there were 3 pairs in which each one of the three
color attributes was changed/unchanged at a time). Observers judged the 9 pairs in a given group in their own order
and without a time limit, the responses being recorded by
the researcher. Although only one pair was judged at a time,
the 9 pairs were displayed simultaneously to provide the
observer an overall idea of the experiment and facilitate the
proposed task. The order of presentation of the 4 groups to
the different observers was random.
Experiment I was conducted with 40 observers having
normal color vision according to the Ishihara test,11 of
which 20 could be considered experienced and 20 inexperienced. The observers designated as experienced had extensive knowledge of the fundamentals of colorimetry (13
of them had earned a Ph.D. working on different aspects of
colorimetry), and had previously carried out research in this
field (7 observers had measured their own color-matching
functions at 10 nm intervals using a visual colorimeter, and
13 had measured color discrimination thresholds), being
professors in our Optics Department. The observers considered as inexperienced, students in the same Department,
were instructed individually by the researcher before conducting each experiment, with the help of the definitions of
the color attributes1-4 and with the observation of the complete Munsell atlas,5 until the observer appeared to fully
understand the objective of the experiment.
In Experiment I, 26 males and 14 females participated
(the subgroup of inexperienced observers contained 10
males and 10 females). To analyze the consistency of the
results from Experiment I, it was repeated by one subgroup
of the observers (8 experienced and 8 inexperienced). From
the results of this replicated experiment, we decided that for
Experiment II it should suffice to use only 20 observers (12
males and 8 females) with normal color vision, half experienced and half inexperienced. To test different questions,
the next three complementary experiments were also performed: (1) Experiment I was replicated by 20 observers (10
experienced and 10 inexperienced), in order to analyze the
usefulness of an anchor pair to increase the percentage of
correct answers; (2) both Experiments I and II were replicated by two of the authors of this article, who could be
considered as highly trained for the proposed task; (3) the
detection of the attribute Value for color pairs with low and
high Chroma was specifically tested for a group of 20
observers (half experienced and half inexperienced). All
together, in this work, a total of 4248 visual judgments were
made.
As we can see, both the layout of both experiments and
358
TABLE I. Munsell notation for each of the 36 sample
pairs analyzed in Experiment i. For each pair, the color
difference is indicated in CIELAB units (⌬E*ab), as well
as the color attribute that varies in each pair (and its
contribution, in percentage, to the total color difference). The last column shows the percentage of correct responses for the 40 observers participating in
this experiment.
EXPERIMENT I
⌬E*ab
Variable Attribute
(% to ⌬E*ab)
% Correct
Answers
5Y 5/2-5Y 6/2
5Y 5/4-5Y 7/4
5Y 5/6-5Y 8/6
2.5Y 5/6-2.5Y 5/8
2.5Y 6/6-2.5Y 6/10
2.5Y 7/6-2.5Y 7/12
5Y 5/8-7.5Y 5/8
5Y 6/8-10Y 6/8
5Y 7/8-2.5GY 7/8
9.7
19.0
28.5
9.6
23.2
39.7
6.1
9.4
15.2
Value (99.8%)
Value (99.7%)
Value (98.8%)
Chroma (99.4%)
Chroma (99.9%)
Chroma (99.9%)
Hue (93.8%)
Hue (97.4%)
Hue (99.8%)
75.0
57.5
35.0
60.0
42.5
20.0
42.5
55.0
87.5
10G 4/2-10G 5/2
10G 4/4-10G 6/4
10G 4/6-10G 7/6
7.5G 4/4-7.5G 4/6
7.5G 5/4-7.5G 5/8
7.5G 6/4-7.5G 6/10
10G 5/8-2.5BG 5/8
10G 6/8-5BG 6/8
10G 7/8-7.5BG 7/8
9.0
19.1
27.3
11.7
18.7
27.8
4.6
9.7
12.6
Value (98.3%)
Value (99.3%)
Value (99.8%)
Chroma (99.9%)
Chroma (99.9%)
Chroma (99.9%)
Hue (98.6%)
Hue (99.7%)
Hue (95.6%)
77.5
57.5
52.5
52.5
62.5
55.0
27.5
60.0
85.0
5PB 5/2-5PB 6/2
5PB 5/4-5PB 7/4
5PB 5/6-5PB 8/6
2.5PB 3/6-2.5PB 3/8
2.5PB 4/6-2.5PB 4/10
2.5PB 5/6-2.5PB 5/12
5PB 5/8-7.5PB 5/8
5PB 6/8-10PB 6/8
5PB 7/8-2.5P 7/8
9.8
18.5
27.6
7.3
15.3
20.6
7.9
14.1
17.6
Value (99.1%)
Value (99.3%)
Value (98.4%)
Chroma (99.9%)
Chroma (99.4%)
Chroma (99.3%)
Hue (95.1%)
Hue (99.9%)
Hue (99.0%)
85.0
72.5
65.0
57.5
50.0
67.5
62.5
82.5
95.0
5RP 5/2-5RP 6/2
5RP 5/4-5RP 7/4
5RP 5/6-5RP 8/6
2.5RP 4/6-2.5RP 4/8
2.5RP 5/6-2.5RP 5/10
2.5RP 6/6-2.5RP 6/12
5RP 5/8-7.5RP 5/8
5RP 6/8-10RP 6/8
5RP 7/8-2.5R 7/8
9.5
19.6
28.2
6.5
13.1
21.0
5.5
10.1
14.5
Value (99.6%)
Value (98.9%)
Value (98.3%)
Chroma (99.6%)
Chroma (99.5%)
Chroma (98.6%)
Hue (95.7%)
Hue (85.1%)
Hue (84.8%)
72.5
55.0
50.0
47.5
62.5
60.0
30.0
70.0
77.5
Sample Pairs
the tasks performed by the observers were made intentionally simple in comparison with other works.12 In any case,
it should be emphasized that these tasks are not merely
perceptive, but both perceptive and cognitive, this last aspect probably playing a key role in our experiments. Because of the large size of the color differences involved in
our current experiments (15.8 and 21.7 CIELAB units on
the average, for Experiments I and II, respectively), it
should be expected that, in addition to perceptive factors,
the cognitive factors and the familiarity with global aspects
of color space could play an important role in the results
found here. Perception should be defined as consciousness
of a sensorial impression via the nerve centers, while cognition signifies a complex hypothetical model based on
experiences, which enable or facilitate knowledge and or-
COLOR research and application
ganization of all manifestations of the outside world.13 The
limits between sensation and perception are imprecise, as
between perception and cognition, which signifies a more
complex interpretative process involving the acquisition,
storage, recovery, and use of knowledge.14
Experiments I and II cannot be considered as complementary or mutually exclusive, but as two different tasks
with an increasing complexity. Thus, in Experiment I only
one of the three color attributes changes, while in Experiment II two color attributes change (i.e., the third color
attribute remains unchanged). The more usual situation in
our everyday life (not considered here) should be sample
pairs where the three color attributes change simultaneously.
The statistical analyses in this research were made using
the Statgraphics-Plus software. Specifically, in the analysis
of results, the F-test was used for the comparison of the
variance, the t-test for the mean, and the Mann–Whitney
(Wilcoxon) for the medians (when the normality hypothesis
was not fulfilled), in all cases, at a 95% confidence level.
RESULTS
Tables I and II list the 36 Munsell sample pairs used in
Experiments I and II, respectively, distinguishing the 4
groups according to their Hue. For each sample pair, these
tables give the total color difference in CIELAB units
(⌬E*ab), the attribute that varied (Table I) or that remained
unchanged (Table II), indicating the proportion in which
this attribute contributed to the total color difference, and
the percentage of correct answers per total of observers
participating in the experiment.
In Table I, the components of the CIELAB color differences (⌬L*, ⌬C*, and ⌬H*) revealed that, in Experiment I,
the color attribute that varied contributed 84.8% to the total
difference in the worst of the cases, and nearly the 100% in
many cases. This indicates the fastidious control in the
manufacture of the Munsell atlas.5 In Experiment II, although the selection of the pairs followed a symmetrical
criterion in the Munsell atlas (cfr. Table II, first column), it
should be pointed out that the two attributes that varied in
each pair contributed in different proportions to the total
color difference measured in CIELAB units, due to the
different magnitude of the three scales (Value, Hue, and
Chroma) of this atlas in CIELAB units. Thus, usually ⌬C*
is much greater than ⌬H*, and ⌬L* is also substantially
greater than ⌬H*, while ⌬C* is only slightly greater than
⌬L*. Table II again reflects the painstaking colorimetric
control in the Munsell samples (in the worst case the attribute that remains unchanged contributes only 5.92% to
the total color difference, this contribution being lower than
1% in most cases).
Table III summarizes the results of Experiments I and II,
for all the observers and per groups: experienced/inexperienced, males/females. The same information for both experiments is also provided by Table IV, but now distinguishing the results achieved for each of the four groups of
Volume 25, Number 5, October 2000
TABLE II. Munsell notation for each of the 36 sample
pairs analyzed in Experiment II. For each pair, the
color difference is indicated in CIELAB units (⌬E*ab), as
well as the color attribute that is constant in the pair
(and its contribution, in percentage, to the total color
difference). The last column shows the percentage of
correct responses for the 20 observers participating in
this experiment.
EXPERIMENT II
5Y
5Y
5Y
5Y
5Y
5Y
5Y
5Y
5Y
% Correct
Answers
Sample Pairs
⌬E*ab
Constant Attribute
(% to ⌬E*ab)
5/2-2.5Y 5/4
6/2-10YR 6/6
7/2-7.5YR 7/8
4/2-7.5Y 5/2
4/4-10Y 6/4
4/6-2.5GY 7/6
5/8-5Y 6/10
5/6-5Y 7/10
5/4-5Y 8/10
12.4
22.7
32.8
9.9
20.4
31.3
15.4
32.8
47.8
Value (0.01%)
Value (0.09%)
Value (0.03%)
Chroma (0.34%)
Chroma (1.18%)
Chroma (5.92%)
Hue (0.41%)
Hue (0.28%)
Hue (0.11%)
50
50
80
10
30
30
70
65
50
10G
10G
10G
10G
10G
10G
10G
10G
10G
5/2-7.5G 5/4
6.2-5G 6/6
7/2-2.5G 7/8
3/4-2.5BG 4/4
3/6-5BG 5/6
3/8-7.5BG 6/8
4/6-10G 5/8
4/4-10G 6/8
4/2-10G 7/8
9.7
19.2
31.1
8.6
18.6
31.7
12.6
27.8
40.9
Value (0.05%)
Value (0.0001%)
Value (0.003%)
Chroma (5.3%)
Chroma (2.3%)
Chroma (7.0%)
Hue (0.20%)
Hue (0.0001%)
Hue (0.03%)
55
60
80
10
15
75
65
60
30
5PB
5PB
5PB
5PB
5PB
5PB
5PB
5PB
5PB
5/2-2.5PB 5/4
6/2-10B 6/6
7/2-7.5B 7/8
3/4-7.5PB 4/4
3/6-10PB 5/6
3/8-2.5P 6/8
4/6-5PB 5/8
4/4-5PB 6/8
4/2-5PB 7/8
8.4
16.6
28.1
9.7
19.8
32.1
11.1
22.4
32.8
Value (0.05%)
Value (0.03%)
Value (0.02%)
Chroma (0.04%)
Chroma (0.09%)
Chroma (0.23%)
Hue (0.68%)
Hue (0.67%)
Hue (0.37%)
55
55
55
30
50
60
55
60
45
5RP
5RP
5RP
5RP
5RP
5RP
5RP
5RP
5RP
5/2-2.5RP 5/4
6/2-10P 6/6
7/2-7.5P 7/8
3/4-7.5RP 4/4
3/6-10RP 5/6
3/8-2.5R 6/8
4/6-5RP 5/8
4/4-5RP 6/8
4/2-5RP 7/8
7.2
13.1
21.2
9.3
20.0
33.3
12.6
22.1
34.1
Value (0.03%)
Value (0.05%)
Value (0.04%)
Chroma (0.16%)
Chroma (0.09%)
Chroma (1.41%)
Hue (0.59%)
Hue (0.02%)
Hue (0.26%)
65
60
80
15
45
50
80
35
40
sample pairs considered around each Hue (5Y, 10G, 5PB,
and 5RP). Tables III and IV indicate the percentages of
correct and incorrect answers, specifying for the latter the
percentage of each type of wrong answer. Specifically, there
were 6 possibilities for confusion, designated as VC, VH,
CH, CV, HV, and HC in Tables III and IV. In this designation, the first letter indicates the correct attribute
(changed/unchanged in Experiments I/II, respectively) and
the second letter the wrong attribute selected by the observers. Thus, for example, if the pair 5Y 5/2 – 2.5Y 5/4 in
Experiment II was judged as a pair of samples with the same
Chroma, this was an incorrect answer (the correct answer
should be Value), which should be scored within the group
VC. Note that, in the designation of an incorrect answer or
a percentage of incorrect answers, the order of the letters is
359
TABLE III. Summary of the percentage of correct and incorrect responses in Experiments I and II, distinguishing
the different groups of observers (experienced, inexperienced, males, females), and for all the observers pooled
(total). The different types of incorrect responses are distinguished (for example, when the attribute Value is
judged by the observers as Chroma, this error is noted as VC).
EXPERIMENT I/EXPERIMENT II
Experienced
Inexperienced
Males
Females
Total
% Incorrect
%
Correct
VC
VH
CH
CV
HV
HC
72.4/56.6
48.0/44.6
64.8/51.5
51.5/49.2
60.2/50.6
4.7/6.5
11.3/8.6
7.3/7.0
9.5/8.3
7.9/7.6
1.3/3.4
6.8/6.7
3.0/5.2
5.8/4.9
4.0/5.0
6.3/9.5
10.0/10.6
7.2/9.3
9.8/11.1
8.2/10.1
5.1/7.3
13.0/13.1
7.3/9.8
12.5/10.8
9.0/10.1
2.8/2.5
3.3/2.2
3.5/2.1
2.6/2.8
3.2/2.4
7.4/14.2
7.6/14.2
6.9/15.1
8.3/12.9
7.5/14.2
important (i.e., for example, VC and CV indicate two different types of confusion). To obtain the total percentage of
confusions for a given attribute, the results of two different
columns in Table III or IV should be added (for example,
adding VC and VH, we obtain the total percentage of wrong
judgments concerning the attribute Value).
ANALYSIS AND DISCUSSION
Overall Results
Figure 2 shows the proportion of correct and incorrect
answers given by the entire group of observers in the two
experiments, distinguishing among the wrong answers
whether Value, Hue, or Chroma was not identified. The last
row of Table III shows the percentages at which each of the
attributes is identified or confused. In general, it was found
that on the average the ability to distinguish color attributes
was low: the observers were able to identify the attribute
that differed for only 60.2% of the pairs (Experiment I), and
they could detect the shared attribute for only 50.6% of the
pairs (Experiment II).
The percentage of correct answers in Experiment I was
slightly greater than in Experiment II, attributable, according to the opinion of most of the observers, to the greater
TABLE IV. Summary of the percentage of correct and incorrect responses found in Experiments I and II for each
of the four Hue zones considered (5Y, 10G, 5PB and 5RP). Results for different groups of observers (experienced,
inexperienced, males, females), and for all the observers pooled (total) are shown. The different types of incorrect
responses are distinguished (for example, when the true attribute is Value but it is judged by the observers as
Chroma, this error is noted as VC).
EXPERIMENT I/EXPERIMENT II
Hue
Zone
% Incorrect
Groups
%
Correct
VC
VH
CH
CV
HV
HC
5Y
Experienced
Inexperienced
Males
Females
Total
64.4/57.8
41.1/38.9
56.4/50.9
46.0/44.4
52.8/48.3
3.9/5.5
10.1/8.9
5.9/5.6
8.8/9.7
6.9/7.2
2.2/2.2
9.5/4.4
4.9/4.6
7.9/1.4
5.9/3.3
5.6/8.9
9.0/10.0
6.5/8.3
8.8/11.1
7.3/9.5
5.6/6.7
11.8/16.7
6.5/10.2
12.7/13.9
8.5/11.7
5.6/1.1
6.2/2.2
6.5/1.0
4.7/2.8
6.0/1.7
12.7/17.8
12.3/18.9
13.3/19.4
11.1/16.7
12.6/18.3
10G
Experienced
Inexperienced
Males
Females
Total
71.1/52.2
46.7/47.8
64.5/53.7
48.4/44.4
58.9/50.0
2.2/6.7
9.5/6.7
4.8/7.3
8.0/5.6
5.9/6.7
1.7/2.2
10.1/10.0
4.8/4.8
7.1/8.3
5.9/6.1
7.2/12.2
11.2/7.7
7.8/9.2
11.9/11.1
9.2/10.0
6.1/5.6
15.0/12.2
9.0/8.3
13.4/9.7
10.3/8.9
1.7/4.4
1.3/1.1
1.3/2.8
2.4/2.8
1.7/2.8
10.0/16.7
6.2/14.5
7.8/13.9
8.8/18.1
8.1/15.5
5PB
Experienced
Inexperienced
Males
Females
Total
82.8/59.5
58.9/43.9
72.6/48.6
67.4/56.4
70.8/51.7
6.1/6.8
12.1/11.2
9.8/8.4
7.9/9.8
9.0/9.0
0.0/5.6
1.0/6.8
0.3/6.6
0.9/5.7
0.5/6.1
3.3/9.0
8.8/10.2
5.7/10.3
7.1/8.4
6.2/9.6
2.2/7.9
11.6/13.4
5.7/11.2
8.8/9.8
6.8/10.6
1.7/4.4
2.1/3.3
2.6/3.7
0.8/4.2
1.9/4.0
3.9/6.8
5.5/11.2
3.3/11.2
7.1/5.7
4.8/9.0
5RP
Experienced
Inexperienced
Males
Females
Total
71.4/56.7
45.2/47.8
65.9/52.8
44.1/51.4
58.3/52.2
6.8/6.9
13.6/7.8
8.5/6.7
13.6/8.3
10.1/7.3
1.1/3.5
6.3/5.5
1.9/4.8
7.2/4.2
3.7/4.5
8.8/8.0
10.9/14.5
9.1/9.4
11.3/13.9
9.9/11.3
6.8/9.1
14.2/10.0
7.8/9.4
15.1/9.7
10.4/9.5
2.2/0.1
3.5/2.2
3.4/1.0
2.3/1.4
3.2/1.3
2.9/15.7
6.3/12.2
3.4/15.9
6.4/11.1
4.4/13.9
360
COLOR research and application
FIG. 2. Distribution of correct/incorrect responses corresponding to all observers that participated in Experiments (left) I and
(right) II. White: percentage of correct responses; black: percentage of incorrect responses in the detection of the attribute
Value; dark gray: percentage of incorrect responses in the detection of Chroma; light gray: percentage of incorrect responses
in the detection of Hue.
complexity of the latter experiment. In fact, the average
time needed by the observers to complete Experiment II was
somewhat greater than for Experiment I, as shown by Table
V. This reflects the contention that the second experiment
was more difficult, especially taking into account that after
Experiment I the observers should have been better prepared for Experiment II. On the other hand, despite the fact
that the CIELAB color differences in the two experiments
were of similar size (Tables I and II), in Experiment II two
attributes differed simultaneously, and thus these pairs
would have been perceived as more different than those of
Experiment I (in which only one attribute was varied). It
could be concluded from these results that our visual system
is somewhat better at identifying a differing attribute (a
basically perceptive process) than a shared attribute (a process that in addition to being perceptive can have a major
cognitive or intellectual component concerning each color
TABLE V. Time (min) spent by the observers to complete Experiments I and II (mean and standard deviation), distinguishing the different groups of observers
(experienced, inexperienced, males, females) and all
the observers pooled (total).
Experiment I
Experienced
Inexperienced
Males
Females
Total
Experiment II
Average
Std. Dev.
Average
Std. Dev.
10.2
11.1
11.4
9.4
10.7
4.6
7.1
5.6
6.5
6.0
10.8
12.0
11.8
10.8
11.4
3.9
3.0
3.4
3.6
3.4
Volume 25, Number 5, October 2000
attribute). In any case, it should be said that, at a confidence
level of 95%, the difference between our two experiments in
the percentage of correct responses was not significant (nor
were the differences in means or standard deviations).
Table III shows that of the three color attributes the least
successfully identified by the observers was Chroma (17.2%
and 20.2% wrong answers in Experiments I and II, respectively), and the most easily identified was Hue in Experiment I and Value in Experiment II. In this sense, we should
remember that low percentages of incorrect answers in
Table III (or IV) mean better identification, bearing in mind
that every attribute was equally possible in our experiments.
In addition, it should be noted from Table III that the least
frequent confusion in both experiments was between Hue
and Value (HV), and vice versa (VH), while the most
frequent confusion was between Chroma and Value (CV) in
Experiment I, and Hue and Chroma (HC) in Experiment II.
It was also appreciable in Experiment II that Hue was
confused with Chroma (HC) considerably more than in
Experiment I.
The results for each of the 4 groups or Hues considered in
our experiments indicate (see Table IV) that, in Experiment
I, the highest proportion of correct answers (70.8%) was for
the 5PB zone (purple-blue), and the lowest (52.8%) was for
5Y (yellow), while the Hues 10G (green) and 5RP (redpurple) were intermediate (58.9% and 58.3% correct responses, respectively). A similar result, although with fewer
differences between the different Hues, was found in Experiment II: the highest proportion of correct answers was
52.2% for the 5RP zone, followed by 51.7% in the 5PB
zone, 50.0% in the 10G zone, the lowest proportion being in
361
the 5Y zone (48.3%), as in Experiment I. The only noteworthy result from the analysis of the different zones was
that the highest percentage of errors concerning the attribute
Hue was in zone 5Y (18.6% and 20.0% in Experiments I
and II, respectively), and that these percentages were substantially greater than in other zones (e.g., 6.7% and 13.0%
in zone 5PB for Experiments I and II, respectively). This
peculiar trend in zone 5Y (yellow) might be explained by
the experimental results of Purdy,15 who showed a maximum value for the so-called Bezold–Brücke effect towards
520 nm; in addition, Jacobs,16 studying the influence of the
wavelength on saturation, similarly found a maximum value
of around 570 nm. It should be also noted that the Munsell
5Y yellow chips present some problems, such as the greenish-brown hue attributable to the dark yellow region (lower
than 7/), which would probably be hardly called “yellow”
by most observers. The low percentage of correct answers
for the pairs 5Y 5/4 – 5Y 7/4 (57.5%) and 5Y 5/6 – 5Y 8/6
(35.0%) in our Experiment I could be attributed to this
problem.
Mention should also be made of the influence of the
choice of other Munsell pairs in our experimental results.
Thus, the hue spacing in the Munsell region from 5PB
through 5P has unusually proved high: 62°, in contrast to
36° in Munsell notation, from experiments on multidimensional scaling.17,18 If the cooler blues are shifted towards
5BG and the warmer blues towards 5RP, somewhere between the two areas the hue steps become too large. It
should be considered that this large hue change takes place
around 7.5PB, and this may explain the high percentage of
correct answers for the pairs 5PB 7/8 – 2.5P 7/8 (95%) and
5PB 6/8 – 10PB 6/8 (82.5%), in our Experiment I.
A noteworthy aspect in our experimental results concerns
consistency or reproducibility, which was tested at the end
of Experiment I by replicating it with 16 of the 40 original
observers (again, half experienced and half inexperienced
observers). The percentage of correct answers given by the
16 observers was 59.4% (against 60.2% of the entire Experiment I), giving a distribution of correct answers that was
statistically identical to that of Experiment I, in terms of
mean, median, and standard deviation at a confidence level
of 95%. With regard to the type of wrong responses, the
variations between Experiment I and its repetition were in
the worst cases 2.3% (specifically in the confusion of
Chroma with Value, designated as CV). Thus, the resulting
reproducibility was quite acceptable, reinforcing the credibility, for example, of the singular results in the 5Y zone, as
discussed above. This also suggested to us that we could
reduce the number of observers in Experiment II to 20 (half
experienced and half inexperienced), as indicated above,
without a significant lack of accuracy.
We recognize the surprisingly low percentage of correct
answers found in the two experiments, in particular for
experienced observers. Color difference literature shows
that the square of the correlation coefficients relating measured data to perceptual data is only about 70%,19 although
recent studies20 report higher correlation from the use of a
362
TABLE VI. Summary of the percentage of correct and
incorrect responses in the replication of Experiments I
and II by two of the authors of this article, who could
be considered as highly experienced observers for the
proposed task.
EXPERIMENT I/EXPERIMENT II
% Incorrect
%
Correct
VC
VH
CH
CV
HV
HC
95.8/80.5
0/0
0/5.6
1.4/9.7
0/0
0/0
2.8/4.2
gray anchor-pair. Although in our experiments the visual
tasks are more complex than the perception of a color
difference, we have checked the potential improvement
achieved in Experiment I using an anchor-pair. Experiment
I was repeated by 20 observers (half experienced and half
inexperienced) using 3 different anchor-pairs (with Value,
Hue, and Chroma difference of intermediate size, respectively) for each one of the four Hue groups. The same task
as before was proposed to the observers, and the total
percentage of correct answers was 68.5%, as expected
slightly higher than the one previously obtained for this
experiment (60.2%). It should also be noticed that, although
this increase in the percentage of correct answers occurred
for both experienced and inexperienced observers, it was
mainly due to the inexperienced observers (62.4% of correct
responses against 48.0% obtained without anchor pair). In
addition, the results of this replicated experiment confirm
that the worst and best detected color attributes were
Chroma and Hue, respectively.
From the low percentage of correct answers in our experiments, it might be argued that our observers need more
extensive training, and that the level of our experienced
observers has been overestimated, because academic
knowledge of colorimetry does not necessarily translate into
experience in making color judgments. Certainly, it seems
reasonable to assume that people making color judgments
for a living, or people with a higher training, could render
better results than those reported here. As an example,
during the revision of this article two of the present authors
repeated the Experiments I and II, attaining the results
shown in Table VI. To avoid the author’s knowledge about
the experimental designs, the color pairs of the 4 Hue zones
were pooled for these last replicated experiments.
In our opinion, our current results represent the ability to
distinguish the color attributes for a medium-high trained
population, supporting the idea that this is not an easy or
innate ability, which should be trained or tested, if necessary, using appropriate commercial tools.21 On the other
extreme, it should also be considered that our group of
experienced observers should not be underestimated, given
their academic and laboratory experience described above.
In particular, we feel that, for the current experiments where
familiarity with the global aspects of the color space is
important because of the large size of the color differences
COLOR research and application
FIG. 3. Distribution of correct/incorrect responses by groups of observers (experienced, inexperienced, males, females) in
the Experiments (left) I and (right) II. White: percentage of correct responses; black: percentage of incorrect responses in the
detection of the attribute Value; dark gray: percentage of incorrect responses in the detection of Chroma; light gray:
percentage of incorrect responses in the detection of Hue.
Volume 25, Number 5, October 2000
363
FIG. 4. Percentage of correct responses of all observers in
Experiment I, according to the magnitude of the differences
of the attribute (upper) Value ⌬L*, (middle) Chroma ⌬C*ab,
and (bottom) Hue ⌬H*ab. All measurements are given in
CIELAB units. Different symbols indicate the results of each
of the four Hue zones studied: (‚) 5Y, (E) 10G, (Œ) 5PB, (⽧)
5RP.
involved, the predominantly academic experience of most
of our observers is highly valuable.
Groups of Observers
Based on the data of Table III, Fig. 3 shows the percentages of correct and incorrect responses of experienced/
inexperienced and male/female observers in the two experiments. As might be expected, the percentage of correct
responses was higher for the experienced than for the inexperienced observers: specifically it was 24.4% higher for
Experiment I and 12.0% higher for Experiment II. At a
confidence level of 95%, the differences in the percentage of
correct answers for experienced and inexperienced observers were statistically significant in Experiment I but not in
Experiment II. Thus, the percentage of correct answers fell
considerably for experienced observers on going from Experiment I to Experiment II, but this decline did not occur
among the inexperienced. This could be interpreted as saying that experienced observers are more familiar and trained
364
for the task proposed in Experiment I, or also as an indication that experienced observers found Experiment II more
difficult than Experiment I, as indicated by many of them.
However, for inexperienced people, both experiments presented the same complexity and, therefore, the percentage
of correct answers was similar: slightly lower than 50%,
which in any case is a higher percentage than random
probability (33.3%), as would be expected on the basis of
the knowledge that the inexperienced observers had acquired before and through their participation in the experiments.
It is also noteworthy that, as shown in Table III and Fig.
3, the experienced observers appeared to comprehend and
distinguish Value differences considerably better than did
the inexperienced (experienced registered an average of
6.0% mistakes against 18.1% for inexperienced in Experiment I, and 9.9% against 15.3% in Experiment II). Juxtaposing samples, as was done here, increases the perception
of differences in all three color attributes; however, in the
case of Value, a harder edge is seen between the colors,
while Hue and Chroma differences are more subtle. That
experienced observers have probably learned to distinguish
hard or soft edges between colors is a potential explanation
of the above result. In turn, the distinction of the attribute
Chroma was the worst in both groups, as discussed above
for all the observers, but more patently for the inexperienced.
The sex of the observer did not appear to be an influential factor in the results, as shown in Table III and Fig.
3. The lower percentage of correct answers among females was not statistically significant at a confidence
level of 95%, even in Experiment I. The percentage of
mistakes in the discernment of each of the three color
attributes was also quite similar for males and females in
our experiments.
Magnitude of Color Differences
The response of the human visual system is not the same
when the size of the color differences changes from approximately threshold differences to large color differences.
Thus, the last CIE recommendation for industrial colordifference evaluation22 has been suggested for moderate
color differences between 0 –5 CIELAB units and changes
TABLE VII. Percentage of correct answers by 20 observers, who were asked to identify the attribute Value
in achromatic (/2) and chromatic (/8) color pairs. Half
of the observers were experienced and half inexperienced (results in parentheses).
% Correct Chroma
Value step
/2
/8
5/ to 6/ (⌬ Value ⫽ 1)
5/ to 7/ (⌬ Value ⫽ 2)
5/ to 8/ (⌬ Value ⫽ 3)
97.5 (77.5)
100 (90.0)
100 (95.0)
85.0 (65.0)
97.5 (90.0)
97.5 (95.0)
COLOR research and application
between threshold and suprathreshold color difference perception have also been reported.23 On this basis, it might
also be asked whether the distinction of the color attributes
varies with the magnitude of the color difference, and this
was considered in the design of our two experiments, as
explained before. In our case, given that we based our
experiments on samples from the Munsell atlas,5 the color
differences of the pairs in all cases were far greater than
thresholds, as indicated by the second column of Tables I
and II.
Figure 4 shows the percentage of correct answers against
the CIELAB differences in lightness ⌬L* (Value), chroma
⌬C*ab, and hue ⌬H*ab, for all observers in Experiment I.
Following the design of this experiment, each plot in Fig. 4
presents 12 sample pairs (3 per each Hue studied) with
noticeable differences in each color attribute. These 12 pairs
are distributed in three groups, which correspond to the
three magnitudes of color differences selected, as clearly
shown in particular for the attribute Value (Fig. 4, upper
plot), where we can see for each Hue, three lightnessdifferences ⌬L* of approximately 10, 18, and 27 units. The
remaining 24 sample pairs having negligible ⌬L* can be
disregarded for the current analysis.
It is interesting to observe in Fig. 4 that the number of
correct responses tended to decrease when the lightnessdifference ⌬L* increased, though the trend was the opposite
for the pairs that differed in Hue, and no definite trend was
appreciable for the pairs that differed in Chroma. That is,
small differences in Value are most easily identified,
whereas, for Hue, large differences are easiest to identify.
The fact that the percentage of correct responses decreases
as ⌬L* increases (Fig. 4, upper plot) was unexpected. We
have checked that this counterintuitive trend holds for both
experienced and inexperienced observers. In this sense, we
should bear in mind that in Experiment I the pairs with the
smallest Value differences had a low Chroma of /2 (see
Table I), and this probably reduced the perception of both
Hue and Chroma, making it easier to appreciate the Value
difference. To test this hypothesis (influence of achromatic/
chromatic pairs in the detection of Value differences), we
performed a complementary experiment with 20 observers
(10 experienced and 10 inexperienced), who judged 36 new
color pairs under the same experimental conditions used for
the previous Experiment I. These 36 color pairs were regularly distributed into the 4 Hue groups considered before,
and 24 of the pairs had Value differences (12 with Chroma
/2 and 12 with Chroma /8) with three different sizes (⌬
Value ⫽ 1,2,3). The task proposed to the observers was to
identify which color pairs had a Value difference. The
results (Table VII), indicate that Value differences are more
easily detected for achromatic than for chromatic pairs, both
for experienced and inexperienced observers. In addition,
Table VII shows that the larger Value differences were
better detected than the smaller ones, as reasonably expected. In summary, according to these new results, the
counterintuitive effect shown in the upper plot of Fig. 4 does
not hold, and should be due to the use in Experiment I of
Volume 25, Number 5, October 2000
FIG. 5. Percentage of correct responses of all the observers in Experiment II according to the magnitude of the color
differences ⌬E*ab in CIELAB units. The 12 pairs in which the
common attribute is (upper) Value, (middle) Chroma, and
(bottom) Hue are distinguished, as well as the results at each
of the four Hue zones studied: (‚) 5Y, (E) 10G, (Œ) 5PB, (⽧)
5RP.
achromatic pairs, and not to the small size of the Value
differences of these pairs.
Figure 5 presents the relationship between the percentage
of correct answers and the total color difference of the pairs
of Experiment II in CIELAB units, again considering the
total observations. In this figure, the 36 sample pairs are
presented in 3 graphs according to the attribute that remained constant. In this case, the trends observed for Value
and Hue are not so clear as those given in Fig. 4 (it might
even be stated that they are contrary to those in this figure),
and moreover, in Fig. 5, the correct answers for Chroma
increased with the magnitude of the color differences. Thus,
it might be thought that large color differences in a pair
make it easier for the observer to perceive that two samples
have the same lightness or Value (and also the same
Chroma), since small color differences raise doubt as to
which attribute is held constant, and therefore responses
often err. In turn, small color differences in pairs of the
same Hue prove easy to detect (implying two samples near
365
each other on the same page in the Munsell atlas), but, with
large differences, the pair is so different that the response
becomes complex (samples farther apart also on the same
page of the Munsell atlas).
CONCLUSION
Although the three psychological attributes of color—
designed here by Value, Hue, and Chroma—are relatively easily understood from their definitions and the
examples provided by color-order systems, discernment
of these attributes in color sample pairs proves quite
difficult. Thus, only 60.2% of the pairs were judged
correctly with regard to the attribute that differed (Experiment I), and only 50.6% with regard to the attribute
that was shared (Experiment II), the random probability
for correct answers being 33.3% in both experiments.
Even for observers who should be considered as mediumhigh experienced, these percentages are not much higher:
72.4% and 56.6% in Experiments I and II, respectively. If
experienced observers do not achieve the 100% correct
score, it should be hypothesized that the Munsell classification is not perhaps the best perceptual or cognitive
classification, and that the attributes Value, Chroma, and
Hue are intermediate attributes between sensation and
cognition. The observer’s sex does not appear to be a
significant variable in our experiments. The attribute that
was usually the least accurately distinguished was
Chroma, and the best distinguished were Value (particularly by the experienced observers) and Hue. The errors
in the identification of Hue were especially frequent in
the yellow zone (5Y). Rarely were Value and Hue confused.
Large Hue differences were more easily identified than
smaller ones, and a constant Hue was better identified
when the entire color difference was small. Initially, it
seemed that small Value differences were easily detected
(see Fig. 4, upper plot), but this was a consequence of the
use of achromatic pairs for the pairs with small Value
differences. Figures 4 and 5 are interesting, because they
show a link between the ability of our visual system to
distinguish the color attributes in a sample pair with large
color differences, and the physical measurements of these
color differences. Of course, the answer provided by the
visual system starts from the perception of such a color
difference, but in our experiments it is not entirely justified from perception.
In our opinion, the previous results do not contradict the
usefulness of considering three color attributes, as is traditional in colorimetry. They simply indicate that the human
visual system does not possess adequate analytical faculties
to discern such attributes when confronted with only one
pair of samples, even for simple experiments where such
samples differ in only one of the three attributes or have
only one of three attributes in common. It can be assumed
that the distinction of color attributes could be improved by
appropriate training, as shown by Table VI, or using broader
366
scales of color samples (e.g., complete pages from the
Munsell atlas). The lack of accuracy of our visual system in
detecting the color attributes, shown by the current experiments, is not surprising, and could be compared with the
difficulties in high complexity tasks such as, for example,
those carried out in color naming or color memory experiments.
ACKNOWLEDGMENTS
We thank all the observers who generously agreed to collaborate in these experiments. We are indebted to Dr.
Miguel Soriano (Universidad de Almerı́a, Spain) for his
help with the statistical treatments, and to Mr. David Nesbitt
for translating the original Spanish text into English. We are
also very grateful to Dr. Danny Rich (Sun Chemical Corporation Research Center) for his interest in our work and
helpful suggestions. The helpful suggestions from the anonymous referees are also greatly appreciated.
1. Wyszecki G, Stiles WS. Color science: concepts and methods,
quantitative data and formulae. 2nd Ed. New York: Wiley; 1987.
p 487.
2. Billmeyer FW Jr, Saltzman M. Principles of color technology. 2nd Ed.
New York: Wiley; 1981. p 17–19.
3. Hunt RWG. Measuring colour. London: Ellis Horwood; 1989. p 25–
28.
4. Lozano RD. El color y su medición. America Lee; 1978. p 188 –189.
5. Munsell book of color, glossy finish collection. Baltimore, MD: Macbeth Kollmorgen; 1976.
6. LeGrand Y. Optique physiologique. Vol. 2, Chap. 11–12. Paris: Masson et Cie; 1972.
7. Gómez J, Dı́ez MA, de Fez MD, Capilla P. Menu for checking a
model of viewing non-related colours. Ver y Oir 1998;123:119 –
131.
8. Smith NS, Billmeyer FW Jr. Interrelation of the Swiss colour atlas
(SCA-2541) and the Munsell Color Order System. Color Res Appl
1997;22:111–120.
9. Martı́nez–Verdú FM, Arnau V, Malo J, Felipe A, Artigas JM. A new
chromatic encoding for machine vision invariant to the change of
illuminant. J Optics 1996;27:171–181.
10. Wyble DR, Fairchild MD. Prediction of Munsell appearance scales
using various color-appearance models. Col Res Appl 2000;25:132–
144.
11. Ishihara S. Test for color blindness. 38th Ed. Tokyo, Japan: Kanehara
Shuppan; 1979.
12. Burns B, Bryan ES. Dimensional interaction and the structure of
psychological space: the representation of hue, saturation, and brightness. Percept Psychophys 1988;43:494 –507.
13. Dorsch F. Diccionario de Psicologı́a. 2nd Ed. Barcelona: Herder; 1977.
14. Matlin MW, Foley HJ. Sensación y percepción. 3rd Ed. Mexico:
Prentice Hall; 1996.
15. Purdy D McL. Spectral hue as a function of intensity. Am J Psych
1931;43:541.
16. Jacobs GH. Saturation estimates and chromatic adaptation. Percept
Psychophys 1967;2:271–274.
17. Indow T, Aoki N. Multidimensional mapping of 178 Munsell colors.
Col Res Appl 1983;8:145–152.
18. Indow T. Multidimensional studies of Munsell color solid. Psychological Review 1988;95:456 – 470.
19. Rich RM, Billmeyer FW Jr, Howe WG. Method for deriving colordifference-perceptibility ellipses for surface-color samples. J Opt Soc
Am 1975;65:956 –959.
20. Alman DH, Berns RS, Snyder GD, Larsen WA. Performance testing of
COLOR research and application
color-difference metrics using a color tolerance dataset. Col Res Appl
1989;14:139 –151.
21. Colorcurve威 HVC color vision skill test. Colwel/General, Inc. (www.
colorcurve.com) For general information, see ISCC News #382, p. 13.
22. Industrial colour-difference evaluation. CIE Publ. 116-1995. Vienna:
CIE Central Bureau; 1995.
23. Kuehni RG. Threshold color differences compared to supra-threshold
color differences. Col Res Appl 2000;25:226 –229.
ERRATUM
An error appears in Eq. (1) on p. 77 of Color Research and Application, Volume 25, in the note: “On Von Kries: A Reply
to Brill.” It should read:
Q ⫽ Q1 ⫺
Qj ⫽
冕
Q2 Q3
⫹
2!
3!
d␳ dS
p 共 ␭ 兲 d␭
d␭ d␭ j
C.
VAN TRIGT
Saturnus 8
5591 PB HEEZE
The Netherlands
© 2000 John Wiley & Sons, Inc.
Volume 25, Number 5, October 2000
367