Are We Able to Distinguish Color Attributes?† M. Melgosa,1* M. J. Rivas,1 E. Hita,1 F. Viénot2 1 Departamento de Óptica, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain 2 Muséum National d’Histoire Naturelle, Laboratoire de Photobiologie, 75005 Paris, France Received 20 March 1999; revised 29 January 2000; accepted 25 March 2000 Abstract: The perception and understanding of the three color attributes have been analyzed from two experiments using pairs of Munsell samples, where only one of the three color attributes were changed/unchanged (Experiment I/II) at a time. In each experiment, 36 pairs with color differences of 3 different sizes (average values of 15.8 and 21.7 CIELAB units for Experiments I and II, respectively) were assessed under standardized conditions by 40 normal observers, 20 of them with previous knowledge and experience in colorimetry. At a 95% confidence level, the results from the two experiments were not significantly different, indicating that color attributes were not easily distinguished: for example, for experienced observers, the percentage of correct answers for identifying the color attribute responsible for a color difference was only 72.4%, the random probability being 33.3%. There were no significant differences between the results found by men and women. The worst distinguished attribute was Chroma, that is, the least frequent confusion was between Hue and Value or vice versa. Value differences were more easily detected for achromatic than for chromatic pairs, both for experienced and inexperienced observers. With respect to the size of the color differences, we observed that large hue differences were more easily identifiable than smaller ones, and a constant Hue was more identifiable when the entire color difference was small. © 2000 John Wiley & Sons, Inc. Col Res Appl, 25, 356 –367, 2000 * Correspondence to: Dr. Manuel Melgosa, Departamento de Óptica, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain (e-mail: [email protected]) Contract grant sponsor: Ministerio de Educación y Ciencia (España); contract grant number: PB96-1454. † The preliminary results of this work were presented at the 1998 Annual Meeting of the Optical Society of America (Proc. 1998 OSA Annual Meeting, Paper MDD7, p. 73). © 2000 John Wiley & Sons, Inc. 356 Key words: colorimetry; color vision; color differences; Munsell; CIELAB INTRODUCTION The three basic perceptual attributes of color, usually designated as lightness, hue, and saturation, are generally understood and differentiated through their definitions in classical textbooks on colorimetry,1-4 as well as through the examples provided by several color-order systems, where samples vary along scales for each of these attributes (e.g., the Munsell Book of Color5). These perceptual attributes are generally presented as being independent. As it is well known, when presented with a pair of color stimuli, our visual system can accurately judge equality or inequality within the limits of color-discrimination thresholds. In the case of equal stimuli, the visual system does not achieve the distinction of the diverse spectral distributions leading to the same color (metamerism). In the case of unequal stimuli, the color difference can be assumed a priori to be the result of disparities in one or more of the aforementioned attributes, but it seems that it is not easy to identify which attribute(s) might be involved, and in what proportions. In short, it is usually said that the human visual system acts as a zero instrument (detection of equality/ inequality),6 but it does not have analytical properties, among which should probably be included an accurate discerning of the different attributes responsible for a given color difference. In any case, it might be asked to what extent, as a result of a process that is both perceptive and cognitive, we can distinguish whether a given color difference is due to lightness, hue, or saturation. It is reasonable to assume that at least observers who are experienced in color discrimination tasks (or normal observers with a reliable degree of training) COLOR research and application can distinguish each of the three color attributes in simple experiments such as those described here, where only one of the three attributes is varied (or held constant). It has been reported in the literature7 that variations in lightness can be caused by changes in hue or saturation (Helmholtz–Kohlrausch effect), and variations in hue can be caused by changes in lightness (Bezold–Brücke effect) or in colorimetric purity (Aubert–Abney effect). The present experiments have not been designed to study such effects, which were taken into consideration during the renotation of the Munsell system, but we cannot rule out the influence of these effects in the possible confusion of the three color attributes by our observers. In the following section, we examine the design of two experiments carried out in order to answer the question addressed in the title of this work: are we able to distinguish color attributes? This is a question that could be considered interesting within the fundamentals of colorimetry. MATERIALS AND METHODS Two experiments were performed, in each of which 36 pairs of glossy samples from the Munsell atlas5 were observed. This atlas was chosen, because it is one of the most widely used color-order systems,8-10 in which the three color attributes are illustrated by different scales with constant perceptual steps. In addition, the samples in this atlas are rigorously controlled for quality over the different scales. Following the designation of the scales in the Munsell system, henceforth the three color attributes (lightness, hue, and saturation) will be designated as Value, Hue, and Chroma, respectively. In our first experiment (Experiment I), samples of each pair differed in only one of the color attributes, while in the second experiment (Experiment II) samples of each pair had only one of the color attributes identical. The 36 pairs of samples of each experiment were distributed into 4 groups around the Hues 5Y, 10G, 5PB, and 5RP, with 9 sample pairs per group. In Experiment I, in each group, 3 pairs varied only in Value, 3 pairs varied only in Chroma, and 3 pairs varied only in Hue, the variation differing in magnitude within each series. Experiment II followed an analogous design, having also 4 groups of 9 sample pairs, 3 having the same Value, 3 the same Chroma, and 3 the same Hue (the color difference in each case being of a different magnitude due to the two attributes that varied at the same time, and along the same proportion in Munsell scales, in each pair). Figure 1 illustrates the procedure followed for the selection of pairs for Experiment I, indicating, as an example, the 9 pairs selected for the group around the Hue 10G (a similar procedure was used in the selection of samples for Experiment II). The fact that we had only one Munsell atlas for our experiments implied certain limitations in the selection of samples, including that, chronologically, Experiment II had to be performed after Experiment I was finished. The 9 pairs of samples from each group were randomly Volume 25, Number 5, October 2000 FIG. 1. Schematic of the criteria used to select the sample pairs analyzed in Experiment I (Hue 10G). The pairs present differences in only one of the three color attributes: (upper left) Value, (upper right) Chroma, or (bottom) Hue. In each case, the color differences are of three different magnitudes (pairs designated 1, 2, and 3). positioned over gray cardboard backgrounds (10.0 cm width ⫻ 14.3 cm height), five pairs on the left column and four pairs on the right column, allowing the cardboard to be held by the observer using the free lower right corner. The two samples of each pair were juxtaposed (without separation) subtending a visual angle of 4.6° ⫻ 7.6° for an observer at a distance of 25 cm, the minimum separation between pairs being 0.7 cm. The sample pairs were viewed simultaneously inside a Verivide cabinet CAC120 with a D65 luminous source, and their colors were measured by a SpectraScan PR-704 spectroradiometer of Photo Research, under identical conditions to those used for visual judgments, assuming the CIE 1964 Standard Observer. The CIELAB coordinates of the gray cardboard background on which the sample pairs were presented were a* ⫽ ⫺1.3 ; b* ⫽ ⫺3.0 ; L* ⫽ 62.2. The reference white used to obtain 357 the CIELAB coordinates was a pressed barium-sulphatepowder plaque supplied with the spectroradiometer and measured under the same experimental conditions as the samples. For each sample pair, the observer was simply asked which color attribute was different in Experiment I, and which color attribute was identical in Experiment II. The observers were not informed concerning the experimental design (e.g., they did not know that in the 9 sample pairs of each group there were 3 pairs in which each one of the three color attributes was changed/unchanged at a time). Observers judged the 9 pairs in a given group in their own order and without a time limit, the responses being recorded by the researcher. Although only one pair was judged at a time, the 9 pairs were displayed simultaneously to provide the observer an overall idea of the experiment and facilitate the proposed task. The order of presentation of the 4 groups to the different observers was random. Experiment I was conducted with 40 observers having normal color vision according to the Ishihara test,11 of which 20 could be considered experienced and 20 inexperienced. The observers designated as experienced had extensive knowledge of the fundamentals of colorimetry (13 of them had earned a Ph.D. working on different aspects of colorimetry), and had previously carried out research in this field (7 observers had measured their own color-matching functions at 10 nm intervals using a visual colorimeter, and 13 had measured color discrimination thresholds), being professors in our Optics Department. The observers considered as inexperienced, students in the same Department, were instructed individually by the researcher before conducting each experiment, with the help of the definitions of the color attributes1-4 and with the observation of the complete Munsell atlas,5 until the observer appeared to fully understand the objective of the experiment. In Experiment I, 26 males and 14 females participated (the subgroup of inexperienced observers contained 10 males and 10 females). To analyze the consistency of the results from Experiment I, it was repeated by one subgroup of the observers (8 experienced and 8 inexperienced). From the results of this replicated experiment, we decided that for Experiment II it should suffice to use only 20 observers (12 males and 8 females) with normal color vision, half experienced and half inexperienced. To test different questions, the next three complementary experiments were also performed: (1) Experiment I was replicated by 20 observers (10 experienced and 10 inexperienced), in order to analyze the usefulness of an anchor pair to increase the percentage of correct answers; (2) both Experiments I and II were replicated by two of the authors of this article, who could be considered as highly trained for the proposed task; (3) the detection of the attribute Value for color pairs with low and high Chroma was specifically tested for a group of 20 observers (half experienced and half inexperienced). All together, in this work, a total of 4248 visual judgments were made. As we can see, both the layout of both experiments and 358 TABLE I. Munsell notation for each of the 36 sample pairs analyzed in Experiment i. For each pair, the color difference is indicated in CIELAB units (⌬E*ab), as well as the color attribute that varies in each pair (and its contribution, in percentage, to the total color difference). The last column shows the percentage of correct responses for the 40 observers participating in this experiment. EXPERIMENT I ⌬E*ab Variable Attribute (% to ⌬E*ab) % Correct Answers 5Y 5/2-5Y 6/2 5Y 5/4-5Y 7/4 5Y 5/6-5Y 8/6 2.5Y 5/6-2.5Y 5/8 2.5Y 6/6-2.5Y 6/10 2.5Y 7/6-2.5Y 7/12 5Y 5/8-7.5Y 5/8 5Y 6/8-10Y 6/8 5Y 7/8-2.5GY 7/8 9.7 19.0 28.5 9.6 23.2 39.7 6.1 9.4 15.2 Value (99.8%) Value (99.7%) Value (98.8%) Chroma (99.4%) Chroma (99.9%) Chroma (99.9%) Hue (93.8%) Hue (97.4%) Hue (99.8%) 75.0 57.5 35.0 60.0 42.5 20.0 42.5 55.0 87.5 10G 4/2-10G 5/2 10G 4/4-10G 6/4 10G 4/6-10G 7/6 7.5G 4/4-7.5G 4/6 7.5G 5/4-7.5G 5/8 7.5G 6/4-7.5G 6/10 10G 5/8-2.5BG 5/8 10G 6/8-5BG 6/8 10G 7/8-7.5BG 7/8 9.0 19.1 27.3 11.7 18.7 27.8 4.6 9.7 12.6 Value (98.3%) Value (99.3%) Value (99.8%) Chroma (99.9%) Chroma (99.9%) Chroma (99.9%) Hue (98.6%) Hue (99.7%) Hue (95.6%) 77.5 57.5 52.5 52.5 62.5 55.0 27.5 60.0 85.0 5PB 5/2-5PB 6/2 5PB 5/4-5PB 7/4 5PB 5/6-5PB 8/6 2.5PB 3/6-2.5PB 3/8 2.5PB 4/6-2.5PB 4/10 2.5PB 5/6-2.5PB 5/12 5PB 5/8-7.5PB 5/8 5PB 6/8-10PB 6/8 5PB 7/8-2.5P 7/8 9.8 18.5 27.6 7.3 15.3 20.6 7.9 14.1 17.6 Value (99.1%) Value (99.3%) Value (98.4%) Chroma (99.9%) Chroma (99.4%) Chroma (99.3%) Hue (95.1%) Hue (99.9%) Hue (99.0%) 85.0 72.5 65.0 57.5 50.0 67.5 62.5 82.5 95.0 5RP 5/2-5RP 6/2 5RP 5/4-5RP 7/4 5RP 5/6-5RP 8/6 2.5RP 4/6-2.5RP 4/8 2.5RP 5/6-2.5RP 5/10 2.5RP 6/6-2.5RP 6/12 5RP 5/8-7.5RP 5/8 5RP 6/8-10RP 6/8 5RP 7/8-2.5R 7/8 9.5 19.6 28.2 6.5 13.1 21.0 5.5 10.1 14.5 Value (99.6%) Value (98.9%) Value (98.3%) Chroma (99.6%) Chroma (99.5%) Chroma (98.6%) Hue (95.7%) Hue (85.1%) Hue (84.8%) 72.5 55.0 50.0 47.5 62.5 60.0 30.0 70.0 77.5 Sample Pairs the tasks performed by the observers were made intentionally simple in comparison with other works.12 In any case, it should be emphasized that these tasks are not merely perceptive, but both perceptive and cognitive, this last aspect probably playing a key role in our experiments. Because of the large size of the color differences involved in our current experiments (15.8 and 21.7 CIELAB units on the average, for Experiments I and II, respectively), it should be expected that, in addition to perceptive factors, the cognitive factors and the familiarity with global aspects of color space could play an important role in the results found here. Perception should be defined as consciousness of a sensorial impression via the nerve centers, while cognition signifies a complex hypothetical model based on experiences, which enable or facilitate knowledge and or- COLOR research and application ganization of all manifestations of the outside world.13 The limits between sensation and perception are imprecise, as between perception and cognition, which signifies a more complex interpretative process involving the acquisition, storage, recovery, and use of knowledge.14 Experiments I and II cannot be considered as complementary or mutually exclusive, but as two different tasks with an increasing complexity. Thus, in Experiment I only one of the three color attributes changes, while in Experiment II two color attributes change (i.e., the third color attribute remains unchanged). The more usual situation in our everyday life (not considered here) should be sample pairs where the three color attributes change simultaneously. The statistical analyses in this research were made using the Statgraphics-Plus software. Specifically, in the analysis of results, the F-test was used for the comparison of the variance, the t-test for the mean, and the Mann–Whitney (Wilcoxon) for the medians (when the normality hypothesis was not fulfilled), in all cases, at a 95% confidence level. RESULTS Tables I and II list the 36 Munsell sample pairs used in Experiments I and II, respectively, distinguishing the 4 groups according to their Hue. For each sample pair, these tables give the total color difference in CIELAB units (⌬E*ab), the attribute that varied (Table I) or that remained unchanged (Table II), indicating the proportion in which this attribute contributed to the total color difference, and the percentage of correct answers per total of observers participating in the experiment. In Table I, the components of the CIELAB color differences (⌬L*, ⌬C*, and ⌬H*) revealed that, in Experiment I, the color attribute that varied contributed 84.8% to the total difference in the worst of the cases, and nearly the 100% in many cases. This indicates the fastidious control in the manufacture of the Munsell atlas.5 In Experiment II, although the selection of the pairs followed a symmetrical criterion in the Munsell atlas (cfr. Table II, first column), it should be pointed out that the two attributes that varied in each pair contributed in different proportions to the total color difference measured in CIELAB units, due to the different magnitude of the three scales (Value, Hue, and Chroma) of this atlas in CIELAB units. Thus, usually ⌬C* is much greater than ⌬H*, and ⌬L* is also substantially greater than ⌬H*, while ⌬C* is only slightly greater than ⌬L*. Table II again reflects the painstaking colorimetric control in the Munsell samples (in the worst case the attribute that remains unchanged contributes only 5.92% to the total color difference, this contribution being lower than 1% in most cases). Table III summarizes the results of Experiments I and II, for all the observers and per groups: experienced/inexperienced, males/females. The same information for both experiments is also provided by Table IV, but now distinguishing the results achieved for each of the four groups of Volume 25, Number 5, October 2000 TABLE II. Munsell notation for each of the 36 sample pairs analyzed in Experiment II. For each pair, the color difference is indicated in CIELAB units (⌬E*ab), as well as the color attribute that is constant in the pair (and its contribution, in percentage, to the total color difference). The last column shows the percentage of correct responses for the 20 observers participating in this experiment. EXPERIMENT II 5Y 5Y 5Y 5Y 5Y 5Y 5Y 5Y 5Y % Correct Answers Sample Pairs ⌬E*ab Constant Attribute (% to ⌬E*ab) 5/2-2.5Y 5/4 6/2-10YR 6/6 7/2-7.5YR 7/8 4/2-7.5Y 5/2 4/4-10Y 6/4 4/6-2.5GY 7/6 5/8-5Y 6/10 5/6-5Y 7/10 5/4-5Y 8/10 12.4 22.7 32.8 9.9 20.4 31.3 15.4 32.8 47.8 Value (0.01%) Value (0.09%) Value (0.03%) Chroma (0.34%) Chroma (1.18%) Chroma (5.92%) Hue (0.41%) Hue (0.28%) Hue (0.11%) 50 50 80 10 30 30 70 65 50 10G 10G 10G 10G 10G 10G 10G 10G 10G 5/2-7.5G 5/4 6.2-5G 6/6 7/2-2.5G 7/8 3/4-2.5BG 4/4 3/6-5BG 5/6 3/8-7.5BG 6/8 4/6-10G 5/8 4/4-10G 6/8 4/2-10G 7/8 9.7 19.2 31.1 8.6 18.6 31.7 12.6 27.8 40.9 Value (0.05%) Value (0.0001%) Value (0.003%) Chroma (5.3%) Chroma (2.3%) Chroma (7.0%) Hue (0.20%) Hue (0.0001%) Hue (0.03%) 55 60 80 10 15 75 65 60 30 5PB 5PB 5PB 5PB 5PB 5PB 5PB 5PB 5PB 5/2-2.5PB 5/4 6/2-10B 6/6 7/2-7.5B 7/8 3/4-7.5PB 4/4 3/6-10PB 5/6 3/8-2.5P 6/8 4/6-5PB 5/8 4/4-5PB 6/8 4/2-5PB 7/8 8.4 16.6 28.1 9.7 19.8 32.1 11.1 22.4 32.8 Value (0.05%) Value (0.03%) Value (0.02%) Chroma (0.04%) Chroma (0.09%) Chroma (0.23%) Hue (0.68%) Hue (0.67%) Hue (0.37%) 55 55 55 30 50 60 55 60 45 5RP 5RP 5RP 5RP 5RP 5RP 5RP 5RP 5RP 5/2-2.5RP 5/4 6/2-10P 6/6 7/2-7.5P 7/8 3/4-7.5RP 4/4 3/6-10RP 5/6 3/8-2.5R 6/8 4/6-5RP 5/8 4/4-5RP 6/8 4/2-5RP 7/8 7.2 13.1 21.2 9.3 20.0 33.3 12.6 22.1 34.1 Value (0.03%) Value (0.05%) Value (0.04%) Chroma (0.16%) Chroma (0.09%) Chroma (1.41%) Hue (0.59%) Hue (0.02%) Hue (0.26%) 65 60 80 15 45 50 80 35 40 sample pairs considered around each Hue (5Y, 10G, 5PB, and 5RP). Tables III and IV indicate the percentages of correct and incorrect answers, specifying for the latter the percentage of each type of wrong answer. Specifically, there were 6 possibilities for confusion, designated as VC, VH, CH, CV, HV, and HC in Tables III and IV. In this designation, the first letter indicates the correct attribute (changed/unchanged in Experiments I/II, respectively) and the second letter the wrong attribute selected by the observers. Thus, for example, if the pair 5Y 5/2 – 2.5Y 5/4 in Experiment II was judged as a pair of samples with the same Chroma, this was an incorrect answer (the correct answer should be Value), which should be scored within the group VC. Note that, in the designation of an incorrect answer or a percentage of incorrect answers, the order of the letters is 359 TABLE III. Summary of the percentage of correct and incorrect responses in Experiments I and II, distinguishing the different groups of observers (experienced, inexperienced, males, females), and for all the observers pooled (total). The different types of incorrect responses are distinguished (for example, when the attribute Value is judged by the observers as Chroma, this error is noted as VC). EXPERIMENT I/EXPERIMENT II Experienced Inexperienced Males Females Total % Incorrect % Correct VC VH CH CV HV HC 72.4/56.6 48.0/44.6 64.8/51.5 51.5/49.2 60.2/50.6 4.7/6.5 11.3/8.6 7.3/7.0 9.5/8.3 7.9/7.6 1.3/3.4 6.8/6.7 3.0/5.2 5.8/4.9 4.0/5.0 6.3/9.5 10.0/10.6 7.2/9.3 9.8/11.1 8.2/10.1 5.1/7.3 13.0/13.1 7.3/9.8 12.5/10.8 9.0/10.1 2.8/2.5 3.3/2.2 3.5/2.1 2.6/2.8 3.2/2.4 7.4/14.2 7.6/14.2 6.9/15.1 8.3/12.9 7.5/14.2 important (i.e., for example, VC and CV indicate two different types of confusion). To obtain the total percentage of confusions for a given attribute, the results of two different columns in Table III or IV should be added (for example, adding VC and VH, we obtain the total percentage of wrong judgments concerning the attribute Value). ANALYSIS AND DISCUSSION Overall Results Figure 2 shows the proportion of correct and incorrect answers given by the entire group of observers in the two experiments, distinguishing among the wrong answers whether Value, Hue, or Chroma was not identified. The last row of Table III shows the percentages at which each of the attributes is identified or confused. In general, it was found that on the average the ability to distinguish color attributes was low: the observers were able to identify the attribute that differed for only 60.2% of the pairs (Experiment I), and they could detect the shared attribute for only 50.6% of the pairs (Experiment II). The percentage of correct answers in Experiment I was slightly greater than in Experiment II, attributable, according to the opinion of most of the observers, to the greater TABLE IV. Summary of the percentage of correct and incorrect responses found in Experiments I and II for each of the four Hue zones considered (5Y, 10G, 5PB and 5RP). Results for different groups of observers (experienced, inexperienced, males, females), and for all the observers pooled (total) are shown. The different types of incorrect responses are distinguished (for example, when the true attribute is Value but it is judged by the observers as Chroma, this error is noted as VC). EXPERIMENT I/EXPERIMENT II Hue Zone % Incorrect Groups % Correct VC VH CH CV HV HC 5Y Experienced Inexperienced Males Females Total 64.4/57.8 41.1/38.9 56.4/50.9 46.0/44.4 52.8/48.3 3.9/5.5 10.1/8.9 5.9/5.6 8.8/9.7 6.9/7.2 2.2/2.2 9.5/4.4 4.9/4.6 7.9/1.4 5.9/3.3 5.6/8.9 9.0/10.0 6.5/8.3 8.8/11.1 7.3/9.5 5.6/6.7 11.8/16.7 6.5/10.2 12.7/13.9 8.5/11.7 5.6/1.1 6.2/2.2 6.5/1.0 4.7/2.8 6.0/1.7 12.7/17.8 12.3/18.9 13.3/19.4 11.1/16.7 12.6/18.3 10G Experienced Inexperienced Males Females Total 71.1/52.2 46.7/47.8 64.5/53.7 48.4/44.4 58.9/50.0 2.2/6.7 9.5/6.7 4.8/7.3 8.0/5.6 5.9/6.7 1.7/2.2 10.1/10.0 4.8/4.8 7.1/8.3 5.9/6.1 7.2/12.2 11.2/7.7 7.8/9.2 11.9/11.1 9.2/10.0 6.1/5.6 15.0/12.2 9.0/8.3 13.4/9.7 10.3/8.9 1.7/4.4 1.3/1.1 1.3/2.8 2.4/2.8 1.7/2.8 10.0/16.7 6.2/14.5 7.8/13.9 8.8/18.1 8.1/15.5 5PB Experienced Inexperienced Males Females Total 82.8/59.5 58.9/43.9 72.6/48.6 67.4/56.4 70.8/51.7 6.1/6.8 12.1/11.2 9.8/8.4 7.9/9.8 9.0/9.0 0.0/5.6 1.0/6.8 0.3/6.6 0.9/5.7 0.5/6.1 3.3/9.0 8.8/10.2 5.7/10.3 7.1/8.4 6.2/9.6 2.2/7.9 11.6/13.4 5.7/11.2 8.8/9.8 6.8/10.6 1.7/4.4 2.1/3.3 2.6/3.7 0.8/4.2 1.9/4.0 3.9/6.8 5.5/11.2 3.3/11.2 7.1/5.7 4.8/9.0 5RP Experienced Inexperienced Males Females Total 71.4/56.7 45.2/47.8 65.9/52.8 44.1/51.4 58.3/52.2 6.8/6.9 13.6/7.8 8.5/6.7 13.6/8.3 10.1/7.3 1.1/3.5 6.3/5.5 1.9/4.8 7.2/4.2 3.7/4.5 8.8/8.0 10.9/14.5 9.1/9.4 11.3/13.9 9.9/11.3 6.8/9.1 14.2/10.0 7.8/9.4 15.1/9.7 10.4/9.5 2.2/0.1 3.5/2.2 3.4/1.0 2.3/1.4 3.2/1.3 2.9/15.7 6.3/12.2 3.4/15.9 6.4/11.1 4.4/13.9 360 COLOR research and application FIG. 2. Distribution of correct/incorrect responses corresponding to all observers that participated in Experiments (left) I and (right) II. White: percentage of correct responses; black: percentage of incorrect responses in the detection of the attribute Value; dark gray: percentage of incorrect responses in the detection of Chroma; light gray: percentage of incorrect responses in the detection of Hue. complexity of the latter experiment. In fact, the average time needed by the observers to complete Experiment II was somewhat greater than for Experiment I, as shown by Table V. This reflects the contention that the second experiment was more difficult, especially taking into account that after Experiment I the observers should have been better prepared for Experiment II. On the other hand, despite the fact that the CIELAB color differences in the two experiments were of similar size (Tables I and II), in Experiment II two attributes differed simultaneously, and thus these pairs would have been perceived as more different than those of Experiment I (in which only one attribute was varied). It could be concluded from these results that our visual system is somewhat better at identifying a differing attribute (a basically perceptive process) than a shared attribute (a process that in addition to being perceptive can have a major cognitive or intellectual component concerning each color TABLE V. Time (min) spent by the observers to complete Experiments I and II (mean and standard deviation), distinguishing the different groups of observers (experienced, inexperienced, males, females) and all the observers pooled (total). Experiment I Experienced Inexperienced Males Females Total Experiment II Average Std. Dev. Average Std. Dev. 10.2 11.1 11.4 9.4 10.7 4.6 7.1 5.6 6.5 6.0 10.8 12.0 11.8 10.8 11.4 3.9 3.0 3.4 3.6 3.4 Volume 25, Number 5, October 2000 attribute). In any case, it should be said that, at a confidence level of 95%, the difference between our two experiments in the percentage of correct responses was not significant (nor were the differences in means or standard deviations). Table III shows that of the three color attributes the least successfully identified by the observers was Chroma (17.2% and 20.2% wrong answers in Experiments I and II, respectively), and the most easily identified was Hue in Experiment I and Value in Experiment II. In this sense, we should remember that low percentages of incorrect answers in Table III (or IV) mean better identification, bearing in mind that every attribute was equally possible in our experiments. In addition, it should be noted from Table III that the least frequent confusion in both experiments was between Hue and Value (HV), and vice versa (VH), while the most frequent confusion was between Chroma and Value (CV) in Experiment I, and Hue and Chroma (HC) in Experiment II. It was also appreciable in Experiment II that Hue was confused with Chroma (HC) considerably more than in Experiment I. The results for each of the 4 groups or Hues considered in our experiments indicate (see Table IV) that, in Experiment I, the highest proportion of correct answers (70.8%) was for the 5PB zone (purple-blue), and the lowest (52.8%) was for 5Y (yellow), while the Hues 10G (green) and 5RP (redpurple) were intermediate (58.9% and 58.3% correct responses, respectively). A similar result, although with fewer differences between the different Hues, was found in Experiment II: the highest proportion of correct answers was 52.2% for the 5RP zone, followed by 51.7% in the 5PB zone, 50.0% in the 10G zone, the lowest proportion being in 361 the 5Y zone (48.3%), as in Experiment I. The only noteworthy result from the analysis of the different zones was that the highest percentage of errors concerning the attribute Hue was in zone 5Y (18.6% and 20.0% in Experiments I and II, respectively), and that these percentages were substantially greater than in other zones (e.g., 6.7% and 13.0% in zone 5PB for Experiments I and II, respectively). This peculiar trend in zone 5Y (yellow) might be explained by the experimental results of Purdy,15 who showed a maximum value for the so-called Bezold–Brücke effect towards 520 nm; in addition, Jacobs,16 studying the influence of the wavelength on saturation, similarly found a maximum value of around 570 nm. It should be also noted that the Munsell 5Y yellow chips present some problems, such as the greenish-brown hue attributable to the dark yellow region (lower than 7/), which would probably be hardly called “yellow” by most observers. The low percentage of correct answers for the pairs 5Y 5/4 – 5Y 7/4 (57.5%) and 5Y 5/6 – 5Y 8/6 (35.0%) in our Experiment I could be attributed to this problem. Mention should also be made of the influence of the choice of other Munsell pairs in our experimental results. Thus, the hue spacing in the Munsell region from 5PB through 5P has unusually proved high: 62°, in contrast to 36° in Munsell notation, from experiments on multidimensional scaling.17,18 If the cooler blues are shifted towards 5BG and the warmer blues towards 5RP, somewhere between the two areas the hue steps become too large. It should be considered that this large hue change takes place around 7.5PB, and this may explain the high percentage of correct answers for the pairs 5PB 7/8 – 2.5P 7/8 (95%) and 5PB 6/8 – 10PB 6/8 (82.5%), in our Experiment I. A noteworthy aspect in our experimental results concerns consistency or reproducibility, which was tested at the end of Experiment I by replicating it with 16 of the 40 original observers (again, half experienced and half inexperienced observers). The percentage of correct answers given by the 16 observers was 59.4% (against 60.2% of the entire Experiment I), giving a distribution of correct answers that was statistically identical to that of Experiment I, in terms of mean, median, and standard deviation at a confidence level of 95%. With regard to the type of wrong responses, the variations between Experiment I and its repetition were in the worst cases 2.3% (specifically in the confusion of Chroma with Value, designated as CV). Thus, the resulting reproducibility was quite acceptable, reinforcing the credibility, for example, of the singular results in the 5Y zone, as discussed above. This also suggested to us that we could reduce the number of observers in Experiment II to 20 (half experienced and half inexperienced), as indicated above, without a significant lack of accuracy. We recognize the surprisingly low percentage of correct answers found in the two experiments, in particular for experienced observers. Color difference literature shows that the square of the correlation coefficients relating measured data to perceptual data is only about 70%,19 although recent studies20 report higher correlation from the use of a 362 TABLE VI. Summary of the percentage of correct and incorrect responses in the replication of Experiments I and II by two of the authors of this article, who could be considered as highly experienced observers for the proposed task. EXPERIMENT I/EXPERIMENT II % Incorrect % Correct VC VH CH CV HV HC 95.8/80.5 0/0 0/5.6 1.4/9.7 0/0 0/0 2.8/4.2 gray anchor-pair. Although in our experiments the visual tasks are more complex than the perception of a color difference, we have checked the potential improvement achieved in Experiment I using an anchor-pair. Experiment I was repeated by 20 observers (half experienced and half inexperienced) using 3 different anchor-pairs (with Value, Hue, and Chroma difference of intermediate size, respectively) for each one of the four Hue groups. The same task as before was proposed to the observers, and the total percentage of correct answers was 68.5%, as expected slightly higher than the one previously obtained for this experiment (60.2%). It should also be noticed that, although this increase in the percentage of correct answers occurred for both experienced and inexperienced observers, it was mainly due to the inexperienced observers (62.4% of correct responses against 48.0% obtained without anchor pair). In addition, the results of this replicated experiment confirm that the worst and best detected color attributes were Chroma and Hue, respectively. From the low percentage of correct answers in our experiments, it might be argued that our observers need more extensive training, and that the level of our experienced observers has been overestimated, because academic knowledge of colorimetry does not necessarily translate into experience in making color judgments. Certainly, it seems reasonable to assume that people making color judgments for a living, or people with a higher training, could render better results than those reported here. As an example, during the revision of this article two of the present authors repeated the Experiments I and II, attaining the results shown in Table VI. To avoid the author’s knowledge about the experimental designs, the color pairs of the 4 Hue zones were pooled for these last replicated experiments. In our opinion, our current results represent the ability to distinguish the color attributes for a medium-high trained population, supporting the idea that this is not an easy or innate ability, which should be trained or tested, if necessary, using appropriate commercial tools.21 On the other extreme, it should also be considered that our group of experienced observers should not be underestimated, given their academic and laboratory experience described above. In particular, we feel that, for the current experiments where familiarity with the global aspects of the color space is important because of the large size of the color differences COLOR research and application FIG. 3. Distribution of correct/incorrect responses by groups of observers (experienced, inexperienced, males, females) in the Experiments (left) I and (right) II. White: percentage of correct responses; black: percentage of incorrect responses in the detection of the attribute Value; dark gray: percentage of incorrect responses in the detection of Chroma; light gray: percentage of incorrect responses in the detection of Hue. Volume 25, Number 5, October 2000 363 FIG. 4. Percentage of correct responses of all observers in Experiment I, according to the magnitude of the differences of the attribute (upper) Value ⌬L*, (middle) Chroma ⌬C*ab, and (bottom) Hue ⌬H*ab. All measurements are given in CIELAB units. Different symbols indicate the results of each of the four Hue zones studied: (‚) 5Y, (E) 10G, (Œ) 5PB, (⽧) 5RP. involved, the predominantly academic experience of most of our observers is highly valuable. Groups of Observers Based on the data of Table III, Fig. 3 shows the percentages of correct and incorrect responses of experienced/ inexperienced and male/female observers in the two experiments. As might be expected, the percentage of correct responses was higher for the experienced than for the inexperienced observers: specifically it was 24.4% higher for Experiment I and 12.0% higher for Experiment II. At a confidence level of 95%, the differences in the percentage of correct answers for experienced and inexperienced observers were statistically significant in Experiment I but not in Experiment II. Thus, the percentage of correct answers fell considerably for experienced observers on going from Experiment I to Experiment II, but this decline did not occur among the inexperienced. This could be interpreted as saying that experienced observers are more familiar and trained 364 for the task proposed in Experiment I, or also as an indication that experienced observers found Experiment II more difficult than Experiment I, as indicated by many of them. However, for inexperienced people, both experiments presented the same complexity and, therefore, the percentage of correct answers was similar: slightly lower than 50%, which in any case is a higher percentage than random probability (33.3%), as would be expected on the basis of the knowledge that the inexperienced observers had acquired before and through their participation in the experiments. It is also noteworthy that, as shown in Table III and Fig. 3, the experienced observers appeared to comprehend and distinguish Value differences considerably better than did the inexperienced (experienced registered an average of 6.0% mistakes against 18.1% for inexperienced in Experiment I, and 9.9% against 15.3% in Experiment II). Juxtaposing samples, as was done here, increases the perception of differences in all three color attributes; however, in the case of Value, a harder edge is seen between the colors, while Hue and Chroma differences are more subtle. That experienced observers have probably learned to distinguish hard or soft edges between colors is a potential explanation of the above result. In turn, the distinction of the attribute Chroma was the worst in both groups, as discussed above for all the observers, but more patently for the inexperienced. The sex of the observer did not appear to be an influential factor in the results, as shown in Table III and Fig. 3. The lower percentage of correct answers among females was not statistically significant at a confidence level of 95%, even in Experiment I. The percentage of mistakes in the discernment of each of the three color attributes was also quite similar for males and females in our experiments. Magnitude of Color Differences The response of the human visual system is not the same when the size of the color differences changes from approximately threshold differences to large color differences. Thus, the last CIE recommendation for industrial colordifference evaluation22 has been suggested for moderate color differences between 0 –5 CIELAB units and changes TABLE VII. Percentage of correct answers by 20 observers, who were asked to identify the attribute Value in achromatic (/2) and chromatic (/8) color pairs. Half of the observers were experienced and half inexperienced (results in parentheses). % Correct Chroma Value step /2 /8 5/ to 6/ (⌬ Value ⫽ 1) 5/ to 7/ (⌬ Value ⫽ 2) 5/ to 8/ (⌬ Value ⫽ 3) 97.5 (77.5) 100 (90.0) 100 (95.0) 85.0 (65.0) 97.5 (90.0) 97.5 (95.0) COLOR research and application between threshold and suprathreshold color difference perception have also been reported.23 On this basis, it might also be asked whether the distinction of the color attributes varies with the magnitude of the color difference, and this was considered in the design of our two experiments, as explained before. In our case, given that we based our experiments on samples from the Munsell atlas,5 the color differences of the pairs in all cases were far greater than thresholds, as indicated by the second column of Tables I and II. Figure 4 shows the percentage of correct answers against the CIELAB differences in lightness ⌬L* (Value), chroma ⌬C*ab, and hue ⌬H*ab, for all observers in Experiment I. Following the design of this experiment, each plot in Fig. 4 presents 12 sample pairs (3 per each Hue studied) with noticeable differences in each color attribute. These 12 pairs are distributed in three groups, which correspond to the three magnitudes of color differences selected, as clearly shown in particular for the attribute Value (Fig. 4, upper plot), where we can see for each Hue, three lightnessdifferences ⌬L* of approximately 10, 18, and 27 units. The remaining 24 sample pairs having negligible ⌬L* can be disregarded for the current analysis. It is interesting to observe in Fig. 4 that the number of correct responses tended to decrease when the lightnessdifference ⌬L* increased, though the trend was the opposite for the pairs that differed in Hue, and no definite trend was appreciable for the pairs that differed in Chroma. That is, small differences in Value are most easily identified, whereas, for Hue, large differences are easiest to identify. The fact that the percentage of correct responses decreases as ⌬L* increases (Fig. 4, upper plot) was unexpected. We have checked that this counterintuitive trend holds for both experienced and inexperienced observers. In this sense, we should bear in mind that in Experiment I the pairs with the smallest Value differences had a low Chroma of /2 (see Table I), and this probably reduced the perception of both Hue and Chroma, making it easier to appreciate the Value difference. To test this hypothesis (influence of achromatic/ chromatic pairs in the detection of Value differences), we performed a complementary experiment with 20 observers (10 experienced and 10 inexperienced), who judged 36 new color pairs under the same experimental conditions used for the previous Experiment I. These 36 color pairs were regularly distributed into the 4 Hue groups considered before, and 24 of the pairs had Value differences (12 with Chroma /2 and 12 with Chroma /8) with three different sizes (⌬ Value ⫽ 1,2,3). The task proposed to the observers was to identify which color pairs had a Value difference. The results (Table VII), indicate that Value differences are more easily detected for achromatic than for chromatic pairs, both for experienced and inexperienced observers. In addition, Table VII shows that the larger Value differences were better detected than the smaller ones, as reasonably expected. In summary, according to these new results, the counterintuitive effect shown in the upper plot of Fig. 4 does not hold, and should be due to the use in Experiment I of Volume 25, Number 5, October 2000 FIG. 5. Percentage of correct responses of all the observers in Experiment II according to the magnitude of the color differences ⌬E*ab in CIELAB units. The 12 pairs in which the common attribute is (upper) Value, (middle) Chroma, and (bottom) Hue are distinguished, as well as the results at each of the four Hue zones studied: (‚) 5Y, (E) 10G, (Œ) 5PB, (⽧) 5RP. achromatic pairs, and not to the small size of the Value differences of these pairs. Figure 5 presents the relationship between the percentage of correct answers and the total color difference of the pairs of Experiment II in CIELAB units, again considering the total observations. In this figure, the 36 sample pairs are presented in 3 graphs according to the attribute that remained constant. In this case, the trends observed for Value and Hue are not so clear as those given in Fig. 4 (it might even be stated that they are contrary to those in this figure), and moreover, in Fig. 5, the correct answers for Chroma increased with the magnitude of the color differences. Thus, it might be thought that large color differences in a pair make it easier for the observer to perceive that two samples have the same lightness or Value (and also the same Chroma), since small color differences raise doubt as to which attribute is held constant, and therefore responses often err. In turn, small color differences in pairs of the same Hue prove easy to detect (implying two samples near 365 each other on the same page in the Munsell atlas), but, with large differences, the pair is so different that the response becomes complex (samples farther apart also on the same page of the Munsell atlas). CONCLUSION Although the three psychological attributes of color— designed here by Value, Hue, and Chroma—are relatively easily understood from their definitions and the examples provided by color-order systems, discernment of these attributes in color sample pairs proves quite difficult. Thus, only 60.2% of the pairs were judged correctly with regard to the attribute that differed (Experiment I), and only 50.6% with regard to the attribute that was shared (Experiment II), the random probability for correct answers being 33.3% in both experiments. Even for observers who should be considered as mediumhigh experienced, these percentages are not much higher: 72.4% and 56.6% in Experiments I and II, respectively. If experienced observers do not achieve the 100% correct score, it should be hypothesized that the Munsell classification is not perhaps the best perceptual or cognitive classification, and that the attributes Value, Chroma, and Hue are intermediate attributes between sensation and cognition. The observer’s sex does not appear to be a significant variable in our experiments. The attribute that was usually the least accurately distinguished was Chroma, and the best distinguished were Value (particularly by the experienced observers) and Hue. The errors in the identification of Hue were especially frequent in the yellow zone (5Y). Rarely were Value and Hue confused. Large Hue differences were more easily identified than smaller ones, and a constant Hue was better identified when the entire color difference was small. Initially, it seemed that small Value differences were easily detected (see Fig. 4, upper plot), but this was a consequence of the use of achromatic pairs for the pairs with small Value differences. Figures 4 and 5 are interesting, because they show a link between the ability of our visual system to distinguish the color attributes in a sample pair with large color differences, and the physical measurements of these color differences. Of course, the answer provided by the visual system starts from the perception of such a color difference, but in our experiments it is not entirely justified from perception. In our opinion, the previous results do not contradict the usefulness of considering three color attributes, as is traditional in colorimetry. They simply indicate that the human visual system does not possess adequate analytical faculties to discern such attributes when confronted with only one pair of samples, even for simple experiments where such samples differ in only one of the three attributes or have only one of three attributes in common. It can be assumed that the distinction of color attributes could be improved by appropriate training, as shown by Table VI, or using broader 366 scales of color samples (e.g., complete pages from the Munsell atlas). The lack of accuracy of our visual system in detecting the color attributes, shown by the current experiments, is not surprising, and could be compared with the difficulties in high complexity tasks such as, for example, those carried out in color naming or color memory experiments. ACKNOWLEDGMENTS We thank all the observers who generously agreed to collaborate in these experiments. We are indebted to Dr. Miguel Soriano (Universidad de Almerı́a, Spain) for his help with the statistical treatments, and to Mr. David Nesbitt for translating the original Spanish text into English. We are also very grateful to Dr. Danny Rich (Sun Chemical Corporation Research Center) for his interest in our work and helpful suggestions. The helpful suggestions from the anonymous referees are also greatly appreciated. 1. Wyszecki G, Stiles WS. Color science: concepts and methods, quantitative data and formulae. 2nd Ed. New York: Wiley; 1987. p 487. 2. Billmeyer FW Jr, Saltzman M. Principles of color technology. 2nd Ed. New York: Wiley; 1981. p 17–19. 3. Hunt RWG. Measuring colour. London: Ellis Horwood; 1989. p 25– 28. 4. Lozano RD. El color y su medición. America Lee; 1978. p 188 –189. 5. Munsell book of color, glossy finish collection. Baltimore, MD: Macbeth Kollmorgen; 1976. 6. LeGrand Y. Optique physiologique. Vol. 2, Chap. 11–12. Paris: Masson et Cie; 1972. 7. Gómez J, Dı́ez MA, de Fez MD, Capilla P. Menu for checking a model of viewing non-related colours. Ver y Oir 1998;123:119 – 131. 8. Smith NS, Billmeyer FW Jr. Interrelation of the Swiss colour atlas (SCA-2541) and the Munsell Color Order System. Color Res Appl 1997;22:111–120. 9. Martı́nez–Verdú FM, Arnau V, Malo J, Felipe A, Artigas JM. A new chromatic encoding for machine vision invariant to the change of illuminant. J Optics 1996;27:171–181. 10. Wyble DR, Fairchild MD. Prediction of Munsell appearance scales using various color-appearance models. Col Res Appl 2000;25:132– 144. 11. Ishihara S. Test for color blindness. 38th Ed. Tokyo, Japan: Kanehara Shuppan; 1979. 12. Burns B, Bryan ES. Dimensional interaction and the structure of psychological space: the representation of hue, saturation, and brightness. Percept Psychophys 1988;43:494 –507. 13. Dorsch F. Diccionario de Psicologı́a. 2nd Ed. Barcelona: Herder; 1977. 14. Matlin MW, Foley HJ. Sensación y percepción. 3rd Ed. Mexico: Prentice Hall; 1996. 15. Purdy D McL. Spectral hue as a function of intensity. Am J Psych 1931;43:541. 16. Jacobs GH. Saturation estimates and chromatic adaptation. Percept Psychophys 1967;2:271–274. 17. Indow T, Aoki N. Multidimensional mapping of 178 Munsell colors. Col Res Appl 1983;8:145–152. 18. Indow T. Multidimensional studies of Munsell color solid. Psychological Review 1988;95:456 – 470. 19. Rich RM, Billmeyer FW Jr, Howe WG. Method for deriving colordifference-perceptibility ellipses for surface-color samples. J Opt Soc Am 1975;65:956 –959. 20. Alman DH, Berns RS, Snyder GD, Larsen WA. Performance testing of COLOR research and application color-difference metrics using a color tolerance dataset. Col Res Appl 1989;14:139 –151. 21. Colorcurve威 HVC color vision skill test. Colwel/General, Inc. (www. colorcurve.com) For general information, see ISCC News #382, p. 13. 22. Industrial colour-difference evaluation. CIE Publ. 116-1995. Vienna: CIE Central Bureau; 1995. 23. Kuehni RG. Threshold color differences compared to supra-threshold color differences. Col Res Appl 2000;25:226 –229. ERRATUM An error appears in Eq. (1) on p. 77 of Color Research and Application, Volume 25, in the note: “On Von Kries: A Reply to Brill.” It should read: Q ⫽ Q1 ⫺ Qj ⫽ 冕 Q2 Q3 ⫹ 2! 3! d dS p 共 兲 d d d j C. VAN TRIGT Saturnus 8 5591 PB HEEZE The Netherlands © 2000 John Wiley & Sons, Inc. Volume 25, Number 5, October 2000 367
© Copyright 2026 Paperzz