degree of difference testing

DEGREE OF
DIFFERENCE TESTING
An Alternative to Traditional
Approaches
Overview
Difference testing can seem like a straightforward type of sensory testing, but this is not always true.
Often, companies simply want to know if there is a difference between two samples. In this case,
traditional difference testing methods—such as triangle or tetrad tests—may be appropriate. Difficulties
arise when testing indicates a difference between the samples, but there is not a difference; or when
there is a belief that the test failed to find a difference that does exist (statistical Type I and Type II errors).
Missing a difference in a test or finding a non-existent difference by statistical chance is the bane of
researchers.
Beyond the statistical concern of “Is this a real difference or a false positive?”, more questions arise: “Is this
test or panel too sensitive?” “Not sensitive enough?” “If there was some inherent variability in the sample,
did that cause a difference?” Even when all goes well, and the existence of a difference is established, the
question immediately follows: “How big is the difference?” or “What is the nature of the difference?”
Degree of Difference (DOD) testing is an extremely useful alternative to traditional approaches. DOD
testing can determine if there is a difference, assess how big the difference is and—when used in
conjunction with descriptive analysis—determine the nature of the difference. It also provides a way to
accommodate product variability and compare multiple products at once.
Applications
DOD testing can be useful in most situations where difference testing is desired, but it may be particularly
applicable in the following situations:
▶
▶
▶
▶
▶
▶
There is a need to compare multiple test samples to a single reference product
There is a need to understand the nature of the differences
There is a desire to avoid false positives
There is batch-to-batch variability in the reference product
Samples are non-homogenous
Finding the smallest difference possible is not the primary objective
You need a descriptive panel to conduct DOD studies. Our sensory panelists are trained to use scales rating
both the degree of difference between two samples in a pair and the intensities of sensory attributes in a
single sample. Typically, ten trained panelists evaluate each pair of samples twice, and the resulting data
set is statistically analyzed.
Example I: DOD Testing
Scenario
A production plant would like to reduce the cost of its product. Its team has developed two potential alternative
cost-reduction processes (Prototypes 1 and 2). Also, the company knows that there is some batch-to-batch
variability in the current process that it considers acceptable. The production plant would like to determine if
either of the two cost-reduction prototypes is close enough to the current product to warrant the change.
Method
Panelists are served samples in pairs containing a reference product (marked ‘R’) and a test product (labeled with
a three-digit random number). Pairs are presented to each panelist in a randomized order. Within each product
pair, each panelist rates how different the test sample is from the reference. In this example, Batch A of the current
production serves as a reference. Panelists compare this reference to both cost-reduction prototype samples to
see if either is close enough to the current production to make a switch. Batch B of the current production is also
compared to Batch A to see if there is a difference between the two production runs, and Batch A is compared to
itself to get a baseline DOD score (Figure 1).
Figure 1
Pair 1
Panelists see:
R
Pair 2
Pair 3
Pair 4
853
R
167
R
428
R
975
Prototype 2
Reference
Batch A
Batch B
Reference
Batch A
Blind
Reference
Batch A
Reference
Batch A
Prototype I
Sample identity:
(not seen)
Reference
Batch A
Comparison:
R vs. Prototype 2
Current Batch-to-Batch
Variability
Baseline (R vs. R)
R vs. Prototype I
For most studies, ten panelists evaluate each product pair two times, resulting in 20 DOD ratings per sample. This data
is statistically analyzed to show how different each sample is from the reference. A typical output is shown in Figure 2.
Results
As expected, the reference sample (Batch A of current
production) shows the lowest degree of difference; it is being
compared to itself, and its DOD score provides a baseline
for comparing the remaining samples. All the other samples
tested show statistically higher DOD scores than the reference
vs. itself. Prototype 1 is the most different from the reference
sample. Batch B and Prototype 2 show similar DOD scores,
demonstrating that Prototype 2 is no more different from the
reference than the batch-to-batch variability that already exists
in its product. Based on these results, Prototype 2 would be a
reasonable substitution for the current product.
If this study had been run as a triangle test, both of the prototype
samples would have been different from the reference, and the
process change would not have been implemented.
Figure 2
5.0
Products that share a letter are not significantly
different at the 95% confidence level.
4.0
3.0
2.0
a
b
b
c
1.0
0.0
Reference
(Batch A)
Batch B
Prototype 1
Prototype 2
Example II: DOD Testing with Attribute Intensity Assessment
Scenario
A yogurt company wants to evaluate three alternative strawberry flavor suppliers to determine which is closest
to its current supplier.
Method
Figure 3
Trained sensory panelists participate in an orientation session to
taste the samples, develop a ballot of key sensory attributes and
anchor the scales. An Overall Degree of Difference scale is included Overall Degree of Difference:
DOD vs The Reference
on the ballot, in addition to the sensory attribute intensity scales
0
5
None
(Figure 3).
15
10
Extreme
FLAVOR:
Testing proceeds as described in Example I: Samples are served
in pairs that include the reference sample (labeled ‘R’) and one
test product (labeled with a three-digit random number). Panelists
first rate the Overall Degree of Difference of the test sample vs.
the reference. Then, they rate the intensity of each of the sensory
attributes for the test sample labeled with the three-digit number.
Alternatively, in a traditional triangle test, a sample from each of the
three new suppliers would be tested versus the current supplier’s
sample to determine if each is different from the current product.
Total Flavor
0
None
0
0
Extreme
5
15
10
Extreme
Strawberry Jam
None
Candy Strawberry
0
10
5
None
15
Strawberry Nesquik
Sweet
0
Extreme
10
5
15
Sweet 2
Sweet 5
5
10
15
Sour 2
Sour 5
Sour 10
Extreme
0
None
15
Jam Nesquik
Jammy Strawberry
None
Extreme
10
5
None
Sour
Results
Sweet 10
Sweet 5
Total Strawberry
15
10
5
Sweet 2
Sweet 10
Extreme
Degree of Difference Results are shown in Figure 4. Supplier 2 is most similar to the current supplier reference
product; it is as close to the reference as the blind reference is to itself. Suppliers 1 and 3 are both different from
the current supplier, but Supplier 3 is most different.
Based on these results, Supplier 2’s product could be used as a
substitute for the current supplier. Additionally, sensory feedback
can be provided to the other suppliers to aid in the reformulation
process.
If a series of triangle tests had been conducted, Supplier 2 would
have passed, and Suppliers 1 and 3 would have failed versus the
current supplier. No information on the nature of the differences
would have been available.
Figure 4
Overall DOD vs. Reference
In addition to the Degree of Difference results, descriptive analysis
of key sensory attributes provides additional information on the
nature of the differences among the products (Figure 5). Supplier 2
is similar to the current supplier in all attributes tested. Suppliers
1 and 3 were both higher than the reference in Total Flavor and
Strawberry Flavor, but Supplier 1 was higher in Candy Strawberry
and Sweetness while Supplier 3 was higher in Jammy Strawberry
and Sourness.
5.0
Products that share a letter are not significantly
different at the 95% confidence level.
4.0
a
3.0
2.0
b
c
c
1.0
0.0
Reference
(Current Supplier)
Supplier 1
Supplier 2
Supplier 3
Figure 5
Total Flavor
10
8
6
Sweet
Total Strawberry
4
Reference (Current Supplier)
2
Supplier 1
0
Supplier 2
Supplier 3
Sour
Jammy Strawberry
Candy Strawberry
Choosing a Difference Test
There are many reasonable approaches to sensory difference testing. If maximum discrimination sensitivity is not your
primary objective, any type of difference test may be acceptable. When you are trying to find very small differences, or
when there is a business need to be quite certain that a difference cannot be found, triangle or tetrad tests are statistically
powerful. However, these methods only show if a difference exists between the reference and each of the test samples. They
do not provide a measure of the magnitude of the difference or the nature of the difference.
In many cases, DOD testing provides benefits over traditional difference tests. In addition to allowing for a small variability
in samples to be acceptable as part of normal variation, DOD testing includes the ability to test multiple variants at once and
thus assess if some samples are further from the reference than others. You can also combine DOD testing with sensory
attribute testing to determine the magnitude of specific sensory differences, providing an understanding of the nature of
those differences.
The Covance Partnership
Achieve new levels of market success. We can add depth to your consumer research with tailored testing that delivers
actionable insights. Covance offers a wide breadth and depth of services and is committed to your success. Make us your
partner for integrated consulting, development, and testing solutions.
Learn more about our food solutions at www.covance.com/foodsolutions
Covance Inc., headquartered in Princeton, NJ, USA is the drug development business of Laboratory
Corporation of America Holdings (LabCorp). COVANCE is a registered trademark and the
marketing name for Covance Inc. and its subsidiaries around the world.
US + 001.800.675.8375
UK + 44.1423.848864 Singapore + 65.6568.6759
© Copyright 2016 Covance Inc.
WPNCFS005-0716