DEGREE OF DIFFERENCE TESTING An Alternative to Traditional Approaches Overview Difference testing can seem like a straightforward type of sensory testing, but this is not always true. Often, companies simply want to know if there is a difference between two samples. In this case, traditional difference testing methods—such as triangle or tetrad tests—may be appropriate. Difficulties arise when testing indicates a difference between the samples, but there is not a difference; or when there is a belief that the test failed to find a difference that does exist (statistical Type I and Type II errors). Missing a difference in a test or finding a non-existent difference by statistical chance is the bane of researchers. Beyond the statistical concern of “Is this a real difference or a false positive?”, more questions arise: “Is this test or panel too sensitive?” “Not sensitive enough?” “If there was some inherent variability in the sample, did that cause a difference?” Even when all goes well, and the existence of a difference is established, the question immediately follows: “How big is the difference?” or “What is the nature of the difference?” Degree of Difference (DOD) testing is an extremely useful alternative to traditional approaches. DOD testing can determine if there is a difference, assess how big the difference is and—when used in conjunction with descriptive analysis—determine the nature of the difference. It also provides a way to accommodate product variability and compare multiple products at once. Applications DOD testing can be useful in most situations where difference testing is desired, but it may be particularly applicable in the following situations: ▶ ▶ ▶ ▶ ▶ ▶ There is a need to compare multiple test samples to a single reference product There is a need to understand the nature of the differences There is a desire to avoid false positives There is batch-to-batch variability in the reference product Samples are non-homogenous Finding the smallest difference possible is not the primary objective You need a descriptive panel to conduct DOD studies. Our sensory panelists are trained to use scales rating both the degree of difference between two samples in a pair and the intensities of sensory attributes in a single sample. Typically, ten trained panelists evaluate each pair of samples twice, and the resulting data set is statistically analyzed. Example I: DOD Testing Scenario A production plant would like to reduce the cost of its product. Its team has developed two potential alternative cost-reduction processes (Prototypes 1 and 2). Also, the company knows that there is some batch-to-batch variability in the current process that it considers acceptable. The production plant would like to determine if either of the two cost-reduction prototypes is close enough to the current product to warrant the change. Method Panelists are served samples in pairs containing a reference product (marked ‘R’) and a test product (labeled with a three-digit random number). Pairs are presented to each panelist in a randomized order. Within each product pair, each panelist rates how different the test sample is from the reference. In this example, Batch A of the current production serves as a reference. Panelists compare this reference to both cost-reduction prototype samples to see if either is close enough to the current production to make a switch. Batch B of the current production is also compared to Batch A to see if there is a difference between the two production runs, and Batch A is compared to itself to get a baseline DOD score (Figure 1). Figure 1 Pair 1 Panelists see: R Pair 2 Pair 3 Pair 4 853 R 167 R 428 R 975 Prototype 2 Reference Batch A Batch B Reference Batch A Blind Reference Batch A Reference Batch A Prototype I Sample identity: (not seen) Reference Batch A Comparison: R vs. Prototype 2 Current Batch-to-Batch Variability Baseline (R vs. R) R vs. Prototype I For most studies, ten panelists evaluate each product pair two times, resulting in 20 DOD ratings per sample. This data is statistically analyzed to show how different each sample is from the reference. A typical output is shown in Figure 2. Results As expected, the reference sample (Batch A of current production) shows the lowest degree of difference; it is being compared to itself, and its DOD score provides a baseline for comparing the remaining samples. All the other samples tested show statistically higher DOD scores than the reference vs. itself. Prototype 1 is the most different from the reference sample. Batch B and Prototype 2 show similar DOD scores, demonstrating that Prototype 2 is no more different from the reference than the batch-to-batch variability that already exists in its product. Based on these results, Prototype 2 would be a reasonable substitution for the current product. If this study had been run as a triangle test, both of the prototype samples would have been different from the reference, and the process change would not have been implemented. Figure 2 5.0 Products that share a letter are not significantly different at the 95% confidence level. 4.0 3.0 2.0 a b b c 1.0 0.0 Reference (Batch A) Batch B Prototype 1 Prototype 2 Example II: DOD Testing with Attribute Intensity Assessment Scenario A yogurt company wants to evaluate three alternative strawberry flavor suppliers to determine which is closest to its current supplier. Method Figure 3 Trained sensory panelists participate in an orientation session to taste the samples, develop a ballot of key sensory attributes and anchor the scales. An Overall Degree of Difference scale is included Overall Degree of Difference: DOD vs The Reference on the ballot, in addition to the sensory attribute intensity scales 0 5 None (Figure 3). 15 10 Extreme FLAVOR: Testing proceeds as described in Example I: Samples are served in pairs that include the reference sample (labeled ‘R’) and one test product (labeled with a three-digit random number). Panelists first rate the Overall Degree of Difference of the test sample vs. the reference. Then, they rate the intensity of each of the sensory attributes for the test sample labeled with the three-digit number. Alternatively, in a traditional triangle test, a sample from each of the three new suppliers would be tested versus the current supplier’s sample to determine if each is different from the current product. Total Flavor 0 None 0 0 Extreme 5 15 10 Extreme Strawberry Jam None Candy Strawberry 0 10 5 None 15 Strawberry Nesquik Sweet 0 Extreme 10 5 15 Sweet 2 Sweet 5 5 10 15 Sour 2 Sour 5 Sour 10 Extreme 0 None 15 Jam Nesquik Jammy Strawberry None Extreme 10 5 None Sour Results Sweet 10 Sweet 5 Total Strawberry 15 10 5 Sweet 2 Sweet 10 Extreme Degree of Difference Results are shown in Figure 4. Supplier 2 is most similar to the current supplier reference product; it is as close to the reference as the blind reference is to itself. Suppliers 1 and 3 are both different from the current supplier, but Supplier 3 is most different. Based on these results, Supplier 2’s product could be used as a substitute for the current supplier. Additionally, sensory feedback can be provided to the other suppliers to aid in the reformulation process. If a series of triangle tests had been conducted, Supplier 2 would have passed, and Suppliers 1 and 3 would have failed versus the current supplier. No information on the nature of the differences would have been available. Figure 4 Overall DOD vs. Reference In addition to the Degree of Difference results, descriptive analysis of key sensory attributes provides additional information on the nature of the differences among the products (Figure 5). Supplier 2 is similar to the current supplier in all attributes tested. Suppliers 1 and 3 were both higher than the reference in Total Flavor and Strawberry Flavor, but Supplier 1 was higher in Candy Strawberry and Sweetness while Supplier 3 was higher in Jammy Strawberry and Sourness. 5.0 Products that share a letter are not significantly different at the 95% confidence level. 4.0 a 3.0 2.0 b c c 1.0 0.0 Reference (Current Supplier) Supplier 1 Supplier 2 Supplier 3 Figure 5 Total Flavor 10 8 6 Sweet Total Strawberry 4 Reference (Current Supplier) 2 Supplier 1 0 Supplier 2 Supplier 3 Sour Jammy Strawberry Candy Strawberry Choosing a Difference Test There are many reasonable approaches to sensory difference testing. If maximum discrimination sensitivity is not your primary objective, any type of difference test may be acceptable. When you are trying to find very small differences, or when there is a business need to be quite certain that a difference cannot be found, triangle or tetrad tests are statistically powerful. However, these methods only show if a difference exists between the reference and each of the test samples. They do not provide a measure of the magnitude of the difference or the nature of the difference. In many cases, DOD testing provides benefits over traditional difference tests. In addition to allowing for a small variability in samples to be acceptable as part of normal variation, DOD testing includes the ability to test multiple variants at once and thus assess if some samples are further from the reference than others. You can also combine DOD testing with sensory attribute testing to determine the magnitude of specific sensory differences, providing an understanding of the nature of those differences. The Covance Partnership Achieve new levels of market success. We can add depth to your consumer research with tailored testing that delivers actionable insights. Covance offers a wide breadth and depth of services and is committed to your success. Make us your partner for integrated consulting, development, and testing solutions. Learn more about our food solutions at www.covance.com/foodsolutions Covance Inc., headquartered in Princeton, NJ, USA is the drug development business of Laboratory Corporation of America Holdings (LabCorp). COVANCE is a registered trademark and the marketing name for Covance Inc. and its subsidiaries around the world. US + 001.800.675.8375 UK + 44.1423.848864 Singapore + 65.6568.6759 © Copyright 2016 Covance Inc. WPNCFS005-0716
© Copyright 2026 Paperzz