Round Robin Test

c by Stochastikon GmbH (http: // encyclopedia. stochastikon. com )
Copyright °
1
Round Robin Test
Name and Name Modifications:
Proficiency Test (PT)
Round robin tests or proficiency tests are an essential element of quality
assurance for laboratories. The objective of proficiency testing is to ensure
the quality and comparability of measurement results by the laboratories.
Comparability is no longer guaranteed if the laboratories get very different
results in the analysis of identical samples. To prevent such a situation,
periodic laboratory proficiency testing are performed to ensure the ability of
laboratories to provide sufficiently accurate results.
A round robin test is based on identical samples which are sent to the participating laboratories which use agreed methods of analysis. The samples
are from an institution that conducts the trial and invites the laboratories
to participate. This institution is also responsible for the evaluation. For
the various types of laboratories there are various professional organizations
that are responsible for carrying out the laws and regulations required proficiency testing. The implementation of proficiency testing itself takes place
in accordance with international and national standards.
For example, the Federal Institute for Materials Research and Testing (BAM)
offers round robin test in the following areas1 :
• Organic and inorganic parameters in contaminated sites.
• Dangerous goods and hazardous substances.
• Chemical emissions from materials and products in the air.
• Porous and disperse materials.
• Particle sizes of fine powders.
The BAM also provides the English language proficiency test-information
system EPTIS, which offers detailed information about worldwide programs
of round robin tests.
Statuary Basis
1
See http://www.bam.de/de/fachthemen/ringversuche/index.htm, February 2013.
c by Stochastikon GmbH (http: // encyclopedia. stochastikon. com )
Copyright °
2
Accredited laboratories must participate periodically in proficiency testing
to thereby prove their quality. The study to be performed are legally defined, e.g. in the area of contaminated sites in Germany by the Federal
Soil Protection and Contaminated Sites Ordinance (BBodSchV) of 12 July
1999, and the “Requirements for sampling, sample preparation and chemical investigation methods on federal property” (updated version of October
2008). Laboratories which are required to participate must have their methods of measurement reviewed within the scope of the existing or applied for
accreditation (parameters).
Proficiency Tests as External Quality Assurance
Proficiency tests are considered as a tool of external quality assurance. Depending on the target there are round robin tests to standardize procedures,
to proof the Laboratory capability, to determine certain characteristics of reference materials and to test the practical application of analytical methods.
The anonymous comparison of the results is considered as a major advantage
for the participating laboratories which get a very good positioning for their
laboratory.
Proficiency tests are used in addition to individual proficiency testing and
laboratory comparisons to determine the performance of a laboratory by comparison tests. A round robin test provides an independent assessment unlike
single proficiency testing in which no comparison with other laboratories is
possible, or laboratory comparisons which may not be neutral.
Conduct and Interpretation of Laboratory Data as Part of a Round
Robin Test
There are no standard guidelines for data evaluation in a round robin test.
The different relevant standards suggest many apparently conflicting procedures. Therefore, the implementation of a round robin test and the analysis
of the data shall be illustrated here by the example of a proficiency test
organized by the AQS Baden-Württemberg for the operational analysis of
wastewater treatment plants. Such round robin tests are carried out on behalf of the Ministry of Environment, Nature Conservation and Transport
Baden-Württemberg2 .
2
See http://www.iswa.uni-stuttgart.de/ch/aqs/rv/rv_allgemein.html, February 2013.
c by Stochastikon GmbH (http: // encyclopedia. stochastikon. com )
Copyright °
3
• Organization:
The participating laboratories receive three identical samples in which
the various components are contained in different concentrations. The
AQS BW sets, possibly after preliminary investigations and in consultation with the other proficiency testing agencies, the necessary stabilization measures, container materials and the concentration levels.
The lowest concentration is chosen so that they can still be detected
reliably with at least one of the methods described in the relevant standards. The concentration range is selected so that the monitoring limit
value is included in the lower region, and the highest concentrations
correspond to what is found in control samples in the reality.
The round robin test is then announced to all registered laboratories
stating the methods for several weeks prior to the scheduled shipment
date. To register for this round robin test, a fixed dead line is set.
In the Institut für Siedlungswasserbau control stock solutions for all
parameters and all concentration levels are produced at great expense.
The samples are prepared by means of a fixed system of dilutions of
these stock solutions.
The amount of the sample is such that it is sufficient on the one hand
for multiple determinations, but on the other hand it is so small that
there is no possibility for performing excessively large tests series in the
laboratories.
The samples are packaged with polyurethane moldings in appropriate
boxes and usually dispatched with an express package service. The
samples are thus usually the next day in the laboratory, so that the
investigation can begin in the same week. In some cases, if a continuous cooling of the sample must be ensured, the samples are brought by
refrigerated vehicles to remote distribution points and there are handed
over directly to the laboratories.
The packages contain a cover letter accompanied by a leaflet, which
draws attention to common mistakes. This leaflet should serve as a
checklist before the results are delivered.
The results may be delivered electronically via Internet or by completing
sending the enclosed result sheets. During data entry, via the Internet
or in the result sheets (if not already registered) the address of the laboratory must be entered, the used methods, sample numbers and the
c by Stochastikon GmbH (http: // encyclopedia. stochastikon. com )
Copyright °
4
resulting values using the required unit. The specification of the used
method offers the possibility of a specific analysis. The result sheets or
the printed Internet reports must necessarily be returned in time and
signed to the Institut für Siedlungswasserbau.
For the organization of such a round robin test it is very important that
all deadlines are met by the participants. This is particularly required
for providing samples in sufficient numbers. For organizational reasons,
sometimes samples are left for single latecomers. However, this can,
for obvious reasons, not guaranteed. Moreover, the time available for
analysis is very scarce, so that the AQS BW must insist on a timely
submission of results.
Following the evaluation, the organizer evaluation sheets created for
all laboratories is delivered together with the evaluation documentation.
These revaluation sheets include the values submitted by the laboratory
together with the target values. Values which lie within the tolerance
limits, are marked as successful with a “+”.
• Evaluation:
The proficiency tests of AQS Baden-Württemberg are evaluated following according to DIN 38402 - A45 evaluated. The evaluation is based
on the deviation from a “conventional correct value” using the so-called
zU -scores3 .
As a first step, the standard deviation sR is calculated for each concentration level. This is done using the Q-method4 , a method of so-called
“robust statistics”. This value is required, among other things, for the
calculation of the “robust mean” using the Hampel estimator.5 This
“robust mean” that is usually considered as “conventional true value”
i.e. as target value. In justified cases, the test investigator may select
3
Transformed values are often called scores. The transformation of the aforementioned
zU -scores has the aim to take into account a possible skewness of the distribution function.
4
The Q-method (a type of factor analysis) has been developed by British psychologist
William Stephenson (1902-989) and is used in the psychology and the social sciences to
check the “subjectivity” of the test persons - from their own perspective. What the Qmethod has to do in connection with the determination of a standard deviation, is not
clear.
5
The Hampel estimator is recommended when it is expected that the sample contains
so-called outliers. The Hampel estimator belongs to the scale-invariant M-estimators. The
estimator is determined by the setting of three constants, and provides for the weighting
of the sample elements. The estimate can be calculated only by an iterative algorithm.
c by Stochastikon GmbH (http: // encyclopedia. stochastikon. com )
Copyright °
5
the “conventional true value” differently for example the sample weight
value or other reference values.
For each measured value, first, a z − score is calculated according to
the following formula:
z − Score =
measured value − set value
target standard deviation
The target standard deviation can differ from the calculated comparison
standard deviation when the following cases occur:
a) There are fixed upper and lower limits for the comparison standard
deviation. If these are exceeded or undercut the target standard
deviation is selected by the exceeded upper or undercut lower bound
for the comparison standard deviation which are fixed in advance.
b) In the case of a concentration comprehensive evaluation by means
of the variance function which is described in DIN 38402 - A45.
In such a case the target standard deviation is the value of the
variance function for the corresponding concentration value.
A combination of the two cases may also occur.
To compensate for the injustices occurring at low concentrations skewed
distribution of data, the zU -scores are calculated from the z-scores as
described in DIN 38402 - A45.
By its zU -scores each measured value is assessed. Values between -2
and +2 are considered by definition to be acceptable. More differing
values are evaluated as “false”. In the round robin tests of PT-WFD
network, values in the range 2 < |z − score| < 3 are as considered as
“questionable”. —
The assessment of each individual value is marked on the evaluation
sheet with a “+” or “-” otherwise.
Further parameter-specific and overall ratings in round robin tests in
wastewater and drinking water systems are regulated differently and
therefore the specific guidance is given for each of the relevant round
robin tests.
c by Stochastikon GmbH (http: // encyclopedia. stochastikon. com )
Copyright °
6
Fixing the Proficiency Test Parameters
About the choice of the number M of concentration levels, the number L of
participating laboratories and of the number K of the parallel determinations
the following recommendations can be found in the standard DIN 38402 part
41:
• The number L of the laboratories should depend in a certain way on the
number M of the levels. It is recommended that the number of laboratories is not less than L = 8 and that a larger number of laboratories is
preferable (for example, L = 15 or more), if there is only a single level
of interest.
• K = 4 parallel determinations are recommended, unless it is usual to
perform a larger number of parallel determinations. When faced with
very high level of spread of analytical results between laboratories can
sometimes also be a smaller number of parallel determinations appropriate but never less than 2. It is to ensure that the product of the
number L of the laboratory and the number K of parallel provisions is
not less than 24.
Note: The standard gives no reasons for the recommended numbers. The
main aim of a round robin test is to evaluate the measurement capability of
laboratories. Why the number L of participating laboratories should depend
on the number M of prescribed levels remains unclear. The recommendation
to reduce the number K of parallel determinations in the case of greater
variability seems to be at least strange. Finally, the requirement L · K ≥ 24
is very mysterious.
The Conventional True Value
The so-called “conventional true value” is a particular problem for many
proficiency tests. In the standard 38402 part 41 there are three ways indicated
to set this value:
• For chemically well defined substances in synthetic samples, the conventionally correct value is identical to the “true” content of the substance
in the sample.
• In case that a theoretical substance-related definition of the “true” value
is not possible, or the “true” value is not known, because exceptionally
no synthetic sample was placed in the round robin, the conventionally
correct value is often determined by a recognized reference method. The
c by Stochastikon GmbH (http: // encyclopedia. stochastikon. com )
Copyright °
7
value determined by this method is then agreed to be the conventionally
correct value.
• If also this way is not possible, for example because the round robin test
is used to test a specific method for its usefulness as a reference method,
or because the production of synthetic samples with “true” values fails,
or because there is no other way to get a reference value, then as a last
way out, the common mean of all values reduced by so-called outliers
defines the conventionally true value. This conventionally true value is
not necessarily identical with the “true” value.
Note: First of all, it should be clear that the “true” value can never be
known, regardless of whether it is a synthetic sample or not. While the first
two methods to determine a “conventional true value” describe possibilities
to get a useful approximation of the true value, the third option resembles a
lottery. In the first two cases, the aim of the round robin test is to verify the
ability of laboratories, while in the third case, the goal may also be to test a
so-called reference measurement method. The two goals are fundamentally
different and therefore can not be achieved using the same method (round
robin test).
Critique
The evaluation of the results of a round robin test consists essentially of the
determination of so-called tolerance intervals for the different measurement
methods used in the laboratories. This is accomplished by the determination
of the comparison and the target standard deviation, or in other cases, of
the measurement uncertainty, where in general the evaluation is based on the
normal distribution, and only in special cases possible skewness (asymmetry)
of the distribution is considered.
Whether a measurement method or a laboratory provides useful measurements depends only on the uncertainty of the measurement process. Each
measurement method is based on a correlation between the quantity of interest and an observed quantity (result of the measurement process). Measurement uncertainty is thereby completely determined by the probability
distribution of the observed quantity, which depends on the unknown value
of the quantity of interest and the prevailing conditions in the laboratory. If
the values of the observed quantity for repeated measurements vary strongly,
then measurement uncertainty is large, if the values vary little, then measurement uncertainty is small. To evaluate the capability of a measurement
procedure in a given laboratory, it would be therefore necessary to determine
c by Stochastikon GmbH (http: // encyclopedia. stochastikon. com )
Copyright °
8
the corresponding measurement uncertainty and then compared it with the
uncertainty of the measurement method under specified conditions.
Instead, with complicated statistical methods on the basis of fairly arbitrary
distributional assumptions and based on the data obtained by the round
robin test, standard deviations are calculated and 2- or 3-sigma intervals and
the required tolerance intervals are determined. Below this practice is briefly
evaluated: :
• Normal Distribution: In many cases, it is proven that the observed
quantity has a normal distribution. Such evidence can not be performed
because there exist in this world no normally distributed variables.
If nonetheless the above statement is made then it reveals a lack of
understanding of stochastics.
• In the evaluation, only the (symmetric) normal or skewed (asymmetric)
distributions are considered. However, merging the values of all laboratories not symmetry or asymmetry is important, but the fact that
the considered random variable is no longer uni-modal.
• Since one can assume that the people participating in a round robin test
of laboratories will perform the measurements very carefully the legitimacy of the use of robust methods and the identification and exclusion
of so-called outliers is at least questionable.
• About the so-called “conventional true value” it is only known that the
true value is unknown. How wrong it actually is cannot be determined
with the calculated standard deviations.
• To account for the uncertainty realistically, would require stochastic
methods of measurement. These do not provide a single measurement
from which it is only known that it is wrong, but a measurement interval that contains the true but unknown value of the quantity of interest
with fixed procedure’s reliability. If such stochastic measurement methods would be applied it would be possible to interpret and compare the
measurement results of the laboratory.
Relevant Standards and Guidelines
For the implementation and evaluation of proficiency testing, there are several requirements in standards and legislation (eg Medical Devices Act). The
key standards are given in the following list:
c by Stochastikon GmbH (http: // encyclopedia. stochastikon. com )
Copyright °
9
• ISO/IEC 17011: Conformity assessment – General requirements for
accreditation bodies accrediting conformity assessment bodies.
• DIN EN ISO/IEC 17025: Allgemeine Anforderungen an die Kompetenz
von Prüf- und Kalibrierlaboratorien.
• DIN EN ISO 17043: Konformitätsbewertung – Allgemeine Anforderungen an Eignungsprüfungen.
• DIN ISO 13528: Statistische Verfahren für Eignungsprüfungen durch
Ringversuche.
• DIN ISO 5725 (Normen-Reihe): Genauigkeit (Richtigkeit und Präzision)
von Messverfahren und Messergebnissen.
• ISO/IEC Guide 43-1: Proficiency testing by interlaboratory comparisons – Part 1: Development and operation of proficiency testing schemes.
• ISO/lEC Guide 43-2: Proficiency testing by interlaboratory comparisons – Part 2: Selection and use of proficiency testing schemes by
laboratory accreditation bodies.
• DIN-Taschenbuch 355
• DIN 384026
• DN 38403 Teil 41 (1984): Deutsche Einheitsverfahren zur Wasser-,
Abwasser- und Schlammuntersuchung; Allgemeine Angaben (Gruppe
A); Ringversuche, Planung und Organisation.
• DIN 38402 Teil 42 (1984): Deutsche Einheitsverfahren zur Wasser-,
Abwasser- und Schlammuntersuchung; Allgemeine Angaben (Gruppe
A); Ringversuche, Auswertung.
• DIN 38402 Teil 45 (2003-09): Deutsche Einheitsverfahren zur Wasser-,
Abwasser- und Schlammuntersuchung; Allgemeine Angaben (Gruppe
A); Ringversuche zur externen Qualitätskontrolle von Laboratorien.
6
The DIN 38402 standard covers several standards that deal with the analysis and
proficiency testing in the context of water, waste water and sludge. For example DIN
38402 Part 71 treats the thopic of “equivalence of two methods of analysis based on the
comparison of the test results on the same sample (same matrix).”
c by Stochastikon GmbH (http: // encyclopedia. stochastikon. com )
Copyright °
10
References and Literature:
• ASTM E1301-95: Standard Guide for Proficiency Testing by Interlaboratory Comparison.
• GUM (1995): Guide to the Expression of Uncertainty in Measurement.
BIPT.
• W. Horwitz (1982): Evaluation of Analytical Methods Used for Regulations of Food and Drugs. Anal. Chem 54, 67A-76A.
• St. Kromidas (Hrg.) (2011): Handbuch Validierung in der Analytik. (2nd
edition), Wiley-VCH, Weinheim, Germany.
• P.J. Lowthian und M. Thomson (2002): Bump-hunting for the proficiency
tester – searching for multimodality, Analyst 127, 1359-1364.
• S. Riedel (2004): Erprobung neuentwickelter Schwingungsmodelle des
sitzenden Menschen mittels Round-Robin-Test. Schriftenreihe der Bundesanstalt für Arbeitsschutz und Arbeitsmedizin. Forschung Fb 1029.
• M. Thomson (2000): Recent trends in inter-laboratory precision at ppb
and sub-ppb concentrations in relation to fitness for purpose criteria in proficiency testing. Analyst 125, 385-386.
• M. Thompson, S.L.R. Ellison and R. Wood (2006): The International
Harmonized Protocol for the Proficiency Testing of Analytical Chemistry
Laboratories. (IUPAC Techn. Rep.), Pure and Applied Chemistry 78, 145196.
Author(s) of this contribution:
Karl Baur
Version: 1.00