Variation in the determination of follicular diameter: an inter

Human Reproduction vol.12 no. 11 pp.2465-2468, 1997
Variation in the determination of follicular diameter:
an inter-unit pilot study using an ultrasonic phantom
G.L.DriscoII1'3, J.P.P.Tyler1 and D.Carpenter 2
^ityWest IVF, 12 Caroline St, Westmead, NSW 2145, Australia
and 2CSIRO Division of Telecommunication and Industrial Physics,
Lindfield NSW 2070, Australia
3
To whom correspondence should be addressed
Ultrasound operators in assisted reproductive technology
units in New South Wales, Australia cooperated in a study
that was conducted to assess their ability to measure fixed
objects embedded in an ultrasound phantom. The results
have demonstrated a large variation ranging between 10
and 25% coefficient of variation on distances between 10
and 32 mm. As ultrasonic imaging is the only component
of assisted reproductive technology not currently controlled
by an external quality assurance scheme, the authors
suggest there may be a need to establish a programme.
Key words: assisted reproductive technology/follicular diameter/quality control/ultrasound
Introduction
The monitoring of ovarian follicular development by transvesical, or more commonly, transvaginal ultrasound is a
standard procedure in assisted reproductive technology treatment. Furthermore with the increasing use of 'down-regulation'
with gonadotrophin releasing hormone (GnRH) analogues (e.g.
lucrin, buserelin, syneral) the endocrine component of follicular
tracking has been reduced and clinical decisions have become
more dependent on the results of the ultrasound scans.
The optimum follicle size from which to induce ovulation
is not precisely known and the decision to give human chorionic
gonadotrophin (HCG) depends on a number of factors including
the total number of follicles recruited, their echogenic appearance by ultrasound (Fukuda et al., 1995; Gore et al., 1995),
the sizes of the leading group and the perceived potential for
hyperstimulation. Thus different clinics have their own criteria.
This ranges between two or more follicles of > 17 mm diameter
to those >21 mm (personal communication with all Australian
assisted reproductive technology units). Considerable debate
continues as to which is the 'correct' approach and about
whether one is demonstrably better than another. However,
according to statistics compiled by the Fertility Society of
Australia (Lancaster et al., 1997) most Australian units appear
to be achieving similar results with respect to pregnancy
success, perhaps suggesting a wide window for inducing
ovulation pre-assisted reproductive technology. Another reason
for this wide latitude may be the calibration and/or operator
variation in the ultrasound assessment of follicles.
European Society for Human Reproduction and Embryology
There have been few studies examining the performance of
ultrasonic follicular measurements in assisted reproductive
technology and those that have been done have examined
observer reproducibility using a single scanner (Forman et al.,
1991) rather than a range of machines likely to be found in
different units. Studies in other disciplines (e.g. obstetrics)
have, however, demonstrated the need for experienced ultrasonographers (Bahmaie and Edmonds, 1996) and the necessity
of instrument calibration (Clark, 1988; Fish, 1990; Dudley and
Griffith, 1996). Similarly there is increasing discussion about
quality assurance in diagnostic medicine (Arger, 1995;
Henderson, 1996; Weber, 1996) as the number of ultrasound
services per 1000 head of population in Australia has increased
from 34.9 to 97.9 over a 4 year period (Hailey, 1996). Similarly
statements from professional bodies allied to ultrasound in the
UK suggest that training and performance review should be
routine and a Consortium for the Accreditation of Sonographic
Education (CASE) has been established to provide an overview
of the education of sonographers (BMUS Bulletin, 1996).
Thus the object of this preliminary project was to assess the
variation in ultrasonic determination of objects approximating
follicular sizes using a standard ultrasound phantom distributed
between seven assisted reproductive technology units and to
make recommendations on the need — or otherwise — of
establishing an external quality assurance (EQA) programme
for ultrasound operators in assisted reproductive technology
units.
Materials and methods
Experimental design
For this preliminary study a standard 'phantom' used for the calibration
of ultrasonic instruments (ATS Multipurpose Phantom Model 539:
Gamasonics Institute for Research and Calibration, PO Box 411,
Mittagong, NSW 2575, Australia) was used. Measurements were not
made of the pre-set patterns within the phantom which would have
alerted the operator to the probable 'correct' value required.
The ultrasound phantom
The phantom is a solid, ultrasound-permeable block of rubberized
material in which are embedded wires and cylinders which, when
scanned by ultrasound, reflect the appearance of either white spots
or black filled circles of different diameters and consistency against
a background echo scattering designed to represent liver tissue. They
are situated at different levels and therefore can be used as both far
and near objects (relative to the scanning probe). The technique of
scanning a phantom is quite critical and the transducer needs to be
aligned correctly to give the clearest picture. Similarly maximizing
adjustments to the machine (e.g. total gain settings) will give clearer
imaging. These, of course, are factors which would also affect the
scanning and visualization of human ovaries.
2465
G.L.Driscoll, J.P.P.Tyler and D.Carpenter
Participants
The phantom was taken to each unit in Sydney and one in Canberra (see
Acknowledgements for a list of the participating assisted reproductive
technology units) and every operator who performed ultrasound
measurements for that programme given the opportunity to take part
in the study. The type of instrument used, the probe frequency
and the status of the operator (trained ultrasonographer, medical
practitioner, nurse specialist etc.) was also recorded. Each was shown
the layout of the phantom before measurements were made.
Measurements
Ten values were requested (reproduced in Figure 1 and indexed as
measurements 1-10 in the results). These were distances between
different-sized 'circles', so had no exact distance by which a 'wanted'
value could be guessed. Within the 10 determinations three sets of
paired measurements were made and the final value (measurement
10) was a calibrated distance of 2 cm. Thus an assessment of an
instrument's calibration and inter-observer variation in assessing
distance could be made. Each operator performed the measurements
without knowledge of the values obtained by the others.
8mm
6mm
4mm 3mm
2mm
echogenic
Figure 1. A diagram (not to scale) of the layout of the phantom.
Participants were requested to measure the distances denned by the
numbered 'lines' (i.e. outer circumference to outer circumference
for measurements 3, 6 and 8; inner circumference to inner
circumference for measurements 2, 5 and 9; diameter for
measurement 4, etc.).
Results
Twenty-four operators from seven units completed the measurements. Five were trained ultrasonographers, 10 were experienced clinical nurse specialists, seven were experienced
medical practitioners and two were scientists with no experience in ultrasound methods. The instruments used were:
Ausonics Opus 1 with 7.5 MHz probe (10 operators), Ausonics
Micro with 7.5 MHz probe (three operators), Siemens with
5.0 MHz probe (two operators), Acuson 128XP4 with 5.0 MHz
probe (one operator), RT 3600 with 5.0 MHz probe (one
operator), GERT with 5.0 MHz probe (one operator), Toshiba
140 with 7.5 MHz probe (one operator), ATL50 with 5.0 MHz
probe (one operator) and Sonoace 1500 with 7.5 MHz probe
(four operators).
A summary of the results for each requested measurement
is given in Table I. Mean values were normally distributed
(mean approximated median value) and the measurements
(mean range 10-32 mm) approximated the clinical situation
in determining follicular diameter. However, the coefficient of
variation was large and ranged between 10.1 and 29.9%.
Furthermore it appeared unrelated to the size of the measurement (correlation of linear regression, r = 0.38) or to the
distance of the object from the probe.
For those measurements that were replicated (3 vs 8; 3 vs
6; 6 vs 8; 2 vs 5) no statistically significant difference could
be found (P > 0.05) using Student's paired Mest. While the
numbers were not large it was also possible to suggest that
probe type (5.0 or 7.5 MHz) and operator experience also did
not markedly affect measurement performance. However some
experienced nurse specialists did have problems in clearly
defining the 'deeper' and smaller measurements (numbers 1,
2, 6 and especially 5) suggesting a lack of knowledge of the
ability to optimize their machine settings. Of interest was that
measurements generated by the two novice participants always
fell within the interquartile range.
The distribution of values for each measurement is diagrammatically represented as box plots in Figure 2 where 'outliers'
can be identified and the notched box represents the interquartile range (25th to 75th percentiles). Thus even with a
defined distance (measurement 10 which was factory set at
20 mm) the range was between 18.0 and 29.4 mm. A line plot
Table I. Summary statistics (in centimetres) for each measurement requested in the pilot external quality
assurance study
Measurement no.a
3
No. of operators'
Mean distance
SD
Coefficient of
variation (%)
Median value
Minimum value
Interquartile range
Maximum value
a
1
2
3
4
5
6
7
8
9
10
22
1.76
0.34
19.0
22
1.06
0.25
23.4
24
3.21
0.58
18.2
24
1.56
0.47
29.9
20
1.04
0.19
18.6
22
3.24
0.52
16.1
24
2.13
0.22
10.1
24
3.16
0.58
18.5
24
1.73
0.32
18.6
24
2.18
0.29
13.3
1.70
1.06
0.30
2.40
1.04
0.70
0.16
1.94
3.18
1.39
0.47
4.30
1.40
1.17
0.30
3.40
1.04
0.64
0.16
1.64
3.20
1.90
0.30
4.87
2.10
1.29
0.12
2.50
3.12
1.86
0.33
5.11
1.80
1.10
0.54
2.30
2.10
1.80
0.27
2.94
Refer to Figure 1.
Where n does not equal 24 some operators were unable to assess a measurement because of technical
difficulties.
b
2466
Ultrasound variation in assisted reproductive techniques
4
5
Measurement
3
4
5
6
7
Measurement number
Figure 2. Notched box plots of all data for each measurement
(refer to Figure 1) showing median values (notch), the interquartile
range (box), 10th and 90th percentiles (whiskers) and outliers.
of the data (Figure 3) further suggests that no one particular
observer or instrument consistently affected the variation seen
in these data.
Discussion
All aspects of pathology laboratories in Australia are required
to be accredited by the National Agency of Testing Authorities
(NATA). An integral component of this registration is demonstration of competence in an external quality assurance scheme.
Similarly the Australian assisted reproductive technology
licensing body [the Reproductive Technology Accreditation
Committee (RTAC)] also requires careful monitoring of invitro fertilization culture laboratories, treatment consent forms,
patient satisfaction with information brochures and counselling
practice etc. Ultrasound measurement of ovarian follicles
remains the only 'uncontrolled' aspect of assisted reproductive
technology treatment.
Notwithstanding the question of suitability of tissue-mimicking phantoms in assessing equipment calibration (Letter to the
Editor, 1985), and the importance of the observer as a significant
source of variability in clinical data assessment (Corson et al.,
1995), this pilot survey has demonstrated a large variation in
the ability of the participants to determine distance in an
ultrasound phantom. It could be argued that this is of little
consequence for assisted reproductive technology units since
pregnancy rates across Australia appear similar (Lancaster
et al., 1995) and an 'eyeball' approach to follicular size and
number may be all that is necessary in clinical practice,
particularly to limit the potential for ovarian hyper stimulation.
However, the corollary of this is that the phantom is made
using stable rigid bodies and thus should not be as difficult to
measure as the ovary where it might be argued that variation
could be even larger.
Thus an error of 18% would mean that a follicle actually
measuring 18 mm could be anywhere between 14.7 and
21.2 mm. While the exact relationship between follicle volume
6
7
number
Figure 3. A line plot for each measurement for each observer
demonstrating that no one instrument or person consistently 'read'
high or low.
and oocyte competence is still not precisely known, it is widely
accepted that an oocyte collected from a smaller follicle
(<16 mm) would generally be less competent than that from
a larger one (>18 mm). Similarly, experienced ultrasonographers are capable of distinguishing healthy from atretic
follicles but admit that it is 'impractical to observe and assess
more than four or five follicles in one ovary by ultrasound', a
scenario often encountered in women stimulated for assisted
reproductive technology (Fukuda et al., 1995).
Furthermore, intracytoplasmic sperm injection (ICSI) is
demonstrating that there is as much variation in oocyte quality
as there is in sperm morphology and motility. Should a woman
decide not to use ovulating drugs for her assisted reproductive
technology attempt then the current approach (i.e. as long as
there is a reasonable group of follicles 'some' will be competent) ceases to be clinically applicable and the need to
maximize oocyte quality based on biochemical and follicular
maturity becomes paramount.
Finally, in this survey the lowest coefficient of variation
was 10%, a value generally considered the upper limit of
allowable error in many pathology tests. The resolution of
ultrasound determination is obviously limited by many factors
(machine settings, patient compliance etc.) and since ovarian
follicles are not usually spherical the interpretation of their
volume will remain subjective. In conclusion this pilot study
suggests the need for an EQA programme for ultrasound
in assisted reproductive technology programmes. Individual
clinics could consider making their own 'perishable' phantoms
(Gibson and Gibson, 1995) for in-house assessment or develop
within their geographic areas an external programme such as
has been established for mammography (Royal Australian
College of Radiology, 1996).
Acknowledgements
The authors thank Dr A.Ekangaki, of Macquarie University, for
statistical assistance, the Ultrasonics Laboratory, CSIRO, for the loan
of the phantom for this study and the following Medical Directors
for allowing their staff to participate in this survey: Dr Geoffrey
Driscoll, CityWest IVF, 12 Caroline St, Westmead. NSW 2145; Dr
Robert Jansen, Sydney IVF, 4 O'Connell St, Sydney, NSW 2000; Dr
Trevor Johnson, Royal Hospital for Women Fertility Group, 253
2467
G.L.Driscoll, J.P.P.Tyler and D.Carpenter
Oxford St, Bondi Junction, NSW 2022; Dr David Macourt, St George
Fertility Centre, 172-174 Forrest Rd, Hurstville, NSW 2220; Dr Ric
Porter, North Shore ART, Royal North Shore Hospital, St Leonards,
NSW 2065; Dr Howard Smith, Human Reproduction Unit, Westmead
Hospital, Westmead, NSW 2145; Dr Martyn Stafford-Bell, Canberra
Fertility Centre, John James Memorial Hospital, Strickland Cres,
Deakin, ACT 2605.
References
Arger, P.H. (1995) Ultrasound Practice Accreditation. American Institute of
Ultrasound in Medicine Reporter, 11, Issue 8, September.
Bahmaie, A. and Edmonds, D.K. (1996) Booking ultrasound examination
performed by obstetric senior house officers — Time to reevaluate? Aust.
NZJ. Obstet. GynaecoL, 36, 389-391.
British Medical Ultrasound Society (1996) Bulletin, August.
Clark, P.D. (1988) Performance checks. In Lerski, R.A. (ed.), Practical
Ultrasound. IRL Press, Oxford, pp. 74-76.
Corson, S.L., Batzer, F.R., Gocial, B. et al. (1995) Intra-observer and interobserver variability in scoring laparoscopic diagnosis of pelvic adhesions.
Hum. Reprod., 10, 161-164.
Dudley, N.J. and Griffith, K. (1996) The importance of rigorous testing of
circumference measuring callipers. Ultrasound Med. Biol., 22, 1117-1119.
Fish, P. (1990) Physics and Instrumentation of Diagnostic Medical Ultrasound.
Wiley, Chichester.
Forman., R.G., Robinson, J., Yudkin, P. et al. (1991) What is the true follicular
diameter: an assessment of the reproducibility of transvaginal ultrasound
monitoring in stimulated cycles. Fertil. Steril., 56, 989-992.
Fukuda, M, Fukuda, Y., Yding Andersen, C. and Byskov, A.G. (1995) Healthy
and atritic follicles: vaginosonographic detection and follicular fluid hormone
profiles. Hum. Reprod., 10, 1633-1637.
Gibson, R.N. and Gibson, K.I. (1995) A home-made phantom for learning
ultrasound-guided invasive techniques. Aust. Radiol., 39, 356-357.
Gore, M.A., Nayudu, PL., Vlaisavljevic, V. and Thomas, N. (1995) Prediction
of ovarian cycle outcome by follicular characteristics, stage 1. Hum. Reprod.,
10, 2313-2319.
Hailey, D.M. (1995) Ultrasound makes rapid gains in Australia. Diagnostic
Imaging Asia Pacific, June, 25-27.
Henderson, K. (1996) Quality assurance and the maintenance of professional
standards. Australian Society Ultrasound Medicine Newsletter, June, 4.
Lancaster, P., Shafir, E., Hurst, T. and Huang, J. (1997) Assisted Conception
Australia and New Zealand 1994 and 1995. Fertility Society of Australia,
4-6.
Perkins, A.C. (1985) The suitability of tissue mimicking phantoms for
equipment calibration. [Letter.] Ultrasound Med. Biol., 11, 193-194.
Royal Australian College of Radiology (1996) Mammography quality
assurance programme. Newsletter 48, 22-23.
Weber, A. (1996) Voluntary accreditation aims to deter regulation. Diagnostic
Imaging Asia Pacific, January, pp. 45-47.
Received on February 4, 1997; accepted on July 8, 1997
2468