Med Bio Eng Comput (2008) 46:109–120 DOI 10.1007/s11517-007-0299-2 ORIGINAL ARTICLE An informative probability model enhancing real time echobiometry to improve fetal weight estimation accuracy G. Cevenini Æ F. M. Severi Æ C. Bocchi Æ F. Petraglia Æ P. Barbini Received: 4 May 2007 / Accepted: 28 November 2007 / Published online: 10 January 2008 International Federation for Medical and Biological Engineering 2007 Abstract A multinormal probability model is proposed to correct human errors in fetal echobiometry and improve the estimation of fetal weight (EFW). Model parameters were designed to depend on major pregnancy data and were estimated through feed-forward artificial neural networks (ANNs). Data from 4075 women in labour were used for training and testing ANNs. The model was implemented numerically to provide EFW together with probabilities of congruence among measured echobiometric parameters. It enabled ultrasound measurement errors to be real-time checked and corrected interactively. The software was useful for training medical staff and standardizing measurement procedures. It provided multiple statistical data on fetal morphometry and aid for clinical decisions. A clinical protocol for testing the system ability to detect measurement errors was conducted with 61 women in the last week of pregnancy. It led to decisive improvements in EFW accuracy. Keywords Probability model Neural networks Ultrasound Echobiometry Fetal weight estimation 1 Introduction Many decisions in obstetrics depend on gestational age (GA) and fetal weight (FW). Accurate ultrasound G. Cevenini (&) P. Barbini Department of Surgery and Bioengineering, University of Siena, Viale Mario Bracci 16, 53100 Siena, Italy e-mail: [email protected] F. M. Severi C. Bocchi F. Petraglia Department of Pediatrics, Obstetrics and Reproductive Medicine, University of Siena, Viale Mario Bracci 16, 53100 Siena, Italy examination performed before 20 weeks of gestation enables true GA to be estimated [42]. On the other hand, estimation of FW (EFW) using standard biometric parameters, usually related to geometric dimensions of the fetal head, abdomen and long bones of extremities, is still problematical [18]. Monitoring of fetal growth is fundamental in modern perinatology because it is strictly related to fetal/neonatal wellbeing [43]. Moreover, identification of abnormal intrauterine growth patterns enables better pregnancy management [10, 21, 43]. In the last 30 years, many methods have been developed to improve EFW accuracy, most based on formulae derived by regression analysis [3, 16, 22, 23, 25, 27, 33, 35, 36, 38, 41, 44], or on physical models [2, 14, 17, 29]. Artificial neural networks (ANNs) and volumetric methods based on three-dimensional (3D) ultrasonography were also recently proposed [11, 20, 40]. Clinical use of these mathematical models led to introduction of EFW in ultrasound reports. Although effective in the original papers, ultrasound operators know that every estimation model loses efficacy when applied in clinical practice [9, 17]. The differences between accuracies in the literature and those obtained in local clinical institutions are due to many factors, the ones being significant statistical dissimilarity between original and local populations and samples, diversities in echobiometric measurement procedures and lack of model generalization. Little attention has usually been paid to generalization, which refers to a model ability to provide the same accuracy on data not used for model identification [5]. Specifically, empirical formulae do not guarantee a good compromise between model flexibility to fit all useful information and robustness to filter useless data variability. Too many model parameters have been estimated from few ultrasound cases 123 110 near delivery. Sometimes fetuses with non-homogeneous weight or GA intervals not representative of the whole population are used. In other cases the clinical condition of women in labour is neglected or incorrectly reported. Although attempts to reduce statistical sample errors and lack of generalization power by selecting the most accurate and representative models have been made, a percentage mean absolute error less than 7–8% of the true BW has never been achieved in current clinical practice, with 25% (or more) of estimates having an absolute error over 10% [29]. Unfortunately, since most obstetricians take 10% as a critical error threshold above which EFW cannot guarantee correct clinical management, the method cannot yet be considered reliable for clinical decision-making [7, 17]. Though many attempts have been made to reduce estimation errors by means of models specialized in particular ranges of FW or GA [16, 23, 36], or derived from sophisticated 3D and ANN methods [11, 40], it has not proven possible to significantly reduce the error, because it is presumably due to many different unpredictable factors (human, environmental, instrumental, technological, etc.) associated with digital processing of echobiometric values [17]. Since the 10% error limit for all populations of fetuses is not so far away, there is great interest in finding solutions that could improve EFW accuracy enough to reach the goal. Actually, the only way to enhance fetal weight prediction accuracy seems to be reduction of operator measurement error. Indeed, readings made by operators with long experience in fetal ultrasound have significantly, but not still sufficiently, lower errors. This paper describes a computerized information system to help ultrasound operators in the control and interactive correction of measurement errors in two-dimensional fetal biometry. It is based on a Gaussian multivariate (multinormal) probability model, the parameters of which are identified by ANNs trained with sample data representing a wide fetal population. Therefore, it properly belongs to machine learning methods which are widely used in computing applications to support clinical decision making. The effective level of real time improvement in the accuracy of EFW was tested clinically in a small sample of pregnant women. 2 Methods 2.1 Population and samples To design the model we used data of 4,075 fetuses in the last week before birth, recorded in our clinics over the last 10 years. Only fetuses with evident malformations were excluded from the database which was divided into three samples equally representative of the fetal population: 123 Med Bio Eng Comput (2008) 46:109–120 a training set and a validation set of the same size from the first 3,200 fetuses, the former by odd positions and the latter by even positions of the chronologically ordered list; the last 875 cases constituted a testing set. The training and validation sets were used for model training, whereas the testing set was used to check that model performance remained statistically equivalent with new data (generalization ability). Finally, the system was applied in clinical practice to 61 pregnant women in the last week before delivery to verify its effective capacity to support interactive correction of real-time ultrasound measurements and to improve EFW accuracy. 2.2 Measurement variables Fetal echobiometric data, including biparietal diameter (BPD), head and abdominal circumferences (HC, AC), and femur length (FL), were measured by transabdominal ultrasound scan with a Siemens Sonoline Elegra Millenium Edition ultrasound system or a MYLAB Family instrument (ESAOTE spa, Genova, Italy). Gestational age (GA) in weeks was established by accurate menstrual history confirmed by ultrasound examination before the 20th week of gestation. True FW was determined by measuring birth weight (BW) with a precision balance soon after the delivery. BW was the dependent variable used to train our model to estimate FW from ultrasound scans just before delivery. Essential pregnancy data, namely amniotic fluid volume (AF), number of fetuses (FN) and number of days between last ultrasound examination and delivery (US-D) were also entered in the training process. AF was conceived as a binary-coded qualitative variable with four categories: normal, absent, reduced and augmented volume. US-D ranged from 0 (i.e. ultrasound examination and delivery on the same day) to 6 (i.e. ultrasound examination 6 days before delivery). 2.3 Multinormal probability model To describe the probability space of the ultrasound measurements we used the multivariate Gaussian density function: pðx=wÞ ¼ 1 d=2 ð2pÞ jRðwÞj1=2 1 exp ½x lðwÞT R1 ½x lðwÞ w 2 ð1Þ where T is the vector transposition operator, d = 5 the parameter space dimension, x = [BPD HC AC FL GA] the Med Bio Eng Comput (2008) 46:109–120 111 vector of current echobiometric parameters, w = [BW AF FN US-D] an information vector conditioning density function (1), and l ðwÞ and R ðwÞ the mean vector and covariance matrix, respectively, of parameters which depend on w and have to be estimated to completely define the probability model (1). 2.4 Artificial neural networks Three feed-forward ANNs were designed to estimate the parameters l ðwÞ and R ðwÞ of the multivariate normal model. They were made sufficiently flexible (sufficient number of hidden neurons and appropriate functions of neuron activation) to encompass all deterministic data patterns. Proceeding by trial and error, we selected ANN architecture having ten neurons in a single hidden layer. It offered a good compromise between simplicity and generalization ability through error minimisation. Hidden neurons were equipped with biased tansig activation functions. The output neurons had linear activation for estimating model parameters. The input data were standardized before presentation to the network, so as to have zero mean and unit standard deviation. Standardization has been shown to increase the efficiency of ANN training [6]. The first ANN, ANN1, was designed to estimate the model mean vector, l ðwÞ; for each combination of pregnancy information w, considered as input data. A block diagram of ANN1 is shown in Fig. 1, where the Fig. 1 Block diagram of the feed-forward ANN training process BW (T) training (T) and prediction (P) phases are in the upper and lower left sides, respectively. Specifically, ANN1 is trained to recognize the set of echobiometric measurements x, i.e. BPD, HC, AC, FL and GA, from input data w, i.e. BW, AF, FN and US-D. Once trained, ANN1 predicts the corresponding most likely (expected) parameter values x; i.e. BPD; HC; AC; FL and GA; for any a given set of pregnancy information. These expected values are assumed as a reliable estimation of the mean parameter vector l ðwÞ: The ANN1 prediction phase is reported in Fig.1 because it is necessary to obtain parameter deviations, [xi - li(w)], (i = 1, 2,…, 5), namely the differences between an echobiometric measurement, xi, and its corresponding mean value, li, estimated by ANN1 as a function of input data w. In the centre of Fig. 1 the calculation of deviations is illustrated, together with their squared values, i.e. deviances di = [xi - li(w)]2, and all their paired products, i.e. codeviances didj = [xi li(w)][xj - lj(w)] (i = j = 1, 2,…, 5). The two remaining ANNs, ANN2 (upper right side of Fig. 1) and ANN3 (lower right side of Fig. 1), were then trained to recognize deviances and codeviances, respectively. Once trained, ANN2 and ANN3 could therefore estimate the expected values of deviances and codeviances, E{[xi - li(w)]2} and E{[xi - li(w)][xj - lj(w)]}, respectively, which were taken as suitable estimations of variances r2i and covariances rirj of model covariance matrix R ðwÞ: Of all the pregnancy information, only BW was assumed to affect the model covariance matrix. It is BPD 2 δ BPD HC 2 δ HC AF ANN1 δ i2 AC FN 2 δ AC FL δ FL2 GA 2 δ GA US-D (T) ANN2 BW - - - - US-D (P) GA FL FN ANN1 AC AF HC BW BPD δi δj δ BPDδ HC δ BPD δAC δ BPD δFL δ BPD δGA δ HC δAC δ HC δ FL δ HC δGA δ AC δFL δ AC δGA δ FL δGA (T) ANN3 BW 123 112 well-known that the inferential process exploits a reduction of data dimensions, especially when a large number of parameters (matrix elements) have to be estimated [6]. Significantly improved accuracy of estimates largely compensates for the lack of other pregnancy information. ANN2 and ANN3 were therefore equipped with a single BW input (see right of Fig. 1). Their prediction phase is not reported in Fig. 1, to avoid unnecessary detail. All the ANNs were trained using a batch training method which updates synaptic weights and neuron biases only after all inputs and targets have been presented, i.e. after each iteration. An iterative training algorithm with gradient descendent momentum and adaptive learning rate was used to minimise the mean squared error between real and predicted outputs. To limit the influence of training algorithm initialization on the solution, we performed 99 training sessions starting from 99 different randomly-selected initial values of ANN parameters (i.e. synaptic biases and weights), and chose the session giving the median error value (50th sorted value). The early-stopping method was applied directly during the training process to control ANN generalization power and avoid the problem of overfitting [6, 24]. At each iteration, training and validation errors were calculated from data used to train the ANN (training set) and to validate generalization (validation set), respectively. Training was stopped when the validation error did not decrease for ten consecutive iterations. Testing data was then used to confirm generalization on a third set of cases that had not been used during training. 2.5 Fetal weight estimation The principal aim of this study was to predict FW, which was strictly related to BW for training ANNs. BW is the first component of pregnancy information vector w and cannot be known for an unborn fetus. In the case of a fetus, whose mathematical expressions will be denoted with an upper symbol *, knowledge of the ~ ; that is AF, FN and USother three components of vector w D, and its measured echobiometric parameters, ~ x; allows ANN1 to identify the vector of expected parameters, ~ðBWÞ; as a function of unknown BW. It identifies five l monotonic curves on which five expected values of BW can be found corresponding to actual measurements ~x; they are expressed by the five-dimensional vector BWexp. The most probable value of BW, BWmp, corresponding to ~ x; can be derived from model (1) by calculating the volume of the confidence region in parameter space, as follows. Once the available pregnancy data of information ~ are known, volume depends only on its first vector w unknown component, BW, and describes the cumulative 123 Med Bio Eng Comput (2008) 46:109–120 conditional probability of x~ representing the strength of association between true fetal weight and its just-measured ultrasound parameters. The higher the volume, the more measurements are expected to be mutually congruent and accurately related to the associated weight. The confidence region can be described mathematically by considering the scalar quantity in the exponential term of model equation (1): Q ¼ dT R1 d ð2Þ where d = x - l represents the vector of generic parameter deviations. Q is a quadratic form which was demonstrated to be 2 1Þ distributed as dðn nðndÞ times a Fisher density function, F, with d and (n - d) degrees of freedom [28]. In our application, the number of fetuses n, used for model designing, was much greater than the parameter space dimension d, so that the valid approximations (n2 - 1) % n2 and (n 2 1Þ d) % n, and therefore dðn nðndÞ ffi d; were used for simplifying. Thus, the confidence region at probability level a can be defined as the locus of parameter deviations, d; which satisfy the following inequality: Q d Fc1 ðd; n; aÞ ð3Þ F-1 c where is the inverse of cumulative F distribution, Fc, with d and n degrees of freedom and evaluated at the probability level a. Equation (3) describes a five-dimensional hyperellipsoidal region. The probability, ~a; defines the volume of the hyperellipsoid on whose surface the current measurements, ~ x; lie. It can be derived by inverting Eq. (3): ~ ~a ¼ Fc ðd; n; Q=dÞ ð4Þ ~ and Q~ is calcuwhere Fc has evaluated at the value Q=d ~ ~ lated from formula (2) using d ¼ x l: The quadratic form of (3) implies a unique maximum, ~amax ; for ~a: It corresponds to a value of BW necessarily located in the interval between the minimum and the maximum value of vector BWexp. Though ~amax could theoretically be evaluated analytically, for practical reasons we did a numerical search among all ~a values corresponding to the same number, N, of BW sampling values, spaced at steps, DBW, of 10 g, that is N ~amax ¼ maxi f~aðBWi Þg 1 BW1 ¼ min BWexp BWN ¼ max BWexp BWiþ1 ¼ BWi þ DBW; DBW ¼ 10 g ð5Þ BWmp was chosen to correspond with the region of maximum probability volume, ~amax ; and was assumed as Med Bio Eng Comput (2008) 46:109–120 113 the current EFW, even long before birth. It represents the most plausible value of FW associated with the available pregnancy information and the current echobiometric measurements, taken together. ~ ¼ lðBWmp Þ; of expected parameter values The vector, l evaluated at BWmp, provides model deviations, ~ dm ¼ ~x ~; from actual measurements, and their probabilities, ~am ; l which account for measurement errors and morphological characteristics of fetal physiopathology. ~ am can be derived by projecting the multivariate normal model (1) along any generic parameter axes, xk (k = 1, 2,…, 5), as follows: ( ) ~ k Þ2 1 1 ðxk l pðxk =wÞ ¼ pffiffiffiffiffiffiffiffiffiffi2 exp ð6Þ 2 ~2k r 2p~ rk ~k is of course the kth component of l ~ and r ~2k is the where l corresponding variance from the principal diagonal of ~ ¼ RðBWmp Þ: covariance matrix R Any component ~ am;k of vector ~ am can therefore be calculated from (6): 8 Z x~k > > > ~k pðxk =wÞ if x~k l <1 2 1 ~ ð7Þ am;k ¼ Z þ1 > > > ~k pðxk =wÞ if x~k [ l :1 2 x~k Accuracy of EFW was evaluated by computing the mean absolute percentage error, MAE%: MAE% ¼ N X AEi 100 i N 1 experience) was chosen to perform fetal biometry. Ultrasound data were entered in the model to evaluate the probability of agreement among measured fetal biometric parameters and actual EFW. On the basis of clinical evidence, model-estimated maximum probability, ~amax ; corresponding to the most probable EFW (i.e. BWmp) and congruence probabilities of the parameters, ~am ; the operators decided autonomously whether or not to correct the first set of measurements and to proceed with further refined measurements. Specifically, for each set, ~x; of measured echobiometric parameters, the operator was suggested to consider possible measurement errors when at least one of the ~am parameter probabilities was less than 50% or when the EFW probability, ~amax ; was less than 50%. In this case, the operator decided to make new ultrasound measurements or to keep the current measurements, depending on his/her clinical experience and on case-specific clinical information. Improvements of accuracy in EFW were assessed by applying our interactive method on-line to the 61 abovementioned pregnant women in the last week before delivery. We calculated mean and maximum AE% (MAE% and AEmax%) and the percentage of FW having AE% greater than 10% (AEgt10%). The effectiveness of measurement error correction was also evaluated using some mathematical models from the literature [3, 14, 22, 25, 33, 35, 44] proven to give performance equivalent to our model by error comparison using the non parametric statistical test of Wilcoxon [1]. ð8Þ jEFWi BWi j AEi ¼ BWi 3 Results where AEi is the relative absolute error of the model in predicting the i-th fetal weight. 3.1 Model estimation of fetal weight 2.6 Clinical evaluation of model performance Our method for real-time control of fetal echobiometry was then tested for its effective ability to detect and correct measurement errors and therefore improve accuracy in EFW. Ultrasound parameters of 61 fetuses were evaluated within 5 days of delivery in the Department of Pediatrics, Obstetrics and Reproductive Medicine, University of Siena, by real-time interaction with our multinormal model, implemented numerically by software developed in Matlab language [19]. To investigate whether the system was able to appropriately correct measurement errors difficult to detect and to significantly improve the accuracy of EFW, an obstetrician with good experience in ultrasound (at least 2 years Model performance was statistically equivalent for the training, validation and testing data sets (Wilcoxon test, p [ 0.05). We therefore report the results for the entire data set used for model design. Figure 2 shows the distribution of percentage error in relation to birth weight for the multinormal probability model and the seven models which gave statistically equivalent performance on the 61 data items used for evaluating our model in real-time clinical practice. Table 1 gives the MAE% and the percentage of cases with AE% greater than 10% (AEgt10%) for each model. As we can see (Fig. 2), only our proposed multinormal model, by virtue of its probability nature, has uniform non-biased behaviour over the whole range of BW. On the contrary, all the other models based on regression techniques have an error distribution strongly influenced by training data density in BW space, with the only exception being the Hadlock model, which has moderate bias because it was trained on a data set having a 123 114 Med Bio Eng Comput (2008) 46:109–120 Fig. 2 Distribution of percentage error in relation to birth weight in our multinormal model and the other seven models selected to give statistically equivalent performance with our clinical data quite uniform BW distribution [22]. Table 1 shows that this model had errors very similar (MAE% = 7.81, AEgt10% = 30.8%) to our model (MAE% = 7.86, AEgt10% = 31.3%). In particular, Fig. 2 shows that the Ott [33], Combs [14], Woo [44] and Robson [35] models overestimate low BWs and underestimate high BWs, whereas the Hill [25] and Benson [3] models have different biases, underestimating low and high BWs and overestimating intermediate BWs. 123 The lowest performances in Table 1 are shown by models particularly biased at high BWs. Cases with high errors generally also had low probabilities associated with our model EFW, presumably due to ultrasound measurement errors. Probability region boundaries with low probability values are therefore an inspection area in which measurement errors should be checked and where the accuracy of EFW could improve. Med Bio Eng Comput (2008) 46:109–120 115 Table 1 Model performance evaluated on the whole set of data (training, validation and testing sets) used to design the multinormal model MAE% AEgt10% Multinormal 7.86 31.3 Ott 7.45 27.2 Combs Hill 8.43 8.00 33.1 29.7 Woo 7.53 28.7 Benson 8.43 32.6 Hadlock 7.81 30.8 Robson 7.74 30.0 Model Mean absolute percentage error MAE%; percentage of fetuses estimated to have an AE% greater than 10% AEgt10% A prototypical numerical implementation of our model is shown in Fig. 3 that reports the screen hard copy of graphical user interface of the underlying software. In the right side of Fig. 3 we have gestational information ð~ wÞ; actual measurements ð~ xÞ; probabilities of congruence among them ð~ am Þ and their model-estimated expected values ð~ lÞ: The lower the probability of parameter congruence, the more suspect that parameter has to be considered. High deviation ð~ dm Þ from expected values may be due to measurement errors. Excessively low probability values or low values of more than one parameter suggest that the ultrasound session should be repeated. Figure 3 (left side) shows the five plot windows of most probable parameter values (black lines) and standard deviations (light blue lines) in relation to BW, as estimated from ANNs. Dots around curves represent training data. On the top of the graphic windows are the EFW (BWmp) and its multivariate probability ð~amax Þ: Again, the lower this probability, the more high measurement errors, or unusual body conformation, or both, can be expected. When ~ amax is particularly low, at least one of the congruency probabilities ~am is low as well. Dashed blue lines underline both EFW (BWmp, vertical lines) and its corresponding modelestimated expected parameter values ð~ l; horizontal lines). At the bottom of each plotting area, the univariate expected EFWs (BWexp, vertical dashed red lines) are reported with the measured parameter values ð~x; horizontal dashed red lines). The multivariate most probable EFW, BWmp, is of course between the minimum and maximum of five univariate BWexp values. Figure 3 shows an example of EFW by our system. It concerned a fetus at 40 weeks. The system indicates that measured head circumference (HC = 350 mm) has a low probability (10%) of being congruent with respect to other fetal biometric parameters and an EFW of 3,331 g (probability 13%). This could mean: (1) that the HC measurement is incorrect and that it needs to be measured again; (2) that fetal HC is correct but is bigger than expected because of hereditary predisposition; (3) that HC is bigger for pathological reasons. Only the operator experience, if necessary with other clinical information, can answer this question. Fig. 3 Graphic user-interface of interactive software for fetal echobiometry control and correction, to improve EFW accuracy 123 116 Med Bio Eng Comput (2008) 46:109–120 3.2 Clinical evaluation of model performance experienced operator. After correction (excepting two models), the percentage of cases with an error above 10% reduced to zero, as shown in Table 2. Maximum error was lower or just a little higher than 10%. In 16 out of 61 cases (26.3%) fetal biometry was measured once and in 45 cases it was repeated two or more times, to a total of 153 measurements. System performance was assessed by comparing its 61 initial FW estimates with those obtained without (16 cases) or with one (3 cases) or more (42 cases) re-measurements of ultrasound parameters associated with low (less than 50%) congruence probabilities. For comparison we used EFW, derived from 182 formulas (from 59 published papers) [17]. Considering the 61 initial estimates, seven formulas [3, 14, 22, 25, 33, 35, 44] showed a performance statistically equivalent to our system (Wilcoxon test, P [ 0.05). All other formulas gave significant higher errors. Table 2 shows the performances of all models. It is evident that correction of detected errors yielded statistically significant improvements not only in our model EFW (MAE% from 6.5% to 2.6%) but also when the new biometry was tested by the seven best models (i.e. Hadlock formula MAE% from 6.7% to 3.5%), thus confirming that the system is able to correct measurement errors that affect model performance, worsening their accuracy. In particular, although the Hadlock model showed the second best decrease in MAE% after our system, we found a drastic reduction in error variability, with a maximum error of 9.0% (in the same fetus), lower than that made by our system (maximum error of 10.7%). Nevertheless, this maximum error of 10.7% is acceptable, because it concerns a normal weight fetus (real weight 3,640 g) that was underestimated by the system (EFW equal to 3,250 g). Other models also showed very good performance with few errors above 10%. In the cases we analyzed, MAE% was low at initial estimations because the measurements were made by an 4 Discussion Accurate prediction of BW by ultrasonographic measurement of classical fetal morphometric parameters plus other related pregnancy data, such as gestational age, amniotic fluid volume and number of fetuses, is of considerable interest in obstetrics, enabling clinicians to more accurately predict infant morbidity and mortality [17]. Moreover, EFW in utero is of great clinical interest for monitoring fetal growth [31, 34] and may have a central role in major medical decisions in critical conditions of preterm delivery and fetal macrosomy [15, 20, 35, 36]. Although a lot of sophisticated mathematical formulas and models have been developed in the last 30 years [3, 11, 14–17, 20, 22, 23, 25, 27, 29, 33, 35, 36, 38, 41, 44], estimates still typically have too high an error variance, preventing reliable clinical use [13, 15, 17, 29]. Even operators with proven ability in ultrasound examination provide remarkably high percentages (15–25%) of fetuses whose BW is estimated with an AE% greater than 10%. This problem seems difficult to overcome because the many errors of fetal ultrasound evaluation are presumably due to technological, environmental, intra- and interobserver variability in fetal measurement and so forth [17, 29]. There are currently unlikely to be major revolutions in technology, ultrasonographic practice and other methods that could significantly improve accuracy of measurements and/or their ability to predict BW more reliably. At the moment, it is not at all easy to quantify errors, and Table 2 Model performance evaluated in 61 pregnancies before (initial measurements) and after (ultimate measurements) zero (16 cases), one (3 cases), or more corrections (42 cases) of the initial ultrasound measurements Model Initial measurements Ultimate measurements MAE% AEmax% AEgt10% MAE% AEmax% AEgt10% Multinormal 6.5 19.3 13.1 2.6 10.7 1.6 Ott 5.7 16.9 9.8 4.3 9.6 0.0 Combs Hill 5.8 6.2 19.1 18.6 11.5 13.1 4.2 4.6 12.2 10.2 1.6 1.6 Woo 6.6 20.1 18.0 4.6 9.8 0.0 Benson 6.7 19.1 16.4 4.9 13.6 3.3 Hadlock 6.7 18.1 16.4 3.5 9.0 0.0 Robson 6.7 16.8 16.4 5.4 14.7 9.8 Corrections were decided autonomously by the operator using an interactive system based on the proposed multinormal model for fetal weight estimation: absolute percentage AE%; mean absolute percentage error MAE%; maximum absolute percentage error AEmax%; percentage of fetuses estimated to have AE% greater than 10% AEgt10% 123 Med Bio Eng Comput (2008) 46:109–120 particularly to discriminate errors due to intra- and interobserver variability in ultrasound measurements. Efforts must be made to minimise this variability if EFW is to be considered clinically useful [17]. Many recent attempts have been made to reduce the estimation error on lower and higher FWs, where the clinical interest is of course focused. In general, clinicians distinguish these two critical intervals of weight from an intermediate one that typically ranges from 2,500 to 4,000 g [16, 20, 23]. Almost all models for EFW exhibit a worsening of accuracy in critical weight classes (below 2,500 g and above 4,000 g) where lower/higher weights are usually over/under-estimated [13, 16, 29]. Most mathematical models are derived from statistical regressions and account nonlinearly for ultrasound measurements by fitting experimental data. They are therefore most accurate for intermediate weights, where experimental data has higher density, and produce increasing biases going from median to lower or higher FWs where data density progressively decreases. Concerning this problem it is really important to underline that it is in the critical weight classes that weight estimation becomes fundamental from a clinical point of view. A dangerous increase of the rate of false normal weights arises. In other words, such biased models tend to reassure excessively about a normal FW, correctly identifying only very critical conditions that can be detected by simple qualitative investigations. Models specialized in critical weight ranges have also been constructed and tested: they are sometimes much more accurate in the range where they have been fitted and dramatically less accurate elsewhere, as would be expected [15, 17, 23, 29, 35, 36, 38, 41]. The use of these specialized models therefore requires prior knowledge about the weight range in which to classify the fetus, leading to dangerous amplification of errors in borderline areas which are of critical clinical interest. This has also legal implications for ultrasonographers who may make gross errors with severe consequences for maternal and fetal health. Moreover, there have been several studies to evaluate the efficacy of mathematical models related to specific GA intervals [32, 41]. Although GA intervals are better defined than weight intervals, they are nevertheless affected by gestational age estimation precision, that becomes less accurate as pregnancy goes on, and it is only partially related to microsomic and macrosomic fetuses. In our opinion, the use of mathematical models specialized for specific FW and/or GA ranges can therefore be dangerous, of little clinical interest and not significantly better than those applicable to the entire fetal population. In other words, they are of no help. All other efforts to decrease AE% by introducing correction factors in the algorithms and new information, such as amniotic fluid volume, number of fetuses and maternal 117 pathologies, or non-routine echobiometric parameters, have failed to bring effective improvements [8]. Moreover, more recent mathematical models, besides the above mentioned limits, are sometimes based on echobiometric parameters difficult to obtain, particularly by unskilled operators [8, 37, 40]. Specifically, three-dimensional (3D) ultrasound enables volumetric parameters such as fetal thigh, upper arm and abdomen to be measured for EFW. Although preliminary studies seems to indicate improvements [40], doubts remain about the utility of 3D for a substantial improvement in the accuracy of EFW [17]. Moreover, 3D ultrasound systems are expensive, not as widespread as 2D systems, and unfamiliar for operators doing routine fetal biometry. In any case, if the superiority of 3D ultrasound systems were established, our model could be easily extended to volumetric measurements. Today, about ten models are considered to give the best, not significantly different performances and none give a MAE% below 7–8% [15, 17, 29]. We chose to tackle the problem of reducing human error in the use of ultrasound devices for fetal biometry by significantly improving the accuracy of EFW. An interesting attempt to control ultrasound measurement errors by enhancing the fetal border and reducing noise was recently proposed for evaluation of nuchal translucency thickness [30]. Its impact on fetal echobiometry for improving the accuracy of EFW should be investigated. We designed a weight-dependent Gaussian probability model [1, 28] over the whole range of BWs, which avoids the above-mentioned biases and provides detailed information about the reliability of measurements through interactive software, allowing redefinition of measurements and real-time correction. Model parameters were estimated from a large database of 3,000 fetuses, collected by ultrasound operators of proven experience, though presumably containing measurement errors. Our hypothesis was that by correcting or limiting these errors, we could obtain an EFW of acceptable accuracy to protect fetal and maternal health and reduce wrong medical decisions, which sometimes also have legal implications. In line with Dudley [17], we consider that insufficient accuracy in EFW depends on excessive intra- and interobserver variability of measurements. The great advantage of using a multivariate Gaussian model is that it assigns probability values to the different ultrasound measurements and to EFW. The model is designed and trained on ultrasound data measured by experienced ultrasound operators who carefully followed the standardised protocols for correct echobiometry [4]. It can therefore guide operators to follow its reliable statistical representation suggesting repetition of divergent readings to reduce errors. We assumed that human errors occur more frequently in the space of ultrasound measurements where the model 123 118 indicates lower probabilities of congruence among biometric parameters. However, low probabilities can also arise from fetal pathology or peculiar morphology, such as maternal diabetes, unusual parental build and abnormal fetal growth. Though these zones may not be distinguished by ultrasound examination alone, they are both of great clinical interest. Thus, when operators encounter low model probabilities, they are alerted to investigate more thoroughly than usual and to repeat suggested biometric measurements. Two distinct situations are possible so that new measurements can be: (1) the same as before and/or still associated with low probabilities; (2) substantially different but in the direction of model expected values, increasing the probability of congruence with other fetal parameters. In the first case, there may be abnormalities suggesting the need to review other clinical data, such as maternal/paternal build and pathologies. In the second case, measurement errors may be detected and corrected. In both situations, at least a third session of measurements is recommended for confirmation. If any disagreement still remains between measurement sessions, operators should decide on the basis of other clinical information and/or experience. Since our method incorporated certain clinical information about pregnancy, it was convenient to use an ANN approach [24] to estimate multinormal model parameters (i.e. mean vectors and covariance matrices), that were made to depend on pregnancy data and FW. The model dependence on pregnancy information gives a more accurate probability but makes the problem of estimating its parameters from sample data unfeasible with common statistical methods, such as multivariate regression, which would be inaccurate. For example, means of the parameter vector could be estimated by entering pregnancy variables in multivariate linear regression models where echobiometric measurements are assumed as dependent variables. Unfortunately, all regression techniques are very sensitive to empty regions in observation space and to outliers [1, 5, 6, 12, 28], and are most accurate where observations are densest. Since in clinical application there is great interest in regions with low data density, e.g. macrosomic and microsomic fetuses, we choose an ANN approach to overcome the many limits of regression technique [6, 24, 26]. ANNs are sophisticated machine learning methods which make it possible to express the knowledge contained in experimental data with great flexibility and precision, and provide a uniform description, without discontinuities, of the input-output relationship. They can therefore determine expected output values with satisfactory accuracy, by interpolating missing data even in multivariate space with few sparse observations [6]. Other important advantages of ANNs with respect to statistical regression models are that it is not necessary to specify model structure, hypotheses 123 Med Bio Eng Comput (2008) 46:109–120 about statistical data distribution are unnecessary, they are able to describe nonlinearities, naturally take correlation of input variables into account and can be trained with examples like humans [24, 26, 39]. ANNs have recently been successfully applied in many fields of medicine. All that is required is a sufficiently large, representative set of training examples. The main difficulty with ANNs is their training which must be done with care to avoid overfitting, a tendency of ANNs to learn even training data variability which cannot be generalized to the whole phenomenon. There are many methods of ensuring ANN generalization power, for example regularization techniques, growing and pruning algorithms, genetic algorithms and early-stopping (ES) procedures [6, 26]. We applied the ES which is widely used to train ANNs by virtue of its fast computational time [6, 24]. It divides the available data into training and validation sets. Generalization is ensured by stopping the training process at the iteration when the ANN begins to overfit, that is when the error computed on the validation set starts to increase. However, since the validation set is involved in the training process in any case, it must not be used for estimating the generalization error. We therefore tested the ANNs with the third set of data (testing set) which had not been used during training [6]. When we tested our model in clinical practice to correct operator measurement errors in real time, we obtained very encouraging results. Fetal biometric measurements were performed by an experienced operator because we wanted to understand whether under optimum conditions, it was possible to obtain errors below 10%. We were successful in this endeavour. The fact that we obtained a significant lowering of MAE% when we fitted the corrected parameters in the best estimation models of the literature, confirms that our system can in fact help operators to correct measurement errors. The system also promises to be useful for training less experienced sonographers and could be used as a quality control system for fetal biometry. By reducing human error, it enhances EFW and clinical obstetric management. 5 Conclusions A multinormal probability model for the estimation of fetal weight was implemented numerically to provide clinical indications about the type and size of measurement errors in real-time fetal echobiometry. The model compared actual measures with expected values and associated probability values with EFW, indicating the reliability of EFW in terms of congruence with ultrasound measurements. Low probabilities suggest more accurate repetition of suspect measurements and help ultrasound operators to Med Bio Eng Comput (2008) 46:109–120 interpret fetal morphology by distinguishing between measurement errors and real pathophysiological conditions. Compared to other EFW models of equivalent accuracy, probability models also have the major clinical advantage of avoiding over- and under-estimation of micro- and macrosomic fetal weights. Clinical testing of the model on a sample of 61 fetuses revealed its good performance in correcting measurement errors and showed a remarkable improvement in accuracy of EFW, confirmed by other mathematical models of proven accuracy. Our proposed interactive software therefore offers valid support for training operators in fetal echobiometry. Although system capacity clearly needs to be tested on a wider scale, its clinical utility and simplicity, as well as the sharp improvement in accuracy of EFW, suggest that it could be used as a reliable auxiliary for clinical decision making in pregnancy. This is also an advance in the direction of standardization of measuring procedures, which are often a severe limiting factor in ultrasonographic practice. 119 11. 12. 13. 14. 15. 16. 17. 18. Acknowledgments This work was financed by the Italian Ministry of Education, University and Research (MIUR). Special thanks to ESAOTE S.p.A., Genoa, Italy, for its precious and prompt technical support. 19. References 21. 1. Armitage P, Berry G (1987) Statistical methods in medical research. Blackwell, Oxford 2. Ben-Haroush A, Yogev Y, Hod M (2004) Fetal weight estimation in diabetic pregnancies and suspected fetal macrosomia. J Perinat Med 32(2):113–121 3. Benson CB, Doubilet PM, Saltzman DH (1987) Sonographic determination of fetal weights in diabetic pregnancies. Am J Obstet Gynecol 156(2):441–444 4. Bettelheim D, Deutinger J, Bernaschek (1997) Fetal sonographic biometry: a guide to normal and abnormal measurements. The Parthenon Publishing Group 5. Biagioli B, Scolletta S, Cevenini G, Barbini E, Giomarelli P, Barbini P (2006) A multivariate Bayesian model for assessing morbidity after coronary artery surgery. Crit Care 10(3):R94. doi: 10.1186/cc4951 6. Bishop HCM (1995) Neural networks for pattern recognition. Clarendon, Oxford 7. Chauhan SP, Hendrix NW, Magann EF, Morrison JC, Kenney SP, Devoe LD (1998) Limitations of clinical and sonographic estimates of birth weight: experience with 1034 parturients. Obstet Gynecol 91(1):72–77 8. Chauhan SP, West DJ, Scardo JA, Boyd JM, Joiner J, Hendrix NW (2000) Antepartum detection of macrosomic fetus: clinical versus sonographic, including soft-tissue measurements. Obstet Gynecol 95(5):639–642 9. Chauhan SP, Hendrix NW, Magann EF, Morrison JC, Scardo JA, Berghella V (2005) A review of sonographic estimate of fetal weight: vagaries of accuracy. J Matern Fetal Neonatal Med 18(4):211–220 10. Chauhan SP, Cole J, Sanderson M, Magann EF, Scardo JA (2006) Suspicion of intrauterine growth restriction: use of abdominal 22. 20. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. circumference alone or estimated fetal weight below 10%. J Matern Fetal Neonatal Med 19(9):557–562 Chuang L, Hwang JY, Chang CH, Yu CH, Chang FM (2002) Ultrasound estimation of fetal weight with the use of computerized artificial neural network model. Ultrasound Med Biol 28(8):991–996 Cohen J, Cohen P, West SG, Aiken LS (2003) Applied multiple regression: correlation analysis for the behavioral sciences. Erlbaum, London Colman A, Maharaj D, Hutton J, Tuohy J (2006) Reliability of ultrasound estimation of fetal weight in term singleton pregnancies. New Zeal Med J 119(1241):U2146 Combs CA, Jaekle RK, Rosenn B, Pope M, Miodovnik M, Siddiqi TA (1993) Sonographic estimation of fetal weight based on a model of fetal volume. Obstet Gynecol 82(3):365–370 Coomarasamy A, Connock M, Thornton J, Khan KS (2005) Accuracy of ultrasound biometry in the prediction of macrosomia: a systematic quantitative review. Brit J Obstet Gynaec 112(11):1461–1466 Dudley NJ (1995) Selection of appropriate ultrasound methods for the estimation of fetal weight. Brit J Radiol 68:385–388 Dudley NJ (2005) A systematic review of the ultrasound estimation of fetal weight. Ultrasound Obstet Gynecol 25(1):80–89 Edwards A, Goff J, Baker L (2001) Accuracy and modifying factors of the sonographic estimation of fetal weight in a highrisk population. Aust NZ J Obstet Gyn 41(2):187–190 Etter DM, Kuncicky DC, Moore H (2005) Introduction to MATLAB 7. Prentice Hall, Englewood Cliffs Farmer RM, Medearis AL, Hirata GI, Platt LD (1992) The use of a neural network for the ultrasonographic estimation of fetal weight in the macrosomic fetus. Am J Obstet Gynecol 166(5):1467–1472 Goldberg JD (2004) Routine screening for fetal anomalies: expectations. Obstet Gynecol Clin North Am 31(1):35–50 Hadlock FP, Harrist RB, Sharman RS, Deter RL, Park SK (1985) Estimation of fetal weight with the use of head, body, and femur measurements - a prospective study. Am J Obstet Gynecol 151:333–7 Hadlock FP (1990) Sonographic estimation of fetal age and weight. Fetal Ultrasound 28(1):39–51 Haykin S (1994) Neural networks: a comprehensive foundation. Maxwell Macmillian, Canada Hill LM, Breckle R, Gehrking WC, O’Brien PC (1985) Use of femur length in estimation of fetal weight. Am J Obstet Gynecol 152:847–852 Jamshidi M (2003) Tools for intelligent control: fuzzy controllers, neural networks and genetic algorithms. Philos Transact A Math Phys Eng Sci 361(1809):1781–1808 Jordaan HV (1983) Estimation of fetal weight by ultrasound. J Clin Ultrasound 11(2):59–66 Krzanowski WJ (1988) Principles of multivariate analysis: a user’s perspective. Clarendon, Oxford Kurmanavicius J, Burkhardt T, Wisser J, Huch R (2004) Ultrasonographic fetal weight estimation: accuracy of formulas and accuracy of examiners by birth weight from 500 to 5000 g. J Perinat Med 32(2):155–161 Lee YB, Kim MJ, Kim MH (2007) Robust border enhancement and detection for measurement of fetal nuchal translucency in ultrasound images. Med Biol Eng Comput (Spec issue). doi: 10.1007/s11517-007-0225-7 Lockwood CJ, Weiner S (1986) Assessment of fetal growth. Clin Perinatol 13(1):3–35 Mongelli M, Biswas A (2002) Menstrual age-dependent systematic error in sonographic fetal weight estimation: a mathematical model. J Clin Ultrasound 30(3):139–44 123 120 33. Ott WJ, Doyle S, Flamm S, Wittman J (1986) Accurate ultrasonic estimation of fetal weight. Prospective analysis of a new ultrasonic formula. Am J Perinatol 3(4):307–10 34. Ott WJ (2006) Sonographic diagnosis of fetal growth restriction. Clin Obstet Gynecol 49(2):295–307 35. Robson SC, Gallivan S, Walkinshaw SA, Vaughan J, Rodeck CH (1993) Ultrasonic estimation of fetal weight: use of targeted formulas in small for gestational age fetuses. Obstet Gynecol 82(3):359–364 36. Rosati P, Exacoustos C, Caruso A, and Mancuso S (1992) Ultrasound diagnosis of fetal macrosomia. Ultrasound Obstet Gynecol 2(1):23–29 37. Rotmensch S, Celentano C, Liberati M, Malinger G, Sadan O, Bellati U, Glezerman M (1999) Screening efficacy of the subcutaneous tissue width/femur length ratio for fetal macrosomia in the non-diabetic pregnancy. Ultrasound Obstet Gynecol 13(5):340–344 38. Sabbagha RE, Minogue J, Tamura RK, Hungerford SA (1989) Estimation of birth weight by use of ultrasonographic formulas targeted to LGA, AGA, and SGA fetuses. Am J Obstet Gynecol 160:854–862 39. Sargent DJ (2001) Comparison of artificial neural networks with other statistical approaches: results from medical data sets. Cancer 91(S8):1636–1642 123 Med Bio Eng Comput (2008) 46:109–120 40. Schild RL, Fimmers R, Hansmann M (2000) Fetal weight estimation by three-dimensional ultrasound. Ultrasound Obstet Gynecol 16(5):445–452 41. Secher NJ, Djursing H, Hansen PK, Lenstrup C, Sindberg-Eriksen P, Thomsen BL, Keiding N (1987) Estimation of fetal weight in the third trimester by ultrasound. Eur J Obstet Gynecol Reprod Biol 24:1–11 42. Sladkevicius P, Saltvedt S, Almstrom H, Kublickas M, Grunewald C, Valentin L (2005) Ultrasound dating at 12–14 weeks of gestation. A prospective cross-validation of established dating formulae in in vitro fertilized pregnancies. Ultrasound Obstet Gynecol 26(5):504–511 43. Thornton JG, Hornbuckle J, Vail A, Spiegelhalter DJ, Levene M, GRIT study group (2004) Infant wellbeing at 2 years of age in the growth restriction intervention trial (GRIT): multicentred randomised controlled trial. Lancet 364(9433):513–520 44. Woo JS, Wan MC (1986) An evaluation of fetal weight prediction using a simple equation containing the fetal femur length. J Ultrasound Med 5(8):453–457
© Copyright 2026 Paperzz