1.3 MODEL FITTING Once the phenomenon we are dealing with has been decomposed in its fundamental mechanisms and a mathematical expression accounting for all of them has been achieved (first two steps of model building), the first step of model building, model fitting to experimental data, is required. At this purpose, some definitions of statistical quantities are necessary. Accordingly, when an experimental data set shows a sufficiently strong tendency to centralise, that is, to group around a particular value, it can be useful to characterise this data set by few numbers called its moments [19]. In particular, we are interested in the mean x defined as: x 1 N xi N i 1 (1.11) where xi represents the ith experimental datum value and N the number of data considered. A further characterisation consists in measuring how these data are dispersed around the mean and the most common quantities considered at this purpose are the variance va or its square root, the standard deviation : va 1 N xi x N 1 i 1 2 1 N xi x N 1 i 1 (1.12) 2 (1.13) Very often, it is necessary to establish whether two distributions are characterised by different means. This means that we are asked to decide whether the two means statistically differs from zero. The first step is to decide whether the two means are characterised by different variances or not. Indeed, on the basis of this result, a different strategy needs to be undertaken to decide whether the two means are statistically different. At this purpose, the F-test [19] is necessary. This test consists in evaluating the experimental F value defined as the ratio between the weighted higher vawh and lower vawl variances: F vawh vawl vawh vah nh 1 vawl val nl 1 (1.14) where vah and val are, respectively, the higher and the lower variances (defined by eq.(1.12); vah > val) referring to two different data set each one characterised by nh and nl elements, respectively. Fixing a value of the probability p = 1- of telling the truth (usually 0.95 or 0.99), if the F value calculated according to eq.(1.14) (with (nh-1) degrees of freedom for numerator and (nl-1) degrees of freedom for denominator) is bigger than that tabulated (see table 1.1a and b), vah and val are statistically different. On the contrary, they are equal. If the 1 two variances are equal, the following form of Student’s t test enables deciding whether the corresponding means ( x h and x l ) are equal or not. At this purpose, the pooled variance needs to be estimated: nh sD xi x h i 1 2 nl xi x l i 1 2 nh nl 2 1 1 nh nl (1.15) Accordingly, the experimental t value reads: t xh xl (1.16) SD where x h x l indicates the absolute value of the difference x h x l . Again, fixing a value of the probability p = 1- of telling the truth (usually 0.95 or 0.99), if the calculated t value (with (nh + nl - 2) degrees of freedom) is bigger than that tabulated (see table 1.2), x h and x l are statistically different. On the contrary, they are equal. If, conversely, vah and val are statistically different, the following form of Student’s t test enables deciding whether the corresponding means ( x h and x l ) are equal or not: t xh xl (1.17) vah val nh nl Fixing a value of the probability p = 1- of telling the truth (usually 0.95 or 0.99), if the calculated t value with the following degrees of freedom dof: 2 vah val nh nl dof RTNI 2 2 vah nh val nl n 1 nl 1 h (1.18) where RTNI means “round to the nearest integer”, is bigger than that tabulated (see table 1.2), x h and x l are statistically different. On the contrary, they are equal. 1.3.1 Chi-Square Fitting: straight line Let’s suppose we have a mathematical model f, characterised by M adjustable parameters (aj, j = 1 to M), to be fitted to N data points (xi, yi) (i = 1 to N) being x and y the independent and the dependent variable, respectively. It can be seen [20] that the most probable set of model parameters is that minimizing chi-square 2: 2 y f xi , a1 ...a M i i 1 i 2 2 N (1.19) where i is the ith datum standard deviation. Although eq.(1.19) strictly holds if measurement errors are normally distributed, it is useful also if this hypothesis does not hold. It is now interesting to see how fitting procedure develops according to eq.(1.19) in the simplest case where f is a straight line (linear regression): f x, a1...aM mx q (1.20) being x the independent variable while m and q are the two model parameters representing, respectively, straight line slope and intercept. In this case Eq.(1.19) becomes: y mxi q m, q i i 1 i 2 N 2 (1.21) It is well known that the conditions required to render 2 minimum are: N y mx q x 2 m, q i i 2 i 0 i 1 m i2 (1.22) N y mx q 2 m, q i 2 i 0 i 1 q i2 (1.23) The solution of these equations allows the calculation of the two unknowns m and q: 1 N yx N x N y 2 i 2 i i2 2i i 1 i i 1 i i 1 i i 1 i m 1 N x2 N x 2 i2 i2 i 1 i i 1 i i 1 i 2 N xi2 N yi N xi N xi yi 2 2 2 2 i 1 i i 1 i i 1 i i 1 i N N q 1 N x2 N x 2 i2 i2 i 1 i i 1 i i 1 i (1.24) 2 N Remembering the error propagation law: 2 g ( x, y, z,.. 2 g ( x, y, z,.. 2 g ( x, y, z,.. ... (1.25) 2g 2x y z x y z y , z ,.. x , y ,.. x , z ,.. 2 2 where g is a generic function of the independent variables x, y, z and so on, and 2x , 2y , 2z and so on are the respective variances, it is possible estimating the m and q variances 2m and 2q : N 1 2 2 N i 1 i mxi , yi 2m i2 2 yi i 1 N 1 N x2 N xi xi i 2 2 2 i 1 i i 1 i i 1 i (1.26) 3 xi 2 i 1 i N 2 qx , y N i i 2q i2 2 i 1 yi N 1 N x2 N xi xi i 2 2 2 i 1 i i 1 i i 1 i (1.27) It is worth mentioning that as we assume that xi are error free variables ( 2xi 0 i ), in eq.(1.26) and (1.27) only the partial derivative with respect to yi appears. As the evaluation of the probable uncertainty associated to m (m) and q (q) (fitting parameters) is not an easy topic and it would imply further discussions [20] that are out of the aim of this chapter, we limit to show an approximate solution adopted by a common software used for data fitting [21]. In particular, it is proposed: y mxi q δ m i i 1 σi 2 N y mxi q δ q i i 1 σi 2 N N M * σ 2m (1.28) N M * σ q2 (1.29) where N and M are, respectively, the number of experimental points and model fitting parameters (for our straight line M = 2). In order to be sure that parameters m and q are really meaningful, the goodness of the fit must be evaluated. At this purpose we can recur to the F-test [21, 22] to evaluate whether two particular variances, the mean square regression MSR and the mean square error MSE, differ or not: y y σ y y σ 2 N MSR i 1 i 2 N i i 1 i p i i M 1 y N y i 1 N MSE (1.30) 2 i σ 2 N i 1 σ i2 1 σ i 1 y i i y p i i N M (1.31) where the degrees of freedom associated to MSR are 1 = M-1 while those associated to MSE are 2 = N-M. Accordingly the calculated F value is: F MSR MSE (1.32) It is evident that in presence of good data fitting, as MSE tends to zero but not MSR, F will be very high. If the F value calculated according to eq.(1.32) (with 1 degrees of freedom for 4 numerator and 2 degrees of freedom for denominator) is bigger than that tabulated corresponding to a fixed probability p = 1- of being right (see table 1.1), MSR is statistically bigger than MSE. Accordingly, data fitting is statistically acceptable. 1.3.2 Chi-Square Fitting: general case In order to generalise the previous discussion for the generic function f x, a1...aM , being aj its adjustable parameters and x the independent variable, it is necessary to define the so called design matrix A composed by N rows and M columns whose elements are [20]: Aij f x1 1 a1 1 A ..... f x N 1 a 1 N f xi 1 a j i f x1 1 a M 1 ..... ..... f x N 1 ..... aM M ..... (1.33) being xi and i, respectively, the mean value and the associated standard deviation corresponding to the ith experimental datum. Now, the condition of minimum 2: y f xi , a1 ,..aM a1 ,..aM i i 1 i 2 N 2 (1.21’) requires that all the partial 2 derivatives with respect to the M parameters are equal to zero: N y f x , a ,..a f x , a ,..a 2 i 1 M i 1 M 2 i 0 i 1 a j a j i2 (1.34) Defining the matrix as follows: 1 2 i 1 i N jk f xi , a1 ,..aM f xi , a1 ,..aM a a j k (1.35) it is easy to demonstrate that = AT*A, where AT is the transposed of the design matrix descending from A by simply changing rows with columns. Finally, defining the matrix C as the inverse of (C = -1; this means *C = I, identity matrix), it is possible, in the same spirit leading to eqs.(1.28) and (1.29), to estimate the probable uncertainty (aj) associated to the fitting parameters aj [21]: y f xi , a1 ,..aM δ a j i i 1 σi 2 N N M * C jj (1.36) where, obviously, 2a j C jj . Also in this general case, the goodness of the fit can be determined according to the F-test proposed for the linear case (eq.(1.32)). 5 1.3.3 Robust Fitting Although it is not so usual, sometimes it can happen that both experimental coordinates x and y are affected by errors. This means that the generic experimental point (xi, yi) is associated to two standard errors xi and yi. For the sake of clarity and simplicity, this theme will be matched in the simple case of straight line data fitting as, its extension to the more general case (fitting function f(x,a1,..aM)), implies additional difficulties that could deviate the reader from the main concept of robust fitting. Although different strategies can be adopted [20], a possible definition of the 2 function in the case of robust fitting is: d i2 2 2 i 1 xi yi 2 m, q N (1.37) where di and i2 2xi 2yi are, respectively, the distance between the experimental point (xi, yi) and the straight line y = mx + q and the experimental point variance. Remembering that di is nothing more than the distance between (xi, yi) and the intersection of the straight line passing for (xi, yi) and perpendicular to y = mx + q (accordingly its slope is -1/m), its expression reads: d i2 yi mxi q 2 (1.38) m2 1 The conditions required to minimise 2 are, obviously, those implying its partial derivatives zeroing: N yi mxi q xi m 2 1 2m yi mxi q 0 (1.39) 2 m, q 2 m i2 i 1 m 2 12 2 2 m, q 2 N yi mxi q xi 2 0 q m 1 i 1 i2 (1.40) The simultaneous solution of eqs.(1.39) and (1.40) leads to a quadratic equation in m whose solutions are: N x yi m1, 2 i2 2 i 1 i i 1 i N 1 2 i 1 i N m1, 2 B B 4 2 2 q1, 2 N y 2 N x 2 N 1 N x 2 N y 2 B i2 i2 2 i2 i2 i 1 i i 1 i i 1 i i 1 i i 1 i (1.41) N xi N yi N 1 N xi yi 2 2 2 2 i 1 i 1 i1 i1 i i i i (1.42) 6 It is clear that the solution we are interested in is that minimizing 2. Accordingly, the solution will be m1, q1 if 2(m1, q1) ≤ 2(m2, q2), while it will be m2, q2 in the opposite case. Remembering the error propagation law (eq.(1.25)), m and q variances are: 2 mx , y i i 2 mxi , yi yi xi yi i 1 yi xi 2 m N 2 2 xi 2 qx , y i i 2 qxi , yi 2q 2xi yi xi yi i 1 yi xi (1.43) 2 N (1.44) Consequently, the probable uncertainty associated to m (m) and q (q) (fitting parameters) can be evaluated in the light of eqs.(1.28) and (1.29): d δ m i i 1 σ i 2 N M * σ 2m (1.45) N M * σ q2 (1.46) N d δ q i i 1 σ i 2 N The goodness of the fit can be estimated by means of the F-test (eq.(1.32)) provided that the following expressions for the mean square regression MSR and mean square error MSE are considered: x x σ y y σ d 2 N MSR i 1 i yi i 1 i yi 2 N i 1 i σi (1.47) M 1 x N x 2 N i 1 N σ i 2 xi 1 σ 2 xi i 1 y N y i 1 N i σ 2yi 1 σ i 1 (1.47’) 2 yi 2 d i σ i N MSE i 1 N M (1.48) It is clear that this strategy (robust fitting) can be applied also to a general model f x, a1...aM . Nevertheless, this task is not so straightforward due to the increased difficulty associated to the evaluation of di. Indeed, di can be evaluated remembering that the generic straight line passing for (xi, yi) must satisfy the condition yi = mxi + q from which we have, for example, m yi q xi . Thus, the coordinates ( xi , yi ) of the intersection point 7 between f x, a1...aM and the generic straight line passing from (xi, yi) is given by the solution of the following system of equations: y q xq y i xi y f x, a1 ,.aM (1.49) The generic distance dig between (xi, yi) and ( xi , yi ) is, then, given by: d ig x x y y i 2 i i 2 i (1.50) Obviously, among the infinite distances eq.(1.50) gives, the distance we are interested in, di, represents the smallest one. Thus, q can be evaluated searching for dig minimum. If system (1.49) does not yield an analytical solution, an iterative, numeric procedure is needed to get q. Once di is known for all the experimental points, 2 can be evaluated and a numerical technique is required for its minimisation. 1.4 MODELS COMPARISON In previous paragraphs, the attention was focussed on how fitting a model to experimental data and on the fitting goodness. Basically, this task was developed recurring to mathematical/statistical methods. Accordingly, if F value is high enough and model parameters uncertainty is small, we can say that the model is suitable. Obviously, this is not enough to ensure model reliability as model fitting parameters must assume reasonably values (from the physic point of view) and the model must be able to yield reasonable, if not true, predictions for a different set initial conditions. Another problem that can arise in mathematical modelling is to discern the best model among a group of models yielding a good fit. Although different approaches can be followed at this purpose, for its generality and simplicity we would like to present Akaike’s method [23] that is based on likelihood theory, information theory and the concept of information entropy. Despite the complexity of its theoretical background, this method is very simple to be used. Assuming the usual condition of a Gaussian distribution of points scattering around the model, the Akaike number AIC is defined as follows: 2 M 1N 2 AIC N ln 2 N M 2 N (1.51) The model showing the smallest AIC is that to be preferred. In order to estimate how much the model showing the smallest AIC is more likely, it is sufficient to define the following probability pAIC: 8 pAIC e 0.5 1 e 0.5 AICom AICsmallest (1.52) where AICsmallest is Akaike number relative to the more likely model while AICom is the Akaike number relative to the other model (AICom ≥ AICsmallest). If = 2, for example, there is a probability pAIC = 0.73 that smallest AIC model is correct and a probability pAIC = 0.27 that the other model is correct. 1.6 REFERENCES 19. Press, W. H. and others, Statistical description of data, in Numerical Recipes in FORTRAN, Cambridge University Press, 1992, chap.14. 20. Press, W. H. and others, Modelling of Data, in Numerical Recipes in FORTRAN, Cambridge University Press, 1992, chap.15. 21. Table Curve TM 2D, SPSS Inc. (http://www.spss.com) 1997. 22. Draper, N. and Smith, H., Applied Regression Analysis, John Wiley & Sons, Inc., New York, 1966. 23. Burnham, K. P. and Anderson, D. R., Model selection and multimodel inference a practical information-theoretic approach, Springer, New York, 2002. 9
© Copyright 2026 Paperzz