CHAPTER 6

1.3 MODEL FITTING
Once the phenomenon we are dealing with has been decomposed in its fundamental
mechanisms and a mathematical expression accounting for all of them has been achieved
(first two steps of model building), the first step of model building, model fitting to
experimental data, is required. At this purpose, some definitions of statistical quantities are
necessary. Accordingly, when an experimental data set shows a sufficiently strong tendency
to centralise, that is, to group around a particular value, it can be useful to characterise this
data set by few numbers called its moments [19]. In particular, we are interested in the mean
x defined as:
x
1 N
 xi
N i 1
(1.11)
where xi represents the ith experimental datum value and N the number of data considered. A
further characterisation consists in measuring how these data are dispersed around the mean
and the most common quantities considered at this purpose are the variance va or its square
root, the standard deviation :
va 


1 N
 xi  x
N  1 i 1


2
1 N
 xi  x
N  1 i 1
(1.12)

2
(1.13)
Very often, it is necessary to establish whether two distributions are characterised by different
means. This means that we are asked to decide whether the two means statistically differs
from zero. The first step is to decide whether the two means are characterised by different
variances or not. Indeed, on the basis of this result, a different strategy needs to be undertaken
to decide whether the two means are statistically different. At this purpose, the F-test [19] is
necessary. This test consists in evaluating the experimental F value defined as the ratio
between the weighted higher vawh and lower vawl variances:
F
vawh
vawl
vawh 
vah
nh  1
vawl 
val
nl  1
(1.14)
where vah and val are, respectively, the higher and the lower variances (defined by eq.(1.12);
vah > val) referring to two different data set each one characterised by nh and nl elements,
respectively. Fixing a value of the probability p = 1- of telling the truth (usually 0.95 or
0.99), if the F value calculated according to eq.(1.14) (with (nh-1) degrees of freedom for
numerator and (nl-1) degrees of freedom for denominator) is bigger than that tabulated (see
table 1.1a and b), vah and val are statistically different. On the contrary, they are equal. If the
1
two variances are equal, the following form of Student’s t test enables deciding whether the
corresponding means ( x h and x l ) are equal or not. At this purpose, the pooled variance
needs to be estimated:
nh
sD 

 xi  x h
i 1

2
nl

  xi  x l
i 1

2
nh  nl  2
 1 1
  
 nh nl 
(1.15)
Accordingly, the experimental t value reads:
t
xh  xl
(1.16)
SD
where x h  x l indicates the absolute value of the difference x h  x l . Again, fixing a value of
the probability p = 1- of telling the truth (usually 0.95 or 0.99), if the calculated t value
(with (nh + nl - 2) degrees of freedom) is bigger than that tabulated (see table 1.2), x h and x l
are statistically different. On the contrary, they are equal.
If, conversely, vah and val are statistically different, the following form of Student’s t test
enables deciding whether the corresponding means ( x h and x l ) are equal or not:
t
xh  xl
(1.17)
vah val

nh nl
Fixing a value of the probability p = 1- of telling the truth (usually 0.95 or 0.99), if the
calculated t value with the following degrees of freedom dof:
2

 vah val 


 

nh nl 

dof  RTNI 
2
2
 vah nh   val nl 
 n 1
nl  1
h








(1.18)
where RTNI means “round to the nearest integer”, is bigger than that tabulated (see table
1.2), x h and x l are statistically different. On the contrary, they are equal.
1.3.1 Chi-Square Fitting: straight line
Let’s suppose we have a mathematical model f, characterised by M adjustable parameters (aj,
j = 1 to M), to be fitted to N data points (xi, yi) (i = 1 to N) being x and y the independent and
the dependent variable, respectively. It can be seen [20] that the most probable set of model
parameters is that minimizing chi-square 2:
2
 y  f  xi , a1 ...a M  

    i
i 1 
i

2
2
N
(1.19)
where i is the ith datum standard deviation. Although eq.(1.19) strictly holds if measurement
errors are normally distributed, it is useful also if this hypothesis does not hold.
It is now interesting to see how fitting procedure develops according to eq.(1.19) in the
simplest case where f is a straight line (linear regression):
f x, a1...aM   mx  q
(1.20)
being x the independent variable while m and q are the two model parameters representing,
respectively, straight line slope and intercept. In this case Eq.(1.19) becomes:
 y  mxi  q 

 m, q     i
i 1 
i

2
N
2
(1.21)
It is well known that the conditions required to render 2 minimum are:
N  y  mx  q x
 2 m, q 
i
i
 2 i
0
i 1
m
i2
(1.22)
N y  mx  q
 2 m, q 
i
 2 i
0
i 1
q
i2
(1.23)
The solution of these equations allows the calculation of the two unknowns m and q:
1 N yx N x N y
 2  i 2 i   i2  2i
i 1  i i 1  i
i 1  i i 1  i
m
1 N x2  N x 
 2  i2    i2 
i 1  i i 1  i
 i 1  i 
2
N
xi2 N yi N xi N xi yi
 2  2  2  2
i 1  i i 1  i
i 1  i i 1  i
N
N
q
1 N x2  N x 
 2  i2    i2 
i 1  i i 1  i
 i 1  i 
(1.24)
2
N
Remembering the error propagation law:
2


  g ( x, y, z,.. 



   2   g ( x, y, z,.. 
   2   g ( x, y, z,.. 
  ... (1.25)
 2g   2x  

y 
z







x

y

z
 y , z ,.. 
 x , y ,.. 
 x , z ,.. 



2
2
where g is a generic function of the independent variables x, y, z and so on, and  2x , 2y ,  2z
and so on are the respective variances, it is possible estimating the m and q variances  2m and
 2q :
N 1
2
 2


N
i 1  i
mxi , yi  
 2m    i2 

2
 yi

i 1
N 1
N x2
 N xi 
xi 

i
 2  2    2 
i 1  i i 1  i
 i 1  i 
(1.26)
3
xi
2
i 1  i
N

2
 qx , y  
N
i
i
 
 2q    i2 
2


i 1
yi
N 1
N x2
 N xi 
xi 

i


 2  2  2 
i 1  i i 1  i
 i 1  i 
(1.27)
It is worth mentioning that as we assume that xi are error free variables (  2xi  0 i ), in
eq.(1.26) and (1.27) only the partial derivative with respect to yi appears. As the evaluation of
the probable uncertainty associated to m (m) and q (q) (fitting parameters) is not an easy
topic and it would imply further discussions [20] that are out of the aim of this chapter, we
limit to show an approximate solution adopted by a common software used for data fitting
[21]. In particular, it is proposed:
 y  mxi  q 

δ m    i
i 1 
σi

2
N
 y  mxi  q 

δ q    i
i 1 
σi

2
N
N  M  *
σ 2m
(1.28)
N  M  *
σ q2
(1.29)
where N and M are, respectively, the number of experimental points and model fitting
parameters (for our straight line M = 2).
In order to be sure that parameters m and q are really meaningful, the goodness of the fit must
be evaluated. At this purpose we can recur to the F-test [21, 22] to evaluate whether two
particular variances, the mean square regression MSR and the mean square error MSE, differ
or not:
 y  y  σ    y  y  σ 
2
N
MSR 
i 1
i
2
N
i
i 1
i
p
i
i
M 1
 y
N
y
i 1
N
MSE 

(1.30)
2
i
σ
2
N
i 1
σ i2
 1 σ 
i 1
 y
i
i
y
p
i
i
N M
(1.31)
where the degrees of freedom associated to MSR are 1 = M-1 while those associated to MSE
are 2 = N-M. Accordingly the calculated F value is:
F
MSR
MSE
(1.32)
It is evident that in presence of good data fitting, as MSE tends to zero but not MSR, F will be
very high. If the F value calculated according to eq.(1.32) (with 1 degrees of freedom for
4
numerator and 2 degrees of freedom for denominator) is bigger than that tabulated
corresponding to a fixed probability p = 1- of being right (see table 1.1), MSR is statistically
bigger than MSE. Accordingly, data fitting is statistically acceptable.
1.3.2 Chi-Square Fitting: general case
In order to generalise the previous discussion for the generic function f x, a1...aM  , being aj
its adjustable parameters and x the independent variable, it is necessary to define the so called
design matrix A composed by N rows and M columns whose elements are [20]:
Aij 
 f  x1  1

 a1 1
A
.....
 f  x N  1
 a 
1
N

f xi  1
a j i
f  x1  1
a M 1
.....
.....
f  x N  1
.....
aM  M
.....







(1.33)
being xi and i, respectively, the mean value and the associated standard deviation
corresponding to the ith experimental datum. Now, the condition of minimum 2:
 y  f xi , a1 ,..aM  

 a1 ,..aM     i
i 1 
i

2
N
2
(1.21’)
requires that all the partial 2 derivatives with respect to the M parameters are equal to zero:
N  y  f  x , a ,..a  f  x , a ,..a 
 2
i
1
M
i
1
M
 2 i
0
i 1
a j
a j
 i2
(1.34)
Defining the matrix  as follows:
1
2
i 1  i
N
 jk  
 f xi , a1 ,..aM  f xi , a1 ,..aM  





a

a
j
k


(1.35)
it is easy to demonstrate that  = AT*A, where AT is the transposed of the design matrix
descending from A by simply changing rows with columns. Finally, defining the matrix C as
the inverse of  (C = -1; this means *C = I, identity matrix), it is possible, in the same
spirit leading to eqs.(1.28) and (1.29), to estimate the probable uncertainty (aj) associated to
the fitting parameters aj [21]:
 y  f xi , a1 ,..aM  

δ a j    i
i 1 
σi

2
N
N  M  *
C jj
(1.36)
where, obviously,  2a j  C jj .
Also in this general case, the goodness of the fit can be determined according to the F-test
proposed for the linear case (eq.(1.32)).
5
1.3.3 Robust Fitting
Although it is not so usual, sometimes it can happen that both experimental coordinates x and
y are affected by errors. This means that the generic experimental point (xi, yi) is associated to
two standard errors xi and yi. For the sake of clarity and simplicity, this theme will be
matched in the simple case of straight line data fitting as, its extension to the more general
case (fitting function f(x,a1,..aM)), implies additional difficulties that could deviate the reader
from the main concept of robust fitting. Although different strategies can be adopted [20], a
possible definition of the 2 function in the case of robust fitting is:
d i2
2
2
i 1  xi   yi
 2 m, q   
N
(1.37)
where di and i2   2xi   2yi are, respectively, the distance between the experimental point (xi,
yi) and the straight line y = mx + q and the experimental point variance. Remembering that di
is nothing more than the distance between (xi, yi) and the intersection of the straight line
passing for (xi, yi) and perpendicular to y = mx + q (accordingly its slope is -1/m), its
expression reads:
d i2 
 yi  mxi  q 2
(1.38)
m2 1
The conditions required to minimise 2 are, obviously, those implying its partial derivatives
zeroing:
N
 yi  mxi  q xi m 2  1  2m yi  mxi  q   0 (1.39)
 2 m, q 
2

m
 i2
i 1
m 2  12 
2
 2 m, q 
2 N  yi  mxi  q xi
 2
0

q
m  1 i 1
 i2
(1.40)
The simultaneous solution of eqs.(1.39) and (1.40) leads to a quadratic equation in m whose
solutions are:
N x
yi
 m1, 2  i2
2
i 1  i
i 1  i

N 1
 2
i 1  i
N
m1, 2 
B B 4
2

2
q1, 2
  N  y 2 N  x 2  N 1  N x 2  N y 2 
B      i2     i2   2    i2     i2  
  i 1   i 
 i 1  i  i 1  i   i 1  i  
i 1   i 



(1.41)
  N xi  N yi   N 1  N xi yi
   2   2     2  2
  i 1   i 1    i1   i1 
i 
i  
i 
i





(1.42)
6
It is clear that the solution we are interested in is that minimizing 2. Accordingly, the
solution will be m1, q1 if 2(m1, q1) ≤ 2(m2, q2), while it will be m2, q2 in the opposite case.
Remembering the error propagation law (eq.(1.25)), m and q variances are:
2
 mx , y  


i
i
   2  mxi , yi  
   
yi
 xi

 yi

i 1
yi 
xi 


2
m
N
2
2
xi
2
 qx , y  


i
i
   2  qxi , yi  
 2q    2xi 
yi
 xi

 yi

i 1
yi 
xi 


(1.43)
2
N
(1.44)
Consequently, the probable uncertainty associated to m (m) and q (q) (fitting parameters)
can be evaluated in the light of eqs.(1.28) and (1.29):
d 
δ m    i 
i 1  σ i 
2
N  M  *
σ 2m
(1.45)
N  M  *
σ q2
(1.46)
N
d 
δ q    i 
i 1  σ i 
2
N
The goodness of the fit can be estimated by means of the F-test (eq.(1.32)) provided that the
following expressions for the mean square regression MSR and mean square error MSE are
considered:
 x  x  σ    y  y  σ    d
2
N
MSR  i 1
i
yi
i 1
i
yi
2
N
i 1
i
σi 
(1.47)
M 1
 x
N
x
2
N
i 1
N
σ
i
2
xi
 1 σ 

2
xi
i 1
 y
N
y
i 1
N
i
σ 2yi
 1 σ 
i 1

(1.47’)
2
yi
2
 d i σ i 
N
MSE  i 1
N M
(1.48)
It is clear that this strategy (robust fitting) can be applied also to a general model
f x, a1...aM  . Nevertheless, this task is not so straightforward due to the increased difficulty
associated to the evaluation of di. Indeed, di can be evaluated remembering that the generic
straight line passing for (xi, yi) must satisfy the condition yi = mxi + q from which we have,
for example, m   yi  q  xi . Thus, the coordinates ( xi , yi ) of the intersection point
7
between f x, a1...aM  and the generic straight line passing from (xi, yi) is given by the
solution of the following system of equations:
y q

xq
y  i
xi

 y  f x, a1 ,.aM 
(1.49)
The generic distance dig between (xi, yi) and ( xi , yi ) is, then, given by:
d ig 
x  x   y  y 
i
 2
i
i
 2
i
(1.50)
Obviously, among the infinite distances eq.(1.50) gives, the distance we are interested in, di,
represents the smallest one. Thus, q can be evaluated searching for dig minimum. If system
(1.49) does not yield an analytical solution, an iterative, numeric procedure is needed to get q.
Once di is known for all the experimental points, 2 can be evaluated and a numerical
technique is required for its minimisation.
1.4 MODELS COMPARISON
In previous paragraphs, the attention was focussed on how fitting a model to experimental
data and on the fitting goodness. Basically, this task was developed recurring to
mathematical/statistical methods. Accordingly, if F value is high enough and model
parameters uncertainty is small, we can say that the model is suitable. Obviously, this is not
enough to ensure model reliability as model fitting parameters must assume reasonably
values (from the physic point of view) and the model must be able to yield reasonable, if not
true, predictions for a different set initial conditions.
Another problem that can arise in mathematical modelling is to discern the best model among
a group of models yielding a good fit.
Although different approaches can be followed at this purpose, for its generality and
simplicity we would like to present Akaike’s method [23] that is based on likelihood theory,
information theory and the concept of information entropy. Despite the complexity of its
theoretical background, this method is very simple to be used. Assuming the usual condition
of a Gaussian distribution of points scattering around the model, the Akaike number AIC is
defined as follows:
 2 
M  1N  2
AIC  N ln    2
N M 2
N
(1.51)
The model showing the smallest AIC is that to be preferred. In order to estimate how much
the model showing the smallest AIC is more likely, it is sufficient to define the following
probability pAIC:
8
pAIC 
e 0.5
1  e 0.5
  AICom  AICsmallest
(1.52)
where AICsmallest is Akaike number relative to the more likely model while AICom is the
Akaike number relative to the other model (AICom ≥ AICsmallest). If  = 2, for example, there is
a probability pAIC = 0.73 that smallest AIC model is correct and a probability pAIC = 0.27 that
the other model is correct.
1.6 REFERENCES
19.
Press, W. H. and others, Statistical description of data, in Numerical Recipes in
FORTRAN, Cambridge University Press, 1992, chap.14.
20.
Press, W. H. and others, Modelling of Data, in Numerical Recipes in FORTRAN,
Cambridge University Press, 1992, chap.15.
21.
Table Curve TM 2D, SPSS Inc. (http://www.spss.com) 1997.
22.
Draper, N. and Smith, H., Applied Regression Analysis, John Wiley & Sons, Inc.,
New York, 1966.
23.
Burnham, K. P. and Anderson, D. R., Model selection and multimodel inference a
practical information-theoretic approach, Springer, New York, 2002.
9