SYSTEMS
Identification
Ali Karimpour
Assistant Professor
Ferdowsi University of Mashhad
lecture 7
Lecture 7
Parameter Estimation Method
Topics to be covered include:
Guiding Principles Behind Parameter Estimation Method.
Minimizing Prediction Error.
Linear Regressions and the Least-Squares Method.
A Statistical Framework for Parameter Estimation and the
Maximum Likelihood Method.
Correlation Prediction Errors with Past Data.
Instrumental Variable Methods.
2
Ali Karimpour Dec 2010
lecture 7
Models of linear time invariant system
Topics to be covered include:
Guiding Principles Behind Parameter Estimation Method.
Minimizing Prediction Error.
Linear Regressions and the Least-Squares Method.
A Statistical Framework for Parameter Estimation and the
Maximum Likelihood Method.
Correlation Prediction Errors with Past Data.
Instrumental Variable Methods.
3
Ali Karimpour Dec 2010
lecture 7
Guiding Principles Behind Parameter Estimation Method
Parameter Estimation Method
Suppose that we have selected a certain model structure M. The set of models defined as:
M M ( ) | DM
Suppose the system is:
y (t ) G (q, )u (t ) H (q, )e(t )
For each θ , model represents a way of predicting future outputs. The predictor is a linear
filter as:
M ( ) : yˆ (t | ) Wy (q, ) y (t ) Wu (q, )u (t )
Where:
Wy (q, ) 1 H 1 (q, ) ,
Wu (q, ) H 1 (q, )G(q, )
4
Ali Karimpour Dec 2010
lecture 7
Guiding Principles Behind Parameter Estimation Method
Suppose that we collect a set of data from system as:
Z N y(1) , u(1) , y(2) , u(2) , ... , y( N ) , u( N )
Formally we are going to find a map from the data ZN to the
set DM
ZN
ˆN DM
Such a mapping is a parameter estimation method.
5
Ali Karimpour Dec 2010
lecture 7
Guiding Principles Behind Parameter Estimation Method
Evaluating the candidate model
Let us define the prediction error as:
(t, ) y(t ) yˆ (t | )
When the data set ZN is known, these errors can be computed for t=1, 2 , … , N
A guiding principle for parameter estimation is:
Based on Zt we can compute the prediction error ε(t,θ). Select ˆN so that the
prediction error (t ,ˆN ) , t=1, 2, … , N, becomes as small as possible.
?
• Form a scalar-valued criterion function that measure the size of ε.
We describe
two approaches
• Make (t ,ˆN ) uncorrelated with a given data sequence.
6
Ali Karimpour Dec 2010
lecture 7
Models of linear time invariant system
Topics to be covered include:
Guiding Principles Behind Parameter Estimation Method.
Minimizing Prediction Error.
Linear Regressions and the Least-Squares Method.
A Statistical Framework for Parameter Estimation and the
Maximum Likelihood Method.
Correlation Prediction Errors with Past Data.
Instrumental Variable Methods.
7
Ali Karimpour Dec 2010
lecture 7
Minimizing Prediction Error
Clearly the size of prediction error
(t, ) y(t ) yˆ (t | )
is the same as ZN
Let to filter the prediction error by a stable linear filter L(q)
F (t, ) L(q) (t, )
Then use the following norm
1
VN ( , Z )
N
N
N
l
t 1
F
(t , )
Where l(.) is a scalar-valued positive function.
The estimate ˆN is then defined by:
ˆN ˆN ( Z N ) arg min VN ( , Z N )
DM
8
Ali Karimpour Dec 2010
lecture 7
Minimizing Prediction Error
F (t, ) L(q) (t, )
1
N
VN ( , Z )
N
N
l
t 1
F
(t , )
ˆN ˆN ( Z N ) arg min VN ( , Z N )
DM
Generally the term prediction error identification methods (PEM) is used for
the family of this approaches.
• Choice of l(.)
Particular methods
with specific names
are used according
to:
• Choice of L(.)
• Choice of model structure
• Method by which the minimization is realized
9
Ali Karimpour Dec 2010
lecture 7
Minimizing Prediction Error
1
N
VN ( , Z )
N
F (t, ) L(q) (t, )
N
l
t 1
F
(t , )
ˆN ˆN ( Z N ) arg min VN ( , Z N )
DM
Choice of L
The effect of L is best understood in a frequency-domain interpretation.
Thus L acts like frequency weighting.
See also
>>
14.4 Prefiltering
Exercise1: Consider following system
y (t ) G (q, )u (t ) H (q, )e(t )
Show that the effect of prefiltering by L is identical to changing the noise
model from
H (q, )
L1 (q) H (q, )
10
Ali Karimpour Dec 2010
lecture 7
Minimizing Prediction Error
F (t, ) L(q) (t, )
1
N
VN ( , Z )
N
N
l
t 1
F
(t , )
ˆN ˆN ( Z N ) arg min VN ( , Z N )
DM
Choice of l
A standard choice, which is convenient both for computation and analysis.
1 2
l ( )
2
See also
>>
15.2 Choice of norms: Robustness (against bad data)
One can also parameterize the norm independent of the model parameterization.
11
Ali Karimpour Dec 2010
lecture 7
Models of linear time invariant system
Topics to be covered include:
Guiding Principles Behind Parameter Estimation Method.
Minimizing Prediction Error.
Linear Regressions and the Least-Squares Method.
A Statistical Framework for Parameter Estimation and the
Maximum Likelihood Method.
Correlation Prediction Errors with Past Data.
Instrumental Variable Methods.
12
Ali Karimpour Dec 2010
lecture 7
Linear Regressions and the Least-Squares Method
We introduce linear regressions before as:
yˆ (t | ) T (t ) (t )
φ is the regression vector and for the ARX structure it is
(t ) y(t 1) ... y(t na ) u(t 1) ...u(t nb )T
μ(t) is a known data dependent vector. For simplicity let it zero in the
reminder of this section.
Least-squares criterion
(t , ) y(t ) T (t )
Prediction error is :
Now let L(q)=1 and l(ε)= ε2/2 then
1
VN ( , Z )
N
N
N
1
l F (t , )
N
t 1
N
1
T
y
(
t
)
(t )
t 1 2
2
This is Least-squares criterion for the linear regression
13
Ali Karimpour Dec 2010
lecture 7
Linear Regressions and the Least-Squares Method
Least-squares criterion
1
VN ( , Z N )
N
N
1
l
(
t
,
)
F
N
t 1
N
1
T
y
(
t
)
(t )
t 1 2
2
The least square estimate (LSE) is:
ˆ
LS
N
1
1
1 N
T
arg min V ( , Z ) (t ) (t )
(t ) y(t )
N t 1
N t 1
N
N
R (N )
f (N )
LS
1
ˆ
N R( N ) f ( N )
14
Ali Karimpour Dec 2010
lecture 7
Linear Regressions and the Least-Squares Method
Properties of LSE
The least square method is a special case of PEM (prediction error method)
So we have
15
Ali Karimpour Dec 2010
lecture 7
Linear Regressions and the Least-Squares Method
Properties of LSE
16
Ali Karimpour Dec 2010
lecture 7
Linear Regressions and the Least-Squares Method
Weighted Least Squares
Different measurement could be assigned different weights
1
N
VN ( , Z )
N
T
y
(
t
)
(t )
t
2
N
t 1
or
N
VN ( , Z N ) ( N , t ) y (t ) T (t )
2
t 1
The resulting estimate is the same as previous.
N
ˆNLS ( N , t ) (t ) T (t )
t 1
1 N
( N , t ) (t ) y(t )
t 1
17
Ali Karimpour Dec 2010
lecture 7
Linear Regressions and the Least-Squares Method
Colored Equation-error Noise
We show that in a difference equation
y(t ) a1 y(t 1) ... ana y(t na )
b1u (t 1) ... bnb u (t nb ) v(t )
if the disturbance v(t) is not white noise, then the LSE will not converge to the
true value ai and bi .
To deal with this problem, we may incorporate further modeling of the equation
error v(t) as discussed in chapter 4, let us say
v(t ) k (q)e(t )
Now e(t) is white noise, but the new model take us out from LS environment,
except in two cases:
• Known noise properties
• High-order models
18
Ali Karimpour Dec 2010
lecture 7
Linear Regressions and the Least-Squares Method
Colored Equation-error Noise
• Known noise properties
y(t ) a1 y(t 1) ... ana y(t na )
v(t ) k (q)e(t )
b1u (t 1) ... bnb u (t nb ) v(t )
Suppose the values of ai and bi are unknown, but k is a known filter (not too
realistic a situation), so we have
Filtering through k-1(q) gives
where
Since e(t) is white, the LS method can be applied without problems.
Notice that this is equivalent to applying the filter L(q)=k-1(q) .
19
Ali Karimpour Dec 2010
lecture 7
Linear Regressions and the Least-Squares Method
Colored Equation-error Noise
y(t ) a1 y(t 1) ... ana y(t na )
b1u (t 1) ... bnb u (t nb ) v(t )
• High-order models
v(t ) k (q)e(t )
Suppose that the noise v can be well described by k(q)=1/D(q) where D(q) is a
polynomial of order r. So we have
1
A(q) y (t ) B(q)u (t )
e(t )
D(q )
or
A(q) D(q) y (t ) B(q) D(q)u (t ) e(t )
Now we can apply LS method. Note that nA=na+r, nB=nb+r
20
Ali Karimpour Dec 2010
lecture 7
Linear Regressions and the Least-Squares Method
Consider a state space model as
x(t 1) Ax(t ) Bu (t ) w(t )
y (t ) Cx(t ) Du (t ) v(t )
1- Parameterize A, B, C, D as in section 4.3
To derive the system
2- We have no insight into the particular structure and
we would like to find any suitable matrices A, B, C, D.
Note: Since there are infinite number of such matrices that describe the same
system (similarity transformation), we will have to fix the coordinate basis of
the state space realization.
21
Ali Karimpour Dec 2010
lecture 7
Linear Regressions and the Least-Squares Method
x(t 1) Ax(t ) Bu (t ) w(t )
Consider a state space model as
y (t ) Cx(t ) Du (t ) v(t )
Note: Since there are infinite number of such matrices that describe the same
system (similarity transformation), we will have to fix the coordinate basis of
the state space realization.
Let us for a moment that not only y and u are measured the states are also
measured. This would, by the way, fix the state-space realization coordinate basis.
Now with known y, u, x the model becomes a linear regression
x(t 1)
Y (t )
,
y (t )
A B
,
C D
x(t )
(t )
,
u (t )
w(t )
E (t )
v(t )
Then
x(t 1) Ax(t ) Bu (t ) w(t )
y (t ) Cx(t ) Du (t ) v(t )
But there is some problem?
Y (t ) (t ) E (t )
States are not available to measure!
22
Ali Karimpour Dec 2010
lecture 7
Linear Regressions and the Least-Squares Method
Estimating State Space Models Using Least Squares Techniques
(Subspace Methods)
x(t 1)
Y (t )
,
y (t )
A B
,
C D
x(t )
(t )
,
u (t )
w(t )
E (t )
v(t )
By subspace algorithm x(t+1) derived from observations.
Chapter 10
23
Ali Karimpour Dec 2010
lecture 7
Models of linear time invariant system
Topics to be covered include:
Guiding Principles Behind Parameter Estimation Method.
Minimizing Prediction Error.
Linear Regressions and the Least-Squares Method.
A Statistical Framework for Parameter Estimation and the
Maximum Likelihood Method.
Correlation Prediction Errors with Past Data.
Instrumental Variable Methods.
24
Ali Karimpour Dec 2010
lecture 7
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Estimation and the Principle of Maximum Likelihood
The area of statistical inference, deals with the problem of extracting information
from observations that themselves could be unreliable.
Suppose that observation yN=(y(1), y(2),…,y(N)) has following probability density
function (PDF)
f ( ; x1 , x2 , ..., xN ) f y ( ; x N )
That is:
P( y N A)
x N A
f y ( ; x N )dx N
θ is a d-dimensional parameter vector. The propose of the observation is in fact to
estimate the vector θ using yN.
ˆ( y N ) R N R d
Suppose the observed value of yN is yN*, then
ˆ* ˆ( y*N )
25
Ali Karimpour Dec 2010
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
lecture 7
Estimation and the Principle of Maximum Likelihood
ˆ( y N ) R N R d
Many such estimator functions are possible.
A particular one
>>>>>>>>>
maximum likelihood estimator (MLE) .
The probability that the realization(=observation) indeed should take the value yN* is
proportional to
N
f y ( , y* )
This is a deterministic function of θ once the numerical value yN* isinserted and it is
called Likelihood function.
A reasonable estimator of θ could then be
ˆML ( y*N ) arg max f y ( , y*N )
where the maximization performed for fixed yN* . This function is known as the
26
maximum likelihood estimator (MLE).
Ali Karimpour Dec 2010
lecture 7
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Example: Let
y (i ) , i 1, ... , N
Be independent random variables with normal distribution with unknown means θ0
and known variances λi
y(i) N (0 , i )
A common estimator is the sample mean:
1
ˆSM ( y N )
N
N
y(i)
i 1
To calculate MLE, we start to determine the joint PDF for the observations. The
PDF for y(i) is:
( xi ) 2
1
exp
2
2i
i
Joint PDF for the observations is: (since y(i) are independent)
N
f y ( ; x )
N
i 1
( xi ) 2
1
exp
2
2i
i
27
Ali Karimpour Dec 2010
lecture 7
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Example: Let
y (i ) , i 1, ... , N
Be independent random variables with normal distribution with unknown means θ0
and known variances λi
1
ˆSM ( y N )
N
A common estimator is the sample mean:
N
y(i)
i 1
Joint PDF for the observations is: (since y(i) are independent)
N
f y ( ; x N )
i 1
( xi ) 2
1
exp
2
2i
i
So the likelihood function is:
f y ( ; y N )
Maximizing likelihood function is the same as maximizing its logarithm. So
ˆML ( y N ) arg max log f y ( ; y N )
N
N
1
1 N y (i )
arg max log 2 i
2
2
2
i 1
i 1
i
2
ˆML ( y )
1
N
y(i)
N
i 1
Ali Karimpour Dec 2010
N
i 1 i
1
/
i 28
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
lecture 7
y (i ) , i 1, ... , N
Example: Let
Be independent random variables with normal distribution with unknown means θ0
y(i) N (0 , i )
and known variances λi
1
ˆSM ( y N )
N
ˆML ( y N )
N
y(i)
i 1
1
N
1 /
i 1
i
N
y(i)
i 1
i
Different estimators
Suppose N=15 and y(i) is derived from a random generation (normal distribution)
such that the means is 10 but variances are:
10, 2, 3, 4, 61, 11, 0.1, 121, 10, 1, 6, 9, 11, 13, 15
40
The estimated means for 10 different
experiments are shown in the figure:
ˆSM ( y N )
20
0
ˆML ( y N )
-20
2
4
6
8
Different experiments
10
Exercise2:Do the same procedure for another experiments and draw the corresponding figure.
Exercise3:Do the same procedure for another experiments and draw the corresponding 29
figure. Suppose all variances as 10.
Ali Karimpour Dec 2010
lecture 7
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Maximum likelihood estimator (MLE)
ˆML ( y*N ) arg max f y ( , y*N )
Relationship to the Maximum A Posteriori (MAP) Estimate
The Bayesian approach is used to derive another parameter estimation problem.
In the Bayesian approach the parameter itself is thought of as a random variable.
Let the prior PDF for θ is:
g ( z ) P( z )
After some manipulation
P( | y N ) f y ( ; y N ) g ( )
The Maximum A Posteriori (MAP) estimate is:
ˆMAP ( y N ) arg max f y ( , y N ).g ( )
30
Ali Karimpour Dec 2010
lecture 7
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Cramer-Rao Inequality
The quality of an estimator can be assessed by its mean-square error matrix:
P E ˆ( y N ) 0 ˆ( y N ) 0
T
True value of θ
We may be interested in selecting estimators that make P small. Cramer-Rao
inequality give a lower bound for P
Let
Then
Eˆ( y N ) 0
P E ˆ( y N ) 0 ˆ( y N ) 0
T
M 1
M is Fisher
Information
matrix
31
Ali Karimpour Dec 2010
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
lecture 7
Asymptotic Properties of the MLE
Calculation of
P E ˆ( y N ) 0 ˆ( y N ) 0
T
Is not an easy task.
Therefore, limiting properties as the sample size tends to infinity are calculated
instead.
For the MLE in case of independent observations, Wald and Cramer obtain
Suppose that the random variable {y(i)} are independent and identically
distributed, so that
N
f y ( ; x1 , x2 ,..., xN ) f y (i ) ( ; xi )
i 1
Suppose also that the distribution of yN is given by fy(θ0 ;xN) for some value
θ0. Then ˆML ( y N ) tends to θ0 with probability 1 as N tends to infinity, and
N ˆML ( y N ) 0
converges in distribution to the normal distribution with zero mean covariance
32
matrix given by Cramer-Rao lower bound M-1.
Ali Karimpour Dec 2010
lecture 7
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Probabilistic Models of Dynamical Systems
Suppose
M ( ) : yˆ (t | ) g (t , Z t 1 ; )
(t , ) y (t ) yˆ (t | )
is independen t and have the
PDF f e ( x, t ; )
Recall this kind of model a complete probabilistic model.
Likelihood function for Probabilistic Models of Dynamical Systems
We note that, the output is:
y(t ) yˆ (t | ) (t , ) where (t , ) has the PDF f e ( x, t; )
Now we must determine the likelihood function
f y ( ; y N )
33
Ali Karimpour Dec 2010
lecture 7
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
f y ( ; y N ) ?
Lemma: Suppose ut is given as a deterministic sequence, and assume that the
generation of yt is described by the model
y (t ) g (t , Z t 1 ) (t ) where the conditiona l PDF of (t ) is f e ( x, t )
Then the joint probability density function for yt , given ut is:
t
f m (t , y | u ) f e y (k ) g (k , Z k 1 ), k
t
t
(I )
k 1
p( xt | Z t 1 ) f e xt g (t , Z t 1 ), t
Proof: CPDF of y(t), given Zt-1 , is
Using Bayes’s rule, the joint CPDF of y(t) and y(t-1), given Zt-2 can be expressed as:
p( xt , xt 1 | Z t 2 ) p( xt | y (t 1) xt 1 , Z t 2 ). p( xt 1 | Z t 2 )
f e xt g (t , Z t 1 ), t . f e xt 1 g (t 1, Z t 2 ), t 1
Similarly we derive (I)
34
Ali Karimpour Dec 2010
lecture 7
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Probabilistic Models of Dynamical Systems
Suppose
M ( ) : yˆ (t | ) g (t , Z t 1 ; )
(t , ) y (t ) yˆ (t | )
is independen t and have the
PDF f e ( x, t ; )
Now we must determine the likelihood function
By previous lemma
N
f y ( ; y N ) f e y (t ) g (t , Z t 1 ; ), t ;
t 1
f (t, ), t;
N
e
t 1
Maximizing this function is the same as maximizing
If we define
1
1
log f y ( ; y N )
N
N
N
log f (t , ), t;
t 1
e
l ( , , t ) log f e ( , t; )
35
Ali Karimpour Dec 2010
lecture 7
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Probabilistic Models of Dynamical Systems
Maximizing this function is the same as maximizing
1
1
log f y ( ; y N )
N
N
N
log f (t , ), t;
e
t 1
If we define
l ( , , t ) log f e ( , t; )
We may write
1
ˆML ( y ) arg min
N
N
N
l ( (t , ), t; )
t 1
The ML method can thus be seen as a special case of the PEM.
Exercise4: Find the Fisher information matrix for this system.
Exercise5: Derive a lower bound for CovˆN .
36
Ali Karimpour Dec 2010
lecture 7
Models of linear time invariant system
Topics to be covered include:
Guiding Principles Behind Parameter Estimation Method.
Minimizing Prediction Error.
Linear Regressions and the Least-Squares Method.
A Statistical Framework for Parameter Estimation and the
Maximum Likelihood Method.
Correlation Prediction Errors with Past Data.
Instrumental Variable Methods.
37
Ali Karimpour Dec 2010
lecture 7
Correlation Prediction Errors with Past Data
Ideally, the prediction error ε(t,θ) for good model should be independent of the past
data Zt-1
If ε(t,θ) is correlated with Zt-1 then there was more information available in Zt-1
about y(t) than picked up by yˆ (t | )
To test if ε(t,θ) is independent of the data set Zt-1we must check
Uncorrelated
with
All transformation of ε(t,θ)
All possible function of
Zt-1
This is of course not feasible in practice.
Instead, we may select a certain finite-dimensional vector sequence {ζ(t)} derived
from Zt-1 and a certain transformation of {ε(t,θ)} to be uncorrelated with this
sequence. This would give
1
N
N
(t ) (t , ) 0
t 1
Derived θ would be the best estimate based on the observed data.
38
Ali Karimpour Dec 2010
lecture 7
Correlation Prediction Errors with Past Data
Choose a linear filter L(q) and let
F (t, ) L(q) (t, )
Choose a sequence of correlation vectors
(t , ) (t , Z t 1 , )
Choose a function α(ε) and define
1 N
N
f N ( , Z ) (t , ) F (t , )
N t 1
Then calculate
ˆN sol f N ( , Z N ) 0
DM
Instrumental variable method (next section) is the best known
39
representative of this family.
Ali Karimpour Dec 2010
lecture 7
Correlation Prediction Errors with Past Data
1
f N ( , Z N )
N
ˆN sol f N ( , Z N ) 0
N
(t , ) F (t , )
DM
t 1
Normally, the dimension of ξ would be chosen so that fN is a d-dimensional vector.
Then there is many equations as unknowns. Sometimes one use ξ with higher
dimension than d so there is an over determined set of equations, typically without
solution. so
ˆN arg min f N ( , Z N )
DM
Exercise6: Show that the prediction-error estimate obtained from
ˆN ˆN ( Z N ) arg min VN ( , Z N )
DM
can be also seen as a correlation estimate for a particular choice of L, ζ and α.
40
Ali Karimpour Dec 2010
lecture 7
Correlation Prediction Errors with Past Data
Pseudolinear Regressions
We saw in chapter 4 that a number of common prediction models could be
written as:
yˆ (t | ) T (t , )
Pseudo-regression vector φ(t,θ) contains relevant past data, it is reasonable to
require the resulting prediction errors be uncorrelated with φ(t,θ) so:
(t , ) (t , )
( )
ˆNPLR
1
sol
N
(t , ) y (t ) (t , ) 0
t 1
N
T
Which we term the PLR estimate. Pseudo linear regressions estimate.
41
Ali Karimpour Dec 2010
lecture 7
Models of linear time invariant system
Topics to be covered include:
Guiding Principles Behind Parameter Estimation Method.
Minimizing Prediction Error.
Linear Regressions and the Least-Squares Method.
A Statistical Framework for Parameter Estimation and the
Maximum Likelihood Method.
Correlation Prediction Errors with Past Data.
Instrumental Variable Methods.
42
Ali Karimpour Dec 2010
lecture 7
Instrumental Variable Methods
Consider linear regression as:
yˆ (t | ) T (t )
The least-square estimate of θ is given by
ˆNLS
1
sol
N
T
(
t
)
y
(
t
)
(t ) 0
N
t 1
So it is a kind of PEM with L(q)=1 and ξ(t,θ)=φ(t)
Now suppose that the data actually described by
y (t ) T (t ) 0 v0 (t )
We found in section 7.3 that LSE ˆN will not tend to θ0 in typical cases.
43
Ali Karimpour Dec 2010
lecture 7
Instrumental Variable Methods
y (t ) (t ) 0 v0 (t )
T
ˆ
LS
N
1
sol
N
N
t 1
(t ) y (t ) (t ) 0
T
We found in section 7.3 that LSE ˆN will not tend to θ0 in typical cases.
1
sol
N
ˆ
IV
N
N
t 1
(t ) y (t ) (t ) 0
T
Such an application to a linear regression is called instrumental-variable method.
The elements of ξ are then called instruments or instrumental variables.
Estimated θ is:
ˆ
IV
N
1
1
1 N
T
(t ) (t )
(t ) y(t )
N t 1
N t 1
N
44
Ali Karimpour Dec 2010
lecture 7
Instrumental Variable Methods
ˆ
LS
N
1
sol
N
N
t 1
(t ) y (t ) T (t ) 0
We found in section 7.3 that LSE ˆN will not tend to θ0 in typical cases.
ˆNIV
1
sol
N
N
t 1
(t ) y (t ) T (t ) 0
1
ˆNIV
1 N
1 N
T
(t ) (t )
(t ) y(t )
N
N
t 1
t 1
Does ˆN 0 as N in IV method ?
Exercise7: Show that ˆNIV will be exist and tend to θ0 if following
equations exists.
E (t ) T (t ) be nonsingula r
E ξ (t )v0 (t ) 0
45
Ali Karimpour Dec 2010
lecture 7
Instrumental Variable Methods
So we need
E (t ) T (t ) be nonsingula r (I)
E ξ (t )v0 (t ) 0
(II)
Consider an ARX model
y (t ) a1 y (t 1) ... ana y (t na ) b1u (t 1) ... bnb u (t nb ) v(t )
A natural idea is to generate the instruments so as to secure (II) but also
consider (I)
(t ) K (q) x(t 1) ... x(t na ) u(t 1) ... u(t nb )
Where K is a linear filter and x(t) is generated through a linear system
N (q) x(t ) M (q)u (t )
Most instruments used in practice are generated in this way.
(II) And (I) are satisfied. Why?
46
Ali Karimpour Dec 2010
© Copyright 2026 Paperzz