Seemingly unrelated regression model with unequal size observations

Computational Statistics & Data Analysis 41 (2002) 211 – 229
www.elsevier.com/locate/csda
Seemingly unrelated regression model with
unequal size observations:
computational aspects
Paolo Foschi∗ , Erricos J. Kontoghiorghes
Institut d’informatique, Universite de Neuchâtel, Emile-Argand
11, Case Postal 2,
CH-2007 Neuchâtel, Switzerland
Received 1 December 2001; received in revised form 1 March 2002
Abstract
The computational solution of the seemingly unrelated regression model with unequal size
observations is considered. Two algorithms to solve the model when treated as a generalized
linear least-squares problem are proposed. The algorithms have as a basic tool the generalized
QR decomposition (GQRD) and e5ciently exploit the block-sparse structure of the matrices.
One of the algorithms reduces the computational burden of the estimation procedure by not
computing explicitly the RQ factorization of the GQRD. The maximum likelihood estimation of
the model when the covariance matrix is unknown is also considered.
c 2002 Elsevier Science B.V. All rights reserved.
Keywords: SUR model; Least squares; QR decomposition; Maximum likelihood
1. Seemingly unrelated regression with unequal size observations
The seemingly unrelated regression (SUR) model is de=ned by the set of regressions
y i = X i i + u i ;
i = 1; : : : ; G;
This work is in part supported by the Swiss National Foundation Grants 1214-056900.99/1 and
2000-061875.00/1. Part of the work of the second author was done while he was visiting INRIA-IRISA,
Rennes, France under the support of the host institution and the Swiss National Foundation Grant
83R-065887.
∗ Corresponding author. Fax: 327182701.
E-mail addresses: [email protected] (P. Foschi), [email protected]
(E.J. Kontoghiorghes).
c 2002 Elsevier Science B.V. All rights reserved.
0167-9473/02/$ - see front matter PII: S 0 1 6 7 - 9 4 7 3 ( 0 2 ) 0 0 1 4 6 - 9
212 P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229
where Xi ∈ Rt×ki , yi ∈ Rt and the disturbance vector ui ∈ Rt has zero mean and
variance–covariance matrix i; i It . Furthermore, the disturbances are contemporaneously
correlated across the equations, i.e. E(ui ujT ) = i; j It . In the compact form the SUR
model can be written as

 

  
y1
X1
1
u1
 .  





.
..
 . =
  .  +  .. 
.
 .  
 .   . 
XG
yG
G
uG
or
vec(Y ) =
G
⊕ Xi vec({i }G ) + vec(U );
i=1
(1)
where Y = (y1 · · · yG ); U = (u1 · · · uG ), the direct sum of matrices ⊕G
i=1 Xi ≡ ⊕i Xi ≡
diag(X1 ; : : : ; XG ), {i }G —abbreviated to {i }—denotes the set of vectors 1 ; : : : ; G and
vec(·) is the column stack operator with vec({i }) = (1T ; : : : ; GT )T . The disturbance
term vec(U ) has zero mean and dispersion matrix ⊗ It , where, = [i; j ] ∈ RG×G is
symmetric and positive semide=nite (Srivastava and Giles, 1987).
Computationally e5cient methods for solving SUR models have been proposed
(Foschi et al., forthcoming; Foschi and Kontoghiorghes, 2002; Kontoghiorghes, 2000a,b).
These methods formulate the SUR model as a generalized linear least squares problem
(GLLSP) and use the generalized QR decomposition (GQRD) to solve it (Kourouklis
and Paige, 1981; Paige, 1978). Often it is assumed that each regression equation has
the same number of observations, but this might not always be the case (Srivastava
and Giles, 1987). The solution of SUR models with unequal size observations (abbreviated to SUR-USO) has been previously considered (Schmidt, 1977; Sharma, 1993;
Srivastava and Zaatar, 1973). Emphasis was given in the statistical properties of the
estimators. The SUR-USO model assumes that the observations for the ith (i ¿ 1) regression match in time with those for the (i − 1)th regression. Here, computational
strategies for solving SUR-USO models are provided.
Firstly, recent methods for solving SUR models are extended to the numerical solution of the SUR-USO model when this is considered as a GLLSP. A method based on
the GQRD is proposed for solving the GLLSP by exploiting the block-sparse and recursive structures of the exogenous matrix and Cholesky factor, respectively. A recursive
strategy to reduce the computational burden of this method is presented. Finally, maximum likelihood expressions that can be used in the iterative solution of the SUR-USO
model are derived.
2. Numerical solution of the SUR-USO model
In the SUR-USO model each regression has diNerent number of observations. That
is, yi ; ui ∈ Rti , Xi ∈ Rti ×ki and the covariance matrices, for i ¡ j, are given by
E(ui ujT ) = ij (Iti 0ti ×(tj −ti ) );
(2)
P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229 213
where it has been assumed that ti 6 ti+1 . The compact form of the SUR-USO model
is given by
vec({yi }) = (⊕i Xi ) vec({i }) + vec({ui }):
(3)
The dispersion of vec({ui }) has a block matrix structure, where the (i; j)th block is
given by (2).
Consider partitioning and reordering the observations of each regression by






h1
h1
h1
y1; i
X1; i
u1; i
y  h
X  h
u  h
 2; i  2
 2; i  2
 2; i  2




yi = 
and ui = 
(4)
 ..  .. ; Xi =  ..  ..
 ..  .. ;
 . .
 . .
 .  .
yi; i
hi
Xi; i
hi
ui; i
hi
where h1 =t1 and hi =ti −ti−1 for i=2; 3; : : : ; G. The SUR-USO model can be formulated
as the set of regression equations
yi; j = Xi; j j + ui; j
for i; j = 1; 2; : : : ; G and i 6 j;
(5)
where ui; j has zero mean and dispersion matrix given by j; j Ihi . Furthermore, the cross
equation covariances are given by
i; j Ihk for l = k;
T
E(uk; i ul; j ) =
0hk ×hl for l = k;
where k 6 i and l 6 j. Regressions (5) are also equivalent to the general line model
(GLM)

  O 


X1
yO 1
uO 1
 yO   XO 
 uO 
 2  2
 2

=
 vec({i }) + 

(6)
.
.
 .   . 
 ..  ;
 .   . 
 . 
yO G
uO G
XO G
which, after appropriate substitutions, can be written as
yO = XO + u;
O
where yO Ti = (yi;Ti yi;Ti+1 · · · yi;TG ) ∈ Ri ; uOTi = (ui;Ti ui;Ti+1 · · · ui;TG ) ∈ Ri ; = vec({i }),
XO i =
k1
0
···
ki−1
···
0
ki + · · · + k G
G
⊕ Xi; j i ;
(7)
j=1
and i = (G − i + 1)hi . The disturbance vector uO has zero mean and covariance matrix
(i) ⊗Ihi , where (i) ≡ i:; i: denotes the (G −i +1)×(G −i +1) submatrix of starting
at position (i; i) (Golub and Van Loan, 1996). Furthermore, the vectors uO i and uO j are
O
uncorrelated for i =
j. Thus,
the covariance matrix of uO is given by P=⊕i ((i) ⊗Ihi ) ∈
RT ×T , where T = i ti = i i . Without loss of generality it is assumed that (i) is
non-singular and t1 ¿ ki for i = 1; : : : ; G.
214 P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229
Fig. 1. Examples of models (3), (6) and (8) for G = 3.
As in the case of the SUR model, the best linear unbiased estimator (BLUE) of
the SUR-USO model derives from the solution of the generalized linear least squares
problem (GLLSP)
argmin v
O
v;
O
subject to
yO = XO + CO v;
O
(8)
T
where uO = CO v;
O PO = CO CO and the upper triangular matrix CO has full rank. Thus, the
random vector vO has zero mean and dispersion matrix given by IT . Note that the matrix
CO is block diagonal with the ith (i = 1; : : : ; G) block given by CO i; i = Ci:; i: ⊗ Ihi , where
= CC T and C is upper triangular. Fig. 1 shows the structure of the SUR-USO model
(3), GLM (6) and GLLSP (8) for G = 3.
For the solution of (8) consider the GQRD:
T
QO XO =
K
RO K
0 T −K
(9a)
P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229 215
and
K
T
O
QO CO P=
T −K
W1; 1
W1; 2
W2; 2
0
K
;
T −K
(9b)
where K = i ki ; RO and W22 are upper triangular, and Q; P ∈ RT ×T are orthogonal.
Using (9) the GLLSP (8) can be written as
arg min vOA 2 + vOB 2
vOA ;vOB ;
yO A
yO B
where
=
T
Q yO =
yO A
yO B
RO
0
subject to
+
W1; 1
W1; 2
0
W2; 2
and
T
P vO =
vOA
vOB
vOA
vOB
;
:
−1
It follows that vOB = W22
yO B and vOA = 0. Thus, the solution of the SUR-USO model
comes from solving the triangular system
yO A
RO W1; 2
=
:
(10)
vOB
yO B
0 W2; 2
The main operations for solving the SUR-USO is the computation of the GQRD (9)
and in some extent the solution of the triangular system (10). Clearly, the computational burden of solving the SUR-USO will be reduced if the GQRD (9) is computed
e5ciently. Furthermore, the e5cient computation of (9) will have a greater impact on
the overall computational complexity if the iterative feasible estimator of the SUR-USO
is required (Srivastava and Giles, 1987). In such a case, at each iteration an estimator
in the place of the unknown is used. Thus, the QRD (9a) is computed once, while
O
(9b), and consequently (10), need to be solved at each iteration for diNerent C.
3. E#cient solution of the GLLSP
For the e5cient solution of the GLLSP (8) using the GQRD (9) the block-sparse
structure of the matrices needs to be exploited. Consider =rst the GQRD
(0) RO
= Q0T XO 1
(11a)
0
and
Q0T CO 1; 1 P0 =
K
(0)
CO 1; 1
0
1 − K
K
W1; 1
;
W̃ 1; 1 1 − K
(11b)
216 P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229
(0)
(0)
where CO 1; 1 and W̃ 1; 1 are upper triangular and P0 is orthogonal. Furthermore, RO =
and
⊕i R(0)
i


Q̃0; 1
Q̂0; 1


..
..
;
Q0 = (⊕i Q̂0; i ⊕i Q̃0; i ) ≡ 
.
.


Q̂0; G
where
X1; i = (Q̂0; i Q̃0; i )
R(0)
i
0
Q̃0; G
= Q̂0; i R(0)
i
is the QRD of X1; i for i = 1; 2; : : : ; G. Using (11), the GLLSP (8) can be equivalently
written as
arg min v̂1 2 + ṽ1 2 +
;vˆ1 ;v˜1 ;
vO2 ;:::;vOG

G
vOj 2


 (0)
(0) 
CO 1; 1
RO
  O 


 yO 2   X 2 
 0
 



 ..   . 

 .  =  ..  +  ...
 



 



 yO G   XO G 
 0
ỹ 1
0
0
ŷ 1
subject to
j=2
T
0
CO 2; 2
···
0
Ŵ 1; 1
..
.
···
..
.
0
..
.
0
..
.
0
···
CO G; G
0
···
0

v̂1

 
  vO2 
 
  .. 
 . ;
 
 
0   vOG 
ṽ1
W̃ 1; 1
(12)
T
where ŷ 1 = Q̂0 yO 1 ; ỹ 1 = Q̃0 yO 1 and P0T vO1 is conformably partitioned as P0T vO1 = (v̂T1 ṽT1 )T .
−1
Here, ṽ1 can be computed by ṽ1 = W̃ 11 ỹ 1 and thus, (12) can be reduced to
arg min v̂1 2 +
;vˆ1 ;
vO2 ;:::;vOG







yO (0)
1
yO 2
..
.
yO G
G
vOj 2
subject to
j=2
 (0)

(0)
CO
RO
 1; 1
 

 0
  XO 2 

 

 =  .  + 
 ..
  . 
 .
  . 

O
XG
0


0
···
(0)
CO 2; 2
..
.
···
..
.
0
···
where
−1
yO (0)
1 = ŷ 1 − Ŵ 1; 1 ṽ1 = ŷ 1 − Ŵ 1; 1 W̃ 1; 1 ỹ 1
(0)
and CO i; i ≡ CO i; i for i = 2; : : : ; G.

 
v̂1



0   vO2 

 
..   ..  ;
 . 
. 

(0)
vOG
CO G; G
0
(13)
P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229 217
The blocks XO i (i = 2; : : : ; G) can be annihilated by using a block generalization of a
(0)
Givens sequence. Starting from the bottom to the top RO is used as a pivot in order
to annihilate XO G ; : : : ; XO 2 one at a time. That is, for i = G; G − 1; : : : ; 2, the QRD
QiT
RO
=
XO i
QiT
(G−i)
yO (G−i)
1
yO i
K
(G−i+1)
RO
0
=
yO (G−i+1)
1
yO (G−i+1)
i
K
;
i
(14a)
K
i
(14b)
and

QiT 
K
i
i+1
(G−i)
CO 1; 1
0
(G−i)
CO 1; i+1
···
(G−i)
CO 1; G
0
(0)
CO i; i
0
···
0
K

= 
···
G
···
i


G
(G−i+1)
CO 1; 1
(G−i+1)
CO 1; i
···
(G−i+1)
CO 1; G
(G−i+1)
CO i; 1
(G−i+1)
CO i; i
···
(G−i+1)
CO i; G


K
(14c)
i
(i)
(i)
are computed, where Qi ∈ R(K+i )×(K+i ) is orthogonal, RO = ⊕G
and Rj(i) ∈
j=1 Rj
(G−i+1)
(G−i+1)
(G−i+1)
, CO i; 1
and CO i; i+1 ; : : : ;
Rkj ×kj is upper triangular. Notice that at each step CO 1; i
(G−i+1)
are =lled-in. This results in =lling the block-superdiagonals and =rst blockCO i; G
(0)
column of ⊕i CO i; i .
(0)
Let W denote the modi=ed ⊕i CO i; i , that is,

K
2
3
···
G
W1;(0)1
W1;(0)2
W1;(0)3
···
W1;(0)G
W2;(0)2
W2;(0)3
···
W2;(0)G
0
W3;(0)3
···
W3;(0)G
..
.
..
.
..
..
.
0
0
···

 (0)
 W2; 1


 (0)
W
W ≡
 3; 1

 .
 ..


(0)
WG;
1
.
(0)
WG;
G














218 P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229

K
2
3
···
G
(G−1)
CO 1; 1
(G−1)
CO 1; 2
(G−1)
CO 1; 3
···
(G−1)
CO 1; G
(G−1)
CO 2; 2
(G−1)
CO 2; 3
···
(G−1)
CO 2; G
0
(G−2)
CO 3; 3
···
..
.
..
.
..
0
0
···

 (G−1)
 CO 2; 1

 (G−2)
= 
 CO 3; 1

 .
 .
 .

(1)
CO G; 1






(G−2) 
CO 3; G 

.. 

. 

(1)
O
C G; G
.
K
2
(15)
3 :
..
.
G
The RQD of W can derive by a sequence of G − 1 orthogonal factorizations which
(0)
annihilate from the bottom to the top the submatrices W2;(0)1 ; : : : ; WG;
1 . The ith (i =
1; : : : ; G − 1) factorization computes

W1;(i−1)
1


..

.



(i−1)
 WG−i;
1

(i−1)
WG−i+1; 1
W1;(0)G−i+1
..
.
(0)
WG−i;
G−i+1
(0)
WG−i+1;
G−i+1
K






 Pi =




 .
 ..


 (i)
 WG−i; 1

0
G−i+1
W1;(i)1
W1;(i)G−i+1
..
.
(i)
WG−i;
G−i+1
(i)
WG−i+1;
G−i+1









K
..
.
G−i
;
G−i+1
(16)
(i)
where Pi ∈ R(K+G−i+1 )×(K+G−i+1 ) is orthogonal and WG−i+1;
G−i+1 is upper triangular.
Thus, the upper triangular factor in the RQD of W is given by

K
l∗
where l∗ =
K
W1;∗ 1
0
G



∗

W1; 2

= 
∗

W2; 2

l∗
K
2
···
G
W1;(G−1)
1
W1;(G−1)
2
···
W1;(1)G
0
W2;(G−1)
2
···
W2;(1)G
..
.
..
.
..
0
0
···
.
..
.
(1)
WG;
G









K
2
..
.
;
(17)
G
i=2 i . Fig. 2 shows at each step of the procedure the annihilation and
(0)
O
=lled-in of X 2 ; : : : ; XO G and ⊕i CO i; i , respectively, and the retriangularization of W . From
(14a), (14b) and (17), it follows that the solution of GLLSP (13) is given by the
solution of the triangular system
(G−1)
yO 1(G−1)
RO
W1;∗ 2
=
;
∗
∗
vO∗B
yO
0
W2; 2
P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229 219
(0)
Fig. 2. Annihilation of XO i , =lled-in of ⊕i CO i; i and retriangularization of W , where G = 5.
where

yO 2(G−1)


..
:
yO ∗ = 
.



yO G(1)
The matrices in (14a), (14c) and (16) have block-sparse structures which can facilitate the development of fast factorization algorithms. From (7) and the block-diagonal
(G−i)
it follows that the QRD (14a) can be derived by computing the
structure of RO
(G − i + 1) updating QRDs (UQRDs)
Qi;Tj
Rj(G−i)
Xi; j
=
kj
(G−i+1)
Rj
kj
(j = 1; : : : ; G − i + 1):
hi
0
(18)
Thus, in (14a) Rs(G−i) ≡ Rs(G−i+1) for s = 1; : : : ; i − 1 and

I"i
0
0


Qi = 
 0
(1; 1)
⊕G
j=i Qi; j
(1; 2) 
;
⊕G
j=i Qi; j

0
(2; 1)
⊕G
j=i Qi; j
(2; 2)
⊕G
j=i Qi; j
(19)
220 P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229
(0)
Fig. 3. Annihilation of XO i and =ll-in of ⊕i CO i; i , where G = 5 and i = 1; : : : ; G.
where "i =
i−1
j=1 kj

Qi; j = 
and
hi
kj
Qi;(1;j 1)
Qi;(1;j 2)
Qi;(2;j 1)
Qi;(2;j 2)

kj :

hi
Note that when QiT in (19) is used to compute (14c), then in (15)
"i K − "i
(0)
i ;
Wi;(0)
=
0
W̃
1
i; 1
W1;(0)j =
j
0
(0)
W̃ 1; j
"j
K − "j
(20a)
(20b)
;
and
Wi;(0)
j =
j
0
(0)
W̃ i; j
(0)
(j − i)hi
(G − j + 1)hi
(0)
(0)
if i ¡ j;
(20c)
where Wi;(0)
i ; W̃ i; 1 ; W̃ 1; i and W̃ i; j are block upper triangular (i; j = 2; : : : ; G). Fig. 3
shows the structure of W after the UQRDs (18) have been computed, where G = 5. A
numeral i in XO i; j and W denotes, respectively, the annihilated and =lled-in submatrices
which resulted from the UQRDs at step i (i = 1; : : : ; G).
P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229 221
Fig. 4. Computing (16), where i = 3 and G = 5.
The RQD of W using (16) needs to take into account the sparse structure of the
submatrices. Sequential and parallel strategies for computing similar factorizations have
been proposed (Kontoghiorghes, 1999; Kontoghiorghes, 2000a,b). The block-diagonals
(i−1)
of W̃ G−i+1; 1 in (16) can be annihilated one at a time with a series of factorizations
(i−1)
(i−1)
which preserve the sparse and triangular structure of W̃ 1; 1 ; : : : ; W̃ G−i; 1 . The orthogonal
(1)
(i−j+1)
matrix Pi is de=ned as Pi = P̃ i; 1 · · · P̃ i; i , where P̃ i; j = P̂ i; j · · · P̂ i; j
(i−1)
(s)
and P̂ i; j annihilates
the sth block of the (G−i+j)th block-diagonal of W̃ G−i+1; 1 (i=1; : : : ; G−1; j=1; : : : ; i
and s = 1; : : : ; i − j + 1). Fig. 4 illustrates this strategy for computing (16), where i = 3
and G = 5. Arcs connecting the blocks and block-columns indicate those aNected by
(s)
the orthogonal factorization P̂ i; j .
4. A recursive strategy for solving the SUR-USO model
In the GQRD (9) the computations of the QRD (9a) and RQD (9b) can be interO to
leaved. The orthogonal matrix QiT in (14a) when applied from the left of (XO C)
O
O
annihilate X i will =ll-in a block in the lower part of C. This =ll-in is eliminated by
O That
the application of an orthogonal transformation from the right of the modi=ed C.
is, following (14a) and(14b),

QiT 
i
i+1
···
G
(G−i)
CO 1; 1
K
0
(G−i)
CO 1; i+1
···
(G−i)
CO 1; G
0
(0)
CO i; i
0
···
0


222 P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229
(0)
Fig. 5. Annihilation of XO 2 ; : : : ; XO G and retriangularization of ⊕i CO i; i , where G = 5.

= 
K
i
i+1
···
G
(G−i)
Ĉ 1; 1
(G−i)
Ĉ 1; i
(G−i+1)
CO 1; i+1
···
(G−i+1)
CO 1; G
(G−i)
(G−i+1)
CO i; i+1
···
(G−i+1)
CO i; G
(G−i)
Ĉ i; 1
Ĉ i; i


K
(21)
i
and the RQD
K
i


K
i
(G−i)
Ĉ 1; 1
(G−i)
Ĉ 1; i
(G−i)
Ĉ i; 1
(G−i)
Ĉ i; i


 Pi = 
K
i
(G−i+1)
CO 1; 1
(G−i+1)
CO 1; i
0
(G−i+1)
CO i; i


K
(22)
i
(G−i+1)
(G−i+1)
are computed, where CO 1; 1
and CO i; i
are upper triangular, Pi ∈ R(K+i )×(K+i )
(G−i)
(G−i)
have, respectively, the same structure as Wi;(0)
1 and
T
(0)
W1; i in (20) and i = G; G − 1; : : : ; 2. The orthogonal matrices QO and PO in (9) are
de=ned as the products of the left and right transformations, respectively. Furthermore,
O instead of 2i blocks of its corresponding
note that (22) involves only 4 blocks of C,
(16). This results in an algorithm which has less computational complexity and lower
memory usage. The annihilations and =ll-ins occurring at each step of this procedure
are shown in Fig. 5, where G = 5.
Note that after the (i + 1)th (i = G − 1; G − 2; : : : ; 1) step of the above strategy the
GLLSP (13) can be written as
i
argmin vO1(G−i) 2 + vO∗(G−i) 2 +
vOj 2 subject to
is orthogonal, Ĉ i; 1
vO1(G−i) ;vO∗(G−i) ;
vO2 ;:::;vOi ;
and Ĉ 1; i
j=2
P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229 223


yO (G−i)
1

 yO 2

 .
 ..


 yO i
yO (G−i)
∗

RO









 + 







  O
  X2
 
  .
 =  ..
 
 
  XO i
0
(G−i)
where CO ∗
is (

(G−i) 
G
j ) × (
G
(G−i)
CO 1; 1
argmin v̂1(G−i) 2 +
;vˆ1(G−i) ;
vO2 ;:::;vOi

ŷ(G−i)
1

 yO 2

 .
 ..

yO i
i
···
0
0
..
.
(0)
CO 2; 2
..
.
···
..
.
0
..
.
0
0
···
(0)
CO i; i
0
0
···
0
∗
(23)
vO∗(G−i) = vec({vOi+1 ; : : : ; vOG }). Thus, (23) is equiv-
vOj 2
subject to
j=2
 (G−i)

(G−i)
CO
O
R
 1; 1
 


  XO 2 
 0
 

 =  .  + 
 ..
  . 
 .
  . 

XO i
0


 (G−i) 
 v̂1


0   vO2 


..   ..  ;


.  . 



 vOi 
0 

(G−i)
vO∗(G−i)
CO
(G−i)
CO 1; i:G
j ) upper triangular and non-singular, yO (G−i)
=
∗
j=i+1
j=i+1
(G−i−1)
(1)
vec({yO (G−i)
;
y
O
;
:
:
:
;
y
O
})
and
i+1
i+2
G
alent to
0

0
···
(0)
CO 2; 2
..
.
···
..
.
0
···
0

v̂1(G−i)

  vO
 2

  ..
 .

(0)
vOi
CO
0
..
.




 ; (24)


i; i
where
(G−i)
ŷ(G−i)
= yO (G−i)
− CO 1; i+1:G vO∗(G−i)
1
1
and
(G−i) −1 (G−i)
) yO ∗ :
vO∗(G−i) = (CO ∗
The latter suggest a recursive strategy which solves a sequence of smaller in size
GLLSP and requires less computational eNort of computing the RQD. At the ith (i =
G; G − 1; : : : ; 2) step the recursive algorithm solves the GLLSP (24) by computing the
QRD in (14a),
ŷ(G−i)
ỹ(G−i)
T
1
1
=
;
(25)
Qi
yO i
ŷ i

(G−i)
 (G−i)
(G−i)
O
C 1; 1
Ĉ 1; 1
0
Ĉ 1; i

QiT
(26)
=  (G−i)
(0)
(G−i)
O
0
C i; i
Ĉ i; 1
Ĉ i; i
and the RQD in (22). As in the case of (13), the GLLSP (24) is reduced to
argmin v̂1(G−i) 2 +
;vˆ1(G−i+1) ;
vO2 ;:::;vOi−1
i−1
j=2
vOj 2
subject to
224 P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229
(0)
Fig. 6. Annihilation of XO 2 ; : : : ; XO G and retriangularization of ⊕i CO i; i , where G = 5.







ŷ(G−i+1)
1
yO 2
..
.
yO i−1
 (G−i+1)

(G−i+1)
CO
RO
 1; 1
 


  XO 2 
0

 


+
=


.
..
 

..

.
 


XO i−1
0
 

0
···
0
(0)
CO 2; 2
..
.
···
..
.
0
..
.
0
···
(0)
CO i−1; i−1
(G−i+1)
 v̂1






vO2
..
.
vOi−1




;


(27)
where
(G−i+1)
(G−i+1) −1
ŷ(G−i+1)
= ỹ(G−i)
− CO 1; i
(CO i; i
) ŷ i :
1
1
(28)
The structure of (27) is the same as that of (24), but smaller in size. The GLLSP (13)
is equivalent to (24) when i = G, and ŷ(G−i)
≡ yO (0)
1
1 . Thus, this process can be applied
iteratively to solve (13) and derive the BLUE of the SUR-USO model. Algorithm 1
summarizes the steps of this recursive procedure and Fig. 6 illustrates the factorization
steps for G = 5.
Algorithm 1 Iterative estimation of the SUR-VAR model.
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
T
T
Compute the GQRD (11), ŷ 1 = Q̂0 yO 1 and ỹ 1 = Q̃0 yO 1 .
Solve the upper triangular system W̃ 11 ṽ1 = ỹ 1 .
(0)
Compute yO 1 = ŷ 1 − Ŵ 11 ṽ1 .
for i = G; G − 1; : : : ; 2 do
Compute the UQRD (14a).
Compute (25) and (26).
Compute the RQD (22).
Compute (28).
end for
(G−1)
(G−1)
Solve the upper triangular system RO
= ŷ1
.
P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229 225
5. Maximum likelihood estimation
Under normality assumptions, the maximum likelihood (ML) estimators for i and
derive from the solution of the non-linear equations
@L
=0
(29a)
@
and
@L
= 0;
(29b)
@
where L is the log-likelihood function for the SUR-USO model (3). The non-linear
equations (29) are solved by using the EM algorithm. An initial estimator for is
choosen in order to obtain an estimator for i from (29a), which in term is used to provide a new estimator for . This process is repeated until convergence
(Dempster et al., 1977).
The solution of (29a) is equivalent to the GLS estimator (3) and can be computed
using the previously derived methods. Thus, only the numerical solution of (29b)
will be considered. Note that, when the disturbances are not normally distributed this
approach can be considered as quasi-maximum likelihood estimation procedure.
The SUR-USO model (3) is equivalent to the set of equations,
Y1 = (X1; 1 1 X1; 2 2 · · · X1; G G ) + U1 ;
Y2 = (X2; 2 2 · · · X2; G G ) + U2 ;
..
.
YG = XG; G G + U2 ;
(30)
hi ×(G−i+1)
hi ×(G−i+1)
where Yi = (yi; i : : : yi; G ) ∈ R
and Ui = (ui; i : : : ui; G ) ∈ R
has a multivariate distribution with zero mean and covariance matrix given by (i) . That is,
vec(Ui ) has zero mean and covariance matrix given by (i) ⊗ IG−i+1 . Furthermore, the
elements of Ui and Uj are uncorrelated for i = j.
The log-likelihood function of the ith equation in (30) and of the whole set are
given by
1
−1
))
Li = − (i + hi log(det((i) )) + tr(UiT Ui (i)
2
G
T
and C(i) = Ci:; i: , it follows
and L = i=1 Li , respectively. Now, from (i) = C(i) C(i)
that
@Li
−1 T
T
= hi C(i)
− C(i)
U i Ui :
(31)
−1
@C(i)
−1
Furthermore, since C(i)
is a submatrix of C −1 , the derivative for the log-likelihood
function of SUR-USO model (30) with respect to C −1 is given by
G
0(i−1)×(i−1)
0(i−1)×(G−i+1)
@L
:
(32)
=
−1
@C −1
0(G−i+1)×(i−1)
@Li =@C(i)
i=1
226 P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229
Substituting (31) into (32) and considering only the non-zero elements of C −1 —the
elements in its upper triangular—gives
G
0
0
@L
;
= vech DT DC −
−T
@ vech(C −T )
0 UiT Ui C(i)
i=1
= vech
DT −
G
i=1
0
0
0
−T
UiT Ui C(i)
DC−1
DC
;
(33)
where DT = diag(t1 ; t2 ; : : : ; tG ), DC = diag(C1; 1 ; C2; 2 ; : : : ; CG; G ) and vech is the halfvectorization operator which stacks the columns of its matrix argument from the
principal diagonal downwards (LUutkepohl, 1996). That is, if A = [ai; j ] ∈ Rn×n , then
vech(A) = (aT1:n; 1 aT2:n; 2 · · · an; n )T .
From
i−1
G−i+1
it follows that
0
−1
DC
0
i−1
0
0
G−i+1
0
C −1 =
IG−i+1
0
−1 T
C(i)
U i Ui
=
0
0
i−1
0
0
0
C̃
IG−i+1
G−i+1
0
i−1
−1
C(i)
G−i+1
−1
0
0
0
UiT Ui
where C̃ = CDC . Thus,
G 0
0
−T
−1
vech
DC
= AO vech(C̃ );
−T
T
U
C
0
U
i
i
(i)
i=1
where
G 0
AO = LG
0
i=1
0
IG−i+1
⊗
0
0
0
UiT Ui
;
(34)
LTG
(35)
and the G(G + 1)=2 × G 2 elimination matrix LG is de=ned by vech(X ) = LG vec(X ),
for any matrix X ∈ RG×G (LUutkepohl, 1996).
Now, if UO i = (0hi ×(i−1) Ui ) ∈ Rhi ×G , then AO in (35) can be written as
G
0
0
T
⊗ UO i UO i LTG
LG
AO =
0
I
G−i+1
i=1
G
G
0(G−j+1)×( j−1)
T
O
O
=
⊕ (0(G−j+1)×( j−1) I(G−j+1) )U i U i
j=1
I(G−j+1)
i=1
G
= ⊕ AOj ;
j=1
P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229 227
where AOj ∈ R(G−j+1)×(G−j+1) is given by
G
T
0(G−j+1)×( j−1)
UO i UO i
AOj = (0(G−j+1)×( j−1) I(G−j+1) )
I(G−j+1)
i=j

u1; i
 .
=
 ..
ui; i
···
···
T 
u1; i
u1; G


..   ..
.   .
ui; i
ui; G
···
···

u1G
.. 

. :
ui; G
O is semide=nite.
Note that if G ¿ t1 , then AO1 , and thus A,
From (33) and (34) it follows that the solution of the non-linear equation in (29b)
derives from the solution of the symmetric linear system
AO vech(M ) = vech(DT );
or, equivalently, from solving the set of symmetric linear systems
AOi Mi; i:G = ti e1
(i = i = 1; 2; : : : ; G);
(36)
−1
where M ≡ C̃ and e1 denotes the =rst column of the identity matrix. Once C̃ = M −1
is computed, from the de=nition of C̃ it follows that the elements of C are given by

 C̃
for i = j;
i; i
Ci; j =

C̃ i; j =Ci; i for j = 1; 2; : : : ; i − 1;
where it has been assumed that AO is positive de=nite and thus, C̃ i; i ¿ 0. Note that when
O is positive semide=nite, which implies that (36) may not have
t1 ¡ G; AO1 , and thus A,
solutions.
6. Conclusions
Computationally e5cient methods to solve the SUR model with unequal size of
observations (SUR-USO) which is treated as a GLLSP have been proposed. The
algorithms use the GQRD to solve the GLLSP by exploiting the block-sparse structure
of the matrices. The =rst algorithm initially computes the QRD of the exogenous matrix by annihilating from bottom to the top blocks of observations which consist of a
non-zero block-superdiagonal. The annihilation of the blocks is obtained by orthogonal
transformations which do not create any =ll-in. These transformations are also applied
from the left of the Cholesky factor and then, a sequence of orthogonal factorizations is
applied to retriangularize it from the right. The second recursive algorithm interleaves
the QRD and RQD of the exogenous and modi=ed Cholesky factors, respectively. This
avoids the explicit computation of the RQD and thus, reduces the computational burden
of the estimation procedure.
228 P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229
The algorithms presented here assumed for simplicity that t1 ¿ ki ; (i = 1; : : : ; G).
(G−i)
in (14a) is upper triangular and not trapezoidal. Generally, this
This implies that RO
assumption should be relaxed and the algorithms modi=ed to deal with cases where
the QRD (14a) yields a trapezoidal. This generalization will allow the investigation of
alternative block-generalizations of Givens sequences to compute the QRD (9a) without
(G−i)
is triangular.
imposing additional assumptions so that RO
For the case of normally distributed disturbances the maximum likelihood estimation
has been considered. A closed-form solution of the Cholesky factor of the covariance
matrix has been derived by solving the =rst-order conditions (29). This resulted an
iterative procedure to estimate the SUR-USO model when the variance–covariance
matrix is unknown. Furthermore, this procedure never yields a non-de=nite estimator
for .
The extension of the proposed methods to solve SUR models with missing observations will be investigated (Hocking and Smith, 1968). Currently, the adaptation
and (parallel) implementation of the recursive algorithm to solve the standard SUR
model—with equal size observations—is considered.
Acknowledgements
The authors are grateful to Jesse Barlow, Jesse Barlow and the anonymous referees
for their constructive comments and suggestions.
References
Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM
algorithm. Discussion. J. Roy. Statist. Soc. Ser. B 39, 1–38.
Foschi, P., Kontoghiorghes, E.J., 2002. Estimation of VAR(p) models: computational aspects. Computat.
Econom., to appear, 2003.
Foschi, P., Belsley, D.A., Kontoghiorghes, E.J. A comparative study of algorithms for solving seemingly
unrelated regressions models. Computat. Statist. Data Anal., to appear, 2003.
Golub, G.H., Van Loan, C.F., 1996. Matrix Computations, 3rd Edition. Johns Hopkins University Press,
Baltimore, MD.
Hocking, R.R., Smith, W.B., 1968. Estimation of parameters in the multivariate normal distribution with
missing observations. J. Amer. Statist. Assoc. 63, 159–173.
Kontoghiorghes, E.J., 1999. Parallel strategies for computing the orthogonal factorizations used in the
estimation of econometric models. Algorithmica 25, 58–74.
Kontoghiorghes, E.J., 2000a. Parallel Algorithms for Linear Models: Numerical Methods and Estimation
Problems, Advances in Computational Economics, Vol. 15. Kluwer Academic Publishers, Boston, MA.
Kontoghiorghes, E.J., 2000b. Parallel strategies for solving SURE models with variance inequalities and
positivity of correlations constraints. Computat. Econom. 15 (1–2), 89–106.
Kourouklis, S., Paige, C.C., 1981. A constrained least squares approach to the general Gauss–Markov linear
model. J. Amer. Statist. Assoc. 76 (375), 620–625.
LUutkepohl, H., 1996. Handbook of Matrices. Wiley, New York.
Paige, C.C., 1978. Numerically stable computations for general univariate linear models. Comm. Statist.
Simulation Comput. 7 (5), 437–453.
Schmidt, P., 1977. Estimation of seemingly unrelated regressions with unequal numbers of observations. J.
Econometrics 5, 365–377.
P. Foschi, E.J. Kontoghiorghes / Computational Statistics & Data Analysis 41 (2002) 211 – 229 229
Sharma, V.K., 1993. Estimation of seemingly unrelated regressions with unequal numbers of observation.
Sankhya: Indian J. Statist. 55, 135–138.
Srivastava, V.K., Giles, D.E.A., 1987. Seemingly Unrelated Regression Equations Models: Estimation and
Inference (Statistics: Textbooks and Monographs), Vol. 80. Marcel Dekker, Inc., New York.
Srivastava, J.N., Zaatar, M.K., 1973. A Monte carlo comparison of four estimators of the dispersion matrix
of a bivariate normal population, using incomplete data. J. Amer. Statist. Assoc. 68 (341), 180–183.