Computational Statistics & Data Analysis 44 (2003) 3 – 35 www.elsevier.com/locate/csda A comparative study of algorithms for solving seemingly unrelated regressions models Paolo Foschia;∗ , David A. Belsleyb , Erricos J. Kontoghiorghesa a Institut d’informatique, Universite de Neuchâtel, Rue Emile Argand 11, Case Postale 2, CH-2007 Neuchâtel, Switzerland b Department of Economics, Boston College, Chestnut Hill, MA 02467, USA Received 1 February 2002; received in revised form 9 January 2003; accepted 9 January 2003 Abstract The computational e1ciency of various algorithms for solving seemingly unrelated regressions (SUR) models is investigated. Some of the algorithms adapt known methods; others are new. The 4rst transforms the SUR model to an ordinary linear model and uses the QR decomposition to solve it. Three others employ the generalized QR decomposition to solve the SUR model formulated as a generalized linear least-squares problem. Strategies to exploit the structure of the matrices involved are developed. The algorithms are reconsidered for solving the SUR model after it has been transformed to one of smaller dimensions. c 2003 Elsevier B.V. All rights reserved. Keywords: SUR models; Least-squares; QR decomposition 1. Introduction The seemingly unrelated regressions (SUR) model is de4ned by the set of regressions y i = X i i + u i ; i = 1; : : : ; G; where Xi ∈ RM ×ki has full column rank, yi ∈ RM , and the M -element disturbance vector ui ∼ (0; i; i IM ) and is contemporaneously correlated across the equations so This work is in part supported by the Swiss National Foundation Grants 1214-056900.99/1 and 2000-061875.00/1. ∗ Corresponding author. Department of Mathematics, University of Bologna, Via Sacchi, 3, 47023 Cesena, Italy. Tel.: +39-0547-642-806; fax: +39-0547-610-054. E-mail addresses: [email protected] (P. Foschi), [email protected] (D.A. Belsley), [email protected] (E.J. Kontoghiorghes). c 2003 Elsevier B.V. All rights reserved. 0167-9473/03/$ - see front matter doi:10.1016/S0167-9473(03)00028-8 4 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 E(ui ujT ) = i; j IM (Srivastava and Dwivedi, 1979; Srivastava and Giles, 1987; Telser, 1964; Zellner, 1962,1963; Zellner and Theil, 1962). Compactly, the SUR model is written 1 X1 u1 y1 2 u2 y2 X2 = .. . + . .. .. .. . . XG yG G uG or vec(Y ) = ⊕G i=1 Xi vec({i }G ) + vec(U ); (1) where Y = (y1 : : : yG ), U = (u1 : : : uG ), ⊕G i=1 Xi ≡ ⊕i Xi ≡ diag(X1 ; : : : ; XG ), denotes the direct sum of matrices, {i }G denotes the set of vectors 1 ; : : : ; G , and vec(·) is the column stack operator. The disturbance term vec(U ) ∼ (0; ⊗ IM ), where = [i; j ] ∈ RG×G is symmetric and non-negative de4nite and ⊗ denotes the Kronecker product. In this treatment the following properties of the Kronecker product will be used: (A⊗B)(C ⊗D)=AC ⊗BD, (A⊗B)−1 =A−1 ⊗B−1 and vec(ABC)=(C T ⊗A)vec(B). For notational convenience, the subscript G in the set operator {·} is omitted, and T T T ⊕G i=1 is abbreviated as ⊕i . Also, vec({i }) is denoted simply , so ≡ (1 · · · G ) (Regalia and Mitra, 1989). The notation is consistent with that employed in Foschi and Kontoghiorghes (2002). The standard colon notation will be used to denote subvectors and submatrices (Golub and Van Loan). When is non-singular, a best linear unbiased estimator (BLUE) of results from solving the generalized (linear) least-squares (GLS) problem argmin vec(Y ) − vec({Xi i })−1 ⊗IM ; 1 ;:::;G which can be obtained from the normal equations ⊕i XiT (−1 ⊗ IM ) (⊕i Xi ) ˆ = ⊕i XiT vec(Y−1 ): (2) (3) This solution, however, can be unstable when the matrices are ill-conditioned and explicit matrix inversions are used (BjNorck, 1996; Lawson and Hanson, 1974). Alternatively, multiplying (1) by (C −1 ⊗ IM ) gives the ordinary linear model (OLM) vec(YC −T ) = (C −1 ⊗ IM ) (⊕i Xi ) + vec(UC −T ); (4) where C ∈ RG×G is a Cholesky decomposition of ≡ CC T and is upper triangular. Computing the least-squares estimator of (4) derives the BLUE of the SUR model (1) (Pollock, 1979). The SUR model can also be formulated as a generalized linear least-squares problem (GLLSP) argminV 2F subject to vec(Y ) = (⊕i Xi ) + (C ⊗ IM )vec(V ); (5) P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 5 where VC T = U , vec(V ) ∼ (0; IGM ) and · F denotes the Frobenius norm (Kontoghiorghes, 2000a; Kontoghiorghes and Clarke, 1995). This approach allows the derivation of algorithms that are numerically more stable than those based on (4). Furthermore, the GLLSP allows solution of the BLUE of (1) even when C is not full rank, that is, when is singular (Kourouklis and Paige, 1981; Paige, 1978, 1979b; SNoderkvist, 1996). Often, is unknown and an iterative procedure is used to obtain the feasible GLS estimator (Telser, 1964). Given a consistent estimator of the solution of the model (2) derives an estimator for . From the residual of the estimated coe1cients, another estimator of is obtained. This procedure is repeated until convergence. Thus, the GLS problem (2), or the corresponding GLLSP (5), is solved a number of times for diQerent . Here, the computational cost of deriving the estimator during a single iteration is considered. The particular properties of the SUR model that aQect the convergence of the iterative estimation procedure are not investigated. In this work, the computational e1ciencies of various methods for computing the BLUE of the SUR model are considered. Some of the algorithms are well known while others are new. All of the algorithms are based on an orthogonal factorization obtained through the QR decomposition. In the next section the solutions of the SUR model using the QR and generalized QR decompositions are considered. Recursive estimation algorithms are presented in Section 3. Size reduction of large-scale SUR models is shown in Section 4. The computational results are discussed in Section 5. Section 6 provides summary comments. 2. Numerical estimation of the SUR model 2.1. Estimating the OLM using the QR decomposition The OLM (4) can be written as yR = XR + ;R (6) where yR = vec(YC −T ), XR = (C −1 ⊗ IM )(⊕i Xi ), and R = vec(UC −T ). Let the QR decomposition (QRD) of XR be given by yR A K RR K T R T R R and Q yR = ; (7) Q X= yR B GM −K 0 GM −K where RR is upper triangular, QR ∈ RGM ×GM is orthogonal, and K = i ki (BjNorck, 1996; Golub and Van Loan, 1996). The least-squares estimator of is given by solving the triangular system R = y: R R (8) This straightforward solution of (4) is computationally ine1cient since it computes XR explicitly and ignores its sparsity. 6 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 To solve (4) e1ciently, consider the QRD of Xi : T Q̃i Ri ki T T ; with Qi = ; Qi Xi = T 0 Q̂i M −ki (9) where Qi ∈ RM ×M is orthogonal and Ri ∈ Rki ×ki is upper triangular. From (9) it follows that the QRD of ⊕i Xi is given by R K ⊕ i i ; (10) QT (⊕i Xi ) = GM −K 0 where Q = (⊕i Q̃i ⊕i Q̂i ). Premultiplying (4) by QT gives QT vec(YC −T ) = QT (C −1 ⊗ IM ) (⊕i Xi ) + QT vec(UC −T ); or vec({yR̃ i }) vec({yR̂ i }) = WR̃ WR̂ + vec(VR̃ ) vec(VR̂ ) ; (11) where WR̃ = WR̂ = k1 ··· WR̃ 1; 1 ··· .. . ··· k1 ··· WR̂ 1; 1 ··· .. . WR̃ G; G kG WR̂ 1; G .. . ··· T i; j Q̃i Xj ; WR̃ i; j = i; i Ri ; 0; WR̂ i; j = WR̃ 1; G .. . WR̃ G; 1 WR̂ G; 1 kG WR̂ G; G k1 . ; . . kG M −k1 .. . ; (12a) M −kG if i ¡ j; if i = j; (12b) if i ¿ j; i; j Q̂Ti Xj ; if i ¡ j; 0; if i ¿ j; (12c) P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 7 and i; j is the (i; j)th element of C −1 . Notice that WR̃ and WR̂ are block upper-triangular and strictly block upper-triangular matrices, respectively. Now, compute a row-updating QRD (hereafter abbreviated to UQRD) WR̃ RR K QR T = (13a) 0 GM −K WR̂ and QR T vec({yR̃ i }) vec({yR̂ i }) = yR yR ∗ K GM −K : (13b) It follows that the least-squares solution of (11), and thus the BLUE of , is given by (8). Algorithm 1 summarizes these steps for solving (4). Two block strategies, the column- and diagonally-based methods, that can be used to compute (13)—step 8—are described in Appendix A. Algorithm 1. Ordinary least squares estimation of the OLM (4). 1: 2: 3: 4: 5: 6: 7: 8: 9: Compute = CC T Compute C −1 = [i; j ] and YR = (yR 1 · · · yR G ) = YC −T for i = 1; : : : ; G do Compute the QRD (9) Compute yR̃ i = Q̃Ti yR i and yR̂ i = Q̂Ti yR i end for Compute WR̃ and WR̂ as in (12a) Compute the UQRD (13a) and (13b) Solve the triangular system (8) for 2.2. The GLLSP and generalized QRD The solution of the GLLSP (5) can be obtained by computing the generalized QRD (GQRD) of ⊕i Xi and (C ⊗ IM ) (BjNorck, 1996; Kontoghiorghes, 2000; Kontoghiorghes and Dinenis, 1997; Kourouklis and Paige, 1981; Paige, 1990), that is, by computing the QRD (10) and the RQ decomposition QT (C ⊗ IM )P = W ≡ K GM −K WAA WAB 0 WBB K ; (14) GM −K where K = i ki , P ∈ RGM ×GM is orthogonal, and WBB is upper triangular. Premultiplying the constraints in (5) by QT and using vec(V ) ≡ PP T vec(V ), the GLLSP can 8 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 be written as argmin G ;{v˜i };{vˆi } i=1 (ṽi 2 + v̂i 2 ) vec({ỹ i }) = vec({ŷ i }) ⊕i Ri subject to + 0 WAA WAB 0 WBB vec({ṽi }) vec({v̂i }) ; (15) where Q̃Ti yi = ỹ i , Q̂Ti yi = ŷ i , P T vec(V ) = (vec({ṽi })T vec({v̂i })T )T , ỹ i ; ṽi ∈ Rki and ŷ i ; v̂i ∈ RM −ki . The solution of (15) is given by vec({ṽi }) = 0 and ⊕i Ri WAB 0 WBB = vec({v̂i }) vec({ỹ i }) vec({ŷ i }) : (16) The RQD (14) derives in two stages. The 4rst computes the permutation T Q (C ⊗ IT )# = K GM −K W̃AA W̃AB W̃BA W̃BB K ; (17) GM −K where # = (⊕i (Iki 0)T ⊕i (0 IM −ki )T ). This results in W̃AA , W̃AB , W̃BA and W̃BB being block upper-triangular. The second stage computes the RQD ( W̃BA W̃BB )P̃ = ( 0 WBB ) ( W̃AA W̃AB )P̃ = ( WAA (18a) and WAB ); (18b) where P̃ ∈ RGM ×GM is orthogonal. Notice that P in (14) is given by #P̃ and that (18) does not compute the RQD of the whole matrix in (17). The leading submatrix WAA , which is not used in the solution of the GLLSP, is not triangularized. Furthermore, the RQD (18a) is equivalent to the QL decomposition P̃ T T W̃BA T W̃BB = 0 T WBB : This indicates that (18a) can be computed using adaptations of the diagonally-based and column-based strategies (see Appendix A) that are used for the computation of P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 W = QT (C ⊗ IM )Π Zero Block 9 T W (0) = Q (C ⊗ IM )Q Diagonal block Non-zero block Fig. 1. Structure of matrices W̃ and W̃ (0) , where G = 5. Step 1 Step 2 A Step 3 A A Step 4 A A A A A A A Zero Block Non-zero block Filled-in block A Annihilated block Fig. 2. Structure of matrices W̃ and W̃ (0) , where G = 5. (13a) (Kontoghiorghes, 1999, 2000b). Furthermore, these strategies produce WAA in (18b) that are block upper-triangular. The 4rst step of the diagonally-based strategy annihilates the main block-diagonal of W̃BA . However, the permutation in (17) along with this step is equivalent to applying Q to the right of QT (C ⊗ IM ); that is QT (C ⊗ IM )Q = W̃ (0) ≡ K GM −K (0) W̃AA (0) W̃AB (0) W̃BA (0) W̃BB K ; (19) GM −K (0) where W̃ (0) AA and W̃ BB are block upper-triangular with the ith block of their main diagonals given by Ci; i Iki and Ci; i IM −ki , respectively. Furthermore the matrices W̃ (0) AB and (0) W̃ BA are strictly block upper-triangular. Fig. 1 shows the structure of the matrices W̃ and W̃ (0) , where G = 5. Thus the remaining steps of the diagonally-based method annihilate the strictly (0) block upper-triangular matrix W̃ (0) BA by preserving the block-triangular structure of W̃ AA and W̃ (0) BB . This annihilation strategy is illustrated in Fig. 2, where an arc denotes an 10 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 updating RQD (URQD). Algorithm 2 summarizes the steps of this estimation procedure. Algorithm 2. Solution of the GLLSP (5) using the GQRD. 1: 2: 3: 4: 5: 6: 7: 8: 9: Compute = CC T for i = 1; : : : ; G do Compute the QRD (9) Compute ỹ i = Q̃Ti yi and ŷ i = Q̂Ti yi end for Compute QT (C ⊗ IT )Q as in (19) (0) (0) Compute the URQD (W̃ (0) = (0 WBB ) BA W̃ BB )P̃ (0) (0) (0) Compute (W̃ AA W̃ AB )P̃ = (WAA WAB ) Solve the triangular system (16) for and vec({v̂i }) 2.3. An interleaving approach to solving the GLLSP The RQD (14) is the most expensive operation in computing the GQRD of ⊕i Xi and C ⊗ IM (see Appendix B). An iterative procedure that does not compute (14) can be employed (Paige, 1979b). At each iteration a smaller problem is solved. Let U (0) =C ⊗IM , yU (0) =vec(Y ), and vU(0) =vec(V ). The sth (s=0; : : : ; G −1) XU (0) =⊕G i=1 Xi , W iteration deals with the GLLSP argminvU(s) vU(s) ; subject to yU (s) = XU (s) + WU (s) vU(s) ; (20) computing the factorizations QU Ts XU (s) = K XU (s+1) 0 (21a) %s+1 M −kG−s and (QU Ts WU (s) QU s )PU s = %s+1 M −kG−s WU (s+1) (s) W̃AB 0 (s) W̃BB %s+1 M −kG−s ; (21b) P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 where QU s ; PU s ∈ R%s ×%s are orthogonal, %s = (G − s)M + &G−s+1 , &i = XU (s+1) = k1 ··· X1 ··· .. . .. 0 0 ··· 0 0 ··· .. . .. . ··· XG−s−1 0 ··· 0 ··· 0 RG−s ··· 0 .. . .. . .. .. . 0 0 ··· . .. . 0 ··· (G−s−1)M &G−s (s+1) WU AA (s+1) WU AB 0 (s+1) WU BB WU (s+1) = kG−s kG−s−1 (G−s−1)M G j=i kj , kG 0 .. . . 11 RG M .. . ; M (22) kG−s .. . kG ; (23) &G−s (s+1) (s+1) RG−s+i and WU BB are upper-triangular, and WU AA = C1:G−s−1; 1:G−s−1 ⊗ IM . Furthermore, 0 0 0 I(G−s−1)M (24) QU s = 0 Q̃G−s 0 Q̂G−s 0 0 I&G−s+1 0 and T (s) QU s WU QU s = (G−s−1)M kG−s &G−s+1 M −kG−s (s+1) WU AA (s) Ŵ AB (s) Ŵ AC (s) Ŵ AD 0 CG−s; G−s IkG−s (s) Ŵ BC 0 0 0 (s) Ŵ CC 0 0 0 (s) Ŵ DC CG−s; G−s IM −kG−s (G−s−1)M ; kG−s &G−s+1 M −kG−s (s) (s) (s) (s) U (0) where Ŵ AC = (I(G−s−1)M 0)WU AB , Ŵ CC = WU BB and WU (0) BB , and consequently W CC , has zero dimension. Note that (21a) computes the QRD of Xs , while the RQD (21b) is equivalent to the URQD (s) (Ŵ DC CG−s; G−s IM −kG−s )PR s = (0 (s) W̃BB ) (25a) 12 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 and (s) Ŵ AC (s) Ŵ AD (s) Ŵ BC (s) W̃ AC (s) W̃AD (s) W̃BD ; (s) R 0 P s = W̃ BC (s) Ŵ CC (s) W̃ CC 0 (25b) (s) W̃CD where PU s = I(%s+1 −&G−s+1 ) 0 0 PR s (s+1) WU BB = (s+1) (s) WU AB = ( Ŵ AB ; CG−s; G−s IkG−s (s) W̃BC 0 (s) W̃CC (s) ); W̃AC (s) W̃AD (s) (s) W̃AB = W̃BD : and (s) W̃CD Let QU Ts yU (s) = yU (s) A yU (s) B %s+1 and M −kG−s PU Ts QU Ts vU(s) = vU(s+1) vUB(s) %s+1 M −kG−s : Premultiplying the constraints in (20) by QU Ts and using (21), it follows that the GLLSP is equivalent to argmin vUA(s) 2 + vUB(s) 2 subject to vUA(s) ;vUB(s) ; yU (s) A yU (s) B = XU (s+1) 0 + WU (s+1) (s) W̃AB 0 (s) W̃BB vU(s+1) vUB(s) ; or, again, the smaller GLLSP argmin vU(s+1) vU(s+1) ; subject to yU (s+1) = XU (s+1) + WU (s+1) vU(s+1) ; (26) (s) −1 (s) (s) (s) where vUB(s) = (W̃BB ) yU B and yU (s+1) = yU A(s) − W̃AB vUB . The solution to (26) can be obtained iteratively by employing the method used for the GLLSP (20). At the end of iteration (G − 1) the GLLSP becomes argmin vU(G) vU(G) ; subject to yU (G) = (⊕i Ri ) + WU (G) vU(G) ; (G) which has solution vU(G) = 0 and = (⊕i R−1 . i )yU P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 Step 1 A Step 2 Step 3 A Zero Block Step 4 A Non-zero block 13 A Filled-in block A Annihilated block Fig. 3. Computation of (25) at step s = 4. Fig. 4. Computation of (21b), where G = 5 and s = 2. Now consider the computation of (25a). Let kG−s+1 (s) (Ŵ DC CG−s; G−s IM −kG−s )= A1 ··· ··· kG As M −kG−s A(0) s+1 . (27) The submatrices A1 ; : : : ; As are annihilated one at a time by computing the URQDs (Ai (i−1) As+1 )Pi = (0 (i) As+1 ); i = 1; : : : ; s; (28) (i) (s) (s) where As+1 is upper triangular and Pi is orthogonal. Thus, in (25a) W̃BB = As+1 . This (s) produces W̃CC with a block upper triangular structure. Fig. 3 shows the steps for annihi(s) (s) (s) and the 4ll-ins induced in W̃CC and W̃CD in (25), where lating Ŵ DC s = 4. Algorithm 3 summarizes the steps of the interleaving procedure for solving the GLLSP (5). Fig. 4 illustrates the computations of the second step (s = 2), where G = 5. 14 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 Algorithm 3. Solution of the GLLSP (5) using the interleaving approach. 1: Compute = CC T U (0) = C ⊗ IM and yU (0) = vec(Y ) 2: Let XU (0) = ⊕G i=1 Xi , W 3: for s = 0; 1; : : : ; G − 1 do 4: Compute the QRD (9), where i = G − s and let QU s be given by (24) (s) yU A 5: Compute = QU Ts yU (s) yU (s) B 6: Compute QU Ts WU (s) QU s 7: Compute the URQD (25a) and (25b) (s) (s) (s) 8: Solve the triangular system W̃BB vUB = yU (s) B for vUB (s) (s) 9: Compute yU (s+1) = yU (s) − W̃AB vUB 10: end for 11: Solve the triangular system (⊕i Ri ) = yU (G) for 3. A recursive algorithm for the estimation of the SUR model The BLUE of the SUR model can be computed recursively (Bolstad, 1987; Kontoghiorghes, 2003). Consider the partitioning (1) (1) (1) Xi Y U M1 M1 M1 .. . .. . .. . Xi = . .. ; Y = . .. ; U = . .. Xi(p) and Y (p) Mp (1) Mp U (p) Mp (29) V M1 V = ... ... V (p) Mp for i = 1; : : : ; G. The SUR model (1) and the GLLSP (5) can be respectively expressed equivalently as ⊕i Xi(1) vec(U (1) ) vec(Y (1) ) vec(U (2) ) vec(Y (2) ) ⊕i X (2) i + = .. .. .. . . . vec(Y (p) ) ⊕i Xi(p) vec(U (p) ) P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 15 and argmin p ;V (1) ;:::;V (p) j=1 vec(Y (1) ) V (j) 2F ⊕i Xi(1) subject to vec(Y (2) ) ⊕i X (2) i + = .. . .. . vec(Y (p) ) C ⊗ IM1 0 0 ⊕i Xi(p) ··· 0 C ⊗ IM2 · · · 0 vec(V (2) ) : .. . .. . .. . .. .. . 0 0 · · · C ⊗ IM p . vec(V (1) ) vec(V (p) ) (30) Assume that M1 ¿ max(k1 ; : : : ; kG ), and let the GQRD of given by T Q(1) ⊕i Xi(1) = K ⊕i R(1) i 0 ⊕i Xi(1) and C ⊗ IM1 be (31a) K GM1 −K and T Q(1) (C ⊗ IM1 )P(1) = W (1) ≡ K GM1 −K (1) WAA (1) WAB 0 (1) WBB K ; (31b) GM1 −K where R(1) and W(1) are upper-triangular. Furthermore, let i (1) (1) ỹ ṽ K T (1) T (1) Q(1) vec(Y ) = and P(1) vec(V ) = (1) ŷ v̂(1) GM1 −K K GM1 −K : (32) Using (31) and (32) it follows that the GLLSP (30) can be written as p argmin ṽ (1) 2 + v̂ (1) 2 + V ( j) 2F subject to ;v˜ (1) ;vˆ (1) ; V (2) ;:::;V (p) j=2 (1) WAA 0 vec(Y (2) ) ⊕i X (2) 0 C ⊗ I M2 i . . .. .. = .. + .. . . (p) vec(Y ) ⊕ X (p) 0 0 i i ỹ (1) ŷ (1) ⊕i R(1) i 0 0 0 ··· 0 ··· 0 .. .. . . · · · C ⊗ IM p ··· 0 (1) WAB ṽ (1) (2) 0 vec(V ) .. .. : . . vec(V (p) ) 0 (1) WBB v̂(1) (33) 16 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 This is equivalent to argmin ṽ(1) 2 + ;v˜ (1) V (2) ;:::;V (p) V (j) 2F subject to j=2 y(1) p ⊕i R(1) i (1) WAA vec(Y (2) ) ⊕i X (2) i + = . .. .. . vec(Y (p) ) 0 0 ⊕i Xi(p) ··· 0 C ⊗ IM2 · · · 0 vec(V (2) ) ; .. . .. . .. . .. .. . 0 0 · · · C ⊗ IM p . ṽ (1) vec(V (p) ) (34) (1) −1 (1) (WBB ) ŷ (1) (1) WAB v̂ . where v̂(1) = and y(1) = ỹ (1) − The solution to the GLLSP (34) can be obtained in (p − 1) iterations. The sth (s = 2; : : : ; p) iteration solves the GLLSP p argmin ṽ(s−1) 2 + V (j) 2F subject to ;v˜ (s−1) ; V (s) ;:::;V (p) j=s y(s−1) ⊕i Ri(s−1) vec(Y (s) ) ⊕i X (s) i = .. . .. . vec(Y (p) ) (s−1) WAA 0 ··· 0 0 C ⊗ IMs ··· 0 .. . .. . .. .. . 0 0 ··· + ⊕i Xi(p) . ṽ (s−1) vec(V (s) ) ; .. . C ⊗ IM p vec(V (p) ) (35) (s−1) WAA Ri(s−1) where y(s−1) ; ṽ(s−1) ∈ RK , and ∈ RK×K and ∈ Rki ×ki are upper triangular. For the solution of (35) consider the updating GQRD (UGQRD) T Q(s) ⊕i Ri(s−1) ⊕i Xi(s) = K ⊕i Ri(s) 0 (36a) K GMs and T Q(s) (s−1) WAA 0 0 C ⊗ I Ms P(s) = W (s) ≡ K GMs (s) WAA (s) WAB 0 (s) WBB K GMs ; (36b) where Ri(s) and W (s) are upper triangular and, Q(s) and P(s) are orthogonal. Let (s−1) ỹ (s) y K T Q(s) (37a) = (s) (s) ˆ vec(Y ) vec(Y ) GMs P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 and T P(s) ṽ (s−1) vec(V (s) ) = ṽ(s) vec(V̂ (s) ) K GMs 17 : (37b) Strategies for computing the UGQRD (36) have been discussed in the context of updating the SUR model (Kontoghiorghes, 2003). Using (36) and (37), the GLLSP (35) becomes the smaller GLLSP argmin ṽ(s) 2 + ;v˜ (s) ; V (s+1) ;:::;V (p) y(s) p V (j) 2F ⊕i Ri(s) (s) WAA vec(Y (s+1) ) ⊕i X (s+1) i + = .. . .. . vec(Y (p) ) subject to j=s+1 ⊕i Xi(p) 0 0 ··· 0 C ⊗ IMs+1 · · · 0 .. . .. . .. 0 0 · · · C ⊗ IM p . .. . ṽ(s) vec(V (s+1) ) ; .. . vec(V (p) ) (38) where (s) WBB vec(V̂ (s) ) = vec(Yˆ (s) ) (39) (s) y(s) = ỹ (s) − WAB vec(V̂ (s) ): (40) and At the last iteration, when s = p, the GLLSP reduces to (p) (p) argmin ṽ(p) 2 subject to y(p) = ⊕i R(p) + WAA ṽ ; i ;v˜(p) (p) which has solution ṽ(p) = 0 and ⊕i R(p) . Algorithm 4 summarizes the steps of i =y this recursive estimation procedure for computing the BLUE of the SUR model. Note that at the sth iteration, the matrix retriangularized in (36b) is of order (K + GMs ). This results in less computational cost than does the RQD of the GM × GM matrix in (14). Algorithm 4 also requires less memory to store the smaller dimensioned matrices involved in the factorizations. Algorithm 4. Solution of the GLLSP (5) using the recursive algorithm. 1: 2: 3: 4: Compute = CC T Compute the GQRD (3) and Y (1) from (23) (1) (1) Solve the triangular system WBB v̂ = ŷ (1) (1) (1) (1) (1) Compute y = ỹ − WAB v̂ 18 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 5: for s = 2; : : : ; p do 6: Compute the UGQRD (36) and (37a) 7: Solve the triangular system (39) for vec(V̂ (s) ) 8: Compute (40) 9: end for 10: Solve the triangular system (⊕i Ri ) = y(p) for 4. Size reduction of large scale SUR models When M ¿ k, the SUR model can be transformed to one of smaller dimension (Foschi et al., 2002; Kontoghiorghes, 2000a, b). Solving the transformed model results G in a computationally e1cient algorithm. Let X ∗ = (X1 · · · XG ) ∈ RM ×K , K = i=1 ki , and M ¿ K. Consider the QRD ∗T ∗ QR R ∗ X = ; (41) 0 QN∗T where R∗ = (R∗1 · · · R∗G ) ∈ RM ×M , R∗i ∈ RK×ki , and the matrix (QR∗ QN∗ ) ∈ RM ×M is orthogonal. Now, premultiplying the SUR model (1) by (IG ⊗ QR∗ IG ⊗ QN∗ )T results in the transformed SUR (TSUR) model vec(YR∗ ) vec(UR∗ ) ⊕i R∗i + = ; (42) vec(YN∗ ) vec(UN∗ ) 0 where YR∗ = QR∗T Y , YN∗ = QN∗T Y , UR∗ = QR∗T U , and UN∗ = QN∗T U . Furthermore, vec(UR∗ ) 0 ⊗ IK ; ∼ 0; vec(UN∗ ) 0 ⊗ IM −K (43) and thus the SUR model (1) is equivalent to the smaller TSUR model vec(YR∗ ) = (⊕i R∗i ) + vec(UR∗ ): Note that R∗1; i .. . R∗i = ∗ R i; i 0 k1 . .. ; ki &i+1 (i = 1; : : : ; G); (44) P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 19 G where &i = j=i kj . However, the direct implementation of Algorithms 1–4 to solve this reduced model does not exploit the special structure of the (transformed) exogenous matrices R∗1 ; : : : ; R∗G . Let (41) be replaced with the QRD of X̃ = (XG · · · X1 ) and partition k1 Ỹ 1 Ũ 1 R̃1; i k1 k1 . .. Ỹ 2 k2 Ũ 2 k2 .. . ∗ ∗ QR∗T Xi ≡ R̃i = ; Y = and U = R R .. . .. . : R̃G−i+1; i kG−i+1 . .. . .. & 0 G−i+2 kG kG Ỹ G Ũ G (45) Also let Ṽ i C T = Ũ i , Ỹ i , Ṽ i , and C (i = 1; : : : ; G) be partitioned, respectively, as G−i+1 Ỹ i = Ỹ iA i−1 Ỹ iB , G−i+1 Ṽi = Ṽ iA i−1 Ṽ iB , and C = G−i+1 (i) CAA 0 i−1 (i) CAB G−i+1 : (i) CBB i−1 Then, the GLLSP formulation of the TSUR model (44) can be written as argmin ;V˜ j G Ṽ j 2F subject to j=1 vec(Ỹ i ) = (⊕j R̃j; i ) + vec(Ṽ i C T ); i = 1; : : : ; G; or, equivalently, as argmin G ;V˜ jA ;V˜ jB j=1 (Ṽ jA 2F + Ṽ jB 2F ) subject to (46a) (i) T (i) T ) + Ṽ iB (CAB ) ; Ỹ iA = (R̃i; 1 1 · · · R̃i; G−i+1 G−i+1 ) + Ṽ iA (CAA i = 1; : : : ; G; (46b) (i) T Ỹ iB = Ṽ iB (CBB ) ; i = 1; : : : ; G: (46c) (i) −T ) , and thus, the GLLSP (46) can be written From (46c), it follows that Ṽ iB = Ỹ iB (CBB as argmin ;V˜ jA G Ṽ jA 2F subject to j=1 YR iA = (R̃i; 1 1 ··· R̃i; G−i+1 G−i+1 ) + Ṽ iA (CA(i) )T ; i = 1; : : : ; G; (47) 20 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 (i) (R¯ T1 … R̄TG ) T ⊕i (CAA ⊕ Iki ) (i) Fig. 5. The structure of the transformed exogenous matrix (RRT1 · · · RRTG )T and Cholesky factor ⊕i (CAA ⊗ Iki ) of the GLLSP (48), where G = 4. (i) T where YR iA = Ỹ iA − Ṽ iB (CAB ) . This is equivalent to argmin ;V˜ iA vec(YR 1A ) G Ṽ iA 2F i=1 RR 1 subject to (1) CAA ⊗ Ik 1 vec(YR 2A ) RR 2 = + .. .. . . R R vec(Y GA ) RG 0 0 ··· 0 (2) CAA ⊗ Ik 2 · · · 0 .. . .. . .. 0 0 (G) · · · CAA ⊗ Ik G . .. . vec(Ṽ 1A ) vec(Ṽ 2A ) ; .. . vec(Ṽ GA ) (48) where RR i = and Ki = i KG−i+1 ⊕G−i+1 R̃i; j j=1 j=1 kj . &G−i+2 0 (G−i+1)KG−i+1 (49) Notice that the 4rst block of the constraints (1) vec(YR 1A ) = RR 1 + (CAA ⊗ Ik1 )vec(Ṽ 1A ) is analogous to the constraint of the GLLSP (5). The GLLSP (48) corresponds to a GLLSP formulation of a SUR model with unequal numbers of observations (Foschi and Kontoghiorghes, 2002). Fig. 5 shows the structure (i) of (RR T1 · · · RR TG )T and ⊕i (CAA ⊗Iki ), where G =4. The recursive algorithm in Foschi and Kontoghiorghes (2002) that solves the unequal-numbers-of-observations problem is similar to Algorithm 4 and can therefore be employed to compute the solution of (48). P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 21 Table 1 Execution times of solving the SUR model, where k1 = · · · = kG = 5 M G OLM algorithms GLLSP algorithms LAPACK Alg. 1 LAPACK Alg. 2 Ratio Alg. 3 Alg. 4 GLLSP/OLM 51 51 51 51 51 5 10 15 20 30 0.0007 0.0070 0.0389 0.1274 0.3471 0.0012 0.0070 0.0180 0.0382 0.1111 0.0968 0.4430 1.4195 3.4158 11.4602 0.0159 0.0994 0.2993 0.7061 2.4468 0.0116 0.0776 0.2412 0.6279 2.2541 0.0133 0.0547 0.1404 0.2801 0.7788 16.57 7.81 7.80 7.33 7.01 100 100 100 100 100 5 10 15 20 30 0.0014 0.0213 0.1135 0.3039 0.7402 0.0022 0.0119 0.0326 0.0687 0.2122 0.5553 2.8841 10.4153 25.5424 81.9765 0.0507 0.3215 1.0539 2.6647 9.4783 0.0409 0.2906 0.9871 2.4102 8.9455 0.0270 0.1091 0.2853 0.5697 1.5567 19.28 9.17 8.75 8.29 7.34 400 400 400 400 400 5 10 15 20 30 0.0162 0.1675 0.5750 1.3203 n/a 0.0093 0.0498 0.1598 0.4207 1.5331 25.6388 183.3260 586.3929 n/a n/a 1.2633 8.2393 28.0438 n/a n/a 0.7686 6.0076 22.2265 n/a n/a 0.1092 0.4480 1.1694 2.4639 6.6609 11.74 9.00 7.32 5.86 4.34 5. Computational comparison The algorithms are implemented in double precision on a PC with a single 1:7 GHz Intel Pentium IV processor and 512 Mb of RAM. The matrix factorizations have been computed using LAPACK subroutines (Anderson et al., 1992). The diagonally-based method (see Appendix A) has been used to compute the factorizations (13), (18) and (36b) (Foschi et al., 2002). Furthermore, in the case of the recursive Algorithm 4, the block sizes used are M1 = max(k1 ; : : : ; kG ) and M2 = · · · = Mp = 10. This is found experimentally to be the best choice for the speci4c architecture. Tables 1 and 2 show the execution times (in seconds) of the algorithms. Three classes of models (M = 51, M = 100 and M = 400) are reported, where each regression equation is assumed to have the same number of variables and no common regressors. The elements of the exogenous matrices, the response vectors, and the Cholesky factor of the covariance matrix are generated randomly from a uniform distribution. Notice that the computational complexity of the factorization procedures does not depend on the speci4c values of the exogenous and covariance matrices. Thus, the performance of the algorithms is the same for matrices that have been generated using diQerent statistical assumptions. Table 1 shows the performance of the algorithms when the number of equations changes, (G = 5; 10; 15; 20), while the number of variables in each equation remains 4xed at 5. Table 2 shows the execution times when G = 10 is constant and the number of variables in each regression is k (k = 5; 10; 15; 20; 40); that is, k1 = · · · = kG = k. 22 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 Table 2 Execution times of solving the SUR model, where G = 10 and ki = 5; 10; 15; 20; 30 for i = 1; : : : ; G M ki OLM algorithms GLLSP algorithms LAPACK Alg. 1 LAPACK Alg. 2 Ratio Alg. 3 Alg. 4 GLLSP/OLM 51 51 51 51 51 5 10 15 20 30 0.0060 0.0450 0.0853 0.0942 0.1450 0.0069 0.0187 0.0334 0.0507 0.0898 0.4446 0.5103 0.5746 0.5736 0.7835 0.0989 0.1383 0.1688 0.1887 0.1990 0.0756 0.1013 0.1239 0.1580 0.1803 0.0556 0.0868 0.1327 0.1773 0.2610 9.26 4.64 3.71 3.50 2.91 100 100 100 100 100 5 10 15 20 30 0.0227 0.1208 0.1951 0.1979 0.3181 0.0121 0.0327 0.0726 0.0989 0.2138 2.9766 3.1151 3.4684 3.6348 4.5518 0.3384 0.4663 0.6214 0.7296 1.0656 0.2963 0.4511 0.5865 0.7224 1.0375 0.1122 0.1897 0.3085 0.4416 0.8022 9.27 5.80 4.25 4.46 3.75 400 400 400 400 400 5 10 15 20 30 0.1812 0.6433 1.0295 1.1602 1.8623 0.0499 0.1731 0.3869 0.7879 1.8808 185.3506 191.6706 198.3687 204.5498 200.3455 8.3259 13.1758 18.3041 23.9272 34.8307 6.0854 9.2547 13.0657 17.4742 25.7010 0.4527 0.7791 1.2969 2.0352 3.8604 9.07 4.50 3.35 2.71 2.07 The execution times for solving the OLM (6) using the LAPACK routine DGELS and Algorithm 1 are shown in columns 3 and 4, respectively. Columns 5 –8 give the results for Algorithms 2– 4. Speci4cally, the 5th column refers to the LAPACK routine DGGGLM, which solves the GLLSP (5) without exploiting the sparse structure of the matrices. Columns 6 –8 show the execution times for Algorithms 2– 4, respectively. The best times for solving the OLM (6) and the GLLSP (5) are underlined and are used to calculate the performance ratio in the last column. Computational results for the LAPACK routine DGGGLM and Algorithms 2–3 are not available (n/a) for the largest problems because the algorithms run out of memory. Analogous results for the estimation of the TSUR model (44) are given in Tables 3 and 4, where the execution times include the initial step of transforming the SUR model (1) to the TSUR model (44). The cost of this step is negligible compared to the overall execution time. Table 5 shows the theoretical complexities in terms of Voating point operations (Vops) of Algorithms 1– 4 and that of solving the TSUR model (44), where k = k1 = · · · = kG . A detailed derivation of these complexities can be found in Appendix B. The second column reports the approximate complexity of each algorithm for large values of M , G, and k. The third column shows the same complexities for large scale models, i.e., M k. Finally, the last column gives the number of Vops required by each algorithm to solve the reduced-sized model (44). Computations that have small marginal cost compared to the overall complexities have not been taken into account. Furthermore, the transformation (41) that derives the TSUR model has complexity 2G 2 k 2 (M − Gk=3) and has not been included. P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 23 Table 3 Execution times of solving the TSUR model (44), where k1 = · · · = kG = 5 M G OLM algorithms GLLSP algorithms LAPACK Alg. 1 LAPACK Alg. 2 Alg. 3 Alg. 4 GLLSP/OLM 100 100 100 100 5 10 15 20 0.0007 0.0068 0.0665 0.2908 0.0012 0.0079 0.0277 0.0729 0.0055 0.3980 4.1706 25.1177 0.0049 0.0834 0.5426 2.3531 0.0034 0.0608 0.5045 2.1953 0.0050 0.0463 0.1887 0.5194 4.86 6.81 6.81 7.21 400 400 400 400 400 5 10 15 20 30 0.0015 0.0088 0.0746 0.3059 1.1623 0.0019 0.0110 0.0396 0.1010 0.4447 0.0059 0.3973 4.3138 26.2893 79.9213 0.0056 0.0866 0.5676 2.5442 23.6311 0.0040 0.0637 0.5066 2.3426 21.8987 0.0059 0.0496 0.1987 0.5566 2.3073 3.93 5.64 5.02 5.51 5.19 Table 4 Execution times of solving the TSUR model (44), where G = 10 and ki = 5; 10; 15; 20; 40 for i = 1; : : : ; G M ki OLM algorithms GLLSP algorithms LAPACK Alg. 1 LAPACK Alg. 2 Ratio Alg. 3 Alg. 4 GLLSP/OLM 100 100 5 10 0.0052 0.1088 0.0081 0.0366 0.4015 3.0925 0.0881 0.4421 0.0601 0.3805 0.0477 0.1672 9.17 4.57 400 400 400 400 400 5 10 15 20 30 0.0093 0.1236 0.3700 0.5482 1.3749 0.0111 0.0556 0.1443 0.3031 1.2734 0.4249 3.1201 10.9318 27.2676 87.4721 0.0873 0.4482 1.2860 3.5127 13.9188 0.0643 0.5142 1.2046 2.9855 11.3183 0.0491 0.1872 0.4675 0.9567 2.7661 5.28 3.37 3.24 3.16 2.17 From the theoretical and computational results a number of conclusions can be drawn: • Theoretical and experimental results con4rm that the OLM algorithms outperform the GLLSP algorithms. In theory the ratio between Algorithm 1 and Algorithms 2–3 is linear with k=M , while that of Algorithms 1 and 4 is constant. In practice, this performance diQerence decreases as the number of regressors or equations increases. • The direct use of the standard LAPACK routine DGGGLM to solve the GLLSP is not feasible for large-scale models. • The discrepancies between the theoretical and experimental results are due to the implementation overheads and memory usage. • The complexities of the OLM algorithm and that of the recursive Algorithm 4 are a linear function of the sample size. It follows that, in practice, the performances of 24 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 Table 5 Complexity of Algorithms 1– 4, where k = k1 = · · · = kG Algorithm Complexity Compl. Approx. for M k Compl. for solving the TSUR model OLM Algorithms LAPACK Alg. 1 2G 3 k 2 (M − k=3) G 2 k 2 (M + 2G(M − k + 1)=3) 2G 3 k 2 M 2G 3 k 2 M=3 2G 4 k 3 2G 4 k 3 =3 4G 3 M 3 =3 2G 3 kM 2 =3 4G 6 k 3 =3 2G 5 k 3 =3 G 3 kM 2 =3 4G 3 k 2 M=3 G 5 k 3 =3 4G 4 k 3 =3 GLLSP Algorithms LAPACK G 3 (4M 3 =3 + 4M 2 k − 2k 3 =3) Alg. 2 G 2 kM (M + 2G(M − k + 1)=3) Alg. 3 G 2 kM (M + G(M − k + 2)=3) +G 2 k 2 (M + G(M − k + 1)=3) Alg. 4 G 2 k 3 + 4G 3 k 2 (M − k + 1=2)=3 these algorithms do not deteriorate when the number of observations increases and thus they can solve large-scale problems. • The algorithms for solving the TSUR model (44) outperform the corresponding algorithms for solving the initial SUR model (1). For the largest problems, the cost of transforming the SUR model to one of smaller dimensions and solving it is negligible compared to the cost of solving the original one. 6. Summary Algorithms for solving the seemingly unrelated regressions (SUR) model have been considered. The algorithms use as a basic component the QR decomposition. Initially the SUR model is transformed to an ordinary linear model (OLM). This transformation results in a regressor matrix having a block triangular structure. The best linear unbiased estimator (BLUE) of the SUR model results from the least-squares (LS) solution of the OLM. A computationally e1cient strategy (Algorithm 1) produces the LS estimator by exploiting the sparse structure of the matrices. This strategy outperforms the LAPACK DGELS subroutine, which treats the matrices as full, when the problem is not very small. The remaining three algorithms compute the BLUE by formulating the SUR model as a generalized linear least squares problem (GLLSP). The solution of the GLLSP is obtained using the generalized QR decomposition (GQRD). The 4rst method (Algorithm 2) computes the GQRD of the block diagonal and Kronecker structures of matrix of the exogenous variables and the Cholesky factor of the dispersion matrix, respectively. This method is computationally more e1cient than the corresponding LAPACK routine (DGGGLM) that solves the general linear model. The second method (Algorithm 3) solves the GLLSP iteratively. Each iteration solves a smaller sized GLLSP. The main advantage of this method is that it avoids the formation of the computationally expensive RQ decomposition (14). This allows Algorithm 3 to outperform Algorithm 2. P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 25 Finally, a recursive estimation strategy (Algorithm 4) is proposed. This is found to be the most e1cient when the model is not very small. Furthermore, this strategy requires less memory and can thus solve larger problems. Algorithms 1, 3 and 4 are new designs, while Algorithm 2 has been originally proposed in Kontoghiorghes and Clarke (1995). The algorithms are reassessed after an initial orthogonal transformation is made to reduce the SUR model to one of smaller size. The matrix of exogenous variables of the transformed (TSUR) model (44) has dimensions GK × K compared with GM × K of the original model (1). This transformation is signi4cant for large-scale models, where the number of observations in each equation is much larger than the total number of regressors, i.e., M K. The solution for the SUR, and consequently TSUR, model when the regressions have common exogenous factors is currently under ind vestigation. In this case, Xi = XSi , where X ∈ RM ×K is the matrix of the K d distinct d regressors, and Si ∈ RK ×ki is the selection matrix comprised of the relevant columns of the K d × K d identity matrix. The computation of the QRD of X̃ = X (SG · · · S1 ) produces R̃i (i = 1; : : : ; G) matrices in (45) that have a sparse structure able to be exploited by the various algorithms (Foschi and Kontoghiorghes, 2003b; Kontoghiorghes, 2000b; Kontoghiorghes and Dinenis, 1996). Often SUR models exhibit special properties and characteristics (Foschi and Kontoghiorghes, 2003a, c; Kontoghiorghes, 2000b; Orbe et al., 2003). For the e1cient solution of these models the proposed algorithms need to be modi4ed. The structures of the matrices and their properties should be exploited. Iterative algorithms for computing the estimators of models with sparse exogenous matrices merits investigation. Although Algorithm 1 is computationally most e1cient, it is numerically less stable than Algorithms 2– 4 (SNoderkvist, 1996). The Algorithm 1 may provide a poor solution when C is ill-conditioned and fails when is singular, i.e., when C is not of full-rank (Kontoghiorghes, 2000a; Kourouklis and Paige, 1981; Paige, 1978, 1979a). In such cases, the GLLSP approach should be used (Foschi et al., 2002; Kontoghiorghes, 2000b; Kontoghiorghes and Clarke, 1995). The numerical stability of the algorithms needs to be investigated. The algorithms for solving the TSUR model (44) can be adapted to solve simultaneous equations models (SEMs) (Belsley, 1992; Chavas, 1982; Dhrymes, 1994; Kontoghiorghes and Dinenis, 1997; Zellner and Theil, 1962). Similarly to the SUR model (1), the SEM can be expressed as vec(Y ) = (⊕i Wi )vec({i }) + vec(U ); (50) where Wi =(Xi Yi ), i ∈ Rki +gi , and Yi ∈ RM ×gi consists of gi endogenous variables from Y , excluding yi . The endogeneity in the SEM can be eliminated by a transformation identical to that employed to derive the TSUR model (Kontoghiorghes and Dinenis, 1997). The transformed SEM can be written as vec(YR∗ ) = (⊕i Wi∗ )vec({i }) + vec(UR∗ ); (51) where Wi∗ = (Ri Yi∗ ), Yi∗ = QR∗T Yi , and YR∗ , R∗i and UR∗ are de4ned in (44). E1cient algorithms for solving the SEM are currently under investigation. 26 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 Acknowledgements The authors are grateful to the four referees for their constructive comments and suggestions. Appendix A. The column- and diagonally-based methods Consider the updating QR decomposition (UQRD) A R T = ; Q B 0 (G) (G) where A; R ∈ Rk ×k are upper-triangular, B ∈ Rq Q is orthogonal of order k (G) + q(G) . Now, let A ≡ A(0) = k1 k2 ··· kG A(0) 1; 1 A(0) 1; 2 ··· A(0) 1; G 0 A(0) 2; 2 ··· A(0) 2; G .. . .. . .. .. . 0 0 ··· A(0) G; G . (G) ×k (G) (A.1) is block upper-triangular and k1 k 2; . . . kG and (0) B ≡ B = k1 k2 ··· kG B1;(0)1 B1;(0)2 ··· B1;(0)G 0 B2;(0)2 ··· B2;(0)G .. . .. . .. .. . 0 0 ··· . (0) BG; G q1 q2 ; . . . qG G G where Ai; i is upper-triangular, k (G) = i=1 ki and q(G) = i=1 qi . Two block strategies can be used to compute the UQRD (A.1). The 4rst, a diagonally-based strategy, annihilates the block-superdiagonals of B one at a time. The second, a column-based strategy, annihilates the non-zero blocks of B column-by-column (Foschi et al., 2002; Kontoghiorghes, 1999). The diagonally-based strategy computes the UQRDs (i−1) (i) Ai+j; A ki+j i+j i+j; i+j = Qi;Tj (A.2) (i−1) 0 qj Bj; i+j P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 27 and Qi;Tj (i−1) Ai+j; i+j+1:G Bj;(i−1) i+j+1:G = (i) Ai+j; i+j+1:G Bj;(i)i+j+1:G ki+j qj ; (A.3) where i = 1; : : : ; G − 1, j = 1; : : : ; G − i and (i) Ai+j; i+j+1:G Bj;(i)i+j+1:G = ki+j+1 ki+j+2 (i) Ai+j; i+j+1 (i) Ai+j; i+j+2 Bj;(i)i+j+1 Bj;(i)i+j+2 Thus, R in (A.1) is given by (0) A1; 1 A(0) ··· 1; 2 A(1) ··· 0 2; 2 R= .. .. .. . . . 0 0 ··· A(0) 1; G ··· kG ··· (i) Ai+j; G ··· Bj;(i)G ki+j : qj A(1) 2; G : .. . A(G−1) G; G Notice that the UQRDs (A.2) and (A.3) can be computed simultaneously for j = 1; : : : ; G − i. The column-based strategy computes the QRDs Ai;(i−2) Ai;(i−1) ki i i T Q̃i = (i−2) q(i−1) 0 B1:i−1; i and Q̃Ti Ai;(i−2) i+1:G (i−2) B1:i−1; i+1:G = Ai;(i−1) i+1:G (i−1) B1:i−1; i+1:G ki q(i−1) ; i−1 where i = 2; : : : ; G and q(i−1) = j=1 qj . Fig. 6 shows the annihilation patterns of the two strategies in the case where G = 4. Now, the computation of (18) and part of (36b) are equivalent to Q T C D = p(G) C̃ D̃ k (G) q(G) ; (A.4) 28 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 Fig. 6. Computation of the UQRD (13a) using the diagonally- and column-based strategies, G = 4. where (0) C ≡ C = D ≡ D(0) = p1 p2 ··· pG C1;(0)1 C1;(0)2 ··· C1;(0)G 0 C2;(0)2 ··· C2;(0)G .. . .. . .. .. . 0 0 ··· (0) CG; G p1 p2 ··· pG D1;(0)1 D1;(0)2 ··· D1;(0)G 0 D2;(0)2 ··· D2;(0)G .. . .. . .. .. . 0 0 ··· . . (0) DG; G k1 k 2; . . . kG q1 q2 .. . qG P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 29 G and p(G) = i=1 pi . If the diagonally-based strategy is used, then (A.4) is equivalent to computing Qi;Tj (i−1) Ci+j; j (i−1) Ci+j; j+1 ··· (i−1) Ci+j; G Dj;(i−1) j Dj;(i−1) j+1 ··· Dj;(i−1) G = pj pj+1 ··· pG (i) Ci+j; j (i) Ci+j; j+1 ··· (i) Ci+j; G Dj;(i)j Dj;(i)j+1 ··· Dj;(i)G ki+j ; qj (A.5) where i = 1; : : : ; G − 1 and j = 1; : : : ; G − i. Notice that, the upper triangular structure of D(0) is preserved throughout the computation. Now, if a column-based algorithm is used, then (A.4) is computed as Q̃Ti Ci;(i−2) 1 Ci;(i−2) 2 (i−2) D1:i−1; 1 (i−2) D1:i−1; 2 ··· Ci;(i−2) 2 ··· (i−2) D1:i−1; G = p1 p2 Ci;(i−1) 1 Ci;(i−1) 2 (i−1) D1:i−1; 1 (i−1) D1:i−1; 2 ··· pG ··· Ci;(i−1) 2 ··· (i−1) D1:i−1; G ki : q(i−1) Using this strategy, the block upper-triangular structure of D(0) is destroyed. This can (i−2) be avoided if at the ith step the blocks of B1:i−1; j are annihilated one at a time and from bottom to top. Appendix B. Complexity analysis The theoretical complexities of the algorithms in terms of number of Vops (Voating point operations) are derived in line with (Golub and Van Loan, 1996). Initially the computational costs of the main factorizations are calculated. These are then used to determine the complexity of Algorithms 1– 4. For simplicity the complexities are approximated for large values of G, M and k, where it is assumed that k =k1 =· · ·=kG . B.1. Main factorizations The number of Vops required to compute the Cholesky factorization of an n × n symmetric and positive de4nite matrix is given by n3 =3 (Golub and Van Loan, 1996). The complexities of computing the QRD of an m × n matrix using Householder transformations and that of applying the same orthogonal transformation to an m-element 1 2 vector are given by TQR (m; n) = 2n2 (m − n=3) and TQR (m; n) = 2n(2m − n + 1), respectively (Golub and Van Loan, 1996, pp. 224 –225). Analogously, the complexity of computing the UQRD A R n T = ; (B.1) Q B 0 m−n 30 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 1 is TUQR (m; n) = 2n2 (m − n + 1), where Q ∈ Rm×m is orthogonal, R and A are upper triangular of order n and B ∈ R(m−n)×n . The Vops required to apply QT to a vector are 2 (m; n)=4n(m−n+1). Notice that the RQD and URQD have the same complexities TUQR as those of the QRD and UQRD, respectively. Now, consider the computation of the UQRD (A.1) and (A.4) using the diagonallybased strategy. To simplify the analysis, let assume that ki = k, pi = p and qi = q, for i = 1; : : : ; G. Thus, the Vops required to compute the UQRD (A.2), (A.3) and (A.5) are given, respectively, by 1 TUQR (k; p) = 2k 2 (p + 1); 2 ((G − i − j)k + 1)TUQR (k; p) = 4(1 + (G − i − j)k)k(p + 1) and 2 (G − i − j + 1)qTUQR (k; p) = 4(G − i − j + 1)k(p + 1)q: The complexities of the diagonally-based methods are given by Tdiag (G; k; p; q) = G−1 G−i 2 ((G − i − j)k + 1 + (G − i − j + 1)q)TUQR (k; p) i=1 j=1 1 (k; p) + TUQR = G(G + 1)k(p + 1)(2(k + q)(G + 1) − 3k + 6)=3 2G(G + 1)2 k(p + 1)(k + q)=3: Now, if the 4rst block diagonal of B(0) is already zero, then the latter becomes ∗ Tdiag (G; k; p; q) = G−1 G−i 2 ((G − i − j)k + 1 + (G − i − j + 1)q)TUQR (k; p) i=2 j=1 1 (k; p) + TUQR = (G − 1)(G − 2)k(p + 1)(2G(k + q) − 3k + 6)=3 2G(G − 1)(G − 2)k(p + 1)(k + q)=3: B.2. Algorithm 1 The complexity of Algorithm 1 is dominated by that of steps 7 and 8. Speci4cally, the complexity of step 7 is given by G−1 i=1 2 2(G − i)kTQR (M; k) = G(G − 1)k 2 (2M − k + 1) P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 31 Table 6 Complexity of each step of Algorithm 1 Step Complexity 1 2 4 5 7 8 9 G 3 =3 MG 2 2k 2 (M − k=3) 2k(2M − k + 1) G(G − 1)k 2 (2M − k + 1) 2(G − 1)(G − 2)(G − 5=2)k 2 (M − k + 1)=3 G2 k 2 and that of step 8, i.e., computing the UQRD (13a) and (13b), by Tdiag (G − 1; k; M − k; 0) = (G − 1)(G − 2)k(M − k + 1)((2G − 5)k + 6)=3 5 k 2 (M − k + 1)=3: 2(G − 1)(G − 2) G − 2 Thus, the number of Vops required by Algorithm 1 is approximately T1 (G; M; k) (G − 1)k 2 (GM + 2(M − k + 1)(G 2 − 3G + 5)=3) G 2 k 2 (M + 2G(M − k + 1)=3): Table 6 reports the complexity of each step of Algorithm 1. B.3. Algorithm 2 The complexity of Algorithm 2 is approximately that of steps 6 –8. The Vops required by step 6 are 2 G(G − 1)MTQR (M; k)=2 = G(G − 1)kM (2M − k + 1): Notice that steps 7 and 8 can be computed using an adaptation of the diagonally-based (0) algorithm. Furthermore, since the diagonal blocks of W̃BA are zero, the Vops of these steps are ∗ Tdiag (G; k; M − k; M − k) = G(G − 1)k(M − k + 1)(2GM − 3k + 6)=3 2G 2 (G − 1)k(M − k + 1)M=3: Therefore, the complexity of Algorithm 1 is T2 (G; M; k) G(G − 1)kM (M + 2(G + 3=2)(M − k + 1)=3) G 2 kM (M + 2G(M − k + 1)=3): 32 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 B.4. Algorithm 3 The complexity of the interleaving Algorithm 3 is determined by that of steps 6, 7 and 9. Step 6 applies QiT from the left of an M × ks matrix and Qi from the right of an M (G − s − 1) × M matrix. The complexity of this step is thus 2 (M; k)(ks + M (G − s − 1)) = 2k(2M − k + 1)(M (G − 1) − s(M − k)): TQR For all the iterations s = 0; : : : ; G − 1 the complexity is evaluated to G−1 2k(2M − k + 1)(M (G − 1) − s(M − k)) = G(G − 1)k(M + k)(2M − k + 1) s=0 G 2 k(M + k)(2M − k + 1): Now, the complexity of step 7 is given by that of the URQDs (28) and of (25b), i.e., s 1 2 (TUQR (M; k) + (k(s − i + 1) + M (G − s + 1))TUQR (M; k)) i=1 = 2k(M − k + 1)(ks(s + 4) + 2Ms(G − s + 1)): Therefore, for all the iterations s = 0; : : : ; G − 1 the complexity becomes G−1 2k(M − k + 1)(ks(s + 4) + 2Ms(G − s + 1)) s=0 = G(G + 1)k(M − k + 1)(G(k + M ) + 13k + 4M )=3 G 3 k(M + k)(M − k + 1)=3: Step 9 consists of multiplying an M (G − s − 1) × sk matrix with a vector using 2M (G − s − 1)sk Vops. For all the iterations s = 0; : : : ; G − 1 the number of Vops required is G−1 2Mks(G − s − 1) = G(G − 1)(G − 2)kM=3 G 3 kM=3: s=0 Thus, the total complexity of steps 6, 7 and 9, and thus, of Algorithm 3, is given by T3 (G; M; k) G 2 kM (M + G(M − k + 2)=3) + G 2 k 2 (M + G(M − k + 1)=3): B.5. Algorithm 4 For the complexity analysis of Algorithm 4 it is assumed that M1 = k and that Ms = 0 = (M − k)=(p − 1), s = 2; : : : ; p. Under these assumptions the complexity of step 2 is approximately T2 (G; k; k) = G 2 k 3 + 2=3G 3 k 2 . Now, the complexities of computing P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 33 the UQRD (36a), the transformation (s−1) 0 WAA T Q(s) 0 C ⊗ I Ms and the RQD (36b) are, respectively, 1 GTUQR (k + 0; k) = 2Gk 2 (0 + 1); 2 G(G − 1)(k + 0)TUQR (k + 0; k)=2 = 2G(G − 1)k(0 + 1)(k + 0) 2G 2 k(0 + 1)(k + 0) and Tdiag (G; k; 0; 0) = G(G + 1)k(0 + 1)(2(k + 0)(G + 1)=3 − k + 2) 2G 3 k(0 + 1)(k + 0)=3: It follows that the complexity of step 6 is dominated by that of computing the RQD (36b), i.e. Tdiag (G; k; 0; 0). Finally the complexities of steps 7 and 8 are, respectively, 03 =3 and 2G 2 k0, which are marginal respect to that of step 6. Thus, the complexity of Algorithm 4 is given by T4 (G; M; k; 0) G 2 k 3 + 2G 3 k 2 =3 + 2G 3 k(0 + 1)(k + 0)(M − k)=(30): For 0 = 1, this reduces to T4 (G; M; k; 1) G 2 k 3 + 2G 3 k 2 (2M − 2k + 1)=3: Notice that, if M k, then T4 (G; M; k) = 4G 3 k 2 (k + 1)M=3, which is a linear function of the sample size. References Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorenson, D., 1992. LAPACK Users’ Guide. SIAM, Philadelphia. Belsley, D.A., 1992. Paring 3SLS calculations down to manageable proportions. Computer Science in Economics and Management 5, 157–169. X 1996. Numerical Methods for Least Squares Problems. SIAM, Philadelphia. BjNorck, A., Bolstad, W.M., 1987. An estimation of seemingly unrelated regression-model with contemporaneous covariances based on an e1cient recursive algorithm. Comm. Statist. Simulation Comput. 16 (3), 689– 698. Chavas, J.-P., 1982. Recursive estimation of simultaneous equation models. J. Econometrics 18, 207–217. Dhrymes, P.J., 1994. Topics in Advanced Econometrics. In: Linear and Nonlinear Simultaneous Equations, Vol. 2. Springer, New York. Foschi, P., Kontoghiorghes, E.J., 2002. Estimation of seemingly unrelated regression models with unequal size of observations: computational aspects. Comput. Statist. Data Analysis 41, 211–229. 34 P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 Foschi, P., Kontoghiorghes, E.J., 2003a. Estimating SUR models with orthogonal regressors: computational aspects. Linear Algebra and Application, in press. Foschi, P., Kontoghiorghes, E.J., 2003b. Estimation of VAR models: computational aspects. Comput. Economics 21, 3–22. Foschi, P., Kontoghiorghes, E.J., 2003c. Estimating seemingly unrelated regression models with vector autoregressive disturbances. J. Economic Dynamics Control, in press. Foschi, P., Garin, L., Kontoghiorghes, E.J., 2002. Numerical and computational methods for solving SUR models. In: Kontoghiorghes, E.J., Rustem, B., Siokos, S. (Eds.), Computational Methods in Decision-Making, Economics and Finance, Applied Optimization. Kluwer Academic Publishers, Dordrecht, pp. 405 – 427. Golub, G.H., Van Loan, C.F., 1996. Matrix Computations, 3rd Edition. Johns Hopkins University Press, Baltimore, MD. Kontoghiorghes, E.J., 1999. Parallel strategies for computing the orthogonal factorizations used in the estimation of econometric models. Algorithmica 25, 58–74. Kontoghiorghes, E.J., 2000. Inconsistencies and redundancies in SURE models: computational aspects. Comput. Economics 16 (1+2), 63–70. Kontoghiorghes, E.J., 2000a. Parallel Algorithms for Linear Models: Numerical Methods and Estimation Problems. In: Advances in Computational Economics, Vol. 15. Kluwer Academic Publishers, Boston, MA. Kontoghiorghes, E.J., 2000b. Parallel strategies for solving SURE models with variance inequalities and positivity of correlations constraints. Comput. Economics 15 (1+2), 89–106. Kontoghiorghes, E.J., 2003. Computational methods for modifying seemingly unrelated regressions models. J. Comput. Appl. Math., forthcoming. Kontoghiorghes, E.J., Clarke, M.R.B., 1995. An alternative approach for the numerical solution of seemingly unrelated regression equations models. Comput. Statist. Data Anal. 19 (4), 369–377. Kontoghiorghes, E.J., Dinenis, E., 1996. Solving triangular seemingly unrelated regression equations models on massively parallel systems. In: Gilli, M. (Ed.), Computational Economic Systems: Models, Methods & Econometrics, Advances in Computational Economics, Vol. 5. Kluwer Academic Publishers, Dordrecht, pp. 191–201. Kontoghiorghes, E.J., Dinenis, E., 1997. Computing 3SLS solutions of simultaneous equation models with a possible singular variance-covariance matrix. Comput. Economics 10, 231–250. Kourouklis, S., Paige, C.C., 1981. A constrained least squares approach to the general Gauss–Markov linear model. J. Amer. Statist. Assoc. 76 (375), 620–625. Lawson, C.L., Hanson, R.J., 1974. Solving Least Squares Problems. Prentice-Hall, Englewood CliQs, NJ. Orbe, S., Ferreira, E., Rodriguez-Poo, J., 2003. An algorithm to estimate time varying parameter sur models under diQerent type of restrictions. Comput. Statist. Data Anal. 42, 363–383. Paige, C.C., 1978. Numerically stable computations for general univariate linear models. Comm. Statist. Simulation Comput. 7 (5), 437–453. Paige, C.C., 1979a. Computer solution and perturbation analysis of generalized linear least squares problems. Math. Comput. 33 (145), 171–183. Paige, C.C., 1979b. Fast numerically stable computations for generalized linear least squares problems. SIAM J. Numer. Anal. 16 (1), 165–171. Paige, C.C., 1990. Some aspects of generalized QR factorizations. In: Cox, M.G., Hammarling, S.J. (Eds.), Reliable Numerical Computation. Clarendon Press, Oxford, UK, pp. 71–91. Pollock, D.S.G., 1979. The Algebra of Econometrics (Wiley Series in Probability and Mathematical Statistics). Wiley, New York. Regalia, P.A., Mitra, S.K., 1989. Kronecker products, unitary matrices and signal processing applications. SIAM Rev. 31 (4), 586–613. SNoderkvist, I., 1996. On algorithms for generalized least-squares problems with ill-conditioned covariance matrices. Comput. Statist. 11 (3), 303–313. Srivastava, V.K., Dwivedi, T.D., 1979. Estimation of seemingly unrelated regression equations models: a brief survey. J. Econometrics 10, 15–32. Srivastava, V.K., Giles, D.E.A., 1987. Seemingly Unrelated Regression Equations Models: Estimation and Inference (Statistics: Textbooks and Monographs), Vol. 80. Marcel Dekker, New York. P. Foschi et al. / Computational Statistics & Data Analysis 44 (2003) 3 – 35 35 Telser, L.G., 1964. Iterative estimation of a set of linear regression equations. J. Amer. Statist. Assoc. 59, 845–862. Zellner, A., 1962. An e1cient method of estimating seemingly unrelated regression equations and tests for aggregation bias. J. Amer. Statist. Assoc. 57, 348–368. Zellner, A., 1963. Estimators for seemingly unrelated regression equations: some exact 4nite sample results. J. Amer. Statist. Assoc. 58, 977–992. Zellner, A., Theil, H., 1962. Three-stage least squares: simultaneous estimation of simultaneous equations. Econometrica 30 (1), 54–78.
© Copyright 2026 Paperzz