Completing spatial dependent data: The Chow-Lin method for NUTS data W. Polasek IHS Wien and UAM 1 Contents • • • • • Chow and Lin (1971) approach Model dependent missing data: LHS missing (left hand side) RHS missing (right hand side) Overall solution: – LHS: Prediction problem. – RHS: interpolation, parameter estimation 2 The Problem • European regions: cross section of NUTS1 data; completely observed: (nx1) vector. • We are interested in a (Nx1) NUTS2 vector: length N > n. 3 Chow-Lin method for time series • Problem: Quarterly data y needed but only annual data z are observed. Other quarterly indicators X:(T x K) are available: • Disaggregate model • y = X + ε, ε ~ N(0, σ2 Ω ). • Define aggregation matrix C = In⊗(1,1,1,1) • Estimate β in the aggregated model using • ya = Cy and Xa = CX 4 The Chow-Lin procedure • • • • 1. Set up he disaggregate model, 2. Aggregate with C matrix 3. Estimate b 4. Forecasting in the disaggregate model using b and the known indicators X1,...XK 5 Completing quarterly data • • • • • • • b = [X’a (C Ω C ) -1 Xa] -1 X’a(C Ω C’)-1 ya The quaterly predictiosn are BLUE: y^ = Xb + ΩC’ (C Ω C’) -1 (ya - Xab). The covariance matrix of the predictions Var(z)= PΩ + PX [X’C’(C Ω C )-1 CX]-1 X‘P‘ With P = (I - Ω C’(C Ω C ) -1 XC ) Note that multiplying with C gives aggregation-consistent predictions. 6 Disaggregated model • That is, it is possible to establish a linear model for the disaggregated data vector y = X + ε, ε ~ N(0, σ2IN ). • X is a N x p completely observed regression matrix (known indicators) • y is incomplete N x1 vector. 7 Reduced Form (RF) • The disaggregated SAR model: ε ~ N(0, σ2IN ). • W is a neighbourhood matrix (see Anselin 1988). • y = ρWy + X + ε, • This leads to a reduced form of the spatial model Ry = X + ε, with R = (I - ρW) • The reduced form is • y = R-1X + u , u ~ N(0, σ2 (R’R)-1 ) 8 Data generating process • • • • The stochastic model is y ~ N( X(ρ) , σ2 Ω(ρ) ) with X(ρ) = R-1X and Ω(ρ) = (R’R)-1. 9 Aggregation matrix C • aggregate NUTS 2 to NUTS 1 regions • C: (n x N) matrix of 0’s and 1’s. • Block diagonal of 1ki -vectors. Since the NUTS aggregation is not equal • C = diag (1’k1,1’k2…,1’kn) • ki: number of NUTS 2 cells that add up to a NUTS 1 cell in row i (i =1,…,n). 10 Auxiliary regression • Multiply disaggregate by the aggregation matrix C, to obtain the aggregate relationship Cy = CR-1X + Cu, u ~ N(0. 2I) • where Cu are new disturbances with covariance matrix Var(Cu) = 2 CΩ(ρ) C. 11 Conditional GLS • ya = Cy is just the aggregated completely observed NUTS 1 vector and • XC(ρ) = CR -1X is, conditional on ρ, a completely known (n x p) regressor matrix. • GLS estimate of the auxiliary model: • bρ = [X’C(ρ) (C Ω(ρ) C ) -1 XC(ρ)]-1 X’C(ρ)(C Ω(ρ) C’)-1 ya 12 ML estimate of ρ • Use the 2-step ML method of Anselin (1988). Define the aggregated OLS estimators • bo = (Xa‘Xa) -1 Xa‘ya and • bL = (Xa‘Xa) -1 Xa‘Wya . • Since bML = bo – ρbL we estimate ρ by the ML variance: 2 = ML (eo- ρ eL)’ (eo - ρ eL)/n • with eo = yo - Xa bo and eL = yL – Xa bL. 13 ML estimates for SAR βˆ = (x x) x (I − ρW)y σˆ = y (I − ρW) [I − x(x x) x ](I − ρW)y / n T 2 T −1 T T T −1 T • The ML approach can be extended for systems 14 Prediction of incompletes • Following Chow and Lin (1971) and Goldberger (1962), it is known that the best linear unbiased predictor using the indicators X*(ρ) = R-1X of y is : • y^ = X*(ρ) bρ + Ω(ρ)C’ (C Ω(ρ) C’) -1 (ya – X*(ρ)bρ). • We see that the predictor of y can be decomposed into two components. 15 Interpretation • The first is the conditional expectation of y given x in the spatial model. The other is an improvement using the aggregated residuals and the GLS projection matrix. • Note that for ρ = 0 the spatial Chow-Lin method degenerates to the simple Chow-Lin method with spherical errors. • Multiplying by C gives back the aggregationconsistent values of the aggregated model. 16 More spatial dependence • Consider an extended regressor matrix [X : WX] where WX is interpreted as “potentials” of the variables in X. • For sectoral data that has to be disaggregated by region, we can use the same approach. • Suppose there are s = 1,…,S sectors, then • ys the GDP in sector s in the disaggregated units • Cys is the GDP aggregated across all regions. • “structural” zeros : we know that a certain sector is not producing in a certain region. 17 0-restrictions • We construct a 0-structured regressor Xs0, and then we do the prediction simply using the special sectoral indicators Xs0 : ys^ = C Xs0 (ρ) bs,ρ + Ω(ρ)C’ (C Ω(ρ) C ’) -1 (Cys - C Xs0 (ρ)bs,ρ), s = 1,…,S. • and inserting Xs0 (ρ) = R-1 Xs0 into the prediction 18 Example: Nuts 2->3 Spanish Data 1 Andalucía 10 C. Valenciana 2 Aragón 11 Extremadura 3 Asturias (P.de) 12 Galicia 4 Balears (Illes) 13 Madrid (C.de) 5 Canarias 14 Murcia (R.de) 6 Cantabria 15 Navarra (C.F.de) 7 Castilla y León 16 País Vasco 8 Castilla-La Mancha 17 Rioja (La) 9 Cataluña 18 Ceuta y Melilla 19 Bayesian SAR estimation (Heteroscedastic model) ! " % # &' () + % + # % ! ! ! " ! " ! .% $ " /0 " /0 ! ! 1 2 $ $* , $ % % $$ ,** $ $$ , * /0 $ 1 ! % $ 20 Variable Coefficient • cons 0.212 • log_km -0.132 • log_pop 0.385 • log_stock 0.116 • log_exp -0.274 • log_imp 0.305 • log_access -0.171 • log_trucks -0.277 • log_banks 0.815 • rho -0.339 Std Devia. 0.1746 0.0847 0.3712 0.2994 0.1290 0.1233 0.1764 0.2275 0.2858 0.2174 p-level 0.095 0.062 0.140 0.326 0.022 0.011 0.154 0.100 0.005 0.062 21 SAR Diagnostics S AR heteros cedas tic Gibbs 6 Actual vs . P redicted 0.2 Res iduals 0.1 4 0 2 0 Actual P redicted 0 5 10 -0.1 15 20 -0.2 Mean of Vi draws 4 0 5 10 15 20 P os terior Dens ity for rho 2 1.5 3 1 2 0.5 0 1 0 5 10 15 20 -1 0 1 22 Figure 1: Comparison GDP NUTS 2 (blue) vs. GDP ChowLin (aggregated in green) 600 CL 500 400 300 200 100 0 0 2 4 6 8 10 12 14 16 18 23 Figure 2: Comparison Chow-Lin forecast (blue) vs. disaggregate real GDP data (green) 150 100 50 0 0 10 20 30 40 50 60 24 Figure 3: Ration between Chow-Lin and real GDP data (CL divided by real GDP) 2 1.5 1 0.5 0 10 20 30 40 50 60 25 Figure 4: Scatter Chow-Lin vs. Real Data 150 100 50 0 0 50 100 150 26 Table 1: NUTS-3 Forecast comparison: Chow-Lin GDP vs. Real GDP No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Name Almería Cadiz Córdoba Granada Huelva Jaén Málaga Sevilla Huesca Teruel Zaragoza Asturias Illes Balears Las Palmas Santa Cruz De Tenerife Cantabria Avila Burgos León Palencia Salamanca Segovia Soria Valladolid Zamora Albacete Ciudad Real CL 7.8013 16.4397 9.5304 12.71 10.0743 8.2321 23.6132 27.2495 4.209 3.2966 18.4121 17.9748 20.9948 16.9725 17.2117 10.4754 2.4686 8.1407 6.1681 1.6984 6.6952 2.9738 1.9487 12.1549 3.3202 4.8839 4.491 real data 10.6707 17.6041 10.3084 11.6673 7.625 8.5422 21.6097 27.6232 4.2463 2.7713 18.9001 17.9748 20.9948 18.4606 15.7236 10.4754 2.5753 7.8184 8.0467 3.2938 5.7291 3.0022 1.8029 10.2906 3.0096 5.4844 7.9319 Ratio No. 0.73 28 0.93 29 0.92 30 1.09 31 1.32 32 0.96 33 1.09 34 0.99 35 0.99 36 1.19 37 0.97 38 1.00 39 1.00 40 0.92 41 1.09 42 1.00 43 0.96 44 1.04 45 0.77 46 0.52 47 1.17 48 0.99 49 1.08 50 1.18 51 1.10 52 0.89 Total 0.57 Name Cuenca Guadalajara Toledo Barcelona Gerona Lérida Tarragona Alicante Castellón de la Plana Valencia Badajoz Cáceres La Coruña Lugo Orense Pontevedra Madrid Murcia Navarra Álava Guipúzcoa Vizcaya La Rioja Ceuta (ES) Melilla (ES) CL real data 3.0616 3.1154 6.1575 3.2948 9.9525 8.7199 126.9866 117.2997 9.8092 14.9833 6.6029 9.2576 14.6223 16.4805 26.9904 28.2353 8.6169 10.6662 46.2183 42.9242 8.51 8.3984 5.4319 5.5435 19.3657 18.3762 5.5453 5.1973 4.7117 4.756 13.2958 14.5892 148.688 148.688 21.2489 21.2489 14.2451 14.2451 6.1811 7.9923 13.9635 17.0123 31.5154 26.6552 6.2183 6.2183 0.6823 1.26 1.7268 1.1492 280.1409 284.1824 Ratio 0.98 1.87 1.14 1.08 0.65 0.71 0.89 0.96 0.81 1.08 1.01 0.98 1.05 1.07 0.99 0.91 1.00 1.00 1.00 0.77 0.82 1.18 1.00 0.54 1.50 0.99 27 Part 2 Flow Models 28 Completing cross-sectional flow data • Now we look at the simplest case of flow data that can be modelled at an aggregate level (NUTS 1) and we want to estimate the flows for a disaggregate level (NUTS 2). • Consider the 2 x 2 aggregated flow matrix from 2 regions into 2 regions (e.g. NUTS 1 level): 29 Dis-agg. and aggregated flows Agg.: Y= a c b d Dis-Agg.: a11 a12 c11 c12 a 21 Y= b11 b21 a 22 b12 b22 c 21 d11 d 21 A C c 22 = . d12 B D d 22 30 2 x 2 flow example • Assume that each region consists of 2 sub-regions, • so that we like to know the flows between 4 disaggregated origin regions and • 4 disaggregated destination regions: 31 Aggregated spatial model • The aggregated model can be written via the vectorisation of matrix A, i.e. • y = vecY = (a,b,c,d)’ in the same way as before: • y = ρ(W1,W2)y + X + e, 32 Spatial lag polynomial for flows ρ(W1,W2)y is a spatial lag polynomial that is applicable for flow models (see LeSage and Pace 2005) ⊗ ⊗ ⊗ ρ (W1,W2)y = ρ1(W1 In2)y ⊗ ⊗ • + ρ2(I n1 W2)y + ρ3(W1 W2)y . 33 4 sub-models • Because all the 4 sub-models are again flow models we can write 4 auxiliary disaggregated regressions: • ya = vec A = ρa (W1a,W2a)ya + Xa a + εa, • yb = vec B = ρb (W1b,W2b)yb + Xb b + εb, • yc = vec C = ρc (W1c,W2c)yc + Xc c + εc, • yd = vec D = ρd (W1d,W2d)yd + Xd d + εd 34 The aggregated system • • • • • 4 equations: Ca ya = Ca Ra -1 Xa Cb yb = Cb Rb -1 Xb Cc yc = Cc Rc -1 Xc Cd yd = Cd Rd -1 Xd -1 ε , + C R a a a a -1 ε , + C R b b b b -1 ε , + C R c c c c -1 ε + C R d d d d 35 Pooled estimation • Simplifying assumption: r = ra = rb = rc = rd and • b = ba = bb = bc = bd with diagonal matrices: C = diag (Ca, Cb, Cc, Cd) , • DX = diag( Xa, Xb, Xc , Xd) and • Dε, = diag(εa, εb, εc, εd). 36 System estimation • of the auxiliary equation • Cy = C R-1DX + C R-1Dε, • with R = I - ρ(W1,W2) and can be estimated by GLS with the transformed regressors • X(ρ ρ) = CR-1DX and Ω(ρ) = (R’R)-1 • with R = diag(Ra, …, Rd) • and Rj = I - ρj(W1,W2), 37 GLS estimate • Each component is a block diagonal: • diag (W1a ⊗I2, …, W1d ⊗I2). • to obtain GLS estimate bρ : bρ = [X’C(ρ) (C Ω(ρ ρ) C’) -1 XC(ρ)]-1 X’C(ρ)(C Ω(ρ ρ) C’) -1 y 38 Minimizer of ρ • ML estimate by minimizing the 2-step Anselin 1988 procedure for each block separately: – 2 ML,i = (eo,I - ρ eL,I)’ (eo,I - ρ eL,I) / n , for i = a,b,c,d. – With the ordinary and the W-lag residuals eo and eL 39 Cross-Country Spillovers • extending neighbourhood matrices across country borders. • Define these “cross-country” neighbourhood matrices: • either contiguity matrix, indicating neighbours by 0’s and 1’s • or W is distance based. 40 Cross-country = “off blockdiagonal” • Such matrices have an “off block-diagonal” structure: • For n= 2 0 W12 WB = W21 0 41 A “perforated” matrix • In general this matrix is “perforated” on the block diagonal and looks like WB = 0 W12 W21 0 ... ... Wn1 ... ... W1n W21 W2 n ... . ... Wn −1,n Wn ,n −1 0 42 Extend with WB matrix • Add another spatial lag • ⊗ y = ρ(W1,W2)y + WB ρ4+ X + ε, ⊗ • …the spatial effect form the crosscountries spillovers. • Now the spread matrix R looks like ⊗ (W I ⊗ ⊗W ) • R* = Inn - ρ1 1 n2) - ρ2 (I n1 • - ρ3(W1⊗ W2) – ρ4WCC. 2 43 Extension • Clearly the flow system can be generalized to include the cross-country effects, (“extended polynomial” ρ(W1,W2) with 4 rho components): Cj yj = Cj R*-1Xj j + Cj R*-1 εj, j = 1,…,4, 44 Lindley Smith (1972) estimates • A non-informative 3-stage hierarchical model. • Cj yj = Cj R-1Xj j + Cj R-1 εj, j = 1,…,4, • j ~ N[µ,Σ] • and for the hyper-parameters (µ, Σ) we assume a non-informative prior (Σ-1 = 0). 45 1-stage posterior mean • multivariate regressions set-up as in Lindley-Smith we arrive at the following estimates of the posterior mean –1 • ** = H j j * [X’C(ρ) (C Ω(ρ ρ) C’)-1 XC(ρ) + Σ* -1 j*] • with • Hj = X’C(ρ) (C Ω(ρ ρ) C ) -1 XC(ρ) + Σ* -1 46 Pooled estimates • and the overall estimate is • ** = (Σj = 1,…,4 Hj)-1 Σ j=1,…,4 Hj jGLS Σ* can be approximated by Σ* = ¼ Σ j=1,…,4 ( jGLS - jave) • where jave is the average of the individual GLS estimates jGLS. 47 Completing flows using partial information • Now we consider a flow matrix, which we like to disaggregate, but the information available is asymmetrical. • Suppose that there are 2 countries, home and foreign, where the aggregated flows in 4 cells A, B, C and D, are known. 48 Dis- and aggregated flow matrix • Notation a11 a 21 Y= b11 a12 a 22 b12 c11 c 21 d11 b21 b22 d 21 c12 A C c 22 . = d12 B D d 22 49 The disaggregated model • for the flows in the n2 x m1 matrix Y12 is vectorized to yield vec Y12 = yb and is modelled as yb = ρ(W1,W2) yb + X b + eb, 50 The aggregated model • • • • • C yb = C ρ(W1,W2) yb + C X b + Cεb Cεb ~ N(0, σ2 (R’R) -1 ) with C = 1'n 2 ⊗ I m1 = diag (1'n 2 ,...,1'n 2 ) and where 1’n2 is a (1 x n2) vector of ones. A single equation with regressors XC(ρ) = CR-1X and Ω(ρ) = (R’R) -1 with spread matrix R = In2m1 - ρ(W1,W2) and ρ(W1,W2) is given before. 51 Minimizer of ρ • ML estimate by minimizing the 2-step Anselin 1988 procedure for each block separately: – 2ML,i = (eo,I - ρ eL,I)’ (eo,I - ρ eL,I) / n , for i = a,b,c,d. – With the ordinary and the W-lag residuals eo and eL 52 Flow predictions • Completing by flow predictions • yb^ = XC(ρ)bρ + W(ρ)C’ (C W(ρ) C )-1 (Cyb - XC(ρ)bρ). • where bρ is the ML or GLS estimate. 53 Completing the flow matrix YC • we can use the same set-up as before, only we have to use the transposed flow matrix Y’C, since the marginal distribution reflects the sum over the rows. • The disaggregated model for the flows in the n2 x m1 matrix vec Y’21 = yc is • yc = ρ(W1,W2) yc + X c + εc, 54 Aggregated model estimation • C yc = C ρ(W1,W2) yc + C X c+ C εc, • with • where 1’m2 is a (1 x m2). The flow predictions are • yc^ = XC(ρ)bρ + Ω(ρ)C’ (C Ω(ρ) C’) -1 (Cyc - XC(ρ)bρ). • where bρ is the numerically optimized GLS estimate 55 Conclusions • New method for completing correlated cross-sectional (spatial) data. • The method can be generalized for flow data • The data completeion method depends on the model: Reseach for combined forecast (model averaging?) 56 Minimizer of ρ Now the error sum of square has to be minimized with respect to 3 rho’s, i.e. in a cube (-1,1)3 ESS (ρ ρ) = (Cy - X(ρ ρ) bGLS(ρ ρ))’ (Cy - X(ρ ρ) bGLS(r)). • = (y - Xρ bρ)’C’C (y - Xρ bρ). 57 • Since ρ is an unknown scalar it is easy to minimize the error sum of squares ESS with respect to ρ, in the interval (-1,1): ESS (ρ) = (ya - C Xρ βρ)’ (ya - C Xρ βρ) • = (y - Xρ βρ)’C’C (y - Xρ βρ) • with Xρ = R-1X and • the minimizer is denoted by bρ. 58 The conditional GLS estimate • The conditional GLS estimate is given by (3.5) and the following ESS has to be minimized – ESS (ρ ρ) = (Cy - X(ρ ρ) bGLS(ρ ρ))’ (Cy - X(ρ ρ) bGLS(ρ ρ)). • = n2 (yb - Xρ bρ)’C’C (yb - Xρ bρ) • with X(ρ ρ) = R -1 X and since C’C = n2 Im1 59
© Copyright 2026 Paperzz