Re-development of the Cell Suppression Methodology at the US Census Bureau Philip Steel, James Fagan, Paul Massell, Richard Moore Jr., John Slanta, Bei Wang Background • • • • • Jewett’s network flow program Need for new program 2012 economic census LP (linear programming) methodology R&M cell suppression team Processing Model • Preprocessing – Create table description – Determine primaries – Unduplicate • • • • Sequential processing of primaries Queue reduction Test company protection (aggregate/supercell) Sequential processing of supercells Table relations • Marginals are the sum of interior cells • Geographic relationships tend to generate our most complex sets of table relations – State is the sum of metropolitan areas within the state and the balance. – State is also the sum of counties • Of the form A=B+..+Z where A,B,…,Z are (one of) rows columns or levels that define some Cartesian integer space (i,j,k) • Duplicates are recorded as A=B (eg a county is also a place) minimize: Y c x rows i 1 cols levs i , j ,k j 1 k 1 ( i , j , k ) A u i , j ,k xi,l j,k subject to: x levs (a) k 2 ( i , j , k )A (u ) i , j ,k xi(,l j),k xi(,uj),1 xi(,l j),1 for i =1, ... , rows, j = 1, ... ,cols : levs > 1, ws(i,j,1) = 0 x lim r ( ii ) (b) i 1 ( i , j , k )A (u ) rowrel ( ii,i ), j , k (l ) (u ) (l ) xrowrel ( ii,i ), j , k x rowrel ( ii , 0 ), j , k x rowrel ( ii, 0 ), j , k for ii = 1, ... , rr, j = 1,..,cols, k = 1, ... , levs : limr(ii) ≥ 1, ws(ii,j,k) = 0 x lim c ( jj ) (c) j 1 ( i , j , k )A (u ) i ,colrel( jj , j ), k ) (u ) (l ) xi(,lcolrel ( jj , j ), k xi ,colrel( jj , 0 ), k xi ,colrel( jj , 0 ), k for i = 1, ... , rows, jj = 1, ... , cc, k = 1, ... , levs : limc(cc) ≥ 1, ws(i,jj,k) = 0 (d) (e) where: 0 xi(,uj), k hi , j , k ; 0 xi(,l j), k hi , j , k for i = 1, ... , rows, j = 1, ... , col, k = 1, ... , levs : (i,j,k) ⋲A u) (l ) x(prow , pcol , plev prot ; x prow, pcol, plev 0 max 0, vi , j , k ci , j , k 0 hi,j,k = max(0,v i,j,k) when (i, j, k ) U when(i, j, k )P C Objective Function c x rows cols Y i 1 levs j 1 k 1 ( i , j , k )A i , j ,k u i , j ,k xi , j ,k l Additivity constraint generator (based on row relations) (b) for ii = 1, ... , rr, j = 1,..,cols, k = 1, ... , levs : limr(ii) ≥ 1, ws(ii,j,k) = 0 Bounds hi,j,k = max(0,vi,j,k) for i = 1, ... , rows, j = 1, ... , col, k = 1, ... , levs : (i,j,k) ⋲A For the primary minimize: Y c x rows i 1 cols levs i , j ,k j 1 k 1 ( i , j , k ) A u i , j ,k xi,l j,k subject to: x levs (a) k 2 ( i , j , k )A (u ) i , j ,k xi(,l j),k xi(,uj),1 xi(,l j),1 for i =1, ... , rows, j = 1, ... ,cols : levs > 1, ws(i,j,1) = 0 x lim r ( ii ) (b) i 1 ( i , j , k )A (u ) rowrel ( ii,i ), j , k (l ) (u ) (l ) xrowrel ( ii,i ), j , k x rowrel ( ii , 0 ), j , k x rowrel ( ii, 0 ), j , k for ii = 1, ... , rr, j = 1,..,cols, k = 1, ... , levs : limr(ii) ≥ 1, ws(ii,j,k) = 0 x lim c ( jj ) (c) j 1 ( i , j , k )A (u ) i ,colrel( jj , j ), k ) (u ) (l ) xi(,lcolrel ( jj , j ), k xi ,colrel( jj , 0 ), k xi ,colrel( jj , 0 ), k for i = 1, ... , rows, jj = 1, ... , cc, k = 1, ... , levs : limc(cc) ≥ 1, ws(i,jj,k) = 0 (d) (e) where: 0 xi(,uj), k hi , j , k ; 0 xi(,l j), k hi , j , k for i = 1, ... , rows, j = 1, ... , col, k = 1, ... , levs : (i,j,k) ⋲A u) (l ) x(prow , pcol , plev prot ; x prow, pcol, plev 0 max 0, vi , j , k ci , j , k 0 hi,j,k = max(0,v i,j,k) when (i, j, k ) U when(i, j, k )P C Skip P • Model changes only on the target primary constraints. • How can the minimal solution for one target be transformed to be a solution for another target? • By applying a scalar that converts the flow through the second P to the fixed value of the model! • Can be done when the scalar does not violate the bounding conditions and the complementary flow in the target is 0. • I.e. when the solutions flow through the secondary target exceeds its protection requirement. Empirical confirmation • In our large sparse tables, we would see a lot of objective 0 results. • That is, the solver finds a 0 cost pattern to protect the primary … it is already protected! • Skip P eliminated most objective 0 results and left intact the sequence of positive objectives their solutions. Fat solution • CPLEX is using a dual simplex method to find solutions. • The solutions have a growing 0 cost component, with many more cells than are required to protect the target P. • The flow in the 0 cost cells far exceeds what is required to protect the target P (except in very small or dense examples). • The solution “lights up” the possible flows in the table’s current state, giving a “fat” solution. Skip P and the fat solution Optimization number Count of P with Running total flow skipped P 1 2 . . . 587 588 3961 3952 . . . 11035 11037 3076 3243 . . . 10448 10449 of dg10 sector 44 • Cartesian cells: 367,605 (2d) • Non-zero cells: 159,849 • Relations: 283 (row and column) – 14,000 potential tables, linked • P: • LP problems: • Typical LP size 95,062 10,604 – Reduced LP has 64826 rows, 156809 columns, and 528838 nonzeros • Time: 8hr:37min (includes everything) Comparison between network and LP on one (of hundreds) dataset from 2007 Network flow LP C 14,551 11,283 Cvalue 1,813,213,710 598,886,234 PubValue 12,348,960,578 13,563,288,054 (@10%) undersuppressions # 0 time 24min 8hrs 37min Statistics based on unduplicated data with an approximation of a published status flag Thankyou! [email protected]
© Copyright 2026 Paperzz