Re-development of the Cell Suppression Methodology at

Re-development of the Cell
Suppression Methodology at the US
Census Bureau
Philip Steel, James Fagan,
Paul Massell, Richard Moore Jr.,
John Slanta, Bei Wang
Background
•
•
•
•
•
Jewett’s network flow program
Need for new program
2012 economic census
LP (linear programming) methodology
R&M cell suppression team
Processing Model
• Preprocessing
– Create table description
– Determine primaries
– Unduplicate
•
•
•
•
Sequential processing of primaries
Queue reduction
Test company protection (aggregate/supercell)
Sequential processing of supercells
Table relations
• Marginals are the sum of interior cells
• Geographic relationships tend to generate our
most complex sets of table relations
– State is the sum of metropolitan areas within the state
and the balance.
– State is also the sum of counties
• Of the form A=B+..+Z where A,B,…,Z are (one of)
rows columns or levels that define some
Cartesian integer space (i,j,k)
• Duplicates are recorded as A=B (eg a county is
also a place)
minimize: Y 
   c x  
rows
i 1
cols
levs
i , j ,k
j 1 k 1
( i , j , k ) A
u
i , j ,k
 xi,l j,k

subject to:
 x
levs
(a)
k 2
( i , j , k )A
(u )
i , j ,k

 xi(,l j),k  xi(,uj),1  xi(,l j),1
for i =1, ... , rows, j = 1, ... ,cols : levs > 1, ws(i,j,1) = 0
 x
lim r ( ii )
(b)
i 1
( i , j , k )A
(u )
rowrel ( ii,i ), j , k

(l )
(u )
(l )
 xrowrel
( ii,i ), j , k  x rowrel ( ii , 0 ), j , k  x rowrel ( ii, 0 ), j , k
for ii = 1, ... , rr, j = 1,..,cols, k = 1, ... , levs : limr(ii) ≥ 1, ws(ii,j,k) = 0
 x
lim c ( jj )
(c)
j 1
( i , j , k )A
(u )
i ,colrel( jj , j ), k

)
(u )
(l )
 xi(,lcolrel
( jj , j ), k  xi ,colrel( jj , 0 ), k  xi ,colrel( jj , 0 ), k
for i = 1, ... , rows, jj = 1, ... , cc, k = 1, ... , levs : limc(cc) ≥ 1, ws(i,jj,k) = 0
(d)
(e)
where:
0  xi(,uj), k  hi , j , k ; 0  xi(,l j), k  hi , j , k
for i = 1, ... , rows, j = 1, ... , col, k = 1, ... , levs : (i,j,k) ⋲A
u)
(l )
x(prow
, pcol , plev  prot ; x prow, pcol, plev  0
max 0, vi , j , k 
ci , j , k  
0
hi,j,k = max(0,v i,j,k)
when (i, j, k ) U
when(i, j, k )P  C
Objective Function
  c x
rows cols
Y
i 1
levs
j 1 k 1
( i , j , k )A
i , j ,k
u 
i , j ,k
 xi , j ,k 
l 
Additivity constraint generator
(based on row relations)
(b)
for ii = 1, ... , rr, j = 1,..,cols, k = 1, ... , levs : limr(ii) ≥ 1, ws(ii,j,k) = 0
Bounds
hi,j,k = max(0,vi,j,k)
for i = 1, ... , rows, j = 1, ... , col, k = 1, ... , levs : (i,j,k) ⋲A
For the primary
minimize: Y 
   c x  
rows
i 1
cols
levs
i , j ,k
j 1 k 1
( i , j , k ) A
u
i , j ,k
 xi,l j,k

subject to:
 x
levs
(a)
k 2
( i , j , k )A
(u )
i , j ,k

 xi(,l j),k  xi(,uj),1  xi(,l j),1
for i =1, ... , rows, j = 1, ... ,cols : levs > 1, ws(i,j,1) = 0
 x
lim r ( ii )
(b)
i 1
( i , j , k )A
(u )
rowrel ( ii,i ), j , k

(l )
(u )
(l )
 xrowrel
( ii,i ), j , k  x rowrel ( ii , 0 ), j , k  x rowrel ( ii, 0 ), j , k
for ii = 1, ... , rr, j = 1,..,cols, k = 1, ... , levs : limr(ii) ≥ 1, ws(ii,j,k) = 0
 x
lim c ( jj )
(c)
j 1
( i , j , k )A
(u )
i ,colrel( jj , j ), k

)
(u )
(l )
 xi(,lcolrel
( jj , j ), k  xi ,colrel( jj , 0 ), k  xi ,colrel( jj , 0 ), k
for i = 1, ... , rows, jj = 1, ... , cc, k = 1, ... , levs : limc(cc) ≥ 1, ws(i,jj,k) = 0
(d)
(e)
where:
0  xi(,uj), k  hi , j , k ; 0  xi(,l j), k  hi , j , k
for i = 1, ... , rows, j = 1, ... , col, k = 1, ... , levs : (i,j,k) ⋲A
u)
(l )
x(prow
, pcol , plev  prot ; x prow, pcol, plev  0
max 0, vi , j , k 
ci , j , k  
0
hi,j,k = max(0,v i,j,k)
when (i, j, k ) U
when(i, j, k )P  C
Skip P
• Model changes only on the target primary constraints.
• How can the minimal solution for one target be
transformed to be a solution for another target?
• By applying a scalar that converts the flow through the
second P to the fixed value of the model!
• Can be done when the scalar does not violate the
bounding conditions and the complementary flow in
the target is 0.
• I.e. when the solutions flow through the secondary
target exceeds its protection requirement.
Empirical confirmation
• In our large sparse tables, we would see a lot
of objective 0 results.
• That is, the solver finds a 0 cost pattern to
protect the primary … it is already protected!
• Skip P eliminated most objective 0 results and
left intact the sequence of positive objectives
their solutions.
Fat solution
• CPLEX is using a dual simplex method to find
solutions.
• The solutions have a growing 0 cost component,
with many more cells than are required to
protect the target P.
• The flow in the 0 cost cells far exceeds what is
required to protect the target P (except in very
small or dense examples).
• The solution “lights up” the possible flows in the
table’s current state, giving a “fat” solution.
Skip P and the fat solution
Optimization
number
Count of P with Running total
flow
skipped P
1
2
.
.
.
587
588
3961
3952
.
.
.
11035
11037
3076
3243
.
.
.
10448
10449
of
dg10 sector 44
• Cartesian cells: 367,605 (2d)
• Non-zero cells: 159,849
• Relations:
283 (row and column)
– 14,000 potential tables, linked
• P:
• LP problems:
• Typical LP size
95,062
10,604
– Reduced LP has 64826 rows, 156809 columns, and
528838 nonzeros
• Time:
8hr:37min (includes everything)
Comparison between network and LP
on one (of hundreds) dataset from 2007
Network flow
LP
C
14,551
11,283
Cvalue
1,813,213,710
598,886,234
PubValue
12,348,960,578
13,563,288,054 (@10%)
undersuppressions
#
0
time
24min
8hrs 37min
Statistics based on unduplicated data with an approximation of a published status flag
Thankyou!
[email protected]