UN/ECE Work Session On Statistical Data Confidentiality (Geneva, 9-11 November 2005) WP30: Safety rules in statistical disclosure control for tabular data Giovanni Merola Winton Capital Management Ltd [email protected] Partially written while at ISTAT and partially supported by EU project CASC. G. Merola Winton Capital Management 1 Plan of the Talk 1. 2. 3. 4. 5. 6. 7. 8. 9. SDC for Magnitude tables; Existing safety rules; Generalised p-rule; Rational estimates; Prior distribution; U-estimates; Comparison on real SBS data; MU-rules; Concluding remarks. G. Merola Winton Capital Management 2 1. SDC for Magnitude Tables Tables showing the sums of non-negative contributions in each cell. Example: Income £K Young Old All Ages Male 200 600 800 Female 150 450 600 All Sexes 350 1050 1400 Contributions in non-increasing order 150 130 120 90 50 40 z1≥ z2 ≥ z3 ≥ z4 ≥ ··· ··· G. Merola Winton Capital Management 20 ≥ zn Total 600 (Old Males) Total T is published n is number contrib.n 3 1. SDC for Magnitude Tables cont.d SDC policy: 1. 2. 3. If the categories are confidential, (likely) identification of respondents is disclosure; else only the contributions of (likely) identifiable respondents cannot be disclosed (too precisely); same rule for all cells, else microdata protection. G. Merola Winton Capital Management 4 2. Existing Safety Rules Rare respondents are identifiable – Respondents with large contrib. are identifiable – threshold rule: n > m. Dominance: (z1+···+zm)/T k. Largest contributor is identifiable, hence second largest must not estimate z1 closely – p-rule: [(T-z2) -z1]/z1> p. G. Merola Winton Capital Management 5 3. Generalised p-rule Includes the existence of groups of respondents z1 z2 z3 z4 ··· ··· zn t2 Total is T R2,2 Group with largest sum identifiable; group with second largest sum must not estimate largest sum too closely; G. Merola Winton Capital Management 6 3. Generalised p-rule cont.d Same estimate as p-rule: maximum possible value ^tm=T-Rm,l Gen. p-rule ((T-Rm,l) -tm)/tm > p t1=z1 and R1,1=z2 p-rule G. Merola Winton Capital Management 7 3. Generalised p-rule cont.d If zero contributions are known (external intruder): Dominance rule with k=1/(1+p) If no groups: simple p-rule; If intruding group formed of (m-1) respondents: threshold rule n>m protects against exact estimation (p=0). G. Merola Winton Capital Management Merola, G. M., 2003a. Generalized risk measures for tabular data. Proceedings of the 54th Session of the International Statistical Institute. 8 4. Rational Estimates An intruder can compute a lower and an upper bound for the value of tm: m m t tm t ; For example, if z2=40 and T=100: 40=z2 z1 T- z2=60; the bounds are different for different prior knowledge of the intruder. G. Merola Winton Capital Management 9 4. Rational Estimates cont.d tm can be estimated by minimising the Mean Square Error for some distribution F(tm) : tm min tm 2 ˆ (tm tm ) dF (tm ); for a well known property MSE is minimised by the mean tˆm E (tm ) G. Merola Winton Capital Management 10 5. Prior Distribution: Uniform The ignorance about the distribution of tm can be modelled with a Uniform distribution: tm~U(tm-, tm+) in this case the mean is simply: t t tˆm m m . 2 Note: same estimate for any symmetric F. G. Merola Winton Capital Management 11 5. Prior Distribution: maximising The Generalised p-rule can be derived by assuming a prior concentrated on the maximum value 1 tm tm F (tm ) ; 0 else We refer to the Gen p-rule as M-rule, and to the that derived using the Uniform as Urule. G. Merola Winton Capital Management 12 6. U-estimates Different prior knowledge of the intruder knows T but not n: tˆm T / 2 (Dominance); knows T and n, tˆm (m n)T / 2n; knows T and L contributions, tˆm (T RL ,m mz( m1) ) / 2 (Gen. p-rule*) knows T, L contributions and n, either as above or tˆm T RL ,m (n m L) z( m L ) / 2. * for m=L=1 uniform p-rule is same as uniform dominance Merola, G., 2003b. Safety rules in statistical disclosure control for tabular data. G. Merola Winton Capital Management Contributi Istat 1, istituto Nazionale di Statistica, Roma. 13 6. U-estimates cont.d Example C=(970,376,274,253,203,169,161,121,86,62,21,10), T=2706 Estimated Rule RelErr z1 G. Merola Winton Capital Management Dom 2706 1.8 (t2/T=0.5) p-rule 2330 1.4 U-Dom 1353 0.4 U (1:n) 1465 0.51 U(1;1) 1353 0.4 14 7. Comparison on real SBS data We applied different rules to Italian SBS data, turnover by Region and SIC for the years ’94 and ‘97. We considered the SIC with 2 and 3 digits. G. Merola Winton Capital Management 15 7. Comparison on real SBS data cont.d Mean relative error for z1 G. Merola Winton Capital Management 16 7. Comparison on real SBS data cont.d Mean relative error for t2 G. Merola Winton Capital Management 17 8. U-rules The values for tm tˆm / tm p are intervals: Knowing only T (Dominance) tm 1 1 2(1 p) T 2(1 p) Knowing T and L contributions (gen p-rule) tm 1 1 2(1 p ) T RL ,m mzm1 2(1 p ) G. Merola Winton Capital Management 18 9. MU-rules assuming both estimating approaches we obtain subadditive rules, analogous to p-rule but with stricter bounds unsafe safe M-rule T-R (1+p) unsafe safe U-rule safe ^ ^ tm tm (1+p) (1-p) unsafe safe MU-rule ^ ^ tm tm (1-p) (1+p) tmG. Merola Winton Capital Management ^ tm= tm++tm2 tm+ tm 19 9. MU-rules cont.d Safety rule when only T known (Dominance) t 1 m T 2(1 p ) Safety rule when T and L contributions known (gen p-rule) tm 1 T RL ,m mzm1 2(1 p ) G. Merola Winton Capital Management 20 10. Conclusions The assumptions for the existing rules are unrealistic; using a simple noninformative distribution much smaller relative error of estimation; the corresponding rules are not subadditive; joining assumptions leads to stricter rules; identifiability of all largest respondents requires these rules; different prior can be used. G. Merola Winton Capital Management 21
© Copyright 2026 Paperzz