Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention Conference Confronting Challenges, Applying Solutions Chicago, Illinois, March 10-13, 2008 Race/Ethnicity Age Group Black White Hispanic Asian/PI Total 15-19 15 1 3 1 20 20-24 20 10 10 15 55 25-34 3 10 10 2 25 35+ 12 14 7 2 35 Total 50 35 30 20 135 * NOT issue of “small cells” with expected value < 5 and impact on chi-square tests Confidentiality – Types of Disclosure • Identity Disclosure – Identity of an individual can be determined based on the released data • Or …can reasonably be determined… • Attribute Disclosure – Confidential information about an individual is revealed based on the released data • Or “sensitive” information; or “embarrassing” information Extensive Literature Key Resource: Federal Committee on Statistical Methodology Office of Management and Budget http://www.fcsm.gov/working-papers/spwp22.html Key Concepts • Release of public health data – Balance obligations to protect the public’s health with obligations to respect individual privacy & confidentiality • If “significant” risks – “Statistical Disclosure Limitation” • True Risk versus Perception of Risk Disclosure Limitation with Tabular Data • If cells are deemed sensitive based on specified threshold rule – Alter underlying “line-listed” or “microdata” before the tables are constructed – may be particularly relevant technique for on-line query systems – Change table: aggregate rows or columns – Suppress cells Threshold Rules • Numerator rule – e.g. cell size <3, <5 (many) • Population denominator rule – e.g. population < 20,000 (HIPPA-based), <50 • Numerator and population denominator rule – numerator > 10 AND denominator > 50 (Oregon cancer registry) • Population denominator minus numerator rule – e.g. population-cell count < 10 (Missouri) Cell Suppression • • • • Simple Cell Suppression Random Rounding Controlled Rounding Controlled Tabular Adjustment No Suppression (“With Disclosure”) Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age Group Black White Hispanic Asian/PI Total 15-19 15 1 3 1 20 20-24 20 10 10 15 55 25-34 3 10 10 2 25 35+ 12 14 7 2 35 Total 50 35 30 20 135 No Suppression (“With Disclosure”) Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age Group Black White Hispanic Asian/PI Total 15-19 15 1 3 1 20 20-24 20 10 10 15 55 25-34 3 10 10 2 25 35+ 12 14 7 2 35 Total 50 35 30 20 135 Simple Suppression Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age Group Black White Hispanic Asian/PI Total 15-19 15 s s s 20 20-24 20 10 10 15 55 25-34 s 10 10 s 25 35+ 12 14 7 s 35 Total 50 35 30 20 135 s – data withheld to limit disclosure Simple & Complementary Row and/or Column Suppression Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age Group Black White Hispanic Asian/PI Total 15-19 15 s s s 20 20-24 20 s s 15 55 25-34 s 10 10 s 25 35+ s 14 7 s 35 Total 50 35 30 20 135 s – data withheld to limit disclosure Simple & Complementary Suppression Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age Group Black White Hispanic Asian/PI Total 15-19 15 s s s 20 20-24 20 s s 15 55 25-34 s 10 10 s 25 35+ s 14 7 s 35 Total 50 35 30 20 135 s – data withheld to limit disclosure = 1 based on linear combinations Simple & Complementary – “Protected by Suppression” Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age Group Black White Hispanic Asian/PI Total 15-19 15 s s s 20 20-24 20 10 10 15 55 25-34 s s 10 s 25 35+ s 14 s s 35 Total 50 35 30 20 135 s – data withheld to limit disclosure Methods available to select appropriate cells for suppression and to audit a proposed suppression pattern Los Angeles County - 2006 CA STD Control Suppression Rule • Suppress any cell if – numerator ≠ 0 AND – 0 < (Cell denominator – cell numerator) < 100 • AND, If so – Suppress any complementary cells necessary to avoid re-calculation of suppressed cell – OR – Suppress all cells in a table if any cell meet criteria above Fresno County - 2006 Modoc County - 2006 Alpine County - 2006 Sierra County - 2006 Attribute Disclosure Solano County - 2004 Recommendations • Confidentiality Concerns – Assess real versus perceived risk – If real, determine best rule(s) – Proposition: suppress if: • Denominator – Numerator < 100 AND Numerator Not = 0 • If denominator unknown, estimate reasonably or use reasonable “numerator only” rule ? Michael C. Samuel, Dr.P.H. [email protected] 510.620.3198
© Copyright 2026 Paperzz