Los Angeles County - 2006

Confidentiality Issues with “Small Cell” Data
Michael C. Samuel, DrPH
STD Control Branch
California Department of Public Health
2008 National STD Prevention Conference
Confronting Challenges, Applying Solutions
Chicago, Illinois, March 10-13, 2008
Race/Ethnicity
Age Group
Black
White
Hispanic
Asian/PI
Total
15-19
15
1
3
1
20
20-24
20
10
10
15
55
25-34
3
10
10
2
25
35+
12
14
7
2
35
Total
50
35
30
20
135
* NOT issue of “small cells” with expected value < 5 and impact on chi-square tests
Confidentiality – Types of Disclosure
• Identity Disclosure
– Identity of an individual can be determined based on
the released data
• Or …can reasonably be determined…
• Attribute Disclosure
– Confidential information about an individual is
revealed based on the released data
• Or “sensitive” information; or “embarrassing” information
Extensive Literature
Key Resource:
Federal Committee on Statistical Methodology
Office of Management and Budget
http://www.fcsm.gov/working-papers/spwp22.html
Key Concepts
• Release of public health data
– Balance obligations to protect the public’s
health with obligations to respect individual
privacy & confidentiality
• If “significant” risks
– “Statistical Disclosure Limitation”
• True Risk versus Perception of Risk
Disclosure Limitation with Tabular Data
• If cells are deemed sensitive based on
specified threshold rule
– Alter underlying “line-listed” or “microdata”
before the tables are constructed – may be
particularly relevant technique for on-line
query systems
– Change table: aggregate rows or columns
– Suppress cells
Threshold Rules
• Numerator rule
– e.g. cell size <3, <5 (many)
• Population denominator rule
– e.g. population < 20,000 (HIPPA-based), <50
• Numerator and population denominator rule
– numerator > 10 AND denominator > 50 (Oregon cancer registry)
• Population denominator minus numerator rule
– e.g. population-cell count < 10 (Missouri)
Cell Suppression
•
•
•
•
Simple Cell Suppression
Random Rounding
Controlled Rounding
Controlled Tabular Adjustment
No Suppression (“With Disclosure”)
Numbers from Working Paper 22, from Cox 1986
Race/Ethnicity
Age Group
Black
White
Hispanic
Asian/PI
Total
15-19
15
1
3
1
20
20-24
20
10
10
15
55
25-34
3
10
10
2
25
35+
12
14
7
2
35
Total
50
35
30
20
135
No Suppression (“With Disclosure”)
Numbers from Working Paper 22, from Cox 1986
Race/Ethnicity
Age Group
Black
White
Hispanic
Asian/PI
Total
15-19
15
1
3
1
20
20-24
20
10
10
15
55
25-34
3
10
10
2
25
35+
12
14
7
2
35
Total
50
35
30
20
135
Simple Suppression
Numbers from Working Paper 22, from Cox 1986
Race/Ethnicity
Age Group
Black
White
Hispanic
Asian/PI
Total
15-19
15
s
s
s
20
20-24
20
10
10
15
55
25-34
s
10
10
s
25
35+
12
14
7
s
35
Total
50
35
30
20
135
s – data withheld to limit disclosure
Simple & Complementary Row and/or Column Suppression
Numbers from Working Paper 22, from Cox 1986
Race/Ethnicity
Age Group
Black
White
Hispanic
Asian/PI
Total
15-19
15
s
s
s
20
20-24
20
s
s
15
55
25-34
s
10
10
s
25
35+
s
14
7
s
35
Total
50
35
30
20
135
s – data withheld to limit disclosure
Simple & Complementary Suppression
Numbers from Working Paper 22, from Cox 1986
Race/Ethnicity
Age Group
Black
White
Hispanic
Asian/PI
Total
15-19
15
s
s
s
20
20-24
20
s
s
15
55
25-34
s
10
10
s
25
35+
s
14
7
s
35
Total
50
35
30
20
135
s – data withheld to limit disclosure
= 1 based on linear combinations
Simple & Complementary – “Protected by Suppression”
Numbers from Working Paper 22, from Cox 1986
Race/Ethnicity
Age Group
Black
White
Hispanic
Asian/PI
Total
15-19
15
s
s
s
20
20-24
20
10
10
15
55
25-34
s
s
10
s
25
35+
s
14
s
s
35
Total
50
35
30
20
135
s – data withheld to limit disclosure
Methods available to select appropriate cells for suppression and to
audit a proposed suppression pattern
Los Angeles County - 2006
CA STD Control Suppression Rule
• Suppress any cell if
– numerator ≠ 0 AND
– 0 < (Cell denominator – cell numerator) < 100
• AND, If so
– Suppress any complementary cells necessary to avoid
re-calculation of suppressed cell
– OR
– Suppress all cells in a table if any cell meet criteria above
Fresno County - 2006
Modoc County - 2006
Alpine County - 2006
Sierra County - 2006
Attribute Disclosure
Solano County - 2004
Recommendations
• Confidentiality Concerns
– Assess real versus perceived risk
– If real, determine best rule(s)
– Proposition: suppress if:
• Denominator – Numerator < 100 AND Numerator
Not = 0
• If denominator unknown, estimate reasonably or
use reasonable “numerator only” rule
?
Michael C. Samuel, Dr.P.H.
[email protected]
510.620.3198