Tragedy of the Data Commons

Tragedy of the
Deidentified Data Commons
An Appeal for Transparency and Access
Jane Bambauer
James E. Rogers College of Law
University of Arizona
The Data Commons
Information collected by the government
tax information, epidemiological data, census surveys,
educational records, home mortgage data
Information collected by private companies
Anonymized and released*
The Anonymization Problem
Paul Ohm, Broken Promises of Privacy
57 UCLA L. REV. 1701
• Research subjects can be reidentified in anonymized
databases “with astonishing ease.”
AOL
Re-identification of Gov. Weld
Netflix re-identification
• Every privacy law must be rewritten to eliminate
dependence on anonymization and to restrict access to
all data (even deidentified data) without consent
Save the Data Commons
The Data Commons has been used to:
• Detect housing and employment discrimination
• Debunk the myth of the “welfare queen”
• Inform the healthcare and
mortgage lending policy debates
• Correct longstanding
misconceptions about crime
and law enforcement
• Lots more…
Jane Yakowitz, Tragedy of the Data Commons
Hazards of Covert Noise-Adding
Hazards of Covert Noise-Adding
Exaggerated Risks of Reidentification
The Gov. Weld Example
Exaggerated Risks of Reidentification
The Gov. Weld Example
Exaggerated Risks of Reidentification
The Gov. Weld Example
Gov. Weld Reidentification
Latanya Sweeney Collected Gov. Weld’s voter registration
information and publicly available hospital data
Only one hospital patient matched Gov. Weld’s
DOB, zip, and gender
Conclusion from analysis of US Census data:
87% can be uniquely identified from DOB, zip, and gender
Golle recalculations:
63% are unique using DOB, zip, and gender
Daniel Barth-Jones, “Reidentification” of Governor William Weld
Sweeney et al. 2013 PGP Study
579 Personal Genome Project participants provided their
DOB, zip code, and gender
Using voter registration records and other commercial data
sources, Sweeney et al. were able to reidentify 28%
(accuracy unclear)
2009 ONC Study
Out of 15,000 HIPAA-compliant records, 2 could be
reidentified
.013% Chance of Reidentification
For comparison’s sake, chance of dying from an auto
accident this year: .017%
Total Number of Known
Malicious Reidentifications
0 or 1*
If I Were a Malicious Intruder…
3,101 reported data breaches in the U.S.
(about half a billion records)
700 reported breaches of health records
If I Were a Malicious Intruder…
Sift through Garbage
Make Inferences from Facebook Profiles
Swab a Coffee Cup
What We Have to Lose
•
•
•
•
Fewer Opportunities for Replication
Fewer Voluntary Research Databases
Fewer Involuntary Public Databases
Increased Regulatory Precautions
More Status Quo Bias
Vioxx “What If” Study
From Richard Platt’s FDA testimony in 2007
Vioxx approved May, 1999
Removed from market September, 2004 (64 months)
Data on 7 million patients: 34 months
Data on 100 million: 3 months
88,000-139,000 avoidable heart attacks
27,000-55,000 avoidable deaths