Anti-discrimination and privacy protection in released datasets Sara Hajian Josep Domingo-Ferrer Data mining • There are negative social perceptions about data mining, among which potential • Privacy invasion • Potential discrimination Discrmination • Discrimination is unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit. Discrimination • Example: U.S. federal laws prohibit discrimination on the basis of: • Race , Color, Religion, Nationality, Sex, Marital status, Age, Pregnancy • In a number of settings: • • • • • Credit/insurance scoring Sale, rental, and financing of housing Personnel selection and wage Access to public accommodations, education, nursing homes, adoptions, and health care. Discrimination • Discrimination can be either direct or indirect: • Direct discrimination occurs when decisions are made based on sensitive attributes. • Indirect discrimination occurs when decisions are made based on non-sensitive attributes which are strongly correlated with biased sensitive ones. Discrimination in Data mining • Automated data collection and Data mining techniques such as classification rule mining have paved the way to making automated decisions: • loan granting/denial • insurance premium computation • Personnel selection and wage Discrimination in Data mining • If the training datasets are biased in what regards discriminatory attributes like gender, race, religion, discriminatory decisions may ensue. • Anti-discrimination techniques have been introduced in data mining • Discrimination discovery • Discrimination prevention Discrimination in Data mining • Discrimination discovery • Consists of supporting the discovery of discriminatory decisions hidden, either directly or indirectly, in a dataset of historical decision records. Discrimination Discovery • Different measures of discrimination power of the mined decision rules can be defined, according to the provision of different anti-discrimination regulations. • Extended lift (elift) • Selection lift (slift) Discrimination in Data mining • Discrimination prevention • Consists of inducing patterns that do not lead to discriminatory decisions even if trained from a dataset containing them. Discrimination Prevention • How can we train an unbiased classifier when the training data is biased? • As for privacy, the challenge is to find an optimal trade-off between (measurable) protection against unfair discrimination, and (measurable) utility of the data/models for data mining. Discrimination Prevention • Methods: • Transform the source data • Modify the data mining methods • Modifying discriminatory models The framework • The framework for discrimination prevention can be described in terms of two phases: • Discrimination Measurement • Data Transformation Data transformation • The purpose is transform the original data DB in such a way to remove direct and/or indirect discriminatory biases, with minimum impact • on the data and • on legitimate decision rules, • so that no unfair decision rule can be mined from the transformed data. Data transformation • As part of this effort, the metrics should be developed that specify • which records should be changed, • how many records should be changed • and how those records should be changed during data transformation. Utility measures • Measuring direct discrimination removal • Measuring indirect discrimination removal • Measuring Data Quality • Misses Cost (MC) • Ghost Cost (GC) Thanks for your attention
© Copyright 2026 Paperzz