Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi1, Li Xiong1, Benjamin C. M. Fung2 1Departmen of Mathematics and Computer Science, Emory University, Atlanta, GA, USA 2CIISE, Concordia University, Montreal, QC, Canana Problem Statement Definitions We study the problem of anonymizing microdata with quasi-sensitive (QS) attributes which are not sensitive by themselves, but can be linked to external knowledge to reveal indirect sensitive information of an individual. The external knowledge table E has each row as a pair (Li, Si), i = 1, 2, ..., |E|, where Li is a sensitive label and Si is a corresponding set of QS values. All sensitive labels that can be linked to the d tuples in a QI group G with quasi-identifying (QI) vector q is ∪di=1K(tpi), the sensitive label set of G. The attacker’s prior belief α(q,L) and posterior belief β(q,L) are the probabilities that a target tp with QI-vector q is linked to a label L before and after the data release. (a) Original microdata with quasi-sensitive attribute symptoms (b) External knowledge that maps symptoms to disease (c) A generalized table that cannot prevent indirect disclosure of disease through symptoms Preliminary Results With the Mondrian generalization and our suppression algorithm implemented in C++, we conducted experiments with: 1) a dataset with 3000 tuples augmented from the Adult dataset, with 8 QI attributes and 9 synthesized QS terms per tuple, and 2) an external table with 3000 pieces of knowledge labels linked to random QS terms with Poison distribution. Definition (QS (c,l)-diversity). A group G satisfies QS (c,l)diversity if and only if p1 ≤c (pl + pl +1 + ... + p|∪di=1K(tpi)| ), where p1, p2, ..., p |∪di=1K(tpi)| are the values of β(q,Li) in decreasing order. A table D∗ satisfies QS (c,l)-diversity if every group satisfies QS (c,l)-diversity. Definition (QS t-closeness). A group G satisfies QS t-closeness if and only if the distance between α(q,L) and β(q,L) is no more than a threshold t. A table D∗ satisfies QS t-closeness if every group satisfies QS t-closeness. Figure 1. Anonymizing data with QS attributes Algorithm Contributions Phase 1 (QI generalization). Given D, an intermediate dataset Dg is obtained that satisfies k-anonymity. Figure 4. QS suppression for QS (c,l)-diversity showing adaptive QS suppression outperforms baseline DFS search significantly Phase 2 (QS suppression). Given Dg, a suppression algorithm is used to remove proper QS values (items) until every QI group satisfies QS (c,l)-diversity or QS t-closeness. Figure 2. Disclosure risks with QS attributes • Formal notions of QS l-diversity and QS t-closeness that extend l-diversity and t-closeness to prevent indirect attribute disclosure due to QS attribute values. •A two-phase algorithm that combines generalization and value suppression to achieve QS l-diversity and QS t-closeness. • Greedy search heuristics with dynamic reordering of tailsets that contain potential values to be removed in the next step to enable quick return of result • Dynamic updates when a solution with a lower cost is found to enable continuous improvement of the result within a bounded time period. Figure 3. QS suppression search tree and algorithm features Figure 5. Two-phase algorithm for QS t-closeness showing the trade-off between better privacy and smaller removal cost and benefit of the two-phase algorithm compared to generalization only approach.
© Copyright 2026 Paperzz