Private Learning and
Sanitization: Pure vs.
Approx. Differential Privacy
Uri Stemmer
Ben-Gurion University
Join work with Amos Beimel and Kobbi Nissim
Why Private Learners?
Database
z1
z2
z3
๏
zn-1
zn
Users
Learning
Algorithm
๐
Government,
Businesses,
Researchers
(or)
Malicious
adversary
Often, this algorithmic task can be abstracted as a learning problem:
โข Bank is interested in predicting (based on past customers) whether
new customers are good/bad credit
Approx. Differential Privacy
Dwork, McSherry, Nissim, Smith 2006
Changing one record does not change the output
distribution โtoo muchโ
z1
z2
z3
๏
zn-1
zn
๐
z1
z2
zโ3
๐ ๐บ๐
๏
zn-1
zn
๐
๐ ๐บ๐
Approx. Differential Privacy
Dwork, McSherry, Nissim, Smith 2006
Changing one record does not change the output
distribution โtoo muchโ
A (rand) algorithm ๐ is ๐, ๐น differentially
private if for all neighboring databases ๐บ๐ , ๐บ๐
and for all sets of outputs ๐ญ:
๐๐ซ ๐ ๐บ๐ โ ๐ญ โค โ
๐๐ โ
๐๐ซ ๐ ๐บ๐ โ ๐ญ + ๐น
Approx.
Pure Differential Privacy
Dwork, McSherry, Nissim, Smith 2006
Changing one record does not change the output
distribution โtoo muchโ
A (rand) algorithm ๐ is ๐, ๐น differentially
private if for all neighboring databases ๐บ๐ , ๐บ๐
and for all sets of outputs ๐ญ:
๐๐ซ ๐ ๐บ๐ โ ๐ญ โค ๐๐ โ
๐๐ซ ๐ ๐บ๐ โ ๐ญ + ๐น
Approx. Differential Privacy
Dwork, McSherry, Nissim, Smith 2006
Dwork, Kenthapadi, McSherry, Mironov, Naor 2006
Changing one record does not change the output
distribution โtoo muchโ
A (rand) algorithm ๐ is ๐, ๐น differentially
private if for all neighboring databases ๐บ๐ , ๐บ๐
and for all sets of outputs ๐ญ:
๐๐ซ ๐ ๐บ๐ โ ๐ญ โค ๐๐ โ
๐๐ซ ๐ ๐บ๐ โ ๐ญ + ๐น
Our Results:
โข Sample complexity of Private Learning and
Sanitization can be drastically smaller if we settle
for approximate differential privacy.
โข Label Privacy [Chaudhuri and Hsu 2011]
Learning model with weakened privacy demands.
We settle the question of sample complexity: O(VC).
โ Same as non-private learning.
โ Not is this talk.
โข Natural connection between Private Learning and
Sanitization, leads to lower bounds on Sanitization.
โ Not in this talk.
What is Private Learning?
Kasiviswanathan, Lee, Nissim, Raskhodnikova, Smith 08
Definition:
PAC Learning
+
Differential Privacy
__________________
Private Learning
โPACโ Model
โข
[Valiant 84]
Domain ๐ฟ.
๐
๐
๐ฟ
๐ ๐ โฆ
๐๐
โPACโ Model
โข
โข
[Valiant 84]
Domain ๐ฟ.
Set ๐ of boolean functions over ๐ฟ.
- for example: ๐๐๐๐๐๐๐๐๐
๐
๐
๐ฟ
๐ ๐ โฆ
๐๐
โPACโ Model
โข
โข
[Valiant 84]
Domain ๐ฟ.
Set ๐ of boolean functions over ๐ฟ.
- for example: ๐๐๐๐๐๐๐๐๐
๐
๐
๐ฟ
๐ ๐ โฆ
๐๐
โPACโ Model
[Valiant 84]
โข
โข
Domain ๐ฟ.
Set ๐ of boolean functions over ๐ฟ.
โข
Labeled sample.
- for example: ๐๐๐๐๐๐๐๐๐
๐
๐
๐ฟ
๐ ๐ โฆ
๐๐
โPACโ Model
[Valiant 84]
โข
โข
Domain ๐ฟ.
Set ๐ of boolean functions over ๐ฟ.
โข
โข
Labeled sample.
Output a classifier ๐.
- for example: ๐๐๐๐๐๐๐๐๐
๐
๐
๐ฟ
๐ ๐ โฆ
๐๐
โPACโ Model
[Valiant 84]
โข
โข
Domain ๐ฟ.
Set ๐ of boolean functions over ๐ฟ.
โข
โข
Labeled sample.
Output a classifier ๐.
- for example: ๐๐๐๐๐๐๐๐๐
๐
๐
๐ฟ
๐ ๐ โฆ
๐๐
Related work in Private
Learning (partial list)
[BDMN 05] First private learning algorithms. SQ based.
[KLNRS 08] Define private learning, and showed:
Every class ๐ can be privately learned using ๐ฅ๐จ๐ ๐ labeled
samples.
[BKN 10] Sample complexity of private learning.
[CH 11] Learning in continuous domain, label privacy.
[CM 08, CMS 11, KST 12] Machine learning.
[BLR 08, DNRRV 09, โฆ] Synthetic Data.
[DRV 10] Private Boosting.
Running Example: INTERVALd
๐๐
1
๐๐ ๐ = ๐ โบ ๐ < ๐
0
1 2 3 โฆ
Facts:
๐
โฆ 2๐
โข non-private proper learner with ๐ถ ๐ samples.
โข ๐-private proper learner: ๐ฏ ๐
samples [BBKN 10].
Running Example: INTERVALd
๐๐
1
๐๐ ๐ = ๐ โบ ๐ < ๐
0
1 2 3 โฆ
๐
โฆ 2๐
Facts:
โข non-private proper learner with ๐ถ ๐ samples.
โข ๐-private proper learner: ๐ฏ ๐
samples [BBKN 10].
We show:
๐, ๐น -private proper learner with ๐๐ถ
๐ฅ๐จ๐ โ ๐
samples.
Privately Learning intervals:
Facts: Ideas and Intuition.
โข non-private proper learner with ๐ถ ๐ samples.
โข ๐-private proper learner: ๐ฏ ๐
samples [BBKN 10].
We show:
๐, ๐น -private proper learner with ๐๐ถ
๐ฅ๐จ๐ โ ๐
samples.
The Goal:
Given a labeled sample, choose
a concept with small error.
๐
๐
๐ฟ
4-good interval ๐ฎ
Assume we can (privately) obtain an interval ๐ฎ โ ๐ฟ s.t.
โข
โข
Contains โa lotโ of ones, and โa lotโ of zeroes.
Every interval ๐ฐ โ ๐ฟ of length โค |๐ฎ|/๐ either does not
contain โtoo manyโ ones or does not contain โtoo manyโ
zeroes.
๐
๐
๐ฟ
4-good interval ๐ฎ
Assume we can (privately) obtain an interval ๐ฎ โ ๐ฟ s.t.
โข
โข
Contains โa lotโ of ones, and โa lotโ of zeroes.
Every interval ๐ฐ โ ๐ฟ of length โค |๐ฎ|/๐ either does not
contain โtoo manyโ ones or does not contain โtoo manyโ
zeroes.
๐
๐
๐ฟ
๐ฎ
4-good interval ๐ฎ
Assume we can (privately) obtain an interval ๐ฎ โ ๐ฟ s.t.
โข
โข
Contains โa lotโ of ones, and โa lotโ of zeroes.
Every interval ๐ฐ โ ๐ฟ of length โค |๐ฎ|/๐ either does not
contain โtoo manyโ ones or does not contain โtoo manyโ
zeroes.
๐
๐
๐ฟ
๐ฎ
4-good interval ๐ฎ
Assume we can (privately) obtain an interval ๐ฎ โ ๐ฟ s.t.
โข
โข
Contains โa lotโ of ones, and โa lotโ of zeroes.
Every interval ๐ฐ โ ๐ฟ of length โค |๐ฎ|/๐ either does not
contain โtoo manyโ ones or does not contain โtoo manyโ
zeroes.
๐
๐
๐ฟ
๐ฎ
4-good interval ๐ฎ
Assume we can (privately) obtain an interval ๐ฎ โ ๐ฟ s.t.
โข
โข
Contains โa lotโ of ones, and โa lotโ of zeroes.
Every interval ๐ฐ โ ๐ฟ of length โค |๐ฎ|/๐ either does not
contain โtoo manyโ ones or does not contain โtoo manyโ
zeroes.
๐
๐
๐ฟ
๐ฎ
4-good interval ๐ฎ
Assume we can (privately) obtain an interval ๐ฎ โ ๐ฟ s.t.
โข
โข
Contains โa lotโ of ones, and โa lotโ of zeroes.
Every interval ๐ฐ โ ๐ฟ of length โค |๐ฎ|/๐ either does not
contain โtoo manyโ ones or does not contain โtoo manyโ
zeroes.
๐
๐
๐ฟ
๐ฎ
4-good interval ๐ฎ โ done!
โข
Divide ๐ฎ into 4 equal intervals, and define 5 โequally spreadโ
concepts in ๐ฎ.
โข
At least one concept has small error.
๐๐
๐๐
๐๐
๐๐
๐๐
๐ฟ
๐ฎ
4-good interval ๐ฎ โ done!
โข
Divide ๐ฎ into 4 equal intervals, and define 5 โequally spreadโ
concepts in ๐ฎ.
โข
At least one concept has small error.
๐๐
๐๐
๐๐
๐๐
๐๐
๐ฟ
๐ฎ
4-good interval ๐ฎ โ done!
โข
Divide ๐ฎ into 4 equal intervals, and define 5 โequally spreadโ
concepts in ๐ฎ.
โข
โข
At least one concept has small error.
Choose one using the Exp. Mechanism [McSherry and Talwar 07]
(requires ๐ถ ๐ samples).
๐๐
๐๐
๐๐
๐๐
๐๐
๐ฟ
๐ฎ
4-good interval ๐ฎ โ done!
โข
Divide ๐ฎ into 4 equal intervals, and define 5 โequally spreadโ
concepts in ๐ฎ.
โข
โข
At least one concept has small error.
Choose one using the Exp. Mechanism [McSherry and Talwar 07]
(requires ๐ถ ๐ samples).
Conclusion: suffices to find a 4-good interval.
๐๐
๐๐
๐๐
๐๐
๐๐
๐ฟ
๐ฎ
Finding a 4-good interval
Assume we can (privately) obtain a Jโ โ s.t. there exists a 2-good
interval ๐ฎ of length J.
๐ฟ
Finding a 4-good interval
Assume we can (privately) obtain a Jโ โ s.t. there exists a 2-good
interval ๐ฎ of length J.
โข
Divide ๐ฟ into intervals ๐จ๐ and ๐ฉ๐ of length 2J, where the
{๐ฉ๐ }โs are right-shifted by J.
โข
At least one interval contains ๐ฎ.
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฟ
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐จ๐
Finding a 4-good interval
Assume we can (privately) obtain a Jโ โ s.t. there exists a 2-good
interval ๐ฎ of length J.
โข
Divide ๐ฟ into intervals ๐จ๐ and ๐ฉ๐ of length 2J, where the
{๐ฉ๐ }โs are right-shifted by J.
โข
At least one interval contains ๐ฎ.
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฎ
๐จ๐
๐ฟ
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐จ๐
Finding a 4-good interval
Assume we can (privately) obtain a Jโ โ s.t. there exists a 2-good
interval ๐ฎ of length J.
โข
Divide ๐ฟ into intervals ๐จ๐ and ๐ฉ๐ of length 2J, where the
{๐ฉ๐ }โs are right-shifted by J.
โข
At least one interval contains ๐ฎ.
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฎ
๐จ๐
๐ฟ
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐จ๐
Finding a 4-good interval
Assume we can (privately) obtain a Jโ โ s.t. there exists a 2-good
interval ๐ฎ of length J.
โข
Divide ๐ฟ into intervals ๐จ๐ and ๐ฉ๐ of length 2J, where the
{๐ฉ๐ }โs are right-shifted by J.
โข
At least one interval contains ๐ฎ.
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฎ
๐จ๐
๐จ๐
๐จ๐
๐ฟ
๐จ๐
๐จ๐
๐จ๐
Finding a 4-good interval
Assume we can (privately) obtain a Jโ โ s.t. there exists a 2-good
interval ๐ฎ of length J.
โข
Divide ๐ฟ into intervals ๐จ๐ and ๐ฉ๐ of length 2J, where the
{๐ฉ๐ }โs are right-shifted by J.
โข
At least one interval contains ๐ฎ.
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฎ
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐ฟ
๐จ๐
Finding a 4-good interval
Assume we can (privately) obtain a Jโ โ s.t. there exists a 2-good
interval ๐ฎ of length J.
โข
โข
โข
โข
Divide ๐ฟ into intervals ๐จ๐ and ๐ฉ๐ of length 2J, where the
{๐ฉ๐ }โs are right-shifted by J.
โข
At least one interval contains ๐ฎ.
Say ๐ฎ โ ๐จ๐ . Then ๐จ๐ contains โlotsโ of ones and zeroes.
Every other ๐จ๐ cannot contain both ones and zeroes.
Look for ๐จ๐ with โlotsโ of ones and zeroes.
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฎ
๐จ๐
๐จ๐
๐จ๐
๐ฟ
๐จ๐
๐จ๐
๐จ๐
Finding a 4-good interval
Assume we can (privately) obtain a Jโ โ s.t. there exists a 2-good
interval ๐ฎ of length J.
โข
Divide ๐ฟ into intervals ๐จ๐ and ๐ฉ๐ of length 2J, where the
{๐ฉ๐ }โs are right-shifted by J.
โข
At least one interval contains ๐ฎ.
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฉ๐
๐ฟ
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐จ๐
Finding a 4-good interval
Assume we can (privately) obtain a Jโ โ s.t. there exists a 2-good
interval ๐ฎ of length J.
โข
Divide ๐ฟ into intervals ๐จ๐ and ๐ฉ๐ of length 2J, where the
{๐ฉ๐ }โs are right-shifted by J.
โข
โข
At least one interval contains ๐ฎ.
Choose an interval using Adist
๐ฉ๐
๐ฉ๐
[ST 2013]
๐ฉ๐
(requires ๐ถ ๐ samples).
๐ฉ๐
๐ฉ๐
๐ฟ
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐จ๐
Finding a 4-good interval
Assume we can (privately) obtain a Jโ โ s.t. there exists a 2-good
interval ๐ฎ of length J.
โข
Divide ๐ฟ into intervals ๐จ๐ and ๐ฉ๐ of length 2J, where the
{๐ฉ๐ }โs are right-shifted by J.
โข
โข
โข
At least one interval contains ๐ฎ.
Choose an interval using Adist
[ST 2013]
The chosen interval is of length ๐ ๐ฎ
๐ฉ๐
๐ฉ๐
๐ฉ๐
(requires ๐ถ ๐ samples).
โน
4-good !
๐ฉ๐
๐ฉ๐
๐ฟ
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐จ๐
Finding a 4-good interval
Assume we can (privately) obtain a Jโ โ s.t. there exists a 2-good
interval ๐ฎ of length J.
โข
Divide ๐ฟ into intervals ๐จ๐ and ๐ฉ๐ of length 2J, where the
{๐ฉ๐ }โs are right-shifted by J.
โข
โข
โข
At least one interval contains ๐ฎ.
Choose an interval using Adist
[ST 2013]
The chosen interval is of length ๐ ๐ฎ
(requires ๐ถ ๐ samples).
โน
4-good !
Conclusion: suffices to find a length J
๐ฉ๐ of a
๐ฉ๐2-good
๐ฉ๐ interval.
๐ฉ๐
๐ฉ๐
๐ฟ
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐จ๐
๐จ๐
Computing the length J
Easy solution:
โข Noisy binary search on ๐ โค J โค ๐๐
.
โข ๐
noisy comparisons requires ๐
samples.
Better solution:
โข Noisy binary search on the power ๐ โค ๐ โค ๐ฅ๐จ๐ ๐ฟ of a 2-good
interval of length J= ๐๐ .
โข ๐ฅ๐จ๐ ๐ฅ๐จ๐ ๐ฟ noisy comparisons requires ๐ฅ๐จ๐ ๐ฅ๐จ๐ ๐ฟ samples.
In the paper:
Use recursion on binary search and significantly reduce the
costs.
Computing the length J
Easy solution:
โข Noisy binary search on ๐ โค J โค ๐๐
.
โข ๐
noisy comparisons requires ๐
samples.
Better solution:
โข Noisy binary search on the power ๐ โค ๐ โค ๐ฅ๐จ๐ ๐
of a 2-good
interval of length J= ๐๐ .
โข ๐ฅ๐จ๐ ๐
noisy comparisons requires ๐ฅ๐จ๐ ๐
samples.
In the paper:
Use recursion on binary search and significantly reduce the
costs.
Computing the length J
Easy solution:
โข Noisy binary search on ๐ โค J โค ๐๐
.
โข ๐
noisy comparisons requires ๐
samples.
Better solution:
โข Noisy binary search on the power ๐ โค ๐ โค ๐ฅ๐จ๐ ๐
of a 2-good
interval of length J= ๐๐ .
โข ๐ฅ๐จ๐ ๐
noisy comparisons requires ๐ฅ๐จ๐ ๐
samples.
In the paper:
Use recursion on binary search and significantly reduce the
costs.
Theorem:
There exists an ๐, ๐ฟ -private learner for
โ๐
๐ฅ๐จ๐
๐๐๐๐๐๐๐๐๐
with sample complexity ๐
.
Summary and Open Problems
โข What we saw:
Efficient ๐, ๐น -private learner for ๐๐๐๐๐๐๐๐๐
with low
sample complexity.
โ This separates the sample complexity of ๐, ๐ฟ -private and
๐-private learners.
โข Other results:
โ Efficient ๐, ๐น -private for other concept classes with
even lower sample complexity (independent of the
domain).
โ Similar results for Data Sanitization.
โข Open problem:
Lower bounds on the sample complexity of ๐, ๐น -private
learners?
© Copyright 2026 Paperzz