Some reflections
on the Social Dynamics of Privacy
24 February 2017
ECLT – Ca’Foscari University
Digital society
Privacy harms
Data collection
Computations
Traditional environments
• Small Communities
• Shared norms
• …
• Statistical correlations
• Publishing aggregate
statistics
• Discovering clusters
• Finding particular
events/outsider
• Applying machine
learning/data mining
techniques to update
e.g. marketing
strategies, etc.
• ….
New environments
• Large communities /networks
• Heterogeneous values
• “huge” distance between the
data holder and subjects
• “Hidden” forms of collections
• …
Privacy
Limited access to the self/control over personal information:
"nothing is better worthy of legal protection than private life, or, in
other words, the right of every man to keep his affairs to himself,
and to decide for himself to what extent they shall be the subject of
public observation and discussion.”
D. Solove, Conceptualizing privacy (2002 )
Personal Data: They “mean any information relating to an identified
or identifiable natural person ('data subject'); an identifiable person
is one who can be identified , directly or indirectly, in particular by
reference to an identification number or to one or more factors
specific to his physical, physiological , mental , economic, cultural or
social identity.”
EU Directive, art. 2 (1995)
Anonymization
Problem formulation: To allow the release of private data
preventing the re-identification of the data subject.
Anonymization is often obtained through data de-identification.
De-identified data: data in which “all explicit identifiers, such as
SSN, name, address, and telephone number, are removed,
generalized, or replace with made-up alternative”
L. Sweeney, Weaving technology and policy together to maintain
confidentiality (1997)
A famous story of re-identification
The Massachusetts Group Insurances Commission (GIC) collected patient-specific data
and gave a copy of its database to researches and industries. Data had been anonymized
by removing attributes containing patients’ name, address, and social security number
(SSN). Sweeney purchased the complete voter registration list for Cambridge
Massachusetts and…
Combining this data with the GIC records re-identified William Weld, the Governor of
Massachusetts.
L. Sweeney, K-Anonymity: A model for protecting privacy (2002)
Technical solutions
With the increasing availability of large-scale data and large-scale data
processing several privacy enhancing technologies have been
developed:
• K-anonymity + subsequent versions
• Differential privacy
Solutions arising from specific perspectives:
• Statistics
• Network science
• Machine learning
Privacy by design
A methodological approach to engineering systems stating that privacy should be
embedded into the whole lifecycle of IT systems, from the early stages to their
ultimate deployment (analogous to the value sensitive design approach).
Foundational Principles:
1.
2.
3.
4.
5.
6.
7.
Proactive not reactive; Preventative not remedial
Privacy as the default setting
Privacy embedded into design
Full functionality – positive-sum, not zero-sum
End-to-end security – full lifecycle protection
Visibility and transparency – keep it open
Respect for user privacy – keep it user-centric
A. Cavoukian Privacy by Design (2009)
The debate on privacy
Individual value/right
• In standard literature privacy is characterized as an individual
property (privacy as “the right to be let alone”, “access/control)
Emphasis on protecting individuals.
Privacy as a social process
• An individual-centered view of privacy emphasizes the negative
value of privacy, establishes a conflict between the individual and
society, it fails to take into account the importance of large social
and economic organization. P. Regan, Legislating privacy:
Technology, Social Values, and Public Policy (1995)
• “privacy is a social construction that we create as we negotiate our
relationship with others on a daily basis.” P. Regan Privacy and The
common Good: Revisited (2015)
Privacy and game theory
An emerging literature combining: privacy, game theory and
mechanism design
Some research objectives:
- Modeling the costs of privacy losses
- Designing appropriate compensation due to privacy loss
- Designing markets and pricing schemes for sensitive data
Privacy as a coordination game
Coordination game: class of games in which players choose the
same or corresponding strategy (e.g. choosing the sides of the road
upon which to drive)
Reference: Aripta Gosh and Katrina Ligett, Privacy and Coordination:
Computing on databases with endogenous participation, Proc. 14th
ACM Conference on Electronic Commerce (EC), 2013.
Theoretical assumption
• Privacy is formulated in terms of differential privacy
• Differential privacy is just one possible theoretical privacy
framework. The same idea could be applied also to other
models
Differential privacy
• Main intuition: an analyst accesses to a database of individuals‘
information to perform a computation, the goal is to optimize the
quality of the output of the computation guaranteeing a particular
level of (differential) privacy
• With differential privacy we shift the focus from database to
computation (“Differential privacy provide privacy by process”).
Randomized mechanism
• Inspired by a technique developed in social sciences to collect
statistical information about embarrassing or illegal behavior.
• Participants report whether or not they have a property P as
follows:
– Flip a coin
– If tails, they respond truthfully
– If heads, then flip a second coin and respond “Yes” if heads and
“No” if tails
• “privacy” comes from the plausible deniability of any outcome
(introduction of spurious “yes”, “no” answers)
Example
18
16
17
18
19
20
“Ingredients” of the model
Randomized algorithm
• Mechanism = an algorithm that takes as input a database and a set
of queries and produces an output
• Given a discrete set B and a probability simplex over B, denoted ΔB
• A randomized algorithm with a domain A and a discrete range B is a
mapping M: AΔB
Neighboring Databases
• Databases as a collection of records
• Neighboring databases D and D’ which differ in at most one row
(Hamming distance)
Differential privacy
• A randomized algorithm is ε-differentially private if the following
holds:
Pr[ M ( D) b] exp( ) Pr[ M ( D' ) b]
Pr[ M ( D) b]
exp( )
Pr[ M ( D' ) b]
• Intuitively differential privacy guarantees that a randomized
algorithm behaves similarly on similar input database
Avoiding privacy harms
To participate
or not to
participate?
Output a
Database
with Mr.
Bean’s data
Database
without Mr.
Bean’s data
Pr[ M ( D) b] Pr[ M ( D' ) b]
• Looking at the output a, an adversary should not be able to
guess which databases it came from
• Intuition: if nothing is learned about an individual then the
individual cannot be harmed by the analysis
Privacy loss
Privacy loss is derived from differential privacy
and is bounded by ε:
Pr[ M ( D) b]
exp( )
Pr[ M ( D' ) b]
Pr[ M ( D) b]
ln
Pr[ M ( D' ) b]
Privacy loss
Coordination game scenario
Do I participate?
Computation
Output
There is a binary decision at stake: “Yes, I participate” or “No, I don’t participate”
Main intuition
Theoretical intuition
• Privacy comes not from a formal guarantee, but from a sense that
one is likely safer ‘hiding’ in a larger crowd
Practical intuition
• Real-world situations involving the use of personal data typically say
(at best) how an individual’s data might be used, rather than
promise a privacy guarantee
Model’s ingredients
Given:
• Privacy-sensitive population
• Pre-announced noisy computation
• Individuals’ minimum privacy requirement
The privacy an agent receives in this computation depends on the
noise added to the outcome and the number of agents choose to
participate in the database.
The decision of whether to participate in a computation emerge as
an equilibrium between the potential participants (coordination
game).
Mathematical model
• N = number of total agents
• ri = agents’ privacy requirement, a real number greater than or
equal to zero. It is derived from a utility model with privacy cost (a
function of agents’ privacy loss).
Example: if an agent has a value v for the output of the computation
and the privacy cost is linear in the level of differential privacy , c()
= c, then the utility is nonnegative when v - c ≥ 0 or ≤ v/c = r
• E[] = Expected privacy guarantee
An agent will participate if her expected privacy from participation is
less than her threshold ri, i.e. if E[] ≤ ri
Equilibrium
• Suppose that all other agents use a strategy s(r) to make their
decision
• An agent i makes her decision based on her ri , the number of
agents N and her belief about other agents participation choices
• An agent participates if her differential privacy guarantee is greater
than her threshold p(s(r), N, ) < ri
• A threshold strategy equilibrium is one where all agents whose
privacy requirements are above a certain threshold r* participate
and all agents with stricter privacy requirements, i.e., with
requirements below r*, do not participate, and no agent can
benefit by deviating from this strategy.
Paper contribution
• Proof of equilibrium existence
• Study of equilibrium behavior (in different settings)
• Some results/properties, such as:
– Existence of multiple equilibria
– Best equilibrium for given values of N
– For any N, there exists an interval of values at which
equilibria exist and the size of this interval grows with
N
…
Some considerations
• Intuitively privacy requirement suggests that agents has some
privacy expectation (threshold)
• Privacy is endogenously determined, it results from agents’
behavior
• Privacy arises from individual preferences and from collective
dynamics
Individual and collective dimensions
• The individual/contextual dimensions reminds the
local/contextual dimensions in Relaxation Labeling Processes
(RLP)
• RLP was originally developed to deal with ambiguity in vision
system but it had broader implications (theory of dynamical
systems)
Relaxation labeling
B b1 , b2 ,..., bn
1,2,..., m
Set of objects
Set of labels
In RLP we exploit 2 sources of information: local and contextual information
Contextual measure:
Matrix of compatibility coefficients
R rij ( , )
The coefficient measures the strength of
compatibility between the hypotheses :
“bi has label λ” and “bj has label μ”
Local measure:
pi0 ( pi0 (1),..., pi0 (m));
pi0 ( ) 0;
i 1,...n
p
0
i
( ) 1
The initial non-contextual
degree of confidence in the
hypothesis “bi has label λ”
Relaxation labeling
The space of weighted assignments
p10 , p20 ,..., pn0
m
nm
0
K p R | pi ( ) 0; pi ( ) 1, i 1,...n, ;
1
A relaxation labeling process takes as input the initial labeling assignment and
iteratively updates it
n
m
qi ( , p) rij ( , ) p j ( )
Support function which combines
the local and contextual information
pi ( )qi ( )
Updating rule to adjust the labeling p
p ( )q ( )
The adjusting process continues until
an equilibrium is reached
j 1 1
pi ( )
m
1
i
i
The dynamics of privacy
• Set of objects set of agents
• Set of labels set of computations
• Local information it measure the initial confidence in the hypothesis
“agent ai will participate in computation ck”
• Contextual information it measures the strength of compatibility
between “agent ai will participate in computation ck” and “agent aj will
participate in computation cz”
• Space of “degree” of participation
• Support function will combine individual propensity to participate in a
given computation and the social “approval”
• Privacy could be regarded as the result of a social process
Final considerations
• To avoid an economic view of privacy
• To leverage social approval / agreement
• To explore relationship between expected participation and
“privacy via exposure” (the set of people who are expected to know
a piece of information)
© Copyright 2026 Paperzz