On the Anonymity of Anonymity Systems

On the Anonymity of Anonymity
Systems
Andrei Serjantov
[email protected]
(anonymous)
Outline
• Anonymity informally
– Anonymity Properties
• Anonymity of Existing Implementations
– Analysis
• Probability, Entropy
• Attacks
– Low Latency
– Intersection
• Conclusion
What is Anonymity?
Actually, we assume
humans are tied to
computers
and anonymize those
Anonymity does not hide
the presence of the
individuals/computers
just their identity
Anonymity System
This guy does not even know
he is on the internet(!)
Anonymity Properties I:
Receiver Untraceability
A
B
Senders are observable – i.e.
the attacker knows that
A sent a message to someone
Receivers are not observable – ie
the attacker does not know if
B received a message
Anonymity Properties II:
Sender Untraceability
B
A
Senders unobservable….
Anonymity Properties III:
Unlinkability
A
B
Senders and Receivers are observable,
but not clear who is talking to whom
Anonymous from Who?
(threat model)
• The observer:
– Can compromise (almost) everything but two
users of the system
– Observes and modifies all network traffic
– Observes all network traffic
• Global Passive Adversary
– Observes some network traffic
– Is the service the user is accessing
Properties
• A mix cascade guarantees that a global
active attacker cannot distinguish two
honest users who send one message each
between time t and t’.
– e.g. mixing votes
• DC-net
– (both sender and receiver anonymity)
• Can be expressed formally
Anonymity of Existing
Implementations
Mixes
Mix Systems
Timed Mix
Mix System
R - Receiver
A - Mix
B - Mix
Sender
M, 0101011 R
B
A
R
B
A
M, 0101011 R
R
B
M, 0101011
R
B
Receiver
Doing Things Anonymously
• Can provide guarantees for those who wish
to send one message < 32K, and suffer the
consequences of it not reaching the receiver
• Real life is not like that
– Anonymous email (Mixmaster, Mixminion)
• Send and receive anonymous emails
– Web Browsing (JAP, TOR, Tarzan, Morphmix)
• Wide file size distribution
• Low latency
Anonymity Analysis of Existing
Systems
• Define a system, and an adversary
• Take inputs into the system
– e.g. web request message stream
– Email interaction
• Compute observation
Hence figure out how vulnerable the
anonymity of a certain activity is to a
particular adversary.
Inputs, Model, Observation
System:
M1
M2
M1
(transition semantics model
of the mixes)
Sender 1
Inputs:
Sender 2
Sender 3
R2
Sender 1
R3
Sender 2
Sender3
R1
R2
R3
Attacker: Global Passive Adversary
M2
R1
Observation:
R2
Sender 1
M1
Sender 2
R3
M2
Sender3
R1
Mix Network
A
B
Q
C
D
R
Traditionally
{A,B,C,D}
Timed Mix
A
B
C
D
{A,B,C,D}
Mix Network
A
B
Q
C
D
R
Traditionally
{A,B,C,D}
The message arriving to R is much more likely to be
from D than from A
Pool Mix
• M messages stay in the mix at
each round
• Messages to be sent are picked
from both the N and the M
• A message might stay in the mix
for an very long time (but the
probability of this happening is
very small)
M
N+M
N
N
• The anonymity set of a message leaving at round i includes the
senders who sent messages processed during previous rounds
Adding Probabilities
• Let us add the probability of that event having
occurred to each event
• Call this Anonymity Probability Distribution
• So {A,B,C,D} could become:
– {(A,¼), (B, ¼),(C, ¼),(D, ¼)}
– Or, {(A,0.5), (B,0.1),(C,0.1),(D,0.3)}
• The probability distribution you come up with will
depend on your observation, (+ knowledge,
computational power…)
Entropy
• Ok, what can we do with the probability
distribution afterwards?
• From information theory,  p log( p) is the
information content of a probability distribution
• Can use this for:
– Measuring anonymity
– Expressing new attacks (ones which do not modify the
set, but modify the distribution)
– Comparing effectiveness of attacks
Pool Mix Revisited
• Could not previously compare a pool mix with a
other mixes
• Now we can!
• Compute the entropy of the geometric distribution
• Pool mix with 100 inputs and 10 “feedbacks” is
equivalent to a standard mix with 140 inputs(!!!)
• But, average delay of a message going through a
pool mix is greater
• In the above example, 9% chance “of staying for
another round”
Mix Networks
• Can also compute the anonymity probability
distribution in mix networks
• Model and details in [Ser04]
A
B
Q
C
D
R
{(A,0.125),
(B,0.125),
(C,0.25),
(D,0.5)}
Impact of Low Latency and
Repeated Communication
-Packet Counting
-Intersection
Connection-based Anonymity
Systems
• A number of nodes
– Nodes do not mix, but do onion encryption
• Packets are forwarded along links
• All packets of a connection are forwarded via the
same sequence of nodes
“Classical” Network
P2P anonymity system
The Packet Counting Attack I
• Connection-based Anonymity Systems split the
data up into many fairly small packets <1K
• All packets of an anonymous connection travel
down the same path
• Thus, counting the packets may reveal which
connections go where
• Merely coarse-grained packet counting required
Packet Counting II
• Observe the mix for
time t and count packets
on each link
• Correlate incoming and
outgoing links
– 1075 and 1076
3056
2497
2748
2850
1804
1353
1076
1075
• Ok if:
– d (mix delay) << t
– t is much smaller than interval between new connections
starting
Packet Counting – Key
Observation
• Packet counting works if the whole
connection is lone
– i.e. if it is the only connection on all the links
(from the client to the server) it passes through
This case may be attackable, we consider it not to be
Packet Counting – Results
• Hence, we need 2 or more connections on as many
links as possible
• In our paper (ESORICS 2003) we define this
formally
• Then simulate, showing that
– E.g. 100 nodes, 100 connections via 2-4 nodes  92%
of connections are lone (p2p scenario)
– E.g. 20 nodes, 200 connections via 2-4 nodes  2.5%
of connections lone (classic network)
Repeated Communication
To M
Alice
Steves
B
M
Threshold
B+1
To N
As seen by the attacker
N
The Model
Simplification introduced by the
model
Alice
The Results (1000 rounds, B=10)
P(Estimate)
Receivers, r
Estimate of probability
of Alice sending to r
The Results
The Results
Conclusions
• Anonymity is a security property
– not just privacy
• Analysis of anonymity properties important
– Has been a neglected area
– Uses tools from other fields (graph theory, probability)
• Plenty of applications
– Identity management
– Electronic voting
– Anonymous email (whistle blowing)