投影片 1

Peeping Tom in the Neighborhood
Keystroke Eavesdropping on Multi-User Systems
USENIX 2009
Kehuan Zhang, Indiana University, Bloomington
XiaoFeng Wang, Indiana University, Bloomington
Agenda
Overview
Assumption
Implementation
Experiment
Conclusion
2
Overview
For some command such as ps or top, they
need some information about the process
The virtual file system procfs, which discloses
such information, locates at /proc/<pid>/stat
Our attack take advantage of the stack
information of a process to infer keystrokes
• Specially ESP、EIP
3
Overview (cont.)
 For some command such as ps or top, they need
some information about the process
 The virtual file system procfs, which discloses
such information, locates at /proc/<pid>/stat
 Our attack take advantage of the stack
information of a process to infer keystrokes
• Specially ESP、EIP
Fig. 1: The sketch of keystroke extraction and recognition
4
Assumption
Capability to execute program
Multi-core system
Access to the victim’s information
Attacker can obtain some victim’s typing
sample as training data
5
Implementation
Pattern extraction
Trace logging
Get inter-timing
Keystroke analysis
Fig. 1: The sketch of keystroke extraction and recognition
6
Implementation
Pattern extraction
Trace logging
Get inter-timing
Keystroke analysis
Fig. 2: Steps about keystroke pattern extraction
7
Implementation (cont.)
Pattern extraction
Trace logging
Get inter-timing
Keystroke analysis
Fig. 3: Steps about trace logging and getting inter-timing
8
Implementation (cont.)
Pattern extraction
Trace logging
Get inter-timing
Keystroke analysis
Fig. 4: Steps about keystroke analysis
9
Pattern extraction
Deterministic program
• Same input cause the same output, such as vim
• Use strace to get all system call sequences, then
extract the difference
• False positive check
Non-deterministic program
• Same input could cause different outputs, almost
all GUI programs are non-deterministic
• An instruction level analysis tool to the function
gtk_main_do_event(event) to get it’s event
10
Trace logging
Fig. 3: Steps about trace logging and getting inter-timing
 Attacker’s shadow program keep monitor on
/proc/<pid>/stat
• That’s why we need multi-core system
• However, the log won’t be complete
 Avoid detection
• Decrease the sample rate
• Hide CPU usage
11
Get inter-timing
Use Longest Common Subsequence (LCS)
algorithm to compare log with pattern
• Ignore ASLR by normalize ESP pattern
Use a time duration to get only consecutive
keystroke pattern
Fig. 5: Pattern matching
Fig. 6: Using time duration
12
Keystroke analysis
Fig. 4: Steps about keystroke analysis
 Now, we have got inter-timing sequences
 We use Hidden Markov Model (HMM) to guess
what victim input and list 4500 candidates
• N-Viterbi algorithm: use conditional probability
• Average all probabilities
• M-N-Viterbi algorithm: use conditional probability
13
Experiment
Environment
• Intel Core 2 Duo E6700, 3GB RAM
• Red Hat Linux Enterprise 4.0, Debian 4.0, and
Ubuntu 8.04
Evaluation on three public server
• A Linux workstation in a public machine room
(Server 1)
• A web server of Indiana University that allows SSH
connections from its users (Server 2)
• A server for students’ course projects (Server 3)
• 72-hour monitoring on these servers that user
number range from 1 to 24
14
Experiment (cont.)
Fig. 10: Percentage of keystroke
detected versus CPU usage
Fig. 11: CPU usage of three real
world server during 72 hours
15
Experiment (cont.)
Speculating passwords
• Training: 15 training keys, each has 13 letters and
2 digits, totally 225 key pairs. We detect 45 intertimings for each of these pairs from a user
• Evaluation: select 3 passwords from the space of
all possible 8-bytes sequences formed by 15
characters. Our HMM output 4500 candidates
16
Experiment (cont.)
 Speculating passwords
• Training: 15 training keys, each has 13 letters and 2
digits, totally 225 key pairs. We detect 45 inter-timings
for each of these pairs from a user
• Evaluation: select 3 passwords from the space of all
possible 8-bytes sequences formed by 15 characters.
Our HMM output 4500 candidates
Fig. 7: Percentage of space to search before find the right password
17
Experiment (cont.)
Guess English words
• Training: use the word frequency of British
national corpus to compute transition
probabilities
• Evaluation: random draw a word from 2103
known words with length 3 to 5, then type them
Fig. 8: Time distribution of letter pairs
18
Experiment (cont.)
Guess English words
• Training: use the word frequency of British
national corpus to compute transition
probabilities
• Evaluation: random draw a word from 2103
known words with length 3 to 5, then type them
Fig. 8: Time distribution of letter pairs
Fig. 9: Success rate on English word
19
Conclusion
Information leak: one can get others’
keystrokes without any special permission
Trade-off between convenience and security
Contribute for keystrokes detection and
extraction method on almost all distributions
of Linux
20
Future work
More precise detection method for nondeterministic programs
Way to detect keystrokes when system calls
are not immediately triggered by keystrokes
Better algorithm to identify English words
Utilize more information to infer other events,
such as mouse moving
21
The End