A Brief Introduction to Game Theory

Social Networks and
Surveillance: Evaluating
Suspicion by Association
Ryan P. Layfield
Dr. Bhavani Thuraisingham
Dr. Latifur Khan
Dr. Murat Kantarcioglu
The University of Texas at Dallas
{layfield, bxt043000, lkhan, muratk}@utdallas.edu
Overview
Introduction
►Our Goal
►System Design
►Social Networks
►Threat Detection
►Correlation Analysis
The Experiment
►Setup
►Current Results
►Issues
►Future Work
Introduction
Automated message surveillance is
essential to communication monitoring
►Widespread use of electronic
communication
►Exponential data growth
►Impossible to sift through all ‘by hand’
Going beyond basic surveillance
►Identifying groups rather than individuals
►Monitoring conversations rather than
messages
Our Goal
Design new techniques and apply
existing algorithms to…
►Create a machine-understandable
model of existing social networks
►Identify abnormal conversations and
behavior
►Monitor a given communications
system in real-time
►Continuously learn and adapt to a
dynamic environment
System Design
Three major components:
►Social Network Modeler
►Initial Activity Detector
►Correlated Activity Investigator
Social Networks
Individuals engaged in suspicious or
undesirable behavior rarely act alone
We can infer than those associated with
a person positively identified as
suspicious have a high probability of
being either:
►Accomplices (participants in suspicious
activity)
►Witnesses (observers of suspicious activity)
Making these assumptions, we create a
context of association between users of a
communication network
Social Networks
Within our model:
► Every node is a unique user
► Every message creates or strengthens a link between
nodes
Over time, the network changes
► Frequent communication leads to stronger links
► Intermittent messaging implies weakening social ties
The strength of the link implies how strong an
association between individuals is
From this data, we can theoretically identify
► Hubs
► Groups
► Liaisons
Social Networks
Threat Detection
Every message sent is scrutinized in the
interest of identifying suspicious
communication
►Keywords analysis
►Prior context (i.e. previous message content)
When a detection algorithm yields a
strong result, a token is created
►The token is created at the origin and passed
to the recipient(s)
►Existing tokens, if any, are cloned instead
The result is a web that potentially
reflects the dissemination of suspicious
information activity
Correlation Analysis
Future messages with similar suspicious
topics are not always identifiable with
the same ‘initial’ techniques
►Quick replies
►Pronoun use
►Assumption that recipient is aware of topic
If a token is present at the sender when a
message is sent:
►Message token is associated with and new
message are analyzed
►If analysis yields a strong match, the token
is further cloned and passed to recipient
The Experiment
 A rare set of words shared between two or more
messages are candidates for keyword analysis, but they
are not always easily sifted from ‘noise’
 Noise within text-based messages comes in a variety of
forms
► Misspelled words
► Unusual word choice
► Incompatible variations of the same language (i.e. British
vs. American English)
► Unexpected language
 However, we do not want to eliminate potential keywords
► Document names
► Terminology specific to a subject
► ‘Buzz’ words
The Experiment
We proposed an experiment that
attempts to eliminate false positives
due to noisy data while
strengthening and expanding our
correlation techniques
Setup
Tools
► Running word ‘rank’ database
► Implementation of word set theory infrastructure
► JAMA Matrix Library
 Singular Value Decomposition
Our Approach
► Apply SVD noise filtering based on 100 messages
► Analyze word frequency correlation between current
message and prior suspicious messages
► Generate a score based on the results
Setup
Construct a matrix based on the
last 100 messages
c ji  count ( wi , m j )
W  M 1  M 2 ...  M t
wi  W
messages
words
More common
Less common
Setup
Decompose and rebuild
A
U

Eliminate ‘weak’
singular values
VT
Setup
Pulled from messages j and k
score( wi ) 
‘Raw’ total score
for word wi
Counts only
intersection of words
count (wi , m j ) count (wi , mk )
rank (wi )
Pulled from ‘running’ word database
 score(w )  
wi W j Wk
i
Predefined fixed
threshold
Current Results
Method is not currently accurate
Large fluctuations
►Correlation easily swayed by plethora
of common words
►Uncommon words not given enough
weight
Current Results
Accuracy of Results over 900 Messages
3%
12%
26%
True Positives
False Positives
True Negatives
False Negatives
59%
1000 messages evaluated, first 100 used to seed word ranks.
Issues
Word frequencies fluctuate wildly
during beginning of experiment
(0.0 – 10.0+)
Extreme cost for current
construction methods and
computation
Filtering context limited to recent
global history
Affected by large bodies of text
Future Work
Tap potential of existing matrix for
further analysis
Adaptive filtering feedback algorithms
Speed improvements to accommodate
real-time streams
Flexible communication platform
monitoring
Addition of pipe architecture for
modular threat detection and correlation