Social Networks and
Surveillance: Evaluating
Suspicion by Association
Ryan P. Layfield
Dr. Bhavani Thuraisingham
Dr. Latifur Khan
Dr. Murat Kantarcioglu
The University of Texas at Dallas
{layfield, bxt043000, lkhan, muratk}@utdallas.edu
Overview
Introduction
►Our Goal
►System Design
►Social Networks
►Threat Detection
►Correlation Analysis
The Experiment
►Setup
►Current Results
►Issues
►Future Work
Introduction
Automated message surveillance is
essential to communication monitoring
►Widespread use of electronic
communication
►Exponential data growth
►Impossible to sift through all ‘by hand’
Going beyond basic surveillance
►Identifying groups rather than individuals
►Monitoring conversations rather than
messages
Our Goal
Design new techniques and apply
existing algorithms to…
►Create a machine-understandable
model of existing social networks
►Identify abnormal conversations and
behavior
►Monitor a given communications
system in real-time
►Continuously learn and adapt to a
dynamic environment
System Design
Three major components:
►Social Network Modeler
►Initial Activity Detector
►Correlated Activity Investigator
Social Networks
Individuals engaged in suspicious or
undesirable behavior rarely act alone
We can infer than those associated with
a person positively identified as
suspicious have a high probability of
being either:
►Accomplices (participants in suspicious
activity)
►Witnesses (observers of suspicious activity)
Making these assumptions, we create a
context of association between users of a
communication network
Social Networks
Within our model:
► Every node is a unique user
► Every message creates or strengthens a link between
nodes
Over time, the network changes
► Frequent communication leads to stronger links
► Intermittent messaging implies weakening social ties
The strength of the link implies how strong an
association between individuals is
From this data, we can theoretically identify
► Hubs
► Groups
► Liaisons
Social Networks
Threat Detection
Every message sent is scrutinized in the
interest of identifying suspicious
communication
►Keywords analysis
►Prior context (i.e. previous message content)
When a detection algorithm yields a
strong result, a token is created
►The token is created at the origin and passed
to the recipient(s)
►Existing tokens, if any, are cloned instead
The result is a web that potentially
reflects the dissemination of suspicious
information activity
Correlation Analysis
Future messages with similar suspicious
topics are not always identifiable with
the same ‘initial’ techniques
►Quick replies
►Pronoun use
►Assumption that recipient is aware of topic
If a token is present at the sender when a
message is sent:
►Message token is associated with and new
message are analyzed
►If analysis yields a strong match, the token
is further cloned and passed to recipient
The Experiment
A rare set of words shared between two or more
messages are candidates for keyword analysis, but they
are not always easily sifted from ‘noise’
Noise within text-based messages comes in a variety of
forms
► Misspelled words
► Unusual word choice
► Incompatible variations of the same language (i.e. British
vs. American English)
► Unexpected language
However, we do not want to eliminate potential keywords
► Document names
► Terminology specific to a subject
► ‘Buzz’ words
The Experiment
We proposed an experiment that
attempts to eliminate false positives
due to noisy data while
strengthening and expanding our
correlation techniques
Setup
Tools
► Running word ‘rank’ database
► Implementation of word set theory infrastructure
► JAMA Matrix Library
Singular Value Decomposition
Our Approach
► Apply SVD noise filtering based on 100 messages
► Analyze word frequency correlation between current
message and prior suspicious messages
► Generate a score based on the results
Setup
Construct a matrix based on the
last 100 messages
c ji count ( wi , m j )
W M 1 M 2 ... M t
wi W
messages
words
More common
Less common
Setup
Decompose and rebuild
A
U
Eliminate ‘weak’
singular values
VT
Setup
Pulled from messages j and k
score( wi )
‘Raw’ total score
for word wi
Counts only
intersection of words
count (wi , m j ) count (wi , mk )
rank (wi )
Pulled from ‘running’ word database
score(w )
wi W j Wk
i
Predefined fixed
threshold
Current Results
Method is not currently accurate
Large fluctuations
►Correlation easily swayed by plethora
of common words
►Uncommon words not given enough
weight
Current Results
Accuracy of Results over 900 Messages
3%
12%
26%
True Positives
False Positives
True Negatives
False Negatives
59%
1000 messages evaluated, first 100 used to seed word ranks.
Issues
Word frequencies fluctuate wildly
during beginning of experiment
(0.0 – 10.0+)
Extreme cost for current
construction methods and
computation
Filtering context limited to recent
global history
Affected by large bodies of text
Future Work
Tap potential of existing matrix for
further analysis
Adaptive filtering feedback algorithms
Speed improvements to accommodate
real-time streams
Flexible communication platform
monitoring
Addition of pipe architecture for
modular threat detection and correlation
© Copyright 2026 Paperzz