Towards Improved Sensitivity,
Specificity, and Timeliness of
Syndromic Surveillance
Systems
Anna L. Buczak, PhD,
Linda J. Moniz, PhD, Joseph Lombardo, MS
PHIN 2008, Session F7, August 27, 2008
This work was supported by the JHU/APL Internal
Research and Development (IR&D) Program
Outline
Motivation
Classical disease outbreak detection methods
Novel machine learning methodology for disease
outbreak detection
Results
Conclusions
Future directions
2
Motivation
Develop methodology for reliable detection of disease
outbreaks
Most of the existing methods use univariate statistics i.e.
look at each of the syndrome/ subsyndrome/ age/
gender, etc. combination separately
proliferation of false alarms
Our goal: develop multivariate models for detecting
abnormal relationships between time series
reduced false alarms
3
Approaches to Outbreak Detection
Two broad types of approaches:
Anomaly detection
Detectors flag any anomalous behavior
Statistical methods: CUSUM, C1, C2, C3, EWMA
Machine learning methods: clustering techniques, SVMs
Specific disease outbreak detection
Detectors are geared towards a specific disease
In depth knowledge about given disease manifestation
needed
Separate model needed for each disease
Methods: Bayesian networks, Markov Decision
Processes
This talk will concentrate on anomaly detection methods
4
Statistical disease outbreak detection
methods
They often determine whether the counts in a given syndrome/
subsyndrome time series are unusually high and thus worth investigating.
Statistical detection algorithms: C2, C3, EWMA
C2 & C3: 7 day baseline, 2 day guardband;
Individual day statistic for day j with lag n:
Sj,n = Max {0, ( Countj – [μn + σn] ) / σn}, where
μn is 7-day average with n-day lag ( so μ3 is mean of counts in [j-3, j-9] )
σn = standard deviation of same 7-day window
C2: C2 statistic for day k is Sk,3 (2-day lag)
Alerts if Individual day statistic exceeds threshold
C3 statistic for day k is Sk,3 + Sk-1,3 + Sk-2,3
Alerts when statistic for day k exceeds threshold
Day-9
Day-8
Day-7
Day-6
Day-5
Baseline C2 (-3 to -9days)
5
Baseline for C3 (-3 to -9 days)
Day-4
Day-3
Day-2
Day-1
Day 0
Current
Count
Sample Univariate Algorithm Output: C2
6
Accessible Alerting Algorithms: courtesy of Dr. Howard Burkom, JHU/APL
Sample Univariate Algorithm Output: C3
7
Accessible Alerting Algorithms: courtesy of Dr. Howard Burkom, JHU/APL
EWMA
Exponential Weighted Moving Average (EWMA): average
with most weight on recent count Xt
Y1 X 1
Yt X t (1 ) Yt 1
28-day baseline, 2 day guard band
Test Statistic:
Yt t
t
Often threshold = 3
8
2
>= threshold
Sample Univariate Algorithm Output:
EWMA
9
Accessible Alerting Algorithms: courtesy of Dr. Howard Burkom, JHU/APL
Machine Learning for Disease Outbreak
Detection
Approach:
Learn the model of regular activities
Use one-class Support Vector
Machine
Detect anomaly based on its
dissimilarity from regular activities
In SVMs a hyperplane in ndimensional space divides
data into two classes in
terms of the largest margin
Anomalous
Advantages:
Only normal behavior data needed
Detectors flag any anomalous
behavior
Capable of detecting anomalies for
new pathogens
No need for separate models for each
disease
Normal
10
Support Vector Machines (SVM)
SVM learning algorithm developed by Vapnik
based on statistical learning theory
Learning problem:
Find the best separating hyperplane
dividing two classes in terms of the largest
margin
To construct a classifier for a given data
set, SVM solves a quadratic programming
problem
To solve non-linear problems SVM employs
inner-product kernels such as: polynomial,
RBF, sigmoid, etc.
SVMs build the decision surface using only
those training examples that are near the
boundary region
Data points at the margin are called
support vectors
One-Class SVM used
Proposed by Schölkopf et al. (1999)
Only positive training examples are needed
11
SVM classifiers are based on
hyperplanes corresponding to
decision functions:
Class(x) sign(W x b)
Where w is the weight vector, b is the
threshold of the decision rule, x is the
classified pattern.
Details of the Approach
SVM Module
GI SVM
Fever SVM
Data
Streams
(3021)
Normal
EWMA
Smoothing
Resp SVM
3021
…
Combined Syn SVM
12
Anomalous
Data Sets
Training
ESSENCE data - no outbreaks (180 days)
Flu season weeks removed from training data
Testing (Recall)
ESSENCE data - no outbreaks (132 days that were not
used in training)
Simulated outbreaks added to real background data:
Tularemia
Hep A – Sets 1, 2 & 3
Real problem
Real background data with simulated outbreaks
13
3021 Data Streams
Viral Syndrome: 0-4
Total
Male
Femal
e
Age 04
Age 517
…
Alexa
ndria
Bot_Like
Fever
GI -GIMale
Hem_Ill
Loc_Les
Lymph
Number of data streams:
(11+148)*19 = 3021
Neuro
…
AbdominalCramps
AbdominalPain
Time series for: each
Syndrome & subsyndrome:
AbdominalPainGroup
total / gender / age group / Fever
AbdominalTenderness
All
county
Abscess
AcuteBloodAbnormalities
…
Unresponsive
UrinaryTract
ViralSyndrome
Vomiting
14
Wheezing
Vomiting - Female
…
Washi
ngton
15
All EWMA Alerts
Alerts at 1 sigma
8/14/2007
7/25/2007
7/5/2007
6/15/2007
5/26/2007
5/6/2007
4/16/2007
3/27/2007
All Counts
3/7/2007
2/15/2007
1/26/2007
1/6/2007
12/17/2006
11/27/2006
11/7/2006
10/18/2006
9/28/2006
Proliferation of
false alarms
8/14/2007
7/25/2007
7/5/2007
6/15/2007
5/26/2007
5/6/2007
4/16/2007
3/27/2007
3/7/2007
2/15/2007
1/26/2007
1/6/2007
12/17/2006
11/27/2006
11/7/2006
10/18/2006
9/28/2006
9/8/2006
8/19/2006
omega = 0.8, baseline = 28
days, guard band = 2 days
Alert when EWMA Test
Statistic >= 3
Average of alarms per day
on normal data: 28.1
9/8/2006
8/19/2006
Results: Univariate EWMA
6000
120
5000
100
4000
80
3000
60
2000
40
1000
20
0
0
All EWMA Alerts
120
100
80
60
40
20
0
Results
Polling Univariate EWMA Results
Initial SVM Results
omega = 0.8, baseline = 28 days, guard
band = 2 days
Results per day:
Only Fever SVM and GI SVM
trained
Results per day:
Specificity: 94.7%
Sensitivity: 54.8%
Specificity: 94.0% (th = 53)
Sensitivity: 29.0%
Specificity: 90.9% (th = 44)
Sensitivity: 41.9%
Specificity: 75.8% (th = 35)
Sensitivity : 48.4%
Specificity: 68.9% (th = 32 )
Sensitivity: 54.8%
Results per outbreak:
Specificity: 94.7%
Sensitivity: 100%
0.6
Similar Specificity ->
SVM has Sensitivity
better by 25.8%
0.5
Specificity: 94.0%
Sensitivity: 80%
0.4
Sensitivity
Results per outbreak:
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
1 - Specificity
EWMA1
16
EWMA2
EWMA3
EWMA4
SVM
Similar Sensitivity ->
SVM has Specificity
better by 25.8%
17
All Counts
EWMA Alerts at 1 sigma 2
SVM Alert2
7/17/2007
0.6
2000
All Added Counts
1000
0.2
0
0
EWMA Alerts at 1 sigma
7/2/2007
25000
6/27/2007
1
6/22/2007
30000
6/17/2007
5000
6/12/2007
1.2
6/7/2007
35000
6/2/2007
6000
5/28/2007
5/23/2007
5/18/2007
3000
7/12/2007
4000
7/7/2007
7/2/2007
6/27/2007
6/22/2007
6/17/2007
6/12/2007
6/7/2007
6/2/2007
5/28/2007
5/23/2007
5/18/2007
Results
50000
45000
40000
0.8
0.4
SVM A
Timeliness of Detection
PU EWMA
Specificity: 94.0 %
Avg number of days to detect
an outbreak: 2.25
2 outbreaks not detected at all
SVM
Specificity: 94.7%
Avg number of days to detect
an outbreak: 1.67
All outbreaks detected
Specificity: 68.9%
Avg number of days to detect
an outbreak: 1.8
1 outbreak not detected at all
SVM obtains better Timeliness of Detection,
higher Specificity for same Sensitivity, and
higher Sensitivity for same Specificity than PU EWMA
18
Conclusions
New approach for disease outbreak detection designed
Very promising initial results obtained for normal data, 4
simulated outbreaks, one real problem
Favorable comparison of initial SVM results to PU EWMA
results
19
Future Directions
Proof-of-concept:
Remaining SVMs to be trained
More sophisticated decision fusion module
Testing on many different types of simulated outbreaks
(varying amplitude, day of injection)
Testing on real Influenza outbreaks
Explanation capability for the SVM system:
Capability for users to drill down to find the reason for the
alert
20
Contact info:
Dr. Anna L. Buczak
National Security Technology Department
Johns Hopkins University Applied Physics Laboratory
tel. 443-778-9350
e-mail: [email protected]
21
© Copyright 2025 Paperzz