On the Temporal Dynamics of Opinion Spamming: Case Studies on

On the Temporal Dynamics of
Opinion Spamming: Case
Studies on Yelp
Santosh K C and Arjun Mukherjee
Department of Computer Science,
University of Houston
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
Introduction
n 
Online business boom
n 
Online reviews dominance
n 
n 
Research on spam detection
n 
n 
Spammers
Linguistic, behavioral, latent variable, semisupervised, sockpuppet approaches
explored
Time series in spamming
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
2
Contribution
n 
n 
Spamming Policies
Causal Modeling of Deceptive Ratings
n 
n 
Predicting Truthful Popularity and Rating
n 
n 
n 
Predicting Deceptive Ratings
Future
Imminent
Spam Detection with Temporal Features
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
3
Labeled Review Dataset of
Online Reviews
n 
n 
Human tagged or solicited Amazon
Mechanical Turk not possible
Ground truth
n 
n 
Sting operations or spammer confession
Commercial website filtering fake/
deceptive reviews
n 
Yelp
n 
Reasonably reliable
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
4
Review on Yelp
Name
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
5
Yelp Dataset Statistics
n 
n 
70 restaurants from Chicago area
Each restaurant has 5 years data
Deceptive
Truthful
# of dislike
(1-3★) reviews
1630
10042
# of like (4-5★)
reviews
4465
30652
# of reviews
6095
40694
# of reviews
13.03%
86.97%
5359
21761
# of reviewers
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
6
Spamming Policies
n 
Normalized rating
n 
n 
n 
Time 0 = time of the first review
Time series clustering using K-Spectral
Centroid
n 
n 
1-5 star = 0-1
Invariant to scaling and translation
Promotion spamming dominant
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
7
Spamming Policy Types
a) Early
b) Mid
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
c) Late
8
Behavioral Modalities of
Spamming Policies
n 
n 
n 
n 
n 
n 
n 
n 
n 
n 
# of fake dislike reviews
Cumulative rating of fake dislike reviews
# of non-fake dislike reviews
Cumulative rating of non-fake dislike reviews
# of fake like reviews
Cumulative rating of fake like reviews
# of non-fake like reviews
Cumulative rating of non-fake like reviews
# of non-fake reviews
Cumulative rating of non-fake reviews
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
9
Causal Modeling of Deceptive
Ratings
n 
Time series comparison of truthful and
deceptive reviews
n 
n 
Truthful rating time series
n 
n 
n 
n 
Buffered and Reduced Spamming
Truthful Like
Truthful Dislike
Truthful Average
Prediction of deceptive rating
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
10
Buffered Spamming
n 
What happens when popularity weans?
Fig: Time series of truthful ratings (solid blue) vs deceptive like
rating (dashed red) for different representative restaurants.
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
11
Reduced Spamming
n 
Reduction of spam when there is a
better standing in the market
Fig: Time series of truthful ratings (solid blue) vs deceptive like
rating (dashed red) for different representative restaurants.
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
12
Cross-correlation Coefficient
against Deceptive Rating
n 
CCF at lag k estimates the relationship between a
response Y(t), and a covariate X(t) time-series at
different time-steps shifted by k time units
Fig: Average CCF Plot against Deceptive Rating. Red lines indicate
the CCF value and blue lines indicate the confidence interval
bounds obtained 99% confidence(p<0.01).
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
13
Deceptive Rating Prediction
n 
n 
Truthful ratings are harbingers
Vector Auto Regression (VAR) model
used to predict next week’s deceptive
like rating
n 
n 
Lags 1 and 2 used
Prediction for 10, 20 and 30 weeks
window sizes on spamming policies,
buffered and reduced spamming
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
14
Example of Deceptive Rating
Prediction
Fig: Dotted blue refers to p=1 week lag model prediction; red
dashed refers to p=2 week lag predictor. Solid green line
represents the actual rating.
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
15
Deceptive Rating Prediction
Results-Early Spamming
n 
Mean Absolute Error for three window
sizes and two lag models
15 weeks
Lag = 1
0.55
Lag = 2
0.42
30 weeks
0.17
0.16
45 weeks
0.11
0.11
Avg.
0.28
0.23
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
16
Deceptive Rating Prediction
Comparison in Spamming
n 
n 
Predicting early is difficult
Predicting reduced easier than buffered
Mean Absolute Error for lag 1 VAR model
Early
Mid
Late
Buffered
Reduced
0.28
0.15
0.12
0.21
0.12
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
17
Predicting Truthful Popularity
and Rating
n 
n 
n 
10 fold cross-validation Lasso
Regression model
Training:10 weeks, prediction: 6 months
Only truthful reviews used
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
18
Features for Lasso Model
n 
Feature families incrementally added
n 
n 
n 
n 
Non text features
Opinion lexicon positive/negative words
Word n-grams (n=1,2)
Domain specific aspect/sentiment lexicon
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
19
Regression Prediction Results Popularity
Mean Absolute Error for Lasso Popularity Prediction
Early
Mid
Late
Non-text
3.94
2.02
1.52
Non-text +
Opinion lexicon
3.88
2.01
1.49
Non-text +
Opinion lexicon
+N-gram
3.78
1.99
1.29
Non-text +
Opinion lexicon+
N-gram+
Domain lexicon
3.27
1.80
0.92
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
20
Regression Prediction Results Rating
Mean Absolute Error for Lasso rating Prediction
Early
Mid
Late
Non-text
0.47
0.38
0.16
Non-text +
Opinion lexicon
0.44
0.30
0.15
Non-text +
Opinion lexicon
+N-gram
0.36
0.29
0.14
Non-text +
Opinion lexicon+
N-gram+
Domain lexicon
0.30
0.28
0.13
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
21
Reliability of Yelp’s Filter
n 
Hypothetical regression oracle
n 
n 
Addition of fake reviews to training data should
increase error due to noise
Regression model used to test quality
n 
Statistically significant increase in MAE signifies
considerable noise
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
22
Deceptive Reviews in
Regression
Popularity Regression-Mean Absolute Error
Truthful only
Early
3.27
Mid
1.80
Late
0.92
All reviews
3.97
2.38
1.16
Rating Regression-Mean Absolute Error
Truthful only
Early
0.30
Mid
0.28
Late
0.13
All reviews
0.37
0.35
0.24
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
23
Imminent Future Prediction
n 
n 
n 
n 
n 
n 
Novel time unit: Mon/Tue, Wed/Thu and Fri/
Sat/Sun
Features: Time series with high CCF
Training time: 50, 100, 150 time-steps
VAR model with features as covariates
Used the truthful reviews for truthful
popularity/rating prediction
MAE order: early, mid and late
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
24
Imminent Prediction Example
Fig: Dotted blue refers to p=1 time-step lag VAR model; red
dashed refers to p=2 time-step lag VAR model. Solid green line
represents the actual rating.
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
25
Do reviews filtered by Yelp
affect imminent prediction?
n 
n 
Addition of fake reviews’ time series in
VAR model as exogenous variables
Deterioration of prediction
Rating Prediction MAE
Popularity Prediction MAE
Policy
Early
Lag
1
2
Mid
1
2
Late
Policy
Early
1
Lag
1
2
2
Mid
1
2
Late
1
2
Truthful .84 .79 .76 .65 .67 .62 Truthful .145 .137 .134 .130 .096 .091
All
n 
1.34 1.18 1.29 1.26 1.08 1.02 All
.317 .278 .277 .257 .243 .221
Validation of Yelp’s Filter
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
26
Temporal Dynamics for Spam
Detection
n 
State-of-the-art baselines
n 
n 
n 
n 
n 
N-grams (Ott et al., 2011)
Behavioral features (Mukherjee et al., 2013)
Yelp’s labeled time series features (TSF)
Linear kernel SVM with 5-fold CV
Combination of time series with Ngrams and behavioral
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
27
Classification with Time Series, Ngram and Behavioral Features
n 
Combing time series features with Ngrams and behavioral significantly
improves spam detection
Early Spamming SVM Results
Precision
Recall
F1-score
Accuracy
Ngrams (NG)
63.5
77.1
69.6
65.0
Behavior (BF)
82.1
85.3
83.7
84.4
Time Series (TSF)
65.2
92.7
76.5
73.1
NG+BF+TSF
84.9
94.8
89.6
89.0
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
28
Summary
n 
n 
n 
n 
n 
n 
Three dominant spamming policies
Significant correlation of deception rating with
truthful rating
Spam injection trends: buffered and reduced
Frameworks for future and imminent prediction
Yelp’s filter is reliable reasonably
Temporal features significantly improve spam
detection
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
29
Thank you!
WWW 2016, April 11-15, 2016, Montréal, Québec, Canada.
30