On the Temporal Dynamics of Opinion Spamming: Case Studies on Yelp Santosh K C and Arjun Mukherjee Department of Computer Science, University of Houston WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. Introduction n Online business boom n Online reviews dominance n n Research on spam detection n n Spammers Linguistic, behavioral, latent variable, semisupervised, sockpuppet approaches explored Time series in spamming WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 2 Contribution n n Spamming Policies Causal Modeling of Deceptive Ratings n n Predicting Truthful Popularity and Rating n n n Predicting Deceptive Ratings Future Imminent Spam Detection with Temporal Features WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 3 Labeled Review Dataset of Online Reviews n n Human tagged or solicited Amazon Mechanical Turk not possible Ground truth n n Sting operations or spammer confession Commercial website filtering fake/ deceptive reviews n Yelp n Reasonably reliable WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 4 Review on Yelp Name WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 5 Yelp Dataset Statistics n n 70 restaurants from Chicago area Each restaurant has 5 years data Deceptive Truthful # of dislike (1-3★) reviews 1630 10042 # of like (4-5★) reviews 4465 30652 # of reviews 6095 40694 # of reviews 13.03% 86.97% 5359 21761 # of reviewers WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 6 Spamming Policies n Normalized rating n n n Time 0 = time of the first review Time series clustering using K-Spectral Centroid n n 1-5 star = 0-1 Invariant to scaling and translation Promotion spamming dominant WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 7 Spamming Policy Types a) Early b) Mid WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. c) Late 8 Behavioral Modalities of Spamming Policies n n n n n n n n n n # of fake dislike reviews Cumulative rating of fake dislike reviews # of non-fake dislike reviews Cumulative rating of non-fake dislike reviews # of fake like reviews Cumulative rating of fake like reviews # of non-fake like reviews Cumulative rating of non-fake like reviews # of non-fake reviews Cumulative rating of non-fake reviews WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 9 Causal Modeling of Deceptive Ratings n Time series comparison of truthful and deceptive reviews n n Truthful rating time series n n n n Buffered and Reduced Spamming Truthful Like Truthful Dislike Truthful Average Prediction of deceptive rating WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 10 Buffered Spamming n What happens when popularity weans? Fig: Time series of truthful ratings (solid blue) vs deceptive like rating (dashed red) for different representative restaurants. WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 11 Reduced Spamming n Reduction of spam when there is a better standing in the market Fig: Time series of truthful ratings (solid blue) vs deceptive like rating (dashed red) for different representative restaurants. WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 12 Cross-correlation Coefficient against Deceptive Rating n CCF at lag k estimates the relationship between a response Y(t), and a covariate X(t) time-series at different time-steps shifted by k time units Fig: Average CCF Plot against Deceptive Rating. Red lines indicate the CCF value and blue lines indicate the confidence interval bounds obtained 99% confidence(p<0.01). WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 13 Deceptive Rating Prediction n n Truthful ratings are harbingers Vector Auto Regression (VAR) model used to predict next week’s deceptive like rating n n Lags 1 and 2 used Prediction for 10, 20 and 30 weeks window sizes on spamming policies, buffered and reduced spamming WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 14 Example of Deceptive Rating Prediction Fig: Dotted blue refers to p=1 week lag model prediction; red dashed refers to p=2 week lag predictor. Solid green line represents the actual rating. WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 15 Deceptive Rating Prediction Results-Early Spamming n Mean Absolute Error for three window sizes and two lag models 15 weeks Lag = 1 0.55 Lag = 2 0.42 30 weeks 0.17 0.16 45 weeks 0.11 0.11 Avg. 0.28 0.23 WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 16 Deceptive Rating Prediction Comparison in Spamming n n Predicting early is difficult Predicting reduced easier than buffered Mean Absolute Error for lag 1 VAR model Early Mid Late Buffered Reduced 0.28 0.15 0.12 0.21 0.12 WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 17 Predicting Truthful Popularity and Rating n n n 10 fold cross-validation Lasso Regression model Training:10 weeks, prediction: 6 months Only truthful reviews used WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 18 Features for Lasso Model n Feature families incrementally added n n n n Non text features Opinion lexicon positive/negative words Word n-grams (n=1,2) Domain specific aspect/sentiment lexicon WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 19 Regression Prediction Results Popularity Mean Absolute Error for Lasso Popularity Prediction Early Mid Late Non-text 3.94 2.02 1.52 Non-text + Opinion lexicon 3.88 2.01 1.49 Non-text + Opinion lexicon +N-gram 3.78 1.99 1.29 Non-text + Opinion lexicon+ N-gram+ Domain lexicon 3.27 1.80 0.92 WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 20 Regression Prediction Results Rating Mean Absolute Error for Lasso rating Prediction Early Mid Late Non-text 0.47 0.38 0.16 Non-text + Opinion lexicon 0.44 0.30 0.15 Non-text + Opinion lexicon +N-gram 0.36 0.29 0.14 Non-text + Opinion lexicon+ N-gram+ Domain lexicon 0.30 0.28 0.13 WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 21 Reliability of Yelp’s Filter n Hypothetical regression oracle n n Addition of fake reviews to training data should increase error due to noise Regression model used to test quality n Statistically significant increase in MAE signifies considerable noise WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 22 Deceptive Reviews in Regression Popularity Regression-Mean Absolute Error Truthful only Early 3.27 Mid 1.80 Late 0.92 All reviews 3.97 2.38 1.16 Rating Regression-Mean Absolute Error Truthful only Early 0.30 Mid 0.28 Late 0.13 All reviews 0.37 0.35 0.24 WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 23 Imminent Future Prediction n n n n n n Novel time unit: Mon/Tue, Wed/Thu and Fri/ Sat/Sun Features: Time series with high CCF Training time: 50, 100, 150 time-steps VAR model with features as covariates Used the truthful reviews for truthful popularity/rating prediction MAE order: early, mid and late WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 24 Imminent Prediction Example Fig: Dotted blue refers to p=1 time-step lag VAR model; red dashed refers to p=2 time-step lag VAR model. Solid green line represents the actual rating. WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 25 Do reviews filtered by Yelp affect imminent prediction? n n Addition of fake reviews’ time series in VAR model as exogenous variables Deterioration of prediction Rating Prediction MAE Popularity Prediction MAE Policy Early Lag 1 2 Mid 1 2 Late Policy Early 1 Lag 1 2 2 Mid 1 2 Late 1 2 Truthful .84 .79 .76 .65 .67 .62 Truthful .145 .137 .134 .130 .096 .091 All n 1.34 1.18 1.29 1.26 1.08 1.02 All .317 .278 .277 .257 .243 .221 Validation of Yelp’s Filter WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 26 Temporal Dynamics for Spam Detection n State-of-the-art baselines n n n n n N-grams (Ott et al., 2011) Behavioral features (Mukherjee et al., 2013) Yelp’s labeled time series features (TSF) Linear kernel SVM with 5-fold CV Combination of time series with Ngrams and behavioral WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 27 Classification with Time Series, Ngram and Behavioral Features n Combing time series features with Ngrams and behavioral significantly improves spam detection Early Spamming SVM Results Precision Recall F1-score Accuracy Ngrams (NG) 63.5 77.1 69.6 65.0 Behavior (BF) 82.1 85.3 83.7 84.4 Time Series (TSF) 65.2 92.7 76.5 73.1 NG+BF+TSF 84.9 94.8 89.6 89.0 WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 28 Summary n n n n n n Three dominant spamming policies Significant correlation of deception rating with truthful rating Spam injection trends: buffered and reduced Frameworks for future and imminent prediction Yelp’s filter is reliable reasonably Temporal features significantly improve spam detection WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 29 Thank you! WWW 2016, April 11-15, 2016, Montréal, Québec, Canada. 30
© Copyright 2026 Paperzz