Simrank++: Query Rewriting through link analysis of the click graph Ioannis Antonellis [email protected] Hector Garcia-Molina [email protected] Chi-Chao Chang [email protected] Sponsored Search Model Advertisers Queries Bids Stanford Infolab 2 Auction Model Ads query Relevance Stanford Infolab Bid amount 3 Motivating Example addicting games No ads! www.addictinggames.com Stanford Infolab 4 Motivating Example free online games Stanford Infolab 5 Modified Sponsored Search Model • Advertisers bid on queries • For each query – Search engine runs an auction – ad relevance and bid amount – Top 5-10 ads get displayed along with regular search results • Extra: Advertisers are charged a default amount in cases where their ads are being displayed for queries they didn’t bid on Stanford Infolab 6 Outline • • • • • • • Sponsored Search Model Motivating Example Query Rewriting using the click graph Simrank Evidence-based Simrank Weighted Simrank Experiments Stanford Infolab 7 Sponsored search system Sponsored Search System q History Ads Stanford Infolab ads Bids 8 Query Rewriting q Front End ads q, rewrites for q History Back End Ads Stanford Infolab Bids 9 Click Graph from sponsored search Queries Ads Similar Queries pc camera 10 Hp.com Digital camera 20 30 5 Digital camera camera pc camera pc Digital camera tv camera tv Digital camera pc tv Bestbuy.com 7 15 tv Teleflora.com 16 flower 15 Orchids.com Clicks Stanford Infolab 10 Simrank [JW 2003] • Intuition: – “Two queries are similar if they are connected to similar ads” – “Two ads are similar if they are connected to similar queries” • Iterative procedure: at each iteration similarity propagates in the graph Stanford Infolab 11 Simrank [JW 2003] • • • • N(q): # of ads connected to q E(q): set of ads connected to q simk(q,q’): q-q’ similarity at k-th iteration Initially sim(q,q) = 1, sim(q,q’) = 0, sim(a,a) = 1, sim(a,a’) = 0 C s ( q, q ' ) s (i, j ) N (q) N (q' ) k sk ( a, a ' ) iE ( q ) jE ( q ') k 1 C sk 1 (i, j ) N (a) N (a' ) iE ( a ) jE ( a ') • Time: O(n4) Stanford Infolab 12 Simrank Queries Ads sk ( q, q ' ) pc C sk 1 (i, j ) N (q) N (q' ) iE ( q ) jE ( q ') Hp.com sk ( a, a ' ) camera Digital camera C sk 1 (i, j ) N (a) N (a' ) iE ( a ) jE ( a ') Bestbuy.com Two random surfers model tv Teleflora.com flower Orchids.com Clicks Stanford Infolab 13 Simrank in matrix notation • Input: transition matrix P, decay factor C, number of iterations k • Output: similarity matrix S Worst case running time: • For i = 1:k, do – temp = C P – S = temp + I – Diag(diag(temp)) PT S O(n3), see also next talk • end Stanford Infolab 14 Simrank pc 1st Iteration camera digital camera tv pc 1 camera 0.0889 1 digital camera 0.0889 0.1778 1 tv 0 0.0889 0.0889 1 flower 0 0 0 0 flower 1 C = 0.8 pc Hp.com sk ( q, q ' ) sk ( a, a ' ) C sk 1 (i, j) N (q) N (q' ) iE ( q ) jE ( q ') C sk 1 (i, j ) N (a) N (a' ) iE ( a ) jE ( a ') camera Digital camera bestbuy.com tv teleflora.com flower orchids.com Stanford Infolab 15 Simrank pc 2nd Iteration camera digital camera tv pc 1 camera 0.1244 1 digital camera 0.1244 0.2489 1 tv 0.0356 0.1244 0.1244 1 flower 0 0 0 0 flower 1 C = 0.8 pc Hp.com sk ( q, q ' ) sk ( a, a ' ) C sk 1 (i, j) N (q) N (q' ) iE ( q ) jE ( q ') C sk 1 (i, j ) N (a) N (a' ) iE ( a ) jE ( a ') camera Digital camera bestbuy.com tv teleflora.com flower orchids.com Stanford Infolab 16 Simrank pc 12th Iteration camera digital camera tv pc 1 camera 0.1650 1 digital camera 0.1650 0.33 1 tv 0.0761 0.1650 0.1650 1 flower 0 0 0 0 flower 1 C = 0.8 pc Hp.com sk ( q, q ' ) sk ( a, a ' ) C sk 1 (i, j) N (q) N (q' ) iE ( q ) jE ( q ') C sk 1 (i, j ) N (a) N (a' ) iE ( a ) jE ( a ') camera Digital camera bestbuy.com tv teleflora.com flower orchids.com Stanford Infolab 17 Outline • • • • • • • Sponsored Search Model Motivating Example Query Rewriting using the click graph Simrank Evidence-based Simrank Weighted Simrank Evaluation Stanford Infolab 18 Evidence-based Simrank • Problem: Simrank scores in complete bipartite graphs are counter-intuitive • See Theorems in paper, here examples for intuition Evidence-based Simrank iteration Camera – digital camera Pc camera 1 0.4 0.3 0.8 0.4 2 0.56 0.42 0.8 0.4 3 0.624 0.468 0.8 0.4 4 0.6496 0.4872 0.8 0.4 5 0.65984 0.49488 0.8 0.4 6 0.663933 0.497952 0.8 0.4 pc Hp.com camera camera Hp.com Digital camera evidence(q, q' ) Bestbuy.com E ( q ) E ( q ') i 1 1 i C = 0.8 2 k simevidence (q, q' ) evidence(q, q' ) sim k (q, q' ) Stanford Infolab 20 Evidence-based Simrank iteration Camera – digital camera Pc camera 1 0.3 0.4 2 0.42 0.4 3 0.468 0.4 4 0.4872 0.4 5 0.49488 0.4 6 0.497952 0.4 pc Hp.com camera camera Hp.com Digital camera evidence(q, q' ) Bestbuy.com E ( q ) E ( q ') i 1 1 i C = 0.8 2 k simevidence (q, q' ) evidence(q, q' ) sim k (q, q' ) Stanford Infolab 21 Outline • • • • • • • Sponsored Search Model Motivating Example Query Rewriting using the click graph Simrank Evidence-based Simrank Weighted Simrank Evaluation Stanford Infolab 22 Weighted Simrank flower 1000 Teleflora.com orchids 1000 flower 1000 orchids 1 Teleflora.com Variance on weights matters Stanford Infolab 23 Weighted Simrank flower 1000 Teleflora.com orchids 1000 flower 1 orchids 1 Teleflora.com Absolute value of weights matters Stanford Infolab 24 Weighted Simrank p (a, i ) spread (i ) normalized _ weight (a, i ), i E (a) p ( a, a ) 1 p ( a, i ) iE ( a ) spread (i ) 1 var iance (i ) normalized _ weight (a, i ) w(a, i ) w(a, j ) pc Hp.com camera jE ( a ) Digital camera bestbuy.com tv teleflora.com flower orchids.com Stanford Infolab 25 Simrank++ • Input: transition matrix P’, evidence matrix V, decay factor C, number of iterations k • Output: similarity matrix S’ • For i = 1:k, do – temp = C P’T S’ P’ – S’ = temp + I – Diag(diag(temp)) • End • S’ = V.*S’ Stanford Infolab 26 Outline • • • • • • • Sponsored Search Model Motivating Example Query Rewriting using the click graph Simrank Evidence-based Simrank Weighted Simrank Evaluation Stanford Infolab 27 Evaluation • Dataset: – 2 weeks Yahoo! click graph, 15 million queries, 14 million ads, 28 million edges – Extracted largest connected component and further decomposed it into 5 subgraphs (details in the paper) – Edge weights: adjusted clicks over impressions rate (to account for position bias) • Evaluation set: – 120 queries sampled from search engine traffic Stanford Infolab 28 Evaluation • Comparison with: – Pearson similarity sim (q, q' ) (w(q, a) w )( w(q' , a) w aE ( q ) E ( q ') q q' ) 2 2 ( w ( q , a ) w ) ( w ( q ' , a ) w ) q q' aE ( q ) E ( q ') – Jaccard similarity sim (q, q ' ) E (q) E (q' ) E (q) E (q' ) – cosine similarity Stanford Infolab 29 Metrics – Precision/recall (manual evaluation) • Precision(q) = relevant rewrites of q / number of rewrites for q (among all methods) • Recall(q) = relevant rewrites of q / number of relevant rewrites for q (among all methods) – Query coverage • Number of queries for which the method gives at least one rewrite – Query rewriting depth • Total number of rewrites for a given query Stanford Infolab 30 Evaluation Stanford Infolab 31 Evaluation Stanford Infolab 32 Evaluation Stanford Infolab 33 Evaluation Stanford Infolab 34 Conclusions/Open issues • Proposed use of Simrank for query rewriting • Two extensions: evidence-based, weighted • Simrank++ overall best method • • • • Ad Selection models Blend with semantic text-similarity methods Incremental computation of Simrank++ values Applications to recommendation systems Stanford Infolab 35 Thank You! http://infoblog.stanford.edu Stanford Infolab 36
© Copyright 2026 Paperzz