Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM 2011 1 Query: “pia workshop” Relevant result 2 Outline Approaches to personalization The proposed personalization strategy Evaluation metrics Results Conclusions and Future work 3 Approaches to Personalization Observed user interactions Short-term interests Sriram et al. [24] and [6], session data is too sparse to personalize Longer-term interests [23, 16]: model users by classifying previously visited Web pages promote URLs Joachims [11]: user click-through data to learn a search function PClink [7] and Teevan et al. [28] rich user profile Other related approaches: [20, 25, 26] Representing the user Teevan et al. [28], rich keyword-based representations, no use of web page characteristics Commercial personalization systems Google Yahoo! 4 Personalization Strategy Weighting Title Unigrams WordNet Dictionary Filtering Metadata description Unigrams Browsing History Full text Unigrams User Profile Terms Google N-Gram Filtering TF Weighting User Profile Terms TFxIDF Weighting BM25 Weighting Metadata keywords No Filtering Extracted Terms Noun phrases User Profile Terms and Weights Data Extraction Filtering Visited URLs + number of visits Previous searches &click-through data User Profile Generation Workflow 5 Personalized Search Firefox add-on: AlterEgo Browsing History dog cat india mit search amherst vegas 1 10 2 4 93 12 1 6 Personalized Search Data extraction dog cat india mit search amherst vegas 1 10 2 4 93 12 1 forest hiking dog cat walking baby User monkey gorpProfile csail mit infant baby Terms banana artificial child boy infant food research girl child boy web robot girl search retrieval ir hunt 7 Personalized Search Term weighting 6.0 1.6 dog cat india mit search amherst vegas 1 10 2 4 93 12 1 web search retrieval ir hunt 2.7 0.2 0.2 1.3 1.3 8 Term Weighting TF: term frequency wTF(ti) cow search cow ir hunt dog TF-IDF: TF 2 100 = 0.02 1 wTF(ti)= log(DF ) * wTF(ti) ti forest cow dog cat walking baby monkey gorp csail mit infant baby banana artificial child boy infant food research cow child boy cow robot girl search cow ir hunt dog TF-IDF 1 log(103/107) * 2 100 = 0.08 9 Term Weighting Personalized BM25 World N 0.3 0.7 0.1 0.23 0.6 0.6 0.1 0.7 0.001 0.23 0.6 ni wpBM25(ti)=log 0.1 0.05 0.5 0.35 0.3 0.002 0.7 0.1 0.01 0.6 0.1 0.7 0.001 0.23 0.6 ri (rti+0.5)(N-nti+0.5) (nti+0.5)(R-rti+0.5) R 0.2 0.8 0.1 0.001 0.3 0.4 10 Re-ranking Use the user profile to re-rank top results returned by a search engine Candidate document vs. snippets Snippets are more effective. Teevan et al. [28] Allow straightforward personalization implementation Matching For each term occurs both in snippet and user profile, its weight will be added to the snippet’s score Unique matching Scoring methods Counts each unique term once Language model Language model for user profile, weights for terms are used as frequency counts PClink Dou et al. [7] 11 Evaluation Metrics Relevance judgements rel 10 2 i-1 1 NDCG@10 = Σ i=1 log (1+i) 2 Z Side-by-side Two alternative rankings side-by-side, ask users to vote for best Clickthrough-based Look at the query and click logs from large search engine Interleaved New metric for personalized search Combine results of two search rankings (alternating between results, omitting duplicates) 12 Offline Evaluation 6 participants, 2 months of browsing history Judge relevance of top 50 pages returned by Google for 12 queries 25 general queries (16 from TREC 2009 Web search track), each participant will judge 6 Most recent 40 search queries, judge 5 Each participant took about 2.5 hours to complete 13 Offline Evaluation Personalization strategies. Rel: relative weighting Strategy Profile Parameters Ranking Parameters Full text Title Meta keywords Meta Descr. Extracted terms Noun Phrases Term weights Snippet Scoring Google rank URLs visited MaxNDCG - Rel Rel - - Rel TF-IDF LM 1/log v=10 MaxQuer - - - - Rel Rel TF LM 1/log v=10 MaxNoRank - - Rel - - - TF LM - v=10 MaxBestPar - Rel Rel - Rel - pBM25 LM 1/log v=10 MaxNDCG: yields highest average NDCG MaxQuer: improves the most queries MaxNoRank: the method with highest NDCG that does not take the original Google ranking into account MaxBestPar: obtained by greedily selecting each parameter sequentially 14 Offline Evaluation Offline evaluation performance Method Average NDCG +/=/- Queries Google 0.502 ± 0.067 - Teevan et al. [28] 0.518 ± 0.062 44/0/28 PClink 0.533 ± 0.057 13/58/1 MaxNDCG 0.573 ± 0.042 48/1/23 MaxQuer 0.567 ± 0.045 52/2/18 MaxNoRank 0.520 ± 0.060 13/52/7 MaxBestPar 0.566 ± 0.044 45/5/22 MaxNDCG and MaxQuer are both significantly better Interestingly, MaxNoRank is significantly better than Google and Teevan (may be due to overfitting on small offline data) PClink improves fewest queries, but better than Teevan on average NDCG 15 Offline Evaluation Distribution of relevance at rank for Google and MaxNDCG rankings 3600 relevance judgements collected, 9% Very Relevant, 32% Relevant, 58% Non-Relevant Google:places many Very Relevant results in Top 5 MaxNDCG: adds more Very Relevant results into Top 5, and succeeds in adding Very Relevant results between Top 5 and Top 10 16 Online Evaluation Large-scale interleaved evaluation, users performing day-to-day real searches The first 50 results requested from Google, personalization strategies were picked randomly Exploit Team-Draft interleaving algorithm [18] to produce a combined ranking 41 users, 7997 queries, 6033 query impressions, 6534 queries and 5335 query impressions received a click 17 Online Evaluation Results of online interleaving test Method Queries Google Vote Re-ranked Vote MaxNDCG 2090 624(39.5%) 955(60.5%) MaxQuer 2273 812(47.3%) 905(52.7%) MaxBestPar 2171 734(44.8%) 906(55.2%) Queries impacted by personalization Method Unchanged Improved Deteriorated MaxNDCG 1419(67.9%) 500(23.9%) 171(8.2%) MaxQuer 1639(72.1%) 423(18.6%) 211(9.3%) MaxBestPar 1485(68.4%) 467(21.5%) 219(10.1%) 18 Online Evaluation Rank differences for deteriorated(light) and improved(dark) queries for MaxNDCG Degree of personalization per rank For a large majority of deteriorated queries, the clicked results only loss 1 rank The majority of clicked results that improved a query gain 1 rank The gains from personalization are on average more than double the losses MaxNDCG is the most effective personalization method 19 Conclusions First large-scale personalized search and online evaluation work Proposed personalization techniques: significantly outperform default Google and best previous ones Key to model users: use characteristics and structures of Web pages Long-term, rich user profile is beneficial 20 Future Exploration Parameter extension Learning parameter weights Using other fields (e.g., headings in HTML) and learning their weights Incorporating temporal information How much browsing history? Whether decaying weights of older terms? How page visit duration can be used? Making use of more personal data Using extracted profiles for other purposes 21 Thank you! 22
© Copyright 2026 Paperzz