Combining Content-based and Collaborative Filtering Gabriela Polčicová Pavol Návrat Department of Computer Science and Engineering, Slovak University of Technology [email protected] [email protected] Overview • Information Filtering and its Types • Combined Method • Experiment with Information Filtering Methods • Conclusions Information Filtering (1) – delivery of relevant information to the people who need it • Types of Information Filtering – Content-based - for textual documents – Collaborative - for communities of users • Interests – information about interests - stored in profiles – expressing opinions to documents - ratings • Ratings {i, j, rij} – for user i, item j, the value of rating rij Information Filtering (2) Filter Rated items {user, item, value} Unrated items {user, item} Learning interests Estimating the value of rating Choosing recommendations Recommendations {user, item, estimation} Content-based Filtering (1) • Basic idea – recommending documents based on content and properties of document • Profile – consists of keywords with assigned weights – only documents matching profile are recommended • Recommendations – based on objective measurable properties Content-based Filtering (2) Documents rated by the user Documents unrated by the user Documents of interest Documents, ratings PROFILE Keywords, phrases with weights Documents matching profile => recommended documents Collaborative Filtering (1) • Basic idea – automating “word of mouth” – leverage opinions of like-minded users while making decisions • Schema – collecting users’ opinions – searching for like-minded users – making recommendations Collaborative Filtering (2) Profile of user 1 Profile of current user Profile of user 2 Profile of user 3 Profile of user 4 Profile of user 5 Documents from like-minded users’ profiles => recommended documents Collaborative Filtering (3) • Similarity measure: Pearson Correlation Coefficient (rcj - rc) (rij - ri) kci = j Ici (rcj - rc)2 j Ici (rij - ri)2 j Ici • Recommendations computation: weighted sum of ratings (rij - ri) kci rcj = rc + i Ucj |kci| i Ucj Combining Content-based and Collaborative Filtering (1) • Computing of estimates for missing ratings by Contentbased Filtering method for each user • Searching for like-minded users – computing coefficient kci between current and i-th user (only from ratings) – computing coefficient kci’ between current and i-th user (from both ratings and estimates) • New recommendations computation – using ratings (with coefficients kci) and also ratings with estimates (with coefficient kci’) as weights in weighted sum of ratings and estimates Datasets for Experiments • Data: – EachMovie - users‘ ratings for movies www.research.digital.com/SRC/eachmovie/ – IMDB - textual information for CBF (movies‘ descriptions) www.imdb.com/ • Datasets: – A - ratings from the period up to Mar 1, 1996 (810 ratings from 71 users) – B - ratings from the period uo to Mar 15, 1996 (2407 ratings from 131 users) – C - ratings from the period up to Apr 1, 1996 (12290 ratings from 651 users) EachMovie Data and Constant Method Percentage of ratings in EachMovie 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% A B C 1 2 3 4 ratings • Constant Method rcj = 5 5 6 Experiments with Combination of Contentbased and Collaborative Filtering (2) Dataset recommendations test, training sets Divide dataset into training set (90%) and test set (10%) Apply filtering methods and evaluate their performance recommendations test, training sets recommendations test set recommendations Evaluation of methods’ performance Content-based Filtering method Collaborative Filtering method Combined Filtering method Constant method Metrics • Coverage = percentage of items for which the method is able to compute estimates |R L| + |R L| • Accuracy = |L| + |L| 2.Precision.Recall • F-measure = Precision + Recall |rij - rij| • NMAE = n.s R - set of recommended items L - set of liked items |R L| Precision = |R| |R L| Recall = |L| Results of Experiments Coverage Accuracy 1 0,9 0,95 0,85 0,9 0,8 0,85 0,75 0,8 0,7 A B C A F-measure 1 0,95 0,95 0,9 0,9 0,85 0,85 0,8 0,8 B C F-measure 1 A B C CF CBF combined A B C constant Conclusions • Combination of content-based and collaborative filtering might help in initial phase Future work • Weighting of coefficients • Comparing method with additional methods Content-based Filtering - Vector Representation of Documents and Profiles Documentj Sim(W, Profile) = computer machine learning W . Profile |W| . |Profile| n profilei = rj .wij j=1 TF-IDF TF-IDF Wj= (0, … , 0, D=( … TF-IDF 0.5 , 0, … , 0, , computer, … 0.3 , 0, … , 0, , learning, … 0.2 , 0, … , 0) , machine, …. ) Collaborative Filtering - Example A current 1 3 2 3 B C D 1 4 5 1 1 3 5 1 E F 2 G 5 2 2 4 5 5 4 1 4 2 4 5 2 4 2 5 Combining Content-based and Collaborative Filtering (2) • Similarity measure: Pearson Correlation Coefficient k’ci = CBF (rCBF r ) (r cj c ij - ri) j I’ci CBF (rcj j I’ci rc)2 CBF (rij - ri)2 j I’ci • Recommendations computation: weighted sum of ratings and estimates (rij - ri) kci + (rCBF ij - ri) kci’ rcj = rc + i Ucj i U’cj |kci| + |kci’| i Ucj i U’cj Experiments with Combination of Contentbased and Collaborative Filtering (1) • Content-based Filtering Method (CBF) – documents and profiles: vector representation - weighted keywords (TF-IDF) – estimation computation: normalized dot product of document and profile vectors • Collaborative Filtering (CF) – Pearson correlation coefficient – weighted sum of ratings • Combination of CF and CBF – Pearson correlation coefficients – weighted sum of ratings and CBF estimations • Constant Method (rcj = 5)
© Copyright 2026 Paperzz