Collaborative Filtering - Slovak University of Technology in Bratislava

Combining Content-based and
Collaborative Filtering
Gabriela Polčicová
Pavol Návrat
Department of Computer Science and
Engineering, Slovak University of Technology
[email protected]
[email protected]
Overview
• Information Filtering and its Types
• Combined Method
• Experiment with Information
Filtering Methods
• Conclusions
Information Filtering (1)
– delivery of relevant information to the people who need it
• Types of Information Filtering
– Content-based - for textual documents
– Collaborative - for communities of users
• Interests
– information about interests - stored in profiles
– expressing opinions to documents - ratings
• Ratings {i, j, rij}
– for user i, item j, the value of rating rij
Information Filtering (2)
Filter
Rated items
{user, item, value}
Unrated items
{user, item}
Learning
interests
Estimating the
value of rating
Choosing
recommendations
Recommendations
{user, item, estimation}
Content-based Filtering (1)
• Basic idea
– recommending documents based on content and
properties of document
• Profile
– consists of keywords with assigned weights
– only documents matching profile are recommended
• Recommendations
– based on objective measurable properties
Content-based Filtering (2)
Documents rated by the user
Documents unrated by the user
Documents of interest
Documents, ratings
PROFILE
Keywords, phrases
with weights
Documents matching profile
=> recommended documents
Collaborative Filtering (1)
• Basic idea
– automating “word of mouth”
– leverage opinions of like-minded users while making
decisions
• Schema
– collecting users’ opinions
– searching for like-minded users
– making recommendations
Collaborative Filtering (2)
Profile of
user 1
Profile of
current
user
Profile of
user 2
Profile of
user 3
Profile of
user 4
Profile of
user 5
Documents from
like-minded users’
profiles
=> recommended
documents
Collaborative Filtering (3)
• Similarity measure: Pearson Correlation Coefficient
 (rcj - rc) (rij - ri)
kci =
j  Ici
 (rcj - rc)2
j  Ici
 (rij - ri)2
j  Ici
• Recommendations computation: weighted sum of ratings
 (rij - ri) kci
rcj = rc +
i  Ucj
 |kci|
i  Ucj
Combining Content-based and
Collaborative Filtering (1)
• Computing of estimates for missing ratings by Contentbased Filtering method for each user
• Searching for like-minded users
– computing coefficient kci between current and i-th user
(only from ratings)
– computing coefficient kci’ between current and i-th user
(from both ratings and estimates)
• New recommendations computation
– using ratings (with coefficients kci) and also ratings
with estimates (with coefficient kci’) as weights in
weighted sum of ratings and estimates
Datasets for Experiments
• Data:
– EachMovie - users‘ ratings for movies
www.research.digital.com/SRC/eachmovie/
– IMDB - textual information for CBF (movies‘ descriptions)
www.imdb.com/
• Datasets:
– A - ratings from the period up to Mar 1, 1996
(810 ratings from 71 users)
– B - ratings from the period uo to Mar 15, 1996
(2407 ratings from 131 users)
– C - ratings from the period up to Apr 1, 1996
(12290 ratings from 651 users)
EachMovie Data and Constant Method
Percentage of ratings in EachMovie
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
A
B
C
1
2
3
4
ratings
• Constant Method rcj = 5
5
6
Experiments with Combination of Contentbased and Collaborative Filtering (2)
Dataset
recommendations
test, training sets
Divide dataset
into training
set (90%) and
test set (10%)
Apply filtering
methods and
evaluate their
performance
recommendations
test, training sets
recommendations
test set
recommendations
Evaluation of methods’ performance
Content-based
Filtering method
Collaborative
Filtering method
Combined
Filtering method
Constant
method
Metrics
• Coverage = percentage of items for which the method is able to
compute estimates
|R L| + |R L|
• Accuracy =
|L| + |L|
2.Precision.Recall
• F-measure =
Precision + Recall
|rij - rij|
• NMAE =
n.s
R - set of recommended
items
L - set of liked items
|R  L|
Precision =
|R|
|R  L|
Recall =
|L|
Results of Experiments
Coverage
Accuracy
1
0,9
0,95
0,85
0,9
0,8
0,85
0,75
0,8
0,7
A
B
C
A
F-measure
1
0,95
0,95
0,9
0,9
0,85
0,85
0,8
0,8
B
C
F-measure
1
A
B
C
CF
CBF
combined
A
B
C
constant
Conclusions
• Combination of content-based and collaborative
filtering might help in initial phase
Future work
• Weighting of coefficients
• Comparing method with additional methods
Content-based Filtering - Vector
Representation of Documents and Profiles
Documentj
Sim(W, Profile) =
computer machine
learning
W . Profile
|W| . |Profile|
n
profilei =  rj .wij
j=1
TF-IDF
TF-IDF
Wj= (0, … , 0,
D=(
…
TF-IDF
0.5
, 0, … , 0,
, computer, …
0.3
, 0, … , 0,
, learning,
…
0.2
, 0, … , 0)
, machine, …. )
Collaborative Filtering - Example
A
current
1
3
2
3
B
C
D
1
4
5
1
1
3
5
1
E
F
2
G
5
2
2
4
5
5
4
1
4
2
4
5
2
4
2
5
Combining Content-based and
Collaborative Filtering (2)
• Similarity measure: Pearson Correlation Coefficient
k’ci =
CBF
 (rCBF
r
)
(r
cj
c
ij - ri)
j  I’ci
CBF
 (rcj j  I’ci
rc)2
CBF
 (rij - ri)2
j  I’ci
• Recommendations computation: weighted sum of ratings
and estimates
 (rij - ri) kci +  (rCBF
ij - ri) kci’
rcj = rc +
i  Ucj
i  U’cj
 |kci| +  |kci’|
i  Ucj
i  U’cj
Experiments with Combination of Contentbased and Collaborative Filtering (1)
• Content-based Filtering Method (CBF)
– documents and profiles: vector representation - weighted
keywords (TF-IDF)
– estimation computation: normalized dot product of
document and profile vectors
• Collaborative Filtering (CF)
– Pearson correlation coefficient
– weighted sum of ratings
• Combination of CF and CBF
– Pearson correlation coefficients
– weighted sum of ratings and CBF estimations
• Constant Method (rcj = 5)