Learning User Preferences for 2CP-Regression for a Recommender

Learning user preferences
for 2CP-regression for a
recommender system
Alan Eckhardt, Peter Vojtáš
Department of Software Engineering,
Charles University in Prague,
Czech Republic
Outline





Motivation
User model
Peak and 2CP
Experiments
Conclusion and future work
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
User preference learning

Helping the user to find what she looks for


A small amount of information required from the user


Ratings of notebooks,...
Construction of a general user preference model


E.g. notebooks
Each user has his/her own preference model
Recommendation of the top k notebooks to the user

Which the preference model has chosen as the most
preferred for the user
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
User preference learning


Recommendation process
Initial set

Centers of clusters of
objects

Construction of user
model
Recommendation

More iterations possible


Recommender
system
Initial
set
Construction of
user model
In each iteration the user
model is refined
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
Recommended
items
User
User decision
making
Two step user model

User model learning is divided into two steps
1. Local preferences - normalization of the
attribute values of notebooks to their preference
degrees
f i : DAi  0,1
Transforms the space DA into [0,1]N
i
2. Global preferences - aggregation of preference
degrees of attribute values into the predicted
rating
@ : 0,1  0,1
N
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
User model

Fuzzy sets


Normalize the space to monotone space [0,1]N
Define pareto front



Set of incomparable objects
Candidates for the best object
(1,…,1) is the best object
1
0
f Price
1
1
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
0
Price [$]
0
100
500
2 000
User model

Aggregation



Resolves the best object from pareto front
The second best object may not be on
pareto front
Two methods – Statistical and Instances
@RAM_U 1 , CPU_U 1 , Price_U 1  
1
1st best
2nd best
0
1
5 * RAM_U1  1 * CPU_U 1  3 * Price_U 1
9
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
Normalization of numerical
attributes

Rating
1
Linear regression

Preference of the smallest or
the largest value
0

Quadratic regression

0
Price
10 000 [$]
0
Price
10 000 [$]
Rating
1
Can detect ideal values, but
often fails in experiments
0
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
2CP regression

Preference dependence
between attributes


This is not a dependence
in the dataset (e.g. the
resolution of display
influences the price)
The influence of the
value of attribute A1 on
the preference of
attribute A2
Manufacturer
Name
Rating
Name
ACER
0.2
TOSHIBA 0.7
ASUS
0.5
HP
0.9
FUJITSU 0.8
IBM
0.8
MSI
SONY
0.7
0.5
Rating
LENOVO 0.4
Price
Manufacturer
ACER, ASUS,
Ideal Price
750$
FUJITSU, MSI
TOSHIBA, HP, IBM,
SONY, LENOVO
2200$
E.g. the value of the producer (IBM)
of a notebook influence the
preference of the price of the
notebook (for IBM, the ideal price is
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010 2200$).
Peak

Motivation


Finding the peak value



User often prefer once particular
value of attribute
Peak
Rating
1
Traversing the training set
 Which is small
Testing the error of linear regressions
on both sides of the peak
0
0
Price
10 000 [$]
We know exactly which value is the most preferred

Useful for visual representation
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
2CP regression+Peak



Dependence of price on the value of manufacturer
ACER => High price
ASUS => Lower price
Manufacturer
Price
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
Name
Name
ACER
ASUS
Experiment settings


Dataset of 200 notebooks
Artificial user preferences


Training sets of sizes 2-60


The preference of price was dependent on the value of
producer
The rest of the dataset was
used as testing set
Error measures


RMSE
Kendall t coefficient
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
Experiment settings

Tested methods





Support Vector Machines from Weka
Mean – returns the average rating from the training set
Instances – classification, uses objects from the training as
boundaries on rating
Statistical – weighted average with learned weights
2CP


Both Instances and Statistical can use local preference
normalization – Linear, Quadratic, Peak
2CP serves to find the relation between the preference of an
attribute value and the value of another
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
Experiment results
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
Experiment results
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
Experiment results
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
Conclusion



Proposal of method Peak
Combination with 2CP
Experimental evaluation with very good
results

Using rank correlation measure
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010
Future work



nCP-regression
Clustering of similar values for better
robustness
Degree of relation between two attributes
SOFSEM 2010, Špindlerův mlýv, Czech
Republic, 23.-29.1.2010