Mining Utility Functions

Mining Utility Functions
based on user ratings
COMP5331
Sepanta Zeighami
Motivation

A hotel booking website

Different users provide ratings for different hotels



Not all hotels are rated by all users
There is information available on each hotel

It’s size, price, location and etc.

There is trade off between different attributes, i.e. a hotel in a better
location has a higher price.
Understand how the users’ ratings is affected by hotels’ attributes

What’s the probability of a user choosing a hotel with a lower price
compared to a bigger room in a better location.

Useful for websites’ management to provide more suitable options for
customers
Hotel example
Hotel name
Price
Location
Holiday Inn
8
5
Hilton
2
10
Shangri La
4
8
Name
Hotel
Rating
Alex
Holiday Inn
6.5
Alex
Hilton
6
Alex
Shangri La
6
Sam
Hilton
8.4
Name
Price
Location
Sam
Holiday Inn
5.6
Alex
0.5
0.5
Nick
Shangri La
4.9
Sam
0.2
0.8
Nick
Hilton
4.4
Nick
0.7
0.3
Introduction

Given a set of ratings or scores provided by users on a set
of points, find how the users’ judgement is affected by
different attributes of the points.

First need to understand the decision making process of
each user


How much value does each user attach to each attribute of the
points
Then, build a general probability distribution model based
on that
Related works


Recommender systems [1, 2]

predicting a new user's preferences based on the previous
information.

Grouping users based on their similarities and predicting users
preferences by assigning them to a group.
Preference learning [3, 4]

Predicting a user’s preferences based on information available
about the user.

Does not provide any information on how much a user values
different attributes of different items.
Understanding Each User
Utility Functions

The rating each user provides for a data point is called the
utility of the user from that point.



Hotel
Utility quantifies the “satisfaction” a user derives from the data
point.
Alex
Holiday Inn 6.5
Alex
Hilton
6
Assuming user’s satisfaction can be quantified.
Alex
Shangri La
6
Sam
Hilton
8.4
Sam
Holiday Inn 5.6
Nick
Shangri La
4.9
Nick
Hilton
4.4
Consider a set of points 𝐷 for which the has provided
ratings.


Name
I.e. for some points in 𝐷 we know the utility of the user.
We associate with a user, a function, 𝑓 𝑝 : 𝐷 → ℛ, where
for a point p ∈ 𝐷, 𝑓(𝑝) is a real number equal to the
utility the user derives from the point 𝑝.
Rating
Understanding Each User
Linear Utility Functions
Name
Price
Location
Alex
0.5
0.5
Sam
0.2
0.8
Nick
0.7
0.3

Let 𝐷 = 𝑛. The utility function of a user, 𝑓(𝑝), can be
written as an 𝑛-dimensional vector whose 𝑖th element is
the utility the user derives from the 𝑖𝑡ℎ point of the
database.

Consider 𝐷 as a 𝑑-dimensional database, where each point
is a vector, and 𝑝𝑖 is its value in the 𝑖𝑡ℎ dimension.

We call a utility function linear if there exists a a 𝑑dimensional vector 𝑤 consisting of 𝑤1 , 𝑤2 , … , 𝑤𝑑 for which
𝑓 𝑝 = 𝑑𝑖=1 𝑤𝑖 × 𝑝𝑖 . Alternatively, we can write 𝑓 𝑝 = 𝑝 ⋅
𝑤. We call 𝑤𝑖 the weight the user attaches to the 𝑖𝑡ℎ
dimension.

We can use 𝑤 to refer to the utility function 𝑓.
Understanding Each User
Modeling User’s Utility

We propose to use a linear model to capture the value a
user attaches to each dimension of a data point.

It provides an understanding of the user’s behavior,
although a linear model might not perfectly fit the data.

The model assumes that a linear utility functions can
express the relationship between the points and the
utility of the users.

Note that there might exist utility functions that are
completely independent of the points’ attributes, but in
general we expect to see a correlation.

A customer will usually consider the price of a hotel before
booking it.
Using Linear Models for Utility Functions


For a utility function vector 𝑓, a matrix 𝑋 where each row is a point
vector 𝑥𝑖 and a weight vector 𝑤, we set 𝑋𝑤 = 𝑓.

We want to find a 𝑤 for which the above holds.

The equations might be inconsistent, as users’ utility may not be perfectly linear.

We might have observations on the value of 𝑓 for only a few points in database.
Least squares linear solution


Find a solution with the least square error.
Linear regression with Gaussian noise

Finding 𝑤 so as to maximize the likelihood of 𝑓.
Building a Probability Distribution
Using Gaussian Mixture Model

Create a utility distribution based on the inferred utility
function

Gaussian Mixture Model (GMM)

It divides customers into 𝑘 groups each having a multivariate
Gaussian distribution.

A value for 𝑘 needs to be found through trial and error.

It assumes each group of customers can be modeled by a
multivariate Gaussian distribution.
Building a Probability Distribution
Based on distance from samples

We assume the probability of a utility function getting a specific set of
values changes based on its distance from samples.

That is, given a utility function vector, 𝑣, we assume the probability
of utility functions existing in a region follows a multivariate normal
distribution 𝑁 𝑣, 𝐼 , where 𝐼 is the identity matrix.

If we have 𝑛 samples, then, we use a mixture distribution consisting
1
1
of 𝑛 Gaussian distributions 𝑁(𝑣𝑖 , × 𝐼), each with probability .
𝑛

𝑛
The cumulative distribution function (cdf) will be:
𝑓 𝑥1 , 𝑥2 , … 𝑥𝑑
1
=
𝑛
𝑛
𝜙𝑖 (𝑥1 , 𝑥2 , … , 𝑥𝑑 )
1
Where 𝜙𝑖 is the cdf of 𝑁(𝑣𝑖 , 𝑛 × 𝐼) normal distribution.
The Hotel Example
Price
Price
Low Prob.
High Prob.
location
Clustering Gaussian mixture model
location
Distance based model
Experiments

We did experiments on data sampled from uniform distribution in 2
dimensions with 20 samples.

GMM with 2 components returns 2 multivariate Gaussian distributions
with means (7.99404513, 2.36299952) and (3.1791763 , 6.74665699)
and each component with prob. 0.5.

In this model, Pr(5 ≤ 𝑥 ≤ 6 𝑎𝑛𝑑 0 ≤ 𝑦 ≤ 1) is less than
10
Pr 7.5 ≤ 𝑥 ≤ 8.5 𝑎𝑛𝑑 2 ≤ 𝑦 ≤ 3

With the distance based model, the probability of
5
getting each of these 1 × 1 squares is the same, but
for more samples is needed for smaller units.
10
5
Original uniform distribution
Summary

We’ve provided methods to understand how different users evaluate
different characteristics of different products.

By assuming a linear model for utility functions, we’ve provided 2
methods to find out how much value different users attach to
different attributes of different items.

Using these weights, we proposed two methods to model the
distribution of utility functions of users.
Thank you!
Reference
[1] A. M. Rashid, G. Karypis, and J. Riedl, Learning preferences of new users
in recommender systems: An information theoretic approach," SIGKDD
Explor. Newsl., vol. 10, pp. 90{100, Dec. 2008.
[2] R. Burke, Hybrid recommender systems: Survey and experiments," User
modeling and user-adapted interaction, 2002.
[3] W. Chu and Z. Ghahramani, Preference learning with gaussian processes,"
in Proceedings of the 22Nd International Conference on Machine Learning,
ICML '05, (New York, NY, USA), pp. 137{144, ACM, 2005.
[4] N. Houlsby, J. M. Hernandez-Lobato, F. Huszar, and Z. Ghahramani,
Collaborative gaussian processes for preference learning," in Proceedings of the
25th International Conference on Neural Information Processing Systems,
NIPS'12, (USA), pp. 2096{2104, Curran Associates Inc., 2012.