Slides

Learning Profiles from
User Interactions
Pelin Atahan and Sumit Sarkar
School of Management,
The University of Texas at Dallas
[email protected], [email protected]
The University of Texas at Dallas
Introduction


Personalization systems tailor content and
services to individuals
Consider vendor selling products through its
website


Personalize recommendations
Learn profiles based on links visited by a user



user visits a link (l) to which 70% of visitors are male
predict user is male with probability 0.7 and
revise this probability as the user navigates through the
website, i.e., clicks on other links
The University of Texas at Dallas
2
Research Framework

Learn profiles for targeting purposes



Profile representation –attribute values with relevant
probabilities



personal profiles – demographic, psychographic,
geographic attributes
predetermined set of attributes, e.g., gender, income, risk
taker
for attribute “gender” (G) profile maybe represented as
P(G=m│l)=0.7, and P(G=f │l)=0.3
for attribute “risk taker” (R) with values risk taker (r),
conservative (c), P(R=r│l)=0.6, P(R=c│l)=0.4
Probabilistic representation
3
Data Requirements

Data requirements – link level statistics only (for all
links)

examples:



P(G=m│”finance” link)=0.7, P(G=f│”finance” link)=0.3
P(R=r│”finance” link)=0.6 and P(R=c│”finance” link)=0.4
Data can be acquired from one of the following
sources



registered users, if available
sampling – explicitly asking a subset of users
professional market research agencies like comScore,
Claritas, and Nielsen/Net Ratings.
4
Research Problems
Learn the personal profile of a user based
on links traversed during a session
Two types of learning considered


1.
2.
Learning profiles passively by observing links
traversed
Learning profiles quickly by dynamically
determining links available on a page
The University of Texas at Dallas
5
Literature Review

Primarily study profiling in information retrieval
context




Montgomery (2001) address learning demographic
profiles from websites visited by a user


user interests
identifying interesting pages based on pages visited.
profiles represented as feature (term) vectors
approach is faulty (conditioning is incorrect)
Baglioni et al. (2003) address identifying the gender
of a user based on links visited


consider a subset of pages
apply several classification models
The University of Texas at Dallas
6
Passive Learning

Consider, Yahoo wants to learn the gender of a user who is
traversing its website

user clicks on the following links





problem: To determine the probability that the visitor is male (or
female) given this clickstream { l1, l2, l3, l4}


the “finance” link (l1)
the “investing ideas” link (l2)
the “insurance” link (l3)
the “sports” link (l4)
P(G=m│l1’ l2, l3, l4)
In general, for attribute (A) and clickstream { l1, l2, …, ln}

P(A=ai│l1’ l2, …, ln)
The University of Texas at Dallas
7
Passive Learning Cont’d

Use Bayes formula
P(ai l1 , l2 ,..., ln )  (1 / K )  P(l1 , l2 ,..., ln ai ) P(ai )
where
K   P(l1 , l2 ,..., ln ai ) P(ai )
i

Assume conditional independence, i.e.,
probability of clicking a link is independent of the
probability of clicking another link, when the user
profile is known
P(l1 ai , l2 )  P(l1 ai )
The University of Texas at Dallas
8
Passive Learning Cont’d

After algebraic manipulations, we get:
n
P(ai l1 , l 2 ,..., l n )  (1 / K ) 


 P(a
i
lj)
j
P(ai ) n 1
We can learn customer profile from simple
link statistics
The process is not computationally intensive
The University of Texas at Dallas
10
Illustrative Example


Consider the following site priors and link
probabilities
site priors
P(m)=0.45
P(f)=0.55
finance link (l1),
P(m│l1)=0.6
P(f│l1)=0.4
investing ideas link (l2),
P(m│l2)=0.7
P(f│l2)=0.3
insurance link (l3),
P(m│l3)=0.4
P(f│l3)=0.6
sports link (l4)
P(m│l4)=0.7
P(f│l4)=0.3
P(m│l1’ l2, l3, l4)= 0.91 and P(f│l1’ l2, l3, l4)= 0.09.
The University of Texas at Dallas
11
Learning Profiles in Real Time

What happens when the user clicks on a new
link?



NBA scoreboard link (l5)
Incremental belief revision
LH – denotes the link history (links clicked
prior to the last click)
P(a i LH , l n )  (1 / K ) 
The University of Texas at Dallas
P(a i LH ) P (a i l n )
P(a i )
12
Incremental Revision Example

P(m│LH, l5)=?



P(m│LH)=P(m│l1’ l2, l3, l4)= 0.91 and P(f│LH)=0.09
P(m│l5)=0.65 and P(f│l5)=0.35
P(m│LH,l5)= 0.96 and P(f│LH, l5)= 0.04
The University of Texas at Dallas
13
Active Learning of User Profiles



By learning profiles quickly, websites start getting
the benefits sooner
Learning is the reduction in uncertainty of profile
attributes
Our objective: Learn profiles quickly by carefully
selecting the links to offer at each page (offer set)




Information value of an offer set is measured as the
expected information gain
The number of links to offer (n) is predetermined
Assume the user will click one of the links available
Stop learning when expected additional information is not
statistically significant
The University of Texas at Dallas
14
Click Probabilities Conditional on an Offer Set



Offer set O={o1,o2,…,on}
We estimate P’(lj│ai) for each attribute value and
each link in the offer set.
From Bayes rule:
P(ai l j ) P(l j )
P' (l j ai ) 
 P(ai l j ) P(l j )
l j O

We need some measure of the likelihood of a link
being clicked, P(lj).

does not need to be absolute, a relative measure is
sufficient
 e.g., number of clicks a link gets per month
The University of Texas at Dallas
15
Belief Revision Conditional on an Offer Set

Belief revision
P' ( LH , l j a i ) P' (a i )
P' (a i LH , l j ) 
 P' ( LH , l
j
a i ) P' (a i )
i

Manipulating the above expression we get:
P' (ai LH , l j ) 
P' (l j ai ) P' (ai LH )
 P' (l
j
ai ) P' (ai LH )
i

P’(ai│LH) corresponds to the prior on the attribute value at
each iteration
The University of Texas at Dallas
16
Information Gain Given a Link is Clicked

Information gain: Defined as the reduction in entropy
of attribute’s distribution given a link is clicked
I ( A LH , l j )  H ( A LH )  H ( A LH , l j )

Entropy prior to a click
H ( A LH )   P(ai LH ) log 2 P(ai LH ) 
i

Entropy given a link is clicked

H ( A LH , l j )   P(ai LH , l j ) log 2 P(ai LH , l j )

i
The University of Texas at Dallas
17
Expected Information Gain Given an Offer Set

When n links are offered
EI ( A LH , l j , O) 

 P' (l
l j O
j
LH ) I ( A LH , l j , O)
P’(lj│LH) is the probability of a link being clicked
given the offer set
P' (l j LH )   P' (l j ai ) P' (ai LH )
l j O
The University of Texas at Dallas
18
Optimal Offer Set-One Step Look Ahead
arg max EI ( A LH , l j , O)
O


Prior entropy is constant given the link history
We can determine optimal offer set that minimizes
the expected entropy


arg min  P' (l j LH ) H ( A LH , l j , O)
O
 j

The University of Texas at Dallas
19
Illustrative Example




The user has visited the “finance” link and
There are three possible links to consider
Offer set size n=2
P(m│LH)=0.6 P(f│LH)=0.4
“Investing ideas” link (l1),
P(m│l1)=0.7
P(f│l1)=0.3
P(l1)=0.2
“Insurance” link (l2),
P(m│l2)=0.4
P(f│l2)=0.6
P(l2)=0.3
“Family and home” link (l3),
P(m│l3)=0.2
P(f│l3)=0.8
P(l3)=0.3
Three possible offer sets: O1={o1, o2}, O2={o1, o3}, O3={o2, o3}.




LH=“finance” link
EI(G│lj, O1)=0.06
EI(G│lj, O2)=0.18
EI(G│lj, O3)=0.04
Offering O2 is optimal
The University of Texas at Dallas
20
Determining the Optimal Offer Set


The number of potential offer sets to evaluate
could be very large
For a site with M links and offer set size n,
number of possible combinations:
M 
 
n 

E.g. for M = 100 and n = 10, there are more than
17 trillion combinations
The University of Texas at Dallas
21
Heuristic Approach to Determine the
Optimal Offer Set

Consider the expected entropy expression for
learning the gender (n = 2)
P' (l1 , m)
P' (l1 , f ) 

 P' (l1 , m) log 2 P' (l )  P' (l1 , f ) log 2 P' (l ) 


1
1
arg min 

O
 P' (l , m) log P' (l2 , m)  P' (l , f ) log P' (l2 , f ) 
2
2
2
2

P' (l2 )
P' (l2 ) 

P’(lj,ai), is proportional to P(lj,ai), the joint
distribution of the aggregate link probabilities
The University of Texas at Dallas
22
Heuristic Approach to Determine the
Optimal Offer Set

To select n links to offer

For each attribute value, select link that
maximizes P(ai,lj)

If more links needed, evaluate links with the
next highest joint probability

Continue until all n links have been determined.
The University of Texas at Dallas
23
Discussions


Assumption: the probability of clicking a link
is conditionally independent of the probability
of clicking other links.
If this assumption does not hold for some
links, we can group the correlated links into
disjoint sets,


use joint probabilities associated with these groups of
links for belief revision, or
use aggregate group level probability parameters to
revise beliefs
The University of Texas at Dallas
24
Discussions


Assumption: the user will follow one of the
links being offered.
Other possibilities



the user may leave the site
the user may click the back button, and select a
different link
if there is a search engine available on the site,
the user may submit a query and navigate to the
results page
The University of Texas at Dallas
25
Conclusion





Presented a framework for modeling user
profiles for targeting purposes
Showed how the profile can be learnt
implicitly from the links traversed
Showed how the learning process can be
expedited by dynamically determining the
offer set at each iteration
Data requirements are reasonable
Computationally not intensive
The University of Texas at Dallas
26
On-going Work



Solution approaches to the optimal offer set
selection problem – refine heuristic
Validate the models
Extend the model to learn multiple attributes
simultaneously
The University of Texas at Dallas
27
Thank you!
The University of Texas at Dallas