with personalization

NLP Course Seminar
WEB PERSONALIZATION
Group 14
Vishaal Jatav (04d05013)
Varun Garg (04d05015)
Roadmap







Motivation
Introduction
The Personalization Process
Personalization Approaches
Personalization Techniques
Issues
Conclusion
Motivation

Some Facts





Overwhelming amount of information on web
Not all the documents are relevant to the user
Users cannot convey their information needs
Users never find any document 100% relevant
Users expect more personal behavior


I don't want results of Delhi when I am in Bombay.
I was looking for crane (the bird) not crane (the machine).
Google Customization
Google (without personalization)
Google (with personalization)
Google Search History
Google Search History
Introduction

Personalization




React differently to different users
System reacts in a way the users want it to
Ultimately bring back the user to the system
Web Personalization




Apply machine learning and data mining
Build models of user behavior (called profiles)
Predict user's needs and expectations
Adaptively estimate better models
The Personalization Process

Consider the following pieces of information





Geographical Location
Age, gender, ethnicity, religion, etc.
Interests
Previous reviews on products
......

How could these pieces of information help?

How to collect these information?
The Personalization Process
(Contd...)

Collect lots of information on the user behavior


Decide on a user model


Information must be attributable to a single user
Featuring user needs, lifestyle, situations, etc.
Create user profile for each user of the system

Profile captures the individuality of the user


Habits, browsing behavior, lifestyle, etc.
With every interaction, modify the user profile
The Personalization Process
More Formally



Web is a collection of n items I = {i1,i2,....in}
User comes from a set U = {u1,u2,...um}
User has rated each item by ruk : I → [0,1] U !




where, ij = ! means ij is not rated by the user
Ik(u) is set of items not yet rated by user uk
Ik(r) is set of items rated by user uk
GOAL: recommend items ij to user ua that are
present in Ia(u), which might be of his interest
Classification of Personalization
Approaches

Individual Vs Collaborative

Reactive Vs Proactive

User Vs Item Information
Classification of Personalization Approaches
Individual Vs Collaborative

Individual approach (Google Personalized Search)


Use only individual user's data
Generate user profile by analyzing



Advantage


User's browsing behavior
User's active feedback on the system
Can be implemented on the client-side - no privacy
violation
Disadvantage

Based only on past interactions – lack of serendipity
Classification of Personalization Approaches
Individual Vs Collaborative
Contd...

Collaborative approach (Amazon recommendations)


Find the neighborhood of the active user
React according to an assumption


Disadvantages



If A is like B, then B likes the same things as A likes
New item rating problem
New user problem
Advantage

Better than individual approach - Once the two problems are
solved.
Classification of Personalization Approaches
Reactive Vs Proactive

Reactive approach

Explicitly ask user for preferences


Either in the form of query or feedback
Proactive approach

Learn user preferences by user behavior


No explicit preference demand from the user
Behavior is extracted


Click-through rates
Navigational pattern
Classification of Personalization Approaches
User Vs Item Information

User Information




Geographic location (from IP address)
age, gender, marital status, etc (explicit query)
Lifestyle, etc. (inference from past behavior)
Item Information


Content of Topics – movie genre, etc.
Product/ domain ontology
Personalization Techniques

Content-Based Filtering

Collaborative Filtering

Model Based Personalization

Rule based

Graph theoretic

Language Model
Content-Based Filtering

Syskill and Webert use explicit feedback





Letizia uses implicit feedback



Individual, Reactive, Item-information
Uses naïve Bayes to distinguish likes from dislikes
Initial probabilities updated with new interactions
Uses 128 most informative words from each item
Individual, Proactive, Item-information
Find likes/dislikes based on tf-idf similarity
Others use nearest-neighborhood for similarity
Collaborative Filtering

Found successful in recommendation systems

General Technique

For every user, a user neighborhood is computed


Get candidate items for recommendations


Neighborhood contains users who have rated several
items almost equally
Items seen by the neighborhood but not by active user ua
Data is stored in the form of a rating matrix

Items as rows and users as columns
Collaborative Filtering
Contd....

System must provide the following algorithms

Measure similarity between users



Predicting rank of the item not rated by the user



For creation of the neighborhood
Pearson and Spearman Correlation, cosine similarity, etc.
To decide order with which these items will be presented
Weighted sum of ranks – most common
Select neighborhood subset for prediction


To reduce large amount of computation
Threshold in similarity value – most common
Model Based Personalization
Approaches

Executed in two stages



Common data used for model generation



Offline process – to create the actual model
Online process – using the model and interaction
Web usage data (web history, click-through rates, etc.)
Item's structure and content data
Examples



Rule-Based Models
Graph-Theoretic Models
Language Models
Model Based Personalization
Rule Based Models

Association rule-based



Sequence rule-based



Item ia is in unordered association with ib
If user considers ib, then ia is a good recommendation
Item ia is in sequential association with ib
If user considers ia, then ib is a good recommendation
Association between items can be stored as a
dependency graph
Model Based Personalization
Graph Theoretic Model

Ratings data is transformed into a directed graph




Nodes are users
A edge between ui and uj means that ui predicts uj
Weights on edges represents the predictability
To predict if an item ik will be of interest to ui

Calculate shortest path from ui to any user ur


Where ur has rated ik
Predicted rating is calculated as a function of path between
ui and ur
Model Based Personalization
Language Modeling Approaches

Without using user's relevance feedback


Simple language modeling
Using user's relevance feedback


N gram based methods
Noisy channel model based method
Language Model Approach
Simple Language Modeling



Without using user's feedback
History consists of all the words in the past
queries
Learn User Profile as {(w1,P(w1)),... (wn,P(wn))}
where
Language Model Approach
Simple Language Modeling

Sample User profile
Language Model Approach
Simple Language Modeling

Re-ranking of unpersonalized results

Re-ranking is done according to P(Q|D,u)


α Is a weighter parameter between 0 and 1
UP is user profile
Language Model Approach
N gram based approach


Using user's relevance feedback
Learn User Profile

Let Hu represent the search history of user u
H = {(q1, rf1), (q2, rf2), (q3, rf3), ...., (qn, rfn)}

Unigram
Now the user profile consists of
{(w1, P(w1)), (w2, P(w2)), (w3, P(w3)), ...., (wn, P(wn))}
Language Model Approach
N gram based approach

Sample Unigram User Profile
Language Model Approach
N gram based approach

Bigram
the user profile consists of
{(w1w2, P(w2|w1)), (w2w3, P(w3|w2)), ... , (wn-1wn, P(wn|wn-1))}
Language Model Approach
N gram based approach

Sample Bigram User Profile
Language Model Approach
N gram based approach

Re-ranking unpersonalized results

Based on unigram (α = weighting parameter)
Q = q1 q2 q3 .... qn
P(q1 q2 q3 .... qn)= P(q1) P(q2) P(q3) ....... P(qn)
Language Model Approach
N gram based approach

Based on bigrams
Q = q1 q2 q3 .... qn
P(q1 q2 q3 .... qn)= P(q1|q2) P(q2|q3) ....... P(qn-1|qn)
Language Model Approach
Noisy Channel based approach


With using User's Feedback (Implicit)
User history is represented as




Hi = (Q1,D1) , (Q2,D2) , .... (QN,DN)
Di is the document visited for Qi
D consists of words w1, w2, .... wm
Basic Idea – Statistical Machine Translation



Given Parallel Text of languages S and T
We get P(ti|si) ∀ si ϵ S and ti ϵ T
Using EM we get the optimized model P(T|S)
Language Model Approach
Noisy Channel based approach

Similarly




Assumption




T = past queries Q1, Q2, .... QK
S = text of relevant documents for queries T
We learn the model P(Q|D) or more precisely P(qi|wj)
Translate the ideal [information containing] document into a query
Document – a verbose language
Query – a compact language
User profile is stored as

Tuples < qi , wj , P(qi|wj) >
Language Model Approach
Noisy Channel based approach

Sample Noisy Channel User Profile
Language Model Approach
Noisy Channel based approach

Re-ranking

Re-rank the documents using P(Q|D,u)


α = weighting parameter
P(qi|GE) is the lexical probability of qi
Issues in Personalization







Cold Start Problem (new user problem)
Latency Problem (new item problem)
Data sparseness
Scalability
Privacy
Recommendation List Diversity
Robustness
Conclusion


Web personalization is the need of the hour
for e-businesses
A relatively new research topic



Several issues are yet to be solved effectively
Data should be collected without evading
user privacy
Creating user models effectively and scaling
it to the size of a large number of users/
items is at the core of Personalization
Bibliography





Rohini U, Vamshi Ambati and Vasudeva Varma. Statistical
Machine Translation Models for Personalized Search. In the
Proceedings of 3rd International Joint Conference on Natural
Language Processing (IJCNLP 2008), January 7-12, 2008,
Hyderabad, India.
Sarabjot S. Anand and Bamshad Mobasher. Intelligent
techniques for web personalization. In Intelligent Techniques for
Web Personalization, pages 1-36. Springer, 2005.
Vasudeva Verma. Personalization in Information Retrieval,
Extraction and Access. In Workshop On Ontology, NLP,
Personalization And IE/IR - IIT Bombay, Mumbai 15-17 July
2008
http://en.wikipedia.org/wiki/Personalisation
Snapshots from Google Inc.
Questions