NLP Course Seminar
WEB PERSONALIZATION
Group 14
Vishaal Jatav (04d05013)
Varun Garg (04d05015)
Roadmap
Motivation
Introduction
The Personalization Process
Personalization Approaches
Personalization Techniques
Issues
Conclusion
Motivation
Some Facts
Overwhelming amount of information on web
Not all the documents are relevant to the user
Users cannot convey their information needs
Users never find any document 100% relevant
Users expect more personal behavior
I don't want results of Delhi when I am in Bombay.
I was looking for crane (the bird) not crane (the machine).
Google Customization
Google (without personalization)
Google (with personalization)
Google Search History
Google Search History
Introduction
Personalization
React differently to different users
System reacts in a way the users want it to
Ultimately bring back the user to the system
Web Personalization
Apply machine learning and data mining
Build models of user behavior (called profiles)
Predict user's needs and expectations
Adaptively estimate better models
The Personalization Process
Consider the following pieces of information
Geographical Location
Age, gender, ethnicity, religion, etc.
Interests
Previous reviews on products
......
How could these pieces of information help?
How to collect these information?
The Personalization Process
(Contd...)
Collect lots of information on the user behavior
Decide on a user model
Information must be attributable to a single user
Featuring user needs, lifestyle, situations, etc.
Create user profile for each user of the system
Profile captures the individuality of the user
Habits, browsing behavior, lifestyle, etc.
With every interaction, modify the user profile
The Personalization Process
More Formally
Web is a collection of n items I = {i1,i2,....in}
User comes from a set U = {u1,u2,...um}
User has rated each item by ruk : I → [0,1] U !
where, ij = ! means ij is not rated by the user
Ik(u) is set of items not yet rated by user uk
Ik(r) is set of items rated by user uk
GOAL: recommend items ij to user ua that are
present in Ia(u), which might be of his interest
Classification of Personalization
Approaches
Individual Vs Collaborative
Reactive Vs Proactive
User Vs Item Information
Classification of Personalization Approaches
Individual Vs Collaborative
Individual approach (Google Personalized Search)
Use only individual user's data
Generate user profile by analyzing
Advantage
User's browsing behavior
User's active feedback on the system
Can be implemented on the client-side - no privacy
violation
Disadvantage
Based only on past interactions – lack of serendipity
Classification of Personalization Approaches
Individual Vs Collaborative
Contd...
Collaborative approach (Amazon recommendations)
Find the neighborhood of the active user
React according to an assumption
Disadvantages
If A is like B, then B likes the same things as A likes
New item rating problem
New user problem
Advantage
Better than individual approach - Once the two problems are
solved.
Classification of Personalization Approaches
Reactive Vs Proactive
Reactive approach
Explicitly ask user for preferences
Either in the form of query or feedback
Proactive approach
Learn user preferences by user behavior
No explicit preference demand from the user
Behavior is extracted
Click-through rates
Navigational pattern
Classification of Personalization Approaches
User Vs Item Information
User Information
Geographic location (from IP address)
age, gender, marital status, etc (explicit query)
Lifestyle, etc. (inference from past behavior)
Item Information
Content of Topics – movie genre, etc.
Product/ domain ontology
Personalization Techniques
Content-Based Filtering
Collaborative Filtering
Model Based Personalization
Rule based
Graph theoretic
Language Model
Content-Based Filtering
Syskill and Webert use explicit feedback
Letizia uses implicit feedback
Individual, Reactive, Item-information
Uses naïve Bayes to distinguish likes from dislikes
Initial probabilities updated with new interactions
Uses 128 most informative words from each item
Individual, Proactive, Item-information
Find likes/dislikes based on tf-idf similarity
Others use nearest-neighborhood for similarity
Collaborative Filtering
Found successful in recommendation systems
General Technique
For every user, a user neighborhood is computed
Get candidate items for recommendations
Neighborhood contains users who have rated several
items almost equally
Items seen by the neighborhood but not by active user ua
Data is stored in the form of a rating matrix
Items as rows and users as columns
Collaborative Filtering
Contd....
System must provide the following algorithms
Measure similarity between users
Predicting rank of the item not rated by the user
For creation of the neighborhood
Pearson and Spearman Correlation, cosine similarity, etc.
To decide order with which these items will be presented
Weighted sum of ranks – most common
Select neighborhood subset for prediction
To reduce large amount of computation
Threshold in similarity value – most common
Model Based Personalization
Approaches
Executed in two stages
Common data used for model generation
Offline process – to create the actual model
Online process – using the model and interaction
Web usage data (web history, click-through rates, etc.)
Item's structure and content data
Examples
Rule-Based Models
Graph-Theoretic Models
Language Models
Model Based Personalization
Rule Based Models
Association rule-based
Sequence rule-based
Item ia is in unordered association with ib
If user considers ib, then ia is a good recommendation
Item ia is in sequential association with ib
If user considers ia, then ib is a good recommendation
Association between items can be stored as a
dependency graph
Model Based Personalization
Graph Theoretic Model
Ratings data is transformed into a directed graph
Nodes are users
A edge between ui and uj means that ui predicts uj
Weights on edges represents the predictability
To predict if an item ik will be of interest to ui
Calculate shortest path from ui to any user ur
Where ur has rated ik
Predicted rating is calculated as a function of path between
ui and ur
Model Based Personalization
Language Modeling Approaches
Without using user's relevance feedback
Simple language modeling
Using user's relevance feedback
N gram based methods
Noisy channel model based method
Language Model Approach
Simple Language Modeling
Without using user's feedback
History consists of all the words in the past
queries
Learn User Profile as {(w1,P(w1)),... (wn,P(wn))}
where
Language Model Approach
Simple Language Modeling
Sample User profile
Language Model Approach
Simple Language Modeling
Re-ranking of unpersonalized results
Re-ranking is done according to P(Q|D,u)
α Is a weighter parameter between 0 and 1
UP is user profile
Language Model Approach
N gram based approach
Using user's relevance feedback
Learn User Profile
Let Hu represent the search history of user u
H = {(q1, rf1), (q2, rf2), (q3, rf3), ...., (qn, rfn)}
Unigram
Now the user profile consists of
{(w1, P(w1)), (w2, P(w2)), (w3, P(w3)), ...., (wn, P(wn))}
Language Model Approach
N gram based approach
Sample Unigram User Profile
Language Model Approach
N gram based approach
Bigram
the user profile consists of
{(w1w2, P(w2|w1)), (w2w3, P(w3|w2)), ... , (wn-1wn, P(wn|wn-1))}
Language Model Approach
N gram based approach
Sample Bigram User Profile
Language Model Approach
N gram based approach
Re-ranking unpersonalized results
Based on unigram (α = weighting parameter)
Q = q1 q2 q3 .... qn
P(q1 q2 q3 .... qn)= P(q1) P(q2) P(q3) ....... P(qn)
Language Model Approach
N gram based approach
Based on bigrams
Q = q1 q2 q3 .... qn
P(q1 q2 q3 .... qn)= P(q1|q2) P(q2|q3) ....... P(qn-1|qn)
Language Model Approach
Noisy Channel based approach
With using User's Feedback (Implicit)
User history is represented as
Hi = (Q1,D1) , (Q2,D2) , .... (QN,DN)
Di is the document visited for Qi
D consists of words w1, w2, .... wm
Basic Idea – Statistical Machine Translation
Given Parallel Text of languages S and T
We get P(ti|si) ∀ si ϵ S and ti ϵ T
Using EM we get the optimized model P(T|S)
Language Model Approach
Noisy Channel based approach
Similarly
Assumption
T = past queries Q1, Q2, .... QK
S = text of relevant documents for queries T
We learn the model P(Q|D) or more precisely P(qi|wj)
Translate the ideal [information containing] document into a query
Document – a verbose language
Query – a compact language
User profile is stored as
Tuples < qi , wj , P(qi|wj) >
Language Model Approach
Noisy Channel based approach
Sample Noisy Channel User Profile
Language Model Approach
Noisy Channel based approach
Re-ranking
Re-rank the documents using P(Q|D,u)
α = weighting parameter
P(qi|GE) is the lexical probability of qi
Issues in Personalization
Cold Start Problem (new user problem)
Latency Problem (new item problem)
Data sparseness
Scalability
Privacy
Recommendation List Diversity
Robustness
Conclusion
Web personalization is the need of the hour
for e-businesses
A relatively new research topic
Several issues are yet to be solved effectively
Data should be collected without evading
user privacy
Creating user models effectively and scaling
it to the size of a large number of users/
items is at the core of Personalization
Bibliography
Rohini U, Vamshi Ambati and Vasudeva Varma. Statistical
Machine Translation Models for Personalized Search. In the
Proceedings of 3rd International Joint Conference on Natural
Language Processing (IJCNLP 2008), January 7-12, 2008,
Hyderabad, India.
Sarabjot S. Anand and Bamshad Mobasher. Intelligent
techniques for web personalization. In Intelligent Techniques for
Web Personalization, pages 1-36. Springer, 2005.
Vasudeva Verma. Personalization in Information Retrieval,
Extraction and Access. In Workshop On Ontology, NLP,
Personalization And IE/IR - IIT Bombay, Mumbai 15-17 July
2008
http://en.wikipedia.org/wiki/Personalisation
Snapshots from Google Inc.
Questions
© Copyright 2026 Paperzz