GetJar Mobile Application Recommendations with Very Sparse

GetJar Mobile Application
Recommendations with Very Sparse
Datasets
2012 KDD
Presented by Ting-Yu Cheng
2013/1/7
GetJar
2
Outline
•
•
•
•
•
•
Introduction
GetJar Data Usage
Models
Evaluation
Discussion
Conclusion
3
Outline
•
•
•
•
•
•
Introduction
GetJar Data Usage
Models
Evaluation
Discussion
Conclusion
4
App Store Nowadays
• No personalization
• Users only browse no more than a few pages
– limits exposure for the majority of apps
• Dominate by a few apps
– They occupy the fronted pages of the store
– The apps on the recommendation list are already
known by most of the users
We need a better recommendation system !
5
Why don’t we use existing approaches ?
Because the dataset is different
6
Compare to Netflix
7
GetJar vs. Netflix
Netflix
20%
# 𝐮𝐮𝐮𝐮𝐮 𝐨𝐨 𝐭𝐭𝐭 𝐟𝐟𝐟𝐟𝐟 𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩
# 𝐮𝐮𝐮𝐮𝐮 𝐨𝐨 𝐭𝐭𝐭 𝐟𝐟𝐟𝐟𝐟 𝐦𝐦𝐦𝐦𝐦/𝐚𝐚𝐚
GetJar
0.6%
1.3%
The users of the first percentile
42%
8
The Reasons (1/2)
• Low cost of publishing app leads to
– many apps with similar functionalities
– many apps with very specific needs
• Disparity in available resources among the app
developers is larger than that of movie
producers
9
The Reasons (2/2)
• Discovery mechanisms in the app space are
less effective and mature. The ways to find
apps are only :
– Listings of apps sorted by the number of
downloads
– Category-based browsing
– Keyword-based searching
10
Challenges
• The traditional collaborative filtering
techniques and latent factor models are not
designed to handle this level of sparsity
11
Our Goal
• Recommend a top-N list of apps to each user
based on his/her recent app usage
• Use personalization to help users find a
greater variety of appealing apps
12
Outline
•
•
•
•
•
•
Introduction
GetJar Data Usage
Models
Evaluation
Discussion
Conclusion
13
GetJar Data Usage (1/2)
• Rely upon app usage
Why don’t we rely upon app installation ?
14
Usage is Better than Installation
• Many app installations are just experimental
• Many apps are uninstalled right after
installation
• Many users have lots of installed apps that
never get used
Then why don’t we rely upon app ratings ?
15
Usage vs. Ratings (1/2)
• The drawbacks of the usage
– Influenced by many factors such as mood,
unforeseen events
– There is correlation between usage and category
• e.g. social apps are used more frequently then others
16
Usage vs. Ratings (2/2)
• The drawbacks of ratings
– Since users’ tastes may change and developers
update their apps frequently, ratings may become
obsolete in a short period of time
– The ratings tend to be polarized
– It is difficult to collect large number of ratings
17
GetJar Data Usage (2/2)
• Filtering out the apps that were not used
other than on the day of installation
• Removing users that joined or left midway
during the observation period
• Only includes apps with more than 20 users
18
Outline
• Introduction
• GetJar Data Usage
• Models
–
–
–
–
Non-personalized Models
Memory-based Models
Latent Factor Models
Eigenapp Model
• Evaluation
• Discussion
• Conclusion
19
Outline
• Introduction
• GetJar Data Usage
• Models
–
–
–
–
Non-personalized Models
Memory-based Models
Latent Factor Models
Eigenapp Model
• Evaluation
• Discussion
• Conclusion
20
Non-personalized Models
• Sort items by popularity
• Serve the same list of items to all users
• A baseline algorithm in this paper
21
Outline
• Introduction
• GetJar Data Usage
• Models
–
–
–
–
Non-personalized Models
Memory-based Models
Latent Factor Models
Eigenapp Model
• Evaluation
• Discussion
• Conclusion
22
Memory-based Models
• User-based models
• Item-based models
– Our memory-based model uses the item-based
approach
– Usually far fewer items than users
– There is research showing that item-based
algorithms generally perform better than userbased algorithms
23
Common Memory-based Models
• Pearson correlation coefficient
• Cosine similarity
24
Pearson Correlation Coefficient
• Computed for a pair of items based on their
common users
– 𝑛 : # common user of item 𝑋 and item 𝑌
– 𝑋𝑖 : The usage of item 𝑋 from user 𝑖
– 𝑌𝑖 : The usage of item 𝑌 from user 𝑖
25
Pearson Correlation Coefficient
• Because of the long tail, many of items are
unlikely to share common users with most
other items
26
Cosine Similarity
• 𝑹 : 𝑚 × 𝑛 user-item matrix
• 𝑺 : item-item similarity matrix, whose entry is :
• 𝑡𝑢, 𝑖 : Affinity between user 𝑢 and item 𝑖
where I𝑢 is the set of items used by 𝑢
27
Cosine Similarity
• Still suffers from low overlap support. The
closest neighbors for a less popular item will
often occur by coincidence simply because
they are only ones that produced non-zero
similarity scores
28
Optimize Cosine Similarity Method
• Z-score
• For each item candidate 𝑖, we only consider 𝑙
nearest items, which are those with the
greatest normalized similarly scores to 𝑖. This
has the effect of noise reduction
• For the GetJar dataset, we find 𝑙 = 5 is the best
29
Outline
• Introduction
• GetJar Data Usage
• Models
–
–
–
–
Non-personalized Models
Memory-based Models
Latent Factor Models
Eigenapp Model
• Evaluation
• Discussion
• Conclusion
30
Latent Factor Models
• Factorizing the user-item matrix to user
factors and item factors
• Often be used for rating prediction
• We substituting days of usage for ratings and
find that the results are by far the worst of all
algorithms
– It’s questionable that these models could perform
well when there is no explicit feedback like ratings
31
PureSVD1
• The only latent factor algorithm that generate
reasonable recommendation
P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-n
recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems,
32
RecSys ’10, pages 39–46, New York, NY, USA, 2010. ACM
1
PureSVD
• Replacing all missing value in 𝑹 with 0
• Factorizing 𝑹 with SVD
• Affinity between user u and item i
= 𝑼’𝑢,∗ ⋅ 𝜮′ ⋅ (𝑽𝑽𝑖 ,∗)𝑻
– 𝑸 : top k singular vectors extracted from 𝑽
– 𝒒𝑖 : the row in 𝑸 corresponding to item 𝑖
33
PureSVD
• PureSVD views 𝑡𝑢, 𝑖 as relative scores instead
of the real predicted value
• It is better to replace all missing values with 0
and use relative scores than replacing them
with other estimation
34
Outline
• Introduction
• GetJar Data Usage
• Models
–
–
–
–
Non-personalized Models
Memory-based Models
Latent Factor Models
Eigenapp Model
• Evaluation
• Discussion
• Conclusion
35
Eigenapp Model
Memory-based Model
Latent Factor Model
• Improve the result of memory-based models
by borrowing ideas from the latent factor
models
36
Steps of Eigenapp Model
Replacing all missing values in 𝑹 with 0
Normalization
PCA
𝑹 -> 𝑹𝑹
Memory-based Model
37
Steps of Eigenapp Model
Replacing all missing values in 𝑹 with 0
Normalization
PCA
𝑹 -> 𝑹𝑹
Memory-based Model
38
Recall PCA (1/2)
• A dimension reduction method
• The goal is to find k vectors such that after
projecting the instance onto each vector, the
new data helps the classifier/regressor most
39
Recall PCA (2/2)
The eigenvector of
covariance matrix
corresponding to the
largest eigenvalue
40
PCA
• Uses eigen decomposition of the covariance
matrix 𝑪
– 𝑨 is the matrix with each cell
• 𝑪 is a 𝑚 × 𝑚 matrix (𝑚 is the number of users),
which is too large to make eigen decomposition
41
Eigen Decomposition of 𝑪
• Since the number of items n is likely to be
much lower, let’s make eigen decomposition
on the 𝑛 × 𝑛 matrix 𝑨𝑇𝑨 to optimize the
process
Eigenvector vj of 𝑪
• Normalize each vj to length 1 and keep only the
k eigenvectors with the highest eigenvalues
42
Eigenapp
• Let’s denote these eigenvectors as eigenapps
• Finally, project all the items onto the reduced
eigenspace. We get a new dataset 𝑫
43
Steps of Eigenapp Model
Replacing all missing values in 𝑹 with 0
Normalization
PCA
𝑹 -> 𝑹𝑹
Memory-based Model
44
Apply Memory-based Model
• Item-item similarity can be computed as the
same way
• Since 𝑫 is dense, similarity scores will likely be
non-zero for all item pairs
• The remainder is identical to the described
memory-based algorithm
45
Similar Works
• Eigentaste1
– Use Jester joke dataset
– Requires a gauge item set where every user have
rated every item in the gauge set
– Use user-based approach
K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigentaste: A constant time
collaborative filtering algorithm. Inf. Retr., 4(2):133–151, July 2001.
1
46
Outline
•
•
•
•
•
•
Introduction
GetJar Data Usage
Models
Evaluation
Discussion
Conclusion
47
Evaluation
• Use GetJar dataset
– Dividing the users into five equal sized groups.
Four for training, one for evaluation
– Repeat 5 times, and every group will be used as
the evaluation group once
• Two versions of 𝑹 :
– Day of usage
– Binary version
48
Evaluation
• For each user in the test set, we sort the apps
by install time
• Feed the first M − 1 apps to the model to
generate its recommendation list of 𝑁 apps.
Then we check if the left out app is in the
recommended list
49
Evaluation Criteria
• Accuracy
• Ability to recommend tail apps
• The variety of the apps it recommends
50
Accuracy
• Set ℎ𝑢 equals to 1 if the relevant item is in the
top-N list for user 𝑢 and 0 otherwise
51
Accuracy
52
Accuracy of Less Popular Items
• Excluding the 100 most popular apps from the
recommended list of each user
• Removing the users whose missing app is one
of the 100 most apps
53
Accuracy of Less Popular Items
54
Diversity
• Judging the diversity of items each algorithm
presented
55
Variety
• We measure variety by calculating the entropy:
– 𝑝(𝑖) is the relative frequency of item 𝑖 among the
top-N recommended items for all users
56
Variety
57
Outline
•
•
•
•
•
•
Introduction
GetJar Data Usage
Models
Evaluation
Discussion
Conclusion
58
Discussion
• The memory-based model performed
surprisingly well under precision-recall
because it recommends a small set of popular
apps to almost every user
59
Discussion
• Eigenapp model performed the best in
accuracy and promotion of less popular apps
• The eigenapp model recommend a diverse list
of apps
– All item vectors are normalized prior to applying
PCA, thus usage of less popular apps can be
captured by the top eigenvectors
60
Discussion
• A limitation of Eigenapp model is that it
includes only apps with certain minimum of
usage, a condition that most apps do not
satisfy
61
Outline
•
•
•
•
•
•
Introduction
GetJar Data Usage
Models
Evaluation
Discussion
Conclusion
62
Conclusion
• We find most recent research to be focus on
the movie domain where interest is explicitly
expressed through ratings. It is questionable
how well these models would translate into
other domains
• Because Eigenapp model still has limitation,
we are currently exploring content-based
models that extract useful features from app
metadata in future work
63
Conclusion
• It didn’t explain clearly why most of latent
factor models perform badly when replacing
ratings with day of usage
• The model it proposed just combines several
methods we are familiar
64