GetJar Mobile Application Recommendations with Very Sparse Datasets 2012 KDD Presented by Ting-Yu Cheng 2013/1/7 GetJar 2 Outline • • • • • • Introduction GetJar Data Usage Models Evaluation Discussion Conclusion 3 Outline • • • • • • Introduction GetJar Data Usage Models Evaluation Discussion Conclusion 4 App Store Nowadays • No personalization • Users only browse no more than a few pages – limits exposure for the majority of apps • Dominate by a few apps – They occupy the fronted pages of the store – The apps on the recommendation list are already known by most of the users We need a better recommendation system ! 5 Why don’t we use existing approaches ? Because the dataset is different 6 Compare to Netflix 7 GetJar vs. Netflix Netflix 20% # 𝐮𝐮𝐮𝐮𝐮 𝐨𝐨 𝐭𝐭𝐭 𝐟𝐟𝐟𝐟𝐟 𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 # 𝐮𝐮𝐮𝐮𝐮 𝐨𝐨 𝐭𝐭𝐭 𝐟𝐟𝐟𝐟𝐟 𝐦𝐦𝐦𝐦𝐦/𝐚𝐚𝐚 GetJar 0.6% 1.3% The users of the first percentile 42% 8 The Reasons (1/2) • Low cost of publishing app leads to – many apps with similar functionalities – many apps with very specific needs • Disparity in available resources among the app developers is larger than that of movie producers 9 The Reasons (2/2) • Discovery mechanisms in the app space are less effective and mature. The ways to find apps are only : – Listings of apps sorted by the number of downloads – Category-based browsing – Keyword-based searching 10 Challenges • The traditional collaborative filtering techniques and latent factor models are not designed to handle this level of sparsity 11 Our Goal • Recommend a top-N list of apps to each user based on his/her recent app usage • Use personalization to help users find a greater variety of appealing apps 12 Outline • • • • • • Introduction GetJar Data Usage Models Evaluation Discussion Conclusion 13 GetJar Data Usage (1/2) • Rely upon app usage Why don’t we rely upon app installation ? 14 Usage is Better than Installation • Many app installations are just experimental • Many apps are uninstalled right after installation • Many users have lots of installed apps that never get used Then why don’t we rely upon app ratings ? 15 Usage vs. Ratings (1/2) • The drawbacks of the usage – Influenced by many factors such as mood, unforeseen events – There is correlation between usage and category • e.g. social apps are used more frequently then others 16 Usage vs. Ratings (2/2) • The drawbacks of ratings – Since users’ tastes may change and developers update their apps frequently, ratings may become obsolete in a short period of time – The ratings tend to be polarized – It is difficult to collect large number of ratings 17 GetJar Data Usage (2/2) • Filtering out the apps that were not used other than on the day of installation • Removing users that joined or left midway during the observation period • Only includes apps with more than 20 users 18 Outline • Introduction • GetJar Data Usage • Models – – – – Non-personalized Models Memory-based Models Latent Factor Models Eigenapp Model • Evaluation • Discussion • Conclusion 19 Outline • Introduction • GetJar Data Usage • Models – – – – Non-personalized Models Memory-based Models Latent Factor Models Eigenapp Model • Evaluation • Discussion • Conclusion 20 Non-personalized Models • Sort items by popularity • Serve the same list of items to all users • A baseline algorithm in this paper 21 Outline • Introduction • GetJar Data Usage • Models – – – – Non-personalized Models Memory-based Models Latent Factor Models Eigenapp Model • Evaluation • Discussion • Conclusion 22 Memory-based Models • User-based models • Item-based models – Our memory-based model uses the item-based approach – Usually far fewer items than users – There is research showing that item-based algorithms generally perform better than userbased algorithms 23 Common Memory-based Models • Pearson correlation coefficient • Cosine similarity 24 Pearson Correlation Coefficient • Computed for a pair of items based on their common users – 𝑛 : # common user of item 𝑋 and item 𝑌 – 𝑋𝑖 : The usage of item 𝑋 from user 𝑖 – 𝑌𝑖 : The usage of item 𝑌 from user 𝑖 25 Pearson Correlation Coefficient • Because of the long tail, many of items are unlikely to share common users with most other items 26 Cosine Similarity • 𝑹 : 𝑚 × 𝑛 user-item matrix • 𝑺 : item-item similarity matrix, whose entry is : • 𝑡𝑢, 𝑖 : Affinity between user 𝑢 and item 𝑖 where I𝑢 is the set of items used by 𝑢 27 Cosine Similarity • Still suffers from low overlap support. The closest neighbors for a less popular item will often occur by coincidence simply because they are only ones that produced non-zero similarity scores 28 Optimize Cosine Similarity Method • Z-score • For each item candidate 𝑖, we only consider 𝑙 nearest items, which are those with the greatest normalized similarly scores to 𝑖. This has the effect of noise reduction • For the GetJar dataset, we find 𝑙 = 5 is the best 29 Outline • Introduction • GetJar Data Usage • Models – – – – Non-personalized Models Memory-based Models Latent Factor Models Eigenapp Model • Evaluation • Discussion • Conclusion 30 Latent Factor Models • Factorizing the user-item matrix to user factors and item factors • Often be used for rating prediction • We substituting days of usage for ratings and find that the results are by far the worst of all algorithms – It’s questionable that these models could perform well when there is no explicit feedback like ratings 31 PureSVD1 • The only latent factor algorithm that generate reasonable recommendation P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems, 32 RecSys ’10, pages 39–46, New York, NY, USA, 2010. ACM 1 PureSVD • Replacing all missing value in 𝑹 with 0 • Factorizing 𝑹 with SVD • Affinity between user u and item i = 𝑼’𝑢,∗ ⋅ 𝜮′ ⋅ (𝑽𝑽𝑖 ,∗)𝑻 – 𝑸 : top k singular vectors extracted from 𝑽 – 𝒒𝑖 : the row in 𝑸 corresponding to item 𝑖 33 PureSVD • PureSVD views 𝑡𝑢, 𝑖 as relative scores instead of the real predicted value • It is better to replace all missing values with 0 and use relative scores than replacing them with other estimation 34 Outline • Introduction • GetJar Data Usage • Models – – – – Non-personalized Models Memory-based Models Latent Factor Models Eigenapp Model • Evaluation • Discussion • Conclusion 35 Eigenapp Model Memory-based Model Latent Factor Model • Improve the result of memory-based models by borrowing ideas from the latent factor models 36 Steps of Eigenapp Model Replacing all missing values in 𝑹 with 0 Normalization PCA 𝑹 -> 𝑹𝑹 Memory-based Model 37 Steps of Eigenapp Model Replacing all missing values in 𝑹 with 0 Normalization PCA 𝑹 -> 𝑹𝑹 Memory-based Model 38 Recall PCA (1/2) • A dimension reduction method • The goal is to find k vectors such that after projecting the instance onto each vector, the new data helps the classifier/regressor most 39 Recall PCA (2/2) The eigenvector of covariance matrix corresponding to the largest eigenvalue 40 PCA • Uses eigen decomposition of the covariance matrix 𝑪 – 𝑨 is the matrix with each cell • 𝑪 is a 𝑚 × 𝑚 matrix (𝑚 is the number of users), which is too large to make eigen decomposition 41 Eigen Decomposition of 𝑪 • Since the number of items n is likely to be much lower, let’s make eigen decomposition on the 𝑛 × 𝑛 matrix 𝑨𝑇𝑨 to optimize the process Eigenvector vj of 𝑪 • Normalize each vj to length 1 and keep only the k eigenvectors with the highest eigenvalues 42 Eigenapp • Let’s denote these eigenvectors as eigenapps • Finally, project all the items onto the reduced eigenspace. We get a new dataset 𝑫 43 Steps of Eigenapp Model Replacing all missing values in 𝑹 with 0 Normalization PCA 𝑹 -> 𝑹𝑹 Memory-based Model 44 Apply Memory-based Model • Item-item similarity can be computed as the same way • Since 𝑫 is dense, similarity scores will likely be non-zero for all item pairs • The remainder is identical to the described memory-based algorithm 45 Similar Works • Eigentaste1 – Use Jester joke dataset – Requires a gauge item set where every user have rated every item in the gauge set – Use user-based approach K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigentaste: A constant time collaborative filtering algorithm. Inf. Retr., 4(2):133–151, July 2001. 1 46 Outline • • • • • • Introduction GetJar Data Usage Models Evaluation Discussion Conclusion 47 Evaluation • Use GetJar dataset – Dividing the users into five equal sized groups. Four for training, one for evaluation – Repeat 5 times, and every group will be used as the evaluation group once • Two versions of 𝑹 : – Day of usage – Binary version 48 Evaluation • For each user in the test set, we sort the apps by install time • Feed the first M − 1 apps to the model to generate its recommendation list of 𝑁 apps. Then we check if the left out app is in the recommended list 49 Evaluation Criteria • Accuracy • Ability to recommend tail apps • The variety of the apps it recommends 50 Accuracy • Set ℎ𝑢 equals to 1 if the relevant item is in the top-N list for user 𝑢 and 0 otherwise 51 Accuracy 52 Accuracy of Less Popular Items • Excluding the 100 most popular apps from the recommended list of each user • Removing the users whose missing app is one of the 100 most apps 53 Accuracy of Less Popular Items 54 Diversity • Judging the diversity of items each algorithm presented 55 Variety • We measure variety by calculating the entropy: – 𝑝(𝑖) is the relative frequency of item 𝑖 among the top-N recommended items for all users 56 Variety 57 Outline • • • • • • Introduction GetJar Data Usage Models Evaluation Discussion Conclusion 58 Discussion • The memory-based model performed surprisingly well under precision-recall because it recommends a small set of popular apps to almost every user 59 Discussion • Eigenapp model performed the best in accuracy and promotion of less popular apps • The eigenapp model recommend a diverse list of apps – All item vectors are normalized prior to applying PCA, thus usage of less popular apps can be captured by the top eigenvectors 60 Discussion • A limitation of Eigenapp model is that it includes only apps with certain minimum of usage, a condition that most apps do not satisfy 61 Outline • • • • • • Introduction GetJar Data Usage Models Evaluation Discussion Conclusion 62 Conclusion • We find most recent research to be focus on the movie domain where interest is explicitly expressed through ratings. It is questionable how well these models would translate into other domains • Because Eigenapp model still has limitation, we are currently exploring content-based models that extract useful features from app metadata in future work 63 Conclusion • It didn’t explain clearly why most of latent factor models perform badly when replacing ratings with day of usage • The model it proposed just combines several methods we are familiar 64
© Copyright 2026 Paperzz