Mobile App Recommendation: Maximize the Total App Downloads Zhuohua Chen School of Economics and Management Tsinghua University [email protected] Yinghui (Catherine) Yang Graduate School of Management University of California, Davis [email protected] Hongyan Liu School of Economics and Management Tsinghua University [email protected] Research questions Smart phones are now widely used to carry out a large variety of activities. With the increasing use of smart phones, millions of smart phone applications (mobile apps) are available for download through apps markets (e.g. Apple's App Store and Google Play). Due to the large number of mobile apps available on an apps market, it is getting increasingly more difficult for smart phone users to discover apps. A well-designed app recommendation system can guide users to find relevant apps, which accelerates more app downloads that could subsequently generate profit for the apps market provider as well as app developers. An app recommendation system normally generates a list of apps (on the same screen) which are related to a specific app a user is browsing or downloading. One of the most common ways to generate recommended apps is to recommend similar apps to the focal app, which implies that the recommended apps are also similar to each other. For example, when a user is browsing a flashlight app, many other flashlight apps are recommended. Due to the similarity in functionality of the recommended apps, a user hardly has an incentive to download multiple apps listed on the same screen. Given that a mobile user could easily press multiple download buttons from the same 1 mobile screen if they see several relevant apps, such a recommendation mechanism could miss out on a lot of opportunities. Motivated by this, we should not only consider the probability of downloading an app individually when making recommendations, but also take into consideration the effect that apps have on each other. The recommended list of apps should be optimized as a whole to maximize the number of apps that a user downloads. We are working with one of the biggest Android apps markets (with around 100 million users) in China, and helping them design an app recommendation system whose goal is to maximize the number of apps users download considering that apps on the recommendation list can influence each other. To the best of our knowledge, this problem has not been studied in either the mobile app recommendation literature or the recommendation literature in general. In addition, we will be able to conduct experiments on their platform once the recommendation system is developed. Problem Formulation and Approach For a specific mobile user, the apps market tracks all the apps she has downloaded. Based on the download history, the user’s rating for each app can be inferred by applying well-developed recommendation techniques such as collaborative filtering and matrix factorization. In addition, apps viewed during current browsing session may also help to identify the apps the user could potentially be interested in downloading. Given the user’s rating for each app and her behavior in the current browsing session, we want to find the recommendation list that has the maximum expectation of the total number of downloads. This recommended list of apps will appear at the end of the screen of the app the user is currently browsing. Let n be the number of apps in the apps market, m be the number of apps viewed in current browsing session, l be the number of apps on the recommendation list. Let C = {C1, C2… Cm} refer to the apps a specific user viewed in the current browsing session ordered by time viewed. 2 Each app Ck viewed in the current session has a binary download status Dk. Dk = 1 means that Ck has been downloaded during this session, and Dk = 0 indicates that Ck was just viewed by the user. C1 is the most recently viewed app, and normally the recommended list of apps will appear at the end of the screen of the information page of C1. Let R = {R1, R2… Rl} denote the recommended apps ordered by the position on the recommendation list. Each recommended app Ri has a rating RTi for the given user (which was computed previously based on all users’ download history). Let S = {S1, S2… Sl} be the binary download status of the apps in the recommendation list. Si = 1 means that the recommended Ri is downloaded by the user, and Si = 0 indicates that the user did not download Ri. Given C, the objective function is to find an optimal R that maximizes the expectation of the summation of Si, i.e. 𝑅 ∗ = argmax{E(∑𝑙𝑖=1 𝑆𝑖 |𝐶, 𝑅)} 𝑅 To solve this problem, we first need to model the probability distribution over S conditioning on C and R. We introduce a conditional random field model, which is shown in Figure 1, to represent the conditional probability distribution. C, R S1 S2 ... Sn Figure 1. The conditional random field model The factors function of the model and their explanations are as follows. 1. The rating RTi reflects the preference of the user towards app Ri, thus it would influence the download status Si. Also the position i of Ri in the recommendation list would also influence 3 Si. We use logistic regression to model the relationship between ratings, position and download status. Factor f(Si | Ri) is introduced to capture the influence of Ri on Si. (lg(𝑅𝑇𝑖 , 𝑖))𝛼 𝑓(𝑆𝑖 |𝑅𝑖 ) = { (1 − lg(𝑅𝑇𝑖 , 𝑖))𝛼 , 𝑆𝑖 = 1 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 where lg(RTi, i) is the logistic function that returns a value within [0, 1], and α is a parameter that controls the decline rate. 2. Each Ck viewed in the current session has impact on the download status Si of Ri. Based on the download and browsing history of all users, we will be able to compute the associations between a pair of apps. Let A and B refer to any two given apps, and SA and SB are the download status of app A and B. Generally, probability of A’s download status being SA given that A is viewed can be derived as follows. 𝑃(𝑆𝐴 |A) = #𝑜𝑓𝑠𝑒𝑠𝑠𝑖𝑜𝑛𝑠𝑤ℎ𝑒𝑟𝑒𝐴𝑎𝑝𝑝𝑒𝑎𝑟𝑠𝑎𝑛𝑑𝑖𝑡𝑠𝑑𝑜𝑤𝑛𝑙𝑜𝑎𝑑𝑠𝑡𝑎𝑡𝑢𝑠𝑖𝑠𝑆𝐴 #𝑜𝑓𝑠𝑒𝑠𝑠𝑖𝑜𝑛𝑤ℎ𝑒𝑟𝑒𝐴𝑎𝑝𝑝𝑒𝑎𝑟𝑠 When the other app B is viewed previously, the probability of A’s download status SA might be influenced by the fact that B is viewed and its download status SB. Let P(SA | A, B, SB) denotes the probability of A’s download status being SA given that A and B are viewed in the same session and B’s download status is SB. 𝑃(𝑆𝐴 |𝐴, 𝐵, 𝑆𝐵 ) = #𝑜𝑓𝑠𝑒𝑠𝑠𝑖𝑜𝑛𝑠𝑤ℎ𝑒𝑟𝑒𝐴𝑎𝑛𝑑𝐵𝑏𝑜𝑡ℎ𝑎𝑝𝑝𝑒𝑎𝑟𝑎𝑛𝑑𝑡ℎ𝑒𝑖𝑟𝑑𝑜𝑤𝑛𝑙𝑜𝑎𝑑𝑠𝑡𝑎𝑡𝑢𝑠𝑎𝑟𝑒𝑆𝐴 𝑎𝑛𝑑𝑆𝐵 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦 #𝑜𝑓𝑠𝑒𝑠𝑠𝑖𝑜𝑛𝑤ℎ𝑒𝑟𝑒𝐴𝑎𝑛𝑑𝐵𝑏𝑜𝑡ℎ𝑎𝑝𝑝𝑒𝑎𝑟𝑎𝑛𝑑𝐵′ 𝑠𝑑𝑜𝑤𝑛𝑙𝑜𝑎𝑑𝑠𝑡𝑎𝑡𝑢𝑠𝑖𝑠𝑆𝐵 Now we specify app A to be Si and app B to be Ck. The ratio of P(Si | Ri, Ck, Dk) over P(Si | Ri) can be used to measure how much more or less likely Ri’s download status will be Si after Ck is viewed and Ck’s download status is Dk. Also we assume that the extent of influence of Ck on Ri’s download status Si decreases exponentially as index k increases (i.e. more recently browsed app has more influence on Si). Factor g(Si | Ri, Ck) is introduced to capture this influence of Ck on Ri’s download status Si. 𝛽⁄ 𝑘 𝑃(𝑆𝑖 |𝑅𝑖 , 𝐶𝑘 , 𝐷𝑘 ) 𝑔(𝑆𝑖 |𝑅𝑖 , 𝐶𝑘 ) = ( ) 𝑃(𝑆𝑖 |𝑅𝑖 ) 4 where β is a parameter that that controls g factor’s effect to the whole model. 3. Any two apps on the same recommendation list could affect the download status of each other. Let A and B refer to any two given apps, and SA and SB are the download status of apps A and B. The joint probability of A’s download status being SA and B’s download status being SB given that A and B are viewed can be derived as follows. 𝑃(𝑆𝐴 , 𝑆𝐵 |𝐴, 𝐵) = #𝑜𝑓𝑠𝑒𝑠𝑠𝑖𝑜𝑛𝑠𝑤ℎ𝑒𝑟𝑒𝐴𝑎𝑛𝑑𝐵𝑏𝑜𝑡ℎ𝑎𝑝𝑝𝑒𝑎𝑟𝑎𝑛𝑑𝑡ℎ𝑒𝑖𝑟𝑑𝑜𝑤𝑛𝑙𝑜𝑎𝑑𝑠𝑡𝑎𝑡𝑢𝑠𝑎𝑟𝑒𝑆𝐴 𝑎𝑛𝑑𝑆𝐵 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦 #𝑜𝑓𝑠𝑒𝑠𝑠𝑖𝑜𝑛𝑤ℎ𝑒𝑟𝑒𝐴𝑎𝑛𝑑𝐵𝑏𝑜𝑡ℎ𝑎𝑝𝑝𝑒𝑎𝑟 If the download status SA and SB are independent when A and B are viewed in the same session, the expected probability of A’s download status being SA and B’s download status being SB given that A and B are viewed is P(SA | A)⨯P(SB | B). Now we specify app A to be Si and app B to be Sj. The ratio of P(Si, Sj | Ri, Rj) over P(Si | Ri)⨯P(Sj | Rj) can be used to measure how much more or less likely Ri’s download status will be Si and Rj’s download status will be Sj when Si and Sj are viewed simultaneously. Factor h(Si, Sj|Ri, Rj) is introduced to capture this interaction. 𝛾 𝑃(𝑆𝑖 , 𝑆𝑗 |𝑅𝑖 , 𝑅𝑗 ) ℎ(𝑆𝑖 , 𝑆𝑗 |𝑅𝑖 , 𝑅𝑗 ) = ( ) 𝑃(𝑆𝑖 |𝑅𝑖 ) × 𝑃(𝑆𝑗 |𝑅𝑗 ) where γ is a parameter that controls h factor’s effect to the whole model. The joint conditional probability P(S | R, C) can be derived from the conditional random field model. 𝑙 𝑃(𝑆|𝑅, 𝐶) = 𝑙 𝑚 𝑙 𝑙 1 ∏ 𝑓(𝑆𝑖 |𝑅𝑖 ) ∏ ∏ 𝑔(𝑆𝑖 |𝑅𝑖 , 𝐶𝑘 ) ∏ ∏ ℎ(𝑆𝑖 , 𝑆𝑗 |𝑅𝑖 , 𝑅𝑗 ) 𝑍(𝜃|𝑅, 𝐶) 𝑖=1 𝑖=1 𝑘=1 𝑖=1 𝑗=1,𝑗≠𝑖 where Z(θ | R, C) is the partition function. 𝑙 𝑙 𝑚 𝑙 𝑙 𝑍(𝜃|𝑅, 𝐶) = ∑ ∑ … ∑ ∏ 𝑓(𝑆𝑖 |𝑅𝑖 ) ∏ ∏ 𝑔(𝑆𝑖 |𝑅𝑖 , 𝐶𝑘 ) ∏ ∏ ℎ(𝑆𝑖 , 𝑆𝑗 |𝑅𝑖 , 𝑅𝑗 ) 𝑆1 𝑆2 𝑆𝑙 𝑖=1 𝑖=1 𝑘=1 𝑖=1 𝑗=1 5 Since the size of the recommendation list is usually quite small, i.e. not bigger than 10, we can use the direct enumerating method to solve the inference problem of the model. We use the maximum likelihood estimation to learn all the parameters in the model. A gradient ascent method is used to iteratively find the optimal parameters given data. With the conditional probability distribution, we can then proceed to find the solution for the objective function. We can use a greedy method to quickly generate the recommendation list by picking the best apps one at a time sequentially. We first determine the best R1, and then determine the best R2 given R1, and so on so forth. When picking Ri, the apps appear before it on the list R1, R2…Ri-1 have already been chosen. At this step, we set the length of the recommendation list to i, and find the optimal Ri that makes the recommendation list (Ri, Ri…Ri-1, Ri) have the maximum expectation of total downloads. Expected contributions We are designing a mobile app recommendation system aiming to maximize the number of apps users download considering that apps on the recommendation list can influence each other. To the best of our knowledge, this problem has not been studied before. Moreover, our dataset was collected from one of the biggest Android apps market in China. We have a very detailed data to support our experiments. Once fully developed and verified both theoretically and computationally, our method will be deployed on this apps market, and we will be able to run controlled experiments in real production to measure how much better our method performs compared to the system currently in place at the apps market and other alternative methods. Very few published research on recommendation system has the opportunity to run experiments on a real recommendation platform. Current status of the manuscript 6 The model based on conditional Markov random field has been designed, and we are in the process of evaluating this model first based on offline data we have to first see whether incorporating interactions between recommended apps provides value. Once this is verified, we will first use the initial greedy based algorithm to find the best apps to recommend. Then we will fine tune our algorithm using large scale data, and then deploy on the apps market to conduct controlled experiments. 7
© Copyright 2026 Paperzz