Mobile App Recommendation: Maximize the Total App Downloads

Mobile App Recommendation: Maximize the Total App Downloads
Zhuohua Chen
School of Economics and Management
Tsinghua University
[email protected]
Yinghui (Catherine) Yang
Graduate School of Management
University of California, Davis
[email protected]
Hongyan Liu
School of Economics and Management
Tsinghua University
[email protected]
Research questions
Smart phones are now widely used to carry out a large variety of activities. With the increasing
use of smart phones, millions of smart phone applications (mobile apps) are available for download
through apps markets (e.g. Apple's App Store and Google Play). Due to the large number of mobile
apps available on an apps market, it is getting increasingly more difficult for smart phone users to
discover apps. A well-designed app recommendation system can guide users to find relevant apps,
which accelerates more app downloads that could subsequently generate profit for the apps market
provider as well as app developers.
An app recommendation system normally generates a list of apps (on the same screen)
which are related to a specific app a user is browsing or downloading. One of the most common
ways to generate recommended apps is to recommend similar apps to the focal app, which implies
that the recommended apps are also similar to each other. For example, when a user is browsing a
flashlight app, many other flashlight apps are recommended. Due to the similarity in functionality
of the recommended apps, a user hardly has an incentive to download multiple apps listed on the
same screen. Given that a mobile user could easily press multiple download buttons from the same
1
mobile screen if they see several relevant apps, such a recommendation mechanism could miss out
on a lot of opportunities. Motivated by this, we should not only consider the probability of
downloading an app individually when making recommendations, but also take into consideration
the effect that apps have on each other. The recommended list of apps should be optimized as a
whole to maximize the number of apps that a user downloads.
We are working with one of the biggest Android apps markets (with around 100 million
users) in China, and helping them design an app recommendation system whose goal is to
maximize the number of apps users download considering that apps on the recommendation list
can influence each other. To the best of our knowledge, this problem has not been studied in either
the mobile app recommendation literature or the recommendation literature in general. In addition,
we will be able to conduct experiments on their platform once the recommendation system is
developed.
Problem Formulation and Approach
For a specific mobile user, the apps market tracks all the apps she has downloaded. Based on the
download history, the user’s rating for each app can be inferred by applying well-developed
recommendation techniques such as collaborative filtering and matrix factorization. In addition,
apps viewed during current browsing session may also help to identify the apps the user could
potentially be interested in downloading. Given the user’s rating for each app and her behavior in
the current browsing session, we want to find the recommendation list that has the maximum
expectation of the total number of downloads. This recommended list of apps will appear at the
end of the screen of the app the user is currently browsing.
Let n be the number of apps in the apps market, m be the number of apps viewed in current
browsing session, l be the number of apps on the recommendation list. Let C = {C1, C2… Cm}
refer to the apps a specific user viewed in the current browsing session ordered by time viewed.
2
Each app Ck viewed in the current session has a binary download status Dk. Dk = 1 means that Ck
has been downloaded during this session, and Dk = 0 indicates that Ck was just viewed by the user.
C1 is the most recently viewed app, and normally the recommended list of apps will appear at the
end of the screen of the information page of C1. Let R = {R1, R2… Rl} denote the recommended
apps ordered by the position on the recommendation list. Each recommended app Ri has a rating
RTi for the given user (which was computed previously based on all users’ download history). Let
S = {S1, S2… Sl} be the binary download status of the apps in the recommendation list. Si = 1 means
that the recommended Ri is downloaded by the user, and Si = 0 indicates that the user did not
download Ri. Given C, the objective function is to find an optimal R that maximizes the expectation
of the summation of Si, i.e.
𝑅 ∗ = argmax{E(∑𝑙𝑖=1 𝑆𝑖 |𝐶, 𝑅)}
𝑅
To solve this problem, we first need to model the probability distribution over S conditioning on
C and R. We introduce a conditional random field model, which is shown in Figure 1, to represent
the conditional probability distribution.
C, R
S1
S2
...
Sn
Figure 1. The conditional random field model
The factors function of the model and their explanations are as follows.
1.
The rating RTi reflects the preference of the user towards app Ri, thus it would influence the
download status Si. Also the position i of Ri in the recommendation list would also influence
3
Si. We use logistic regression to model the relationship between ratings, position and download
status. Factor f(Si | Ri) is introduced to capture the influence of Ri on Si.
(lg⁡(𝑅𝑇𝑖 , 𝑖))𝛼
𝑓(𝑆𝑖 |𝑅𝑖 ) = {
(1 − lg⁡(𝑅𝑇𝑖 , 𝑖))𝛼
, ⁡𝑆𝑖 = 1⁡⁡⁡⁡⁡⁡⁡⁡
, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where lg(RTi, i) is the logistic function that returns a value within [0, 1], and α is a parameter
that controls the decline rate.
2.
Each Ck viewed in the current session has impact on the download status Si of Ri. Based on
the download and browsing history of all users, we will be able to compute the associations
between a pair of apps. Let A and B refer to any two given apps, and SA and SB are the download
status of app A and B. Generally, probability of A’s download status being SA given that A is
viewed can be derived as follows.
𝑃(𝑆𝐴 |A) =
#⁡𝑜𝑓⁡𝑠𝑒𝑠𝑠𝑖𝑜𝑛𝑠⁡𝑤ℎ𝑒𝑟𝑒⁡𝐴⁡𝑎𝑝𝑝𝑒𝑎𝑟𝑠⁡𝑎𝑛𝑑⁡𝑖𝑡𝑠⁡𝑑𝑜𝑤𝑛𝑙𝑜𝑎𝑑⁡𝑠𝑡𝑎𝑡𝑢𝑠⁡𝑖𝑠⁡𝑆𝐴
#⁡𝑜𝑓⁡𝑠𝑒𝑠𝑠𝑖𝑜𝑛⁡𝑤ℎ𝑒𝑟𝑒⁡𝐴⁡𝑎𝑝𝑝𝑒𝑎𝑟𝑠
When the other app B is viewed previously, the probability of A’s download status SA might
be influenced by the fact that B is viewed and its download status SB. Let P(SA | A, B, SB)
denotes the probability of A’s download status being SA given that A and B are viewed in the
same session and B’s download status is SB.
𝑃(𝑆𝐴 |𝐴, 𝐵, 𝑆𝐵 ) =
#⁡𝑜𝑓⁡𝑠𝑒𝑠𝑠𝑖𝑜𝑛𝑠⁡𝑤ℎ𝑒𝑟𝑒⁡𝐴⁡𝑎𝑛𝑑⁡𝐵⁡𝑏𝑜𝑡ℎ⁡𝑎𝑝𝑝𝑒𝑎𝑟⁡𝑎𝑛𝑑⁡𝑡ℎ𝑒𝑖𝑟⁡𝑑𝑜𝑤𝑛𝑙𝑜𝑎𝑑⁡𝑠𝑡𝑎𝑡𝑢𝑠⁡𝑎𝑟𝑒⁡𝑆𝐴 ⁡𝑎𝑛𝑑⁡𝑆𝐵 ⁡𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦
#⁡𝑜𝑓⁡𝑠𝑒𝑠𝑠𝑖𝑜𝑛⁡𝑤ℎ𝑒𝑟𝑒⁡𝐴⁡𝑎𝑛𝑑⁡𝐵⁡𝑏𝑜𝑡ℎ⁡𝑎𝑝𝑝𝑒𝑎𝑟⁡𝑎𝑛𝑑⁡𝐵′ 𝑠⁡𝑑𝑜𝑤𝑛𝑙𝑜𝑎𝑑⁡𝑠𝑡𝑎𝑡𝑢𝑠⁡𝑖𝑠⁡𝑆𝐵
Now we specify app A to be Si and app B to be Ck. The ratio of P(Si | Ri, Ck, Dk) over P(Si | Ri)
can be used to measure how much more or less likely Ri’s download status will be Si after Ck
is viewed and Ck’s download status is Dk. Also we assume that the extent of influence of Ck
on Ri’s download status Si decreases exponentially as index k increases (i.e. more recently
browsed app has more influence on Si). Factor g(Si⁡ |⁡ Ri, Ck) is introduced to capture this
influence of Ck on Ri’s download status Si.
𝛽⁄
𝑘
𝑃(𝑆𝑖 |𝑅𝑖 , 𝐶𝑘 , 𝐷𝑘 )
𝑔(𝑆𝑖 |𝑅𝑖 , 𝐶𝑘 ) = (
)
𝑃(𝑆𝑖 |𝑅𝑖 )
4
where β is a parameter that that controls g factor’s effect to the whole model.
3.
Any two apps on the same recommendation list could affect the download status of each other.
Let A and B refer to any two given apps, and SA and SB are the download status of apps A and
B. The joint probability of A’s download status being SA and B’s download status being SB
given that A and B are viewed can be derived as follows.
𝑃(𝑆𝐴 , 𝑆𝐵 |𝐴, 𝐵) =
#⁡𝑜𝑓⁡𝑠𝑒𝑠𝑠𝑖𝑜𝑛𝑠⁡𝑤ℎ𝑒𝑟𝑒⁡𝐴⁡𝑎𝑛𝑑⁡𝐵⁡𝑏𝑜𝑡ℎ⁡𝑎𝑝𝑝𝑒𝑎𝑟⁡𝑎𝑛𝑑⁡𝑡ℎ𝑒𝑖𝑟⁡𝑑𝑜𝑤𝑛𝑙𝑜𝑎𝑑⁡𝑠𝑡𝑎𝑡𝑢𝑠⁡𝑎𝑟𝑒⁡𝑆𝐴 ⁡𝑎𝑛𝑑⁡𝑆𝐵 ⁡𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦
#⁡𝑜𝑓⁡𝑠𝑒𝑠𝑠𝑖𝑜𝑛⁡𝑤ℎ𝑒𝑟𝑒⁡𝐴⁡𝑎𝑛𝑑⁡𝐵⁡𝑏𝑜𝑡ℎ⁡𝑎𝑝𝑝𝑒𝑎𝑟
If the download status SA and SB are independent when A and B are viewed in the same session,
the expected probability of A’s download status being SA and B’s download status being SB
given that A and B are viewed is P(SA | A)⨯P(SB | B). Now we specify app A to be Si and app
B to be Sj. The ratio of P(Si, Sj | Ri, Rj) over P(Si | Ri)⨯P(Sj | Rj) can be used to measure how
much more or less likely Ri’s download status will be Si and Rj’s download status will be Sj
when Si and Sj are viewed simultaneously. Factor h(Si, Sj⁡|⁡Ri, Rj) is introduced to capture this
interaction.
𝛾
𝑃(𝑆𝑖 , 𝑆𝑗 |𝑅𝑖 , 𝑅𝑗 )
ℎ(𝑆𝑖 , 𝑆𝑗 |𝑅𝑖 , 𝑅𝑗 ) = (
)
𝑃(𝑆𝑖 |𝑅𝑖 ) × 𝑃(𝑆𝑗 |𝑅𝑗 )
where γ is a parameter that controls h factor’s effect to the whole model.
The joint conditional probability P(S | R, C) can be derived from the conditional random field
model.
𝑙
𝑃(𝑆|𝑅, 𝐶) =
𝑙
𝑚
𝑙
𝑙
1
∏ 𝑓(𝑆𝑖 |𝑅𝑖 ) ∏ ∏ 𝑔(𝑆𝑖 |𝑅𝑖 , 𝐶𝑘 ) ∏ ∏ ℎ(𝑆𝑖 , 𝑆𝑗 |𝑅𝑖 , 𝑅𝑗 )
𝑍(𝜃|𝑅, 𝐶)
𝑖=1
𝑖=1 𝑘=1
𝑖=1 𝑗=1,𝑗≠𝑖
where Z(θ | R, C) is the partition function.
𝑙
𝑙
𝑚
𝑙
𝑙
𝑍(𝜃|𝑅, 𝐶) = ∑ ∑ … ∑ ∏ 𝑓(𝑆𝑖 |𝑅𝑖 ) ∏ ∏ 𝑔(𝑆𝑖 |𝑅𝑖 , 𝐶𝑘 ) ∏ ∏ ℎ(𝑆𝑖 , 𝑆𝑗 |𝑅𝑖 , 𝑅𝑗 )
𝑆1
𝑆2
𝑆𝑙 𝑖=1
𝑖=1 𝑘=1
𝑖=1 𝑗=1
5
Since the size of the recommendation list is usually quite small, i.e. not bigger than 10, we can use
the direct enumerating method to solve the inference problem of the model. We use the maximum
likelihood estimation to learn all the parameters in the model. A gradient ascent method is used to
iteratively find the optimal parameters given data. With the conditional probability distribution,
we can then proceed to find the solution for the objective function. We can use a greedy method
to quickly generate the recommendation list by picking the best apps one at a time sequentially.
We first determine the best R1, and then determine the best R2 given R1, and so on so forth. When
picking Ri, the apps appear before it on the list R1, R2…Ri-1 have already been chosen. At this step,
we set the length of the recommendation list to i, and find the optimal Ri that makes the
recommendation list (Ri, Ri…Ri-1, Ri) have the maximum expectation of total downloads.
Expected contributions
We are designing a mobile app recommendation system aiming to maximize the number of apps
users download considering that apps on the recommendation list can influence each other. To the
best of our knowledge, this problem has not been studied before. Moreover, our dataset was
collected from one of the biggest Android apps market in China. We have a very detailed data to
support our experiments. Once fully developed and verified both theoretically and computationally,
our method will be deployed on this apps market, and we will be able to run controlled experiments
in real production to measure how much better our method performs compared to the system
currently in place at the apps market and other alternative methods. Very few published research
on recommendation system has the opportunity to run experiments on a real recommendation
platform.
Current status of the manuscript
6
The model based on conditional Markov random field has been designed, and we are in the process
of evaluating this model first based on offline data we have to first see whether incorporating
interactions between recommended apps provides value. Once this is verified, we will first use
the initial greedy based algorithm to find the best apps to recommend. Then we will fine tune our
algorithm using large scale data, and then deploy on the apps market to conduct controlled
experiments.
7