Mining the Network Value of Customers

Mining the Network Value of
Customers
Zhenwei He & Cen Zhe Qiao
School of Informatics
University of Edinburgh
Outline
• Introduction
• Modeling Markets as Markov random field
• Mining from Collaborative Filtering
System(CFS)
• Example: - the EachMovie collaborative filtering database
• Future work
• Conclusion
Introduction
•
•
•
•
Mass Marketing
Direct Marketing: independent assumption
Viral Marketing: strongly dependent
Data mining: plays a key role
 General framework
 Optimize the choice of which customers to market to
 Estimating what customer acquisition cost is justified for each
How to do that?
• Modeling markets as Social Network
• Mining the network from Collaborative
Filtering Databases
Modeling Markets as Social Network
• Some mathematical notations:
n
X i  {0,1}
- the number of customers
- if customer i buys the product/ ith-customer
N i  {xi ,..., xi ,ni } - set of neighbors of X i
X k ,(X u )
- the customers whose value is know(unknown)
N iu  N i  X u - the number of unknown neighbors of X i
- set of attributes of the product
Y  {Y1,..., Ym }
- the marketing action that is taken for customer i
Mi
C
- the cost of marketing to a customer
Modeling Markets as Social Network
r0
- the revenue from selling the product to customer if NO marketing
action is performed.
r1
- the revenue from selling the product to customer if marketing action
is performed
f i1 ( M ) - the result of setting M i to 1 and leaving the rest of M unchanged
f i 0 ( M ) - similar
Where M  {M 1,..., M n }
Modeling Markets as Social Network
• The customer’s network value =
{the Customer’s TOTAL value} – {The customer’s INTRINSIC value}
• The total value of customer is measured by
Which is ELP( X k , Y , f i1 (M ))  ELP( X k , Y , f i 0 (M ))
•
The intrinsic value of customer is
ELP( X k , Y , M )
ELPi ( X k , Y , M )
Modeling Markets as Social Network
• The global lift in profit:
ELP( X k , Y , M )  i 1 ri P( X i  1 | X k , Y , M )  r 0i 1 P( X i  1 | X k , Y , M 0 ) | M | C
n
n
Where ri = r1 if Mi =1, ri = r0 otherwise, and |M| is the number of 1’s in M
• The expected lift in profit:
ELPi ( X k , Y , M )  r1P( X i  1 | X k , Y , f i1 ( M ))  r 0 P( X i  1 | X k , Y , f i 0 ( M ))  C
Modeling Markets as Social Network
• Our goal:
• Problem:
• Solution:
- to find the assignment of values to M that maximizes ELP
- required trying all possible combinations of assignment!
- approximate procedures
 Single Pass Methods
 Greedy search
 Hill-Climbing search
Modeling Markets as Social Network
• There may be another problem.
• How do we compute P( X i | X k , Y , M ) ?
P( X i | X k , Y , M )
 C ( N u ) P( X i , N iu | X k , Y , M )
i
 C ( N u ) P( X i | N iu , X k , Y , M )P( N iu | X k , Y , M )
i
 C ( N u ) P( X i | N i , Y , M ) X
i
u
j N i
P( X j | X k , Y , M )
• L.Pelkwitz (1990), A continuous relaxation labeling for Markov Random
fields
• P( N iu | X k , Y , M ) can be approximate by its maximum entropy estimate given
the marginal P( X j | X k , Y , M ), forX j  Niu
Modeling Markets as Social Network
• P( X i | X k , Y , M ) expresses as a function of themselves
• Can be iteratively to find them
• Relaxation labeling :
- guaranteed to converge to locally consistent values as long as the initial
assignment is sufficiently close to them.
• Initialization: the network-less probability P( X | Y , M )
u
• Problem:
exponential in N i
• Solution:
Gibbs Sampling / k-shortest-path algorithm
i
P( X i | Y , M )  P( X i ) P(M i | X i )k 1 P(Yi | X i ) / P(Y , M i )
m
Modeling Markets as Social Network
• Recall:
P( X i | X k , Y , M )  C ( N u ) P( X i | Ni , Y , M )X
i
u
j Ni
P( X j | X k , Y , M )
• P( X i | N i , Y , M ) still don’t know!
• From Naïve Bayes:
P( X i | N i , Y , M ) 
P( X i | N i ) P( M i | X i ) m
P ( yk | X i )

k 1
P(Y , M i | N i )
Where P(Y , M | N )  P(Y , M | X  1)P( X  1 | N )  P(Y , M | X
• Now P( X i | N i , Y , M ) can be computed by :
i
i
i
i
i
i
i
i
 0) P( X i  0 | Ni )
P( X i | N i ), P( X i ), P( M i | X i ), P( yk | X i )
Mining the network from Collaborative Filtering
Databases
•
P( X i| N i )
: vary from application to application
• Collaborative Filtering System:
 Users rate a set of items (like: amazon.com)
 These ratings are then used to recommend other items the user might be
interested in
• But…how?
• The basic idea( given by GroupLens ):
 To predict a user’s rating of an item as a weighted average of the rating given by
similar users
 Then recommend items with high predicted ratings
Mining the network from Collaborative
Filtering Databases
• The Pearson correlation coefficient:
Wij 
 ( R  R )( R  R )
 (R  R )  (R  R )
ik
k
i
jk
j
2
k
ik
i
k
jk
2
j
Where Rik is user i’s rating of item k, Ri is the mean of user i’s ratings , likewise for j;
and the summations and means are computed over the item k that both i and j have
rated.
• Given an item k that user I has not rated, the rating of k for
the user is then predicted as:
Rˆik  Ri    X
j N i
W ji ( R jk  R j )
Where   1 / X N | Wij | is a normalization factor, and N i is the set of
similar to I according to PCC
j
i
ni users most
Mining the network from Collaborative
Filtering Databases
• Thus we can compute
P( X i | N i )
:
P( X i | Ni )  P( X i | Rˆi ( Ni ))




Piecewise-linear model
Obtained by dividing R̂i ‘s range into bins
Compute Mean R̂i and P( X i | Rˆi ) for each bin
Estimate P( X i | Rˆi ( Ni )) by interpolating linearly between the two nearest
means
• Finally for the model:
P( X i | Rˆi ), P( X i ), P(M i | X i ), P(Yk | X i ), P( Ri | Y )
Example: the ‘EachMovie’
collaborative filtering database
• ‘EachMovie’---word of mouth
---Rating
---Movie Information
• The Data
• Model Accuracy
• Network Value
• Marketing Experiments
The Model
• Y={Y1,Y2,…,Y10}
p(Y|Xi)
• Pearson correlation coefficient for Wij (with
penalized value 0.05)
• P( X i | Rˆi ), P( X i ), P(M i | X i ), P(Yk | X i ), P(Ri | Y )
• P( X i  1 | M i  1)  min{ P( X i  1 | M i  0),1}
Frame of the model
Empirical distribution
The Data
• Training set: all movies before Sep 1 1996
---Sold before Jan 1996
---Srecent Jan-Sep 1996
• Test set: movies Sep-Dec 1996
• Inactive people
Model Accuracy
•
•
•
•
•
Set M=M0
Estimate the p(Xi|Xk, Y, M)
No rating from inactive people---p(Xi|Y)=0
Correlation=p(Xi|Xk, Y, M)/actual Xi
Not really satisfactory as the genre is the only
input
Network Value
Weight ranking function
A good customer to market
• Likely to give high rating
• Strong weight to influence
• Has many neighbors who are easily be
influenced
• High probability of purchasing
Marketing Experiments
• Traditional direct marketing
• Network-based marketing
---single pass
---greedy search
---hill climbing
• Scenarios: Free Movie, Discounted Movie,
Advertising
Profits and runtimes obtained using
different marketing strategies
Related Work
• Regarding the Netwotk
---Email logs (Schwartz and Wood)
---ReferralWeb
---MRF classification of Web pages(Chak)
•
Regarding the Marketing
---impact on the customers’ closest friends
(Krackhardt)
Future Work
• Expect larger network to be mined
• Mining a network from multiple sources of
relevant information
• Mining the unknown networks
• Towards more detailed node models and
multiple types of relations between nodes
Conclusion
• Data mining in viral marketing
• Customers as nodes and impact on each other
• social network from collaborative filtering
database
• Optimize marketing decision