Factorization Machine

I’m Jerry
Factorization
Methods
Support Vector
Machine
User Features
Item Feature
Ratings
D
= {(xi , yi) | xi ∈RP, yi ∈{-1, 1}}i = 1~n
 Line: y(x) = w‧x + b = 0
 For all yi = 1, y(xi) = w‧xi + b ≧ 1
 For all yi = -1, y(xi) = w‧xi + b ≦ -1
 Minimize
|w|
Recommender Group
Y U NO USE SVM?
 Real Value V.S. Classification
 Sparsity
y(x) = w‧x + b = wu + wi + b
On Ensemble
User Item
Model 1
Model 2
Model 3
User Item
Model 1
Model 2
x
Model 3
y
User Item
Model 2
Model 1
+
Model 3
+
+
=
Predictions
on train set
Train set
answer
Predictions
on train set
Train set
answer
SVM
Model
Weights
Predictions
on train set
Train set
answer
SVM
Model
Weights
Predictions
on test set
Model
Weights
Predictions
on train set
Train set
answer
Model
Weights
SVM
Model
Weights
Predictions
on test set
Final
Predictio
n
 Original
SVM:
• y(x) = w‧x + b = b + Σwixi
 Factorization
Machine:
• y(x) = b + Σwixi + ΣΣ(vi‧vj) xixj
 Original
SVM:
• y(x) = w‧x + b = b + Σwixi
 Factorization
Machine:
• y(x) = b + Σwixi + ΣΣ(vi‧vj) xixj
i=0 j=i+1
Interaction between
variables
W
W
?
W
CF Matrix
W
W
=
V
k
T
V
W
 y(x)
=
V
T
V
= b + Σwixi + ΣΣ(vi‧vj) xixj
i=0
j=i+1
W
 y(x)
=
V
T
V
= b + Σwixi + ΣΣ(vi‧vj) xixj
i=0
j=i+1
W
 y(x)
=
V
T
V
= b + Σwixi + ΣΣ(vi‧vj) xixj
i=0
j=i+1
W
 y(x)
=
V
T
V
= b + Σwixi + ΣΣ(vi‧vj) xixj
i=0
j=i+1
W
 y(x)
=
V
T
V
= b + Σwixi + ΣΣ(vi‧vj) xixj
i=0
j=i+1
W
 y(x)
=
V
T
V
= b + Σwixi + ΣΣ(vi‧vj) xixj
i=0
j=i+1
W
=
V
T
V
Factorization
 y(x)
= b + Σwixi + ΣΣ(vi‧vj) xixj
i=0
j=i+1
W
=
V
T
V
Machine
 y(x)
Factorization
= b + Σwixi + ΣΣ(vi‧vj) xixj
i=0
j=i+1
W
 SVM
fails with sparsity
 FM learn with sgd, SVM learn with dual
Polynomial kernel SVM
n
n
n
n
y(x) = b + 2 å wi xi + å w x + 2 å å w x x j
(2) 2
i,i i
i=1
i=1
(2)
i, j i
i=1 j=i+1
Compare to FM:
Wi, j are all independent to
each other.
 MF:
• y( x ) = b + wu + wi + vu‧vi
 SVD++:
• y( x ) = b + wu + wi + vu‧vi + (1/√|Nu|)Σvi‧vl
 Claims
that FM is more general