ppt - CSE, IIT Bombay

Supervised
Class-Ratio Estimation
JO I N T WO R K BY : A R U N I YE R , J. SA K E THA N AT H, S U N I TA SA R AWAGI
Motivation & Definition
Motivating Example
Motivating Example
k roninson
10 months ago (edited)
This Man is brilliant! Every American teacher
could learn from this man!!!
Motivating Example
k roninson
10 months ago (edited)
This Man is brilliant! Every American teacher
could learn from this man!!!
Ramapriya D
1 year ago
Sorry but the first 9-odd minutes are utter
nonsense
Motivating Example
k roninson
10 months ago (edited)
This Man is brilliant! Every American teacher
could learn from this man!!!
Ramapriya D
1 year ago
Sorry but the first 9-odd minutes are utter
nonsense
Pallavi Eshwaran
8 months ago
Sir, I have a doubt. In the last part(dual supply
voltage measurement demonstration).. what
will the multimeter show if we connect the
positive and negative terminals of the
multimeter to the corresponding +ve and -ve
terminals of the dual supply of RPS
Motivating Example
k roninson
10 months ago (edited)
This Man is brilliant! Every American teacher
could learn from this man!!!
Ramapriya D
1 year ago
Sorry but the first 9-odd minutes are utter
nonsense
Pallavi Eshwaran
8 months ago
Sir, I have a doubt. In the last part(dual supply
voltage measurement demonstration).. what
will the multimeter show if we connect the
positive and negative terminals of the
multimeter to the corresponding +ve and -ve
terminals of the dual supply of RPS
Motivating Example
% +ve comments
80%
% neutral
5%
% -ve comments
15%
Motivating Example
% +ve comments
80%
• Each comment need NOT be labelled
• From ML perspective:
• Direct estimation is simpler problem
• Leads to better confidence in estimates
% neutral
5%
% -ve comments
15%
Class-Ratio (CR) estimation
Definition
Given an unlabeled set of objects sampled from an
unknown distribution, the task is to estimate the true
probabilities of the labels (viz., Class-Ratios).
Supervised Class-Ratio (CR) estimation
Definition
Given appropriate training data construct 𝒈 that
takes any set of unlabeled objects and outputs
the estimate for the corresponding class-ratios.
Supervised Class-Ratio (CR) estimation
Definition
Given appropriate training data construct 𝒈 that
takes any set of unlabeled objects and outputs
the estimate for the corresponding class-ratios.
Fraction of objects with a
label is a good estimate
(but unknown)
Baseline & Beyond
Baseline ML Set-up
Training data
+
-
𝑧1
Training
Algorithm
Comment
Classifier
Training Phase
Model
Classifier
𝑧2
𝑧𝑚−1
Classifier
Classifier
CR
Inference Phase
𝑧𝑚
Classifier
Direct Estimator (Strong Supervision)
Training data
+
-
𝑧1
Training
Algorithm
𝑧2
𝑧𝑚−1
CR Estimator
CR Estimator
Training Phase
Model
CR
Inference Phase
𝑧𝑚
Strong -> Regular Supervision
𝑧1
+
-
Training
Algorithm
𝑧2
𝑧𝑚−1
CR Estimator
CR Estimator
Training Phase
Model
pmf
Inference Phase
𝑧𝑚
Strong -> Regular Supervision
𝑧1
+
-
Training
Algorithm
𝑧2
𝑧𝑚−1
CR Estimator
CR Estimator
Training Phase
Model
pmf
Inference Phase
E.g. Demography
based Voter Analysis
𝑧𝑚
Strong -> Regular Supervision
𝑧1
%
Training
Algorithm
𝑧2
𝑧𝑚−1
CR Estimator
CR Estimator
Training Phase
%
Model
pmf
Inference Phase
𝑧𝑚
Key observations …
Learning requires Training and Test data to be related
Key observations …
Learning requires Training and Test data to be related
 Traditional ML algos insist 𝑇𝑟 and 𝑇𝑒 come from same distribution
 Renders CR estimation meaningless
Key observations …
Learning requires Training and Test data to be related
 Traditional ML algos insist 𝑇𝑟 and 𝑇𝑒 come from same distribution
 Renders CR estimation meaningless
Baseline predicts well ONLY with iid data
Key observations …
Learning requires Training and Test data to be related
 Traditional ML algos insist 𝑇𝑟 and 𝑇𝑒 come from same distribution
 Renders CR estimation meaningless
 Invent better `relatedness’ assumption
Our assumption … “Target-Shift”
𝑓𝑋𝑌 = 𝑓𝑋/𝑌 𝑓𝑌
Interesting
𝑇1 (𝑓𝑌1 )
Boring
𝑇2 (𝑓𝑌2 )
decent
𝑇𝑚 (𝑓𝑌𝑚 )
Our assumption … “Target-Shift”
Distribution of all
possible comments
and labels
𝑓𝑋𝑌 = 𝑓𝑋/𝑌 𝑓𝑌
Interesting
𝑇1 (𝑓𝑌1 )
Boring
𝑇2 (𝑓𝑌2 )
decent
𝑇𝑚 (𝑓𝑌𝑚 )
Our assumption … “Target-Shift”
𝑓𝑋𝑌 = 𝑓𝑋/𝑌 𝑓𝑌
Interesting
𝑇1 (𝑓𝑌1 )
Boring
decent
𝑇𝑚 (𝑓𝑌𝑚 )
𝑇2 (𝑓𝑌2 )
𝑖
𝑓𝑋𝑌
=
𝑖
𝑓𝑋/𝑌 𝑓𝑌
• Assume 𝑓𝑋/𝑌 same
• But, 𝑓𝑌 changes
Proposed CR Estimator
Key Idea
1
0
0
𝑓𝑌2
𝑓𝑌1
𝑓𝑌3
0
1
0
2
DETAILS IN IYER ET.AL., KDD, 2016.
0
0
1
Key Idea
1
0
0
𝑓𝑌2
𝑓𝑌1
𝑓𝑌𝑈
𝑓𝑌3
0
1
0
2
DETAILS IN IYER ET.AL., KDD, 2016.
0
0
1
Key Idea
1
0
0
Affine hull of trainingset CRs has the simplex
𝑓𝑌2
𝑓𝑌1
𝑓𝑌𝑈
𝑓𝑌3
0
1
0
2
DETAILS IN IYER ET.AL., KDD, 2016.
0
0
1
Key Idea
1
0
0
𝑓𝑌2
𝑓𝑌1
0
0
1
𝑓𝑌𝑈
𝑓𝑌3
0
1
0
𝑓𝑋
𝑌=1
𝑓𝑋1
𝑓𝑋2
𝑓𝑋
𝑓𝑋3
𝑓𝑋
𝑌=2
2
DETAILS IN IYER ET.AL., KDD, 2016.
𝑌=3
Key Idea
1
0
0
𝑓𝑋𝑖 𝑥 =
𝑓𝑌𝑖 (𝑦)𝑓𝑋/𝑌 (𝑥/𝑦)
𝑦=1
0
0
1
𝑓𝑌𝑈
𝑓𝑌3
0
1
0
𝑐
𝑓𝑌2
𝑓𝑌1
𝑓𝑋
𝑌=1
𝑓𝑋1
𝑓𝑋2
𝑓𝑋
𝑓𝑋3
𝑓𝑋
𝑌=2
2
DETAILS IN IYER ET.AL., KDD, 2016.
𝑌=3
Key Idea
1
0
0
𝑓𝑌2
𝑓𝑌1
𝑓𝑌𝑈
𝑓𝑌3
0
1
0
𝑓𝑋
𝑌=1
𝑓𝑋1
𝑓𝑋2
?
𝑓𝑋3
𝑓𝑋
0
0
1
𝑌=2
2
DETAILS IN IYER ET.AL., KDD, 2016.
𝑓𝑋
𝑌=3
Key Idea
1
0
0
𝑐
𝑓𝑋1 𝑥 =
𝑓𝑌2
𝑓𝑌1
𝑓𝑌𝑈
𝑓𝑌1 (𝑦)𝑓𝑋/𝑌 (𝑥/𝑦)
𝑦=1
𝑐
𝑓𝑋2 𝑥 =
𝑓𝑌3
0
1
0
𝑓𝑌2 (𝑦)𝑓𝑋/𝑌 (𝑥/𝑦)
𝑦=1
0
0
1
𝑓𝑋
𝑌=1
𝑐
𝑓𝑋𝑈 𝑥 =
𝑓𝑌𝑈 (𝑦)𝑓𝑋/𝑌 (𝑥/𝑦)
𝑓𝑋1
𝑦=1
𝑓𝑋2
𝑓𝑋3
𝑓𝑋
𝑌=2
2
DETAILS IN IYER ET.AL., KDD, 2016.
𝑓𝑋𝑈
𝑓𝑋
𝑌=3
Key Idea
1
0
0
𝑓𝑋
𝑓𝑌𝑈
𝑌=1
𝑓𝑋1
𝑓𝑋2
𝑓𝑋3
𝑓𝑋
0
0
1
𝑓𝑌3
0
1
0
Mild conditions2
𝑓𝑌2
𝑓𝑌1
𝑌=2
2
DETAILS IN IYER ET.AL., KDD, 2016.
𝑓𝑋𝑈
𝑓𝑋
𝑌=3
Formulation for CR Estimation
2
𝑚
𝜃 = argmin
𝜃∈ℝ𝑘
𝑈
𝑓𝑋
𝑖
𝜃𝑖 𝑓𝑋
−
𝑖=1
𝑘
𝑈
𝑓𝑌
𝑖
𝜃𝑖 𝑓𝑌
=
𝑖=1
Kernel Embedding
 Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function)
Kernel Embedding
 Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function)
𝑋
Space of random variables over 𝜒
Kernel Embedding
 Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function)
𝜙𝑘 𝑋
𝑋
Space of random variables over 𝜒
RKHS of real valued functions over 𝜒
Kernel Embedding
 Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function)
𝜙𝑘 𝑋 ⋅ ≡ 𝔼 𝑘(𝑋,⋅)
𝜙𝑘 𝑋
𝑋
Space of random variables over 𝜒
RKHS of real valued functions over 𝜒
Kernel Embedding
 Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function)
𝜙𝑘 𝑋 ⋅ ≡ 𝔼 𝑘(𝑋,⋅)
⊤
E.g., 𝑘 𝑥, 𝑠 ≡ 𝑒 𝑥 𝑠 . This gives:
⊤
𝜙 𝑋 𝑠 = 𝔼 𝑒 𝑠 𝑋 , the moment-generating function
𝜙𝑘 𝑋
𝑋
Space of random variables over 𝜒
RKHS of real valued functions over 𝜒
Kernel Embedding
 Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function)
𝜙𝑘 𝑋 ⋅ ≡ 𝔼 𝑘(𝑋,⋅)
⊤
E.g., 𝑘 𝑥, 𝑠 ≡ 𝑒 𝑥 𝑠 . This gives:
⊤
𝜙 𝑋 𝑠 = 𝔼 𝑒 𝑠 𝑋 , the moment-generating function
𝜙𝑘 𝑋
𝑋
Space of random variables over 𝜒
RKHS of real valued functions over 𝜒
Kernel Embedding – Sample approx.
 Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function)
𝜙𝑘 𝑋 ⋅ ≡ 𝔼 𝑘(𝑋,⋅)
𝑚
𝑘 𝑥𝑖 ,⋅
𝑖=1
𝜙𝑘 𝑋 ⋅ ≈
𝑚
𝑋
Space of random variables over 𝜒
RKHS of real valued functions over 𝜒
Kernel Embedding – Sample approx.
 Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function)
𝜙𝑘 𝑋 ⋅ ≡ 𝔼 𝑘(𝑋,⋅)
𝑚
𝑘 𝑥𝑖 ,⋅
𝑖=1
𝜙𝑘 𝑋 ⋅ ≈
𝑚
𝑋
Space of random variables over 𝜒
RKHS of real valued functions over 𝜒
Formulation for CR Estimation
2
𝑚
𝜃 = argmin
𝜃∈ℝ𝑘
𝑈
𝑓𝑋
𝜃𝑖 𝑓𝑋𝑖
−
𝑖=1
𝑘
𝑈
𝑓𝑌
𝑖
𝜃𝑖 𝑓𝑌
=
𝑖=1
ℋ𝑘
Proposed method adapts, compares well in best case
Learning Bounds
Theorem
If 𝑘 is normalized characteristic kernel, then with probability atleast 1 − 𝛿, the following holds:
𝜌−𝜌 ≤
𝑚
𝑖=1 𝐶𝛿
𝑛𝑖
𝑚𝑖𝑛𝑠𝑖𝑔 𝐴
2
1 + 𝑄 + 𝑄𝐶𝛿 𝑛𝑢
Summary & Conclusions
Take-home notes
 Watch-out for direct estimation opportunities
 New problem set-up: Class-ratio estimation
 More accurate and simpler analysis
Take-home notes
 Watch-out for direct estimation opportunities
 New problem set-up: Class-ratio estimation
 More accurate and simpler analysis
 Be-aware of distributional assumptions
 Leads to significant improvements in accuracy
Take-home notes
 Watch-out for direct estimation opportunities
 New problem set-up: Class-ratio estimation
 More accurate and simpler analysis
 Be-aware of distributional assumptions
 Leads to significant improvements in accuracy
 Insights provided by theoretical bounds can be far reaching
 New kernel selection algorithm
 Hints for data publishers
Thanks