Supervised Class-Ratio Estimation JO I N T WO R K BY : A R U N I YE R , J. SA K E THA N AT H, S U N I TA SA R AWAGI Motivation & Definition Motivating Example Motivating Example k roninson 10 months ago (edited) This Man is brilliant! Every American teacher could learn from this man!!! Motivating Example k roninson 10 months ago (edited) This Man is brilliant! Every American teacher could learn from this man!!! Ramapriya D 1 year ago Sorry but the first 9-odd minutes are utter nonsense Motivating Example k roninson 10 months ago (edited) This Man is brilliant! Every American teacher could learn from this man!!! Ramapriya D 1 year ago Sorry but the first 9-odd minutes are utter nonsense Pallavi Eshwaran 8 months ago Sir, I have a doubt. In the last part(dual supply voltage measurement demonstration).. what will the multimeter show if we connect the positive and negative terminals of the multimeter to the corresponding +ve and -ve terminals of the dual supply of RPS Motivating Example k roninson 10 months ago (edited) This Man is brilliant! Every American teacher could learn from this man!!! Ramapriya D 1 year ago Sorry but the first 9-odd minutes are utter nonsense Pallavi Eshwaran 8 months ago Sir, I have a doubt. In the last part(dual supply voltage measurement demonstration).. what will the multimeter show if we connect the positive and negative terminals of the multimeter to the corresponding +ve and -ve terminals of the dual supply of RPS Motivating Example % +ve comments 80% % neutral 5% % -ve comments 15% Motivating Example % +ve comments 80% • Each comment need NOT be labelled • From ML perspective: • Direct estimation is simpler problem • Leads to better confidence in estimates % neutral 5% % -ve comments 15% Class-Ratio (CR) estimation Definition Given an unlabeled set of objects sampled from an unknown distribution, the task is to estimate the true probabilities of the labels (viz., Class-Ratios). Supervised Class-Ratio (CR) estimation Definition Given appropriate training data construct 𝒈 that takes any set of unlabeled objects and outputs the estimate for the corresponding class-ratios. Supervised Class-Ratio (CR) estimation Definition Given appropriate training data construct 𝒈 that takes any set of unlabeled objects and outputs the estimate for the corresponding class-ratios. Fraction of objects with a label is a good estimate (but unknown) Baseline & Beyond Baseline ML Set-up Training data + - 𝑧1 Training Algorithm Comment Classifier Training Phase Model Classifier 𝑧2 𝑧𝑚−1 Classifier Classifier CR Inference Phase 𝑧𝑚 Classifier Direct Estimator (Strong Supervision) Training data + - 𝑧1 Training Algorithm 𝑧2 𝑧𝑚−1 CR Estimator CR Estimator Training Phase Model CR Inference Phase 𝑧𝑚 Strong -> Regular Supervision 𝑧1 + - Training Algorithm 𝑧2 𝑧𝑚−1 CR Estimator CR Estimator Training Phase Model pmf Inference Phase 𝑧𝑚 Strong -> Regular Supervision 𝑧1 + - Training Algorithm 𝑧2 𝑧𝑚−1 CR Estimator CR Estimator Training Phase Model pmf Inference Phase E.g. Demography based Voter Analysis 𝑧𝑚 Strong -> Regular Supervision 𝑧1 % Training Algorithm 𝑧2 𝑧𝑚−1 CR Estimator CR Estimator Training Phase % Model pmf Inference Phase 𝑧𝑚 Key observations … Learning requires Training and Test data to be related Key observations … Learning requires Training and Test data to be related Traditional ML algos insist 𝑇𝑟 and 𝑇𝑒 come from same distribution Renders CR estimation meaningless Key observations … Learning requires Training and Test data to be related Traditional ML algos insist 𝑇𝑟 and 𝑇𝑒 come from same distribution Renders CR estimation meaningless Baseline predicts well ONLY with iid data Key observations … Learning requires Training and Test data to be related Traditional ML algos insist 𝑇𝑟 and 𝑇𝑒 come from same distribution Renders CR estimation meaningless Invent better `relatedness’ assumption Our assumption … “Target-Shift” 𝑓𝑋𝑌 = 𝑓𝑋/𝑌 𝑓𝑌 Interesting 𝑇1 (𝑓𝑌1 ) Boring 𝑇2 (𝑓𝑌2 ) decent 𝑇𝑚 (𝑓𝑌𝑚 ) Our assumption … “Target-Shift” Distribution of all possible comments and labels 𝑓𝑋𝑌 = 𝑓𝑋/𝑌 𝑓𝑌 Interesting 𝑇1 (𝑓𝑌1 ) Boring 𝑇2 (𝑓𝑌2 ) decent 𝑇𝑚 (𝑓𝑌𝑚 ) Our assumption … “Target-Shift” 𝑓𝑋𝑌 = 𝑓𝑋/𝑌 𝑓𝑌 Interesting 𝑇1 (𝑓𝑌1 ) Boring decent 𝑇𝑚 (𝑓𝑌𝑚 ) 𝑇2 (𝑓𝑌2 ) 𝑖 𝑓𝑋𝑌 = 𝑖 𝑓𝑋/𝑌 𝑓𝑌 • Assume 𝑓𝑋/𝑌 same • But, 𝑓𝑌 changes Proposed CR Estimator Key Idea 1 0 0 𝑓𝑌2 𝑓𝑌1 𝑓𝑌3 0 1 0 2 DETAILS IN IYER ET.AL., KDD, 2016. 0 0 1 Key Idea 1 0 0 𝑓𝑌2 𝑓𝑌1 𝑓𝑌𝑈 𝑓𝑌3 0 1 0 2 DETAILS IN IYER ET.AL., KDD, 2016. 0 0 1 Key Idea 1 0 0 Affine hull of trainingset CRs has the simplex 𝑓𝑌2 𝑓𝑌1 𝑓𝑌𝑈 𝑓𝑌3 0 1 0 2 DETAILS IN IYER ET.AL., KDD, 2016. 0 0 1 Key Idea 1 0 0 𝑓𝑌2 𝑓𝑌1 0 0 1 𝑓𝑌𝑈 𝑓𝑌3 0 1 0 𝑓𝑋 𝑌=1 𝑓𝑋1 𝑓𝑋2 𝑓𝑋 𝑓𝑋3 𝑓𝑋 𝑌=2 2 DETAILS IN IYER ET.AL., KDD, 2016. 𝑌=3 Key Idea 1 0 0 𝑓𝑋𝑖 𝑥 = 𝑓𝑌𝑖 (𝑦)𝑓𝑋/𝑌 (𝑥/𝑦) 𝑦=1 0 0 1 𝑓𝑌𝑈 𝑓𝑌3 0 1 0 𝑐 𝑓𝑌2 𝑓𝑌1 𝑓𝑋 𝑌=1 𝑓𝑋1 𝑓𝑋2 𝑓𝑋 𝑓𝑋3 𝑓𝑋 𝑌=2 2 DETAILS IN IYER ET.AL., KDD, 2016. 𝑌=3 Key Idea 1 0 0 𝑓𝑌2 𝑓𝑌1 𝑓𝑌𝑈 𝑓𝑌3 0 1 0 𝑓𝑋 𝑌=1 𝑓𝑋1 𝑓𝑋2 ? 𝑓𝑋3 𝑓𝑋 0 0 1 𝑌=2 2 DETAILS IN IYER ET.AL., KDD, 2016. 𝑓𝑋 𝑌=3 Key Idea 1 0 0 𝑐 𝑓𝑋1 𝑥 = 𝑓𝑌2 𝑓𝑌1 𝑓𝑌𝑈 𝑓𝑌1 (𝑦)𝑓𝑋/𝑌 (𝑥/𝑦) 𝑦=1 𝑐 𝑓𝑋2 𝑥 = 𝑓𝑌3 0 1 0 𝑓𝑌2 (𝑦)𝑓𝑋/𝑌 (𝑥/𝑦) 𝑦=1 0 0 1 𝑓𝑋 𝑌=1 𝑐 𝑓𝑋𝑈 𝑥 = 𝑓𝑌𝑈 (𝑦)𝑓𝑋/𝑌 (𝑥/𝑦) 𝑓𝑋1 𝑦=1 𝑓𝑋2 𝑓𝑋3 𝑓𝑋 𝑌=2 2 DETAILS IN IYER ET.AL., KDD, 2016. 𝑓𝑋𝑈 𝑓𝑋 𝑌=3 Key Idea 1 0 0 𝑓𝑋 𝑓𝑌𝑈 𝑌=1 𝑓𝑋1 𝑓𝑋2 𝑓𝑋3 𝑓𝑋 0 0 1 𝑓𝑌3 0 1 0 Mild conditions2 𝑓𝑌2 𝑓𝑌1 𝑌=2 2 DETAILS IN IYER ET.AL., KDD, 2016. 𝑓𝑋𝑈 𝑓𝑋 𝑌=3 Formulation for CR Estimation 2 𝑚 𝜃 = argmin 𝜃∈ℝ𝑘 𝑈 𝑓𝑋 𝑖 𝜃𝑖 𝑓𝑋 − 𝑖=1 𝑘 𝑈 𝑓𝑌 𝑖 𝜃𝑖 𝑓𝑌 = 𝑖=1 Kernel Embedding Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function) Kernel Embedding Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function) 𝑋 Space of random variables over 𝜒 Kernel Embedding Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function) 𝜙𝑘 𝑋 𝑋 Space of random variables over 𝜒 RKHS of real valued functions over 𝜒 Kernel Embedding Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function) 𝜙𝑘 𝑋 ⋅ ≡ 𝔼 𝑘(𝑋,⋅) 𝜙𝑘 𝑋 𝑋 Space of random variables over 𝜒 RKHS of real valued functions over 𝜒 Kernel Embedding Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function) 𝜙𝑘 𝑋 ⋅ ≡ 𝔼 𝑘(𝑋,⋅) ⊤ E.g., 𝑘 𝑥, 𝑠 ≡ 𝑒 𝑥 𝑠 . This gives: ⊤ 𝜙 𝑋 𝑠 = 𝔼 𝑒 𝑠 𝑋 , the moment-generating function 𝜙𝑘 𝑋 𝑋 Space of random variables over 𝜒 RKHS of real valued functions over 𝜒 Kernel Embedding Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function) 𝜙𝑘 𝑋 ⋅ ≡ 𝔼 𝑘(𝑋,⋅) ⊤ E.g., 𝑘 𝑥, 𝑠 ≡ 𝑒 𝑥 𝑠 . This gives: ⊤ 𝜙 𝑋 𝑠 = 𝔼 𝑒 𝑠 𝑋 , the moment-generating function 𝜙𝑘 𝑋 𝑋 Space of random variables over 𝜒 RKHS of real valued functions over 𝜒 Kernel Embedding – Sample approx. Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function) 𝜙𝑘 𝑋 ⋅ ≡ 𝔼 𝑘(𝑋,⋅) 𝑚 𝑘 𝑥𝑖 ,⋅ 𝑖=1 𝜙𝑘 𝑋 ⋅ ≈ 𝑚 𝑋 Space of random variables over 𝜒 RKHS of real valued functions over 𝜒 Kernel Embedding – Sample approx. Given 𝜒, set of objects, and kernel 𝑘: 𝜒 × 𝜒 ↦ ℝ (similarity function) 𝜙𝑘 𝑋 ⋅ ≡ 𝔼 𝑘(𝑋,⋅) 𝑚 𝑘 𝑥𝑖 ,⋅ 𝑖=1 𝜙𝑘 𝑋 ⋅ ≈ 𝑚 𝑋 Space of random variables over 𝜒 RKHS of real valued functions over 𝜒 Formulation for CR Estimation 2 𝑚 𝜃 = argmin 𝜃∈ℝ𝑘 𝑈 𝑓𝑋 𝜃𝑖 𝑓𝑋𝑖 − 𝑖=1 𝑘 𝑈 𝑓𝑌 𝑖 𝜃𝑖 𝑓𝑌 = 𝑖=1 ℋ𝑘 Proposed method adapts, compares well in best case Learning Bounds Theorem If 𝑘 is normalized characteristic kernel, then with probability atleast 1 − 𝛿, the following holds: 𝜌−𝜌 ≤ 𝑚 𝑖=1 𝐶𝛿 𝑛𝑖 𝑚𝑖𝑛𝑠𝑖𝑔 𝐴 2 1 + 𝑄 + 𝑄𝐶𝛿 𝑛𝑢 Summary & Conclusions Take-home notes Watch-out for direct estimation opportunities New problem set-up: Class-ratio estimation More accurate and simpler analysis Take-home notes Watch-out for direct estimation opportunities New problem set-up: Class-ratio estimation More accurate and simpler analysis Be-aware of distributional assumptions Leads to significant improvements in accuracy Take-home notes Watch-out for direct estimation opportunities New problem set-up: Class-ratio estimation More accurate and simpler analysis Be-aware of distributional assumptions Leads to significant improvements in accuracy Insights provided by theoretical bounds can be far reaching New kernel selection algorithm Hints for data publishers Thanks
© Copyright 2024 Paperzz