Game Theoretic Approach on
Avoiding Sample Bias in
Machine Learning
By Ming Gu
2013.3.20
Background
Smart Building
Sensors
Actuators
Problem Statement
We are learning a real-value function on
an input space 𝑋 based on the labels from
a group of agents 𝑁 = 1,2, … , 𝑛
Each agent 𝑖 ∈ 𝑁 has his own value
function on the input space 𝑣𝑖 (𝒙), which
is unknown to others.
Problem Statement
We have a hypothesis space 𝐹, where
every 𝑓 ∈ 𝐹 is a function from 𝑋 → 𝑅, we
are searching for the optimal 𝑓
The accuracy is measured by loss function
𝑙: 𝑅 × 𝑅 → 𝑅+ , 𝑙(𝑓 𝑥 , 𝑦) is the loss
associate with the prediction 𝑓(𝑥) while
the true output is 𝑦.
Problem Statement
Agent 𝑖 has a risk with function 𝑓
◦ 𝑅𝑖 𝑓 = 𝐸[𝑙 𝑓 𝑥 , 𝑣𝑖 (𝑥) ]
The global risk is
◦ 𝑅𝑔𝑙𝑜𝑏𝑎𝑙 𝑓 =
1
𝑛
𝑛
𝑖=1 𝑅𝑖 (𝑓)
Every agent is selfish, aiming at minimize
𝑅𝑖 (𝑓), while the mechanism designer is
minimizing 𝑅𝑔𝑙𝑜𝑏𝑎𝑙 (𝑓)
EMR Solution
We obtain a training set 𝑆𝑖 =
𝑚
𝑥𝑖𝑗 , 𝑦𝑖𝑗
from each agent 𝑖, the
𝑗=1
global training set 𝑆 =∪𝑁 𝑆𝑖
Every agent and the mechanism designer
is using empirical risk as a estimation to
risk.
◦ 𝑅𝑖 =
1
𝑆𝑖
◦ 𝑅𝑔𝑙𝑜𝑏𝑎𝑙 =
𝒙,𝑦 ∈𝑆𝑖 𝑙(𝑓
1
𝑆
𝒙 , 𝑦)
𝒙,𝑦 ∈𝑆 𝑙(𝑓
𝒙 , 𝑦)
EMR Solution
We are finding
◦ 𝑓 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑓∈𝐹 𝑅𝑔𝑙𝑜𝑏𝑎𝑙 (𝑓, 𝑆)
◦ If more than one fit, choose the one with
smallest ||𝑓||
Question is: Are the agents motivated to
report untruthfully? i.e. 𝑦𝑖𝑗 ≠ 𝑣𝑖 (𝑥𝑖𝑗 )? If
not ,the mechanism is strategy-proof.
EMR Solution with Absolute Loss
Function
𝑙 𝑎, 𝑏 = |𝑎 − 𝑏|
If 𝐹 is convex, then the global risk
minimizing mechanism is strategy-proof
EMR Solution with Other Loss
Function
𝑙 𝑎, 𝑏 = 𝑎 − 𝑏
Example
2
◦ 𝑆1 = 𝒙𝟏 , 0 , 𝑆2 = 𝒙𝟐 , 2
◦ 𝐹 is all constant function
EMR Solution with Other Loss
Function
𝑙 𝑎, 𝑏 =
Example:
|𝑎 − 𝑏|
◦ 𝑆 = { 𝑥1 , 1 , 𝑥2 , 2 , 𝑥3 , 4 , (𝑥4 , 4)}
◦ 𝐹 is all constant function
EMR Solution with Other Loss
Function
General Result?
Machine Learning with Payment
Universal Equivalent
◦
◦
◦
◦
Loss Function
Grid Power Price
Real Reward\Penalty
Virtual Reward\Penalty
Machine Learning with Payment
Basic Idea
◦ Introduce external payment to align the
interest of individual agent and the mechanism
designer.
Machine Learning with Payment
Basic Idea
◦ The outcome is (𝑓, 𝒑) rather than 𝑓
◦ 𝒑 = (𝑝1 , … , 𝑝𝑛 ) is the payment vector
◦ Agent 𝑖’s utility function
𝑢𝑖 𝑓, 𝒑 = −𝑅𝑖 𝑓, 𝑆 − 𝑝𝑖
VCG Mechanism
𝑝𝑖 =
◦ optimal welfare (for the other players) if agent
𝑖 was not participating –
◦ welfare of the other players from the chosen
outcome
VCG Mechanism
𝑝𝑖
=
𝑗≠𝑖 𝑅𝑗
𝑓, 𝑆 − (𝑛 − 1) 𝑅(𝑓, 𝑆−𝑖 )
𝑓 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑓∈𝐹 𝑅(𝑓, S−i )
𝑢𝑖 𝑓, 𝒑 = −𝑅𝑖 𝑓, 𝑆 − 𝑝𝑖 =
−𝑅𝑖 𝑓, 𝑆 − 𝑗≠𝑖 𝑅𝑗 𝑓, 𝑆 + (𝑛 −
Future Work
Do we have result on small amount of
training set which is not enough to obtain
training set of identical size from each
agent?
Future Work
What is the result for asymmetric loss
function?
Is it feasible to let the agents report their
loss function? And how to define/ensure
strategy-proof?
Future Work
What is the effect of regularity?
Future Work
What’s the result for machine learning
problems of other kinds?
◦ Learning target
◦ Learning algorithm
Questions
© Copyright 2026 Paperzz