Group meeting

Group meeting
Hyunsoo Park
2013-09-16
What is our PROBLEM?
• Learning target object in Interactive
scenario
– Modeling opponent in interactive environment
• Main problems
– How collect informative data in restrict
interactions
– How model the target object in restrict data
Modeling the target object
• Conventional way
– Collect large data about the target
– Preprocessing
– Using modeling algorithm
– Can explain new phenomenon using learned
model
Interactive scenario
• Restricts
– User interaction data is very small
• Benefits
– Can collect data about current user state, actively
• Assumptions
– Informative data (less redundant data) is helpful to
model the target even data is small
– Can collect informative data if we can identify that
• So,
– We have to identify informative data and plan/execute
collecting scenario with restrict interactions!
Our approach
• Exploration-Estimation Algorithm
• Assumptions
– There are many explanations in observed
phenomenon
– If there are disagreements each
explanations then it is the point
explanations can’t explain
– To improve explanations, require data that
solve disagreement
Exploration-Estimation Algorithm
Generate candidate models
?
Data
Collect data
Model
1
Model
2
…
Model
N
Searching maximum
disagreements
Other
Self
Planning next interaction
Choose next actions
• Representation
– Neural networks
– Evolution for model learning
– Evolution for structure of NN
Our toy problem
• Iterated Prisoner’s Dilemma
– Two players
– Each player choose action at same time
• Cooperation
• Defection
– Opponent players action determined by opponent and
my past actions
• Problem
– How can I interact efficiently, to model opponent
player
• Small interaction, high performance
– Model performance
• How precisely predict opponent next action
Opponent players
• AllC
– Always cooperate
• TFT
– First time cooperate, react opponent last action in after
second
• Noisy TFT
– Similar to TFT, it change behavior 10% randomly
• Major
– First time cooperate, react opponent major choice in
after second
• Pavlov
– First time cooperate, react cooperate if last time my and
opponent action is same, otherwise defect
Experiment methods
• Algorithms
– EEA
– C4.5 (WEKA)
– MLP (WEKA)
• EEA
– Collect data actively
• C4.5 and MLP
– Collect data randomly
• Performance test
– Possible all scenarios (32 cases in observer’s perspective)
– There is no overfitting
AllC
1
0.9
Accuracy
0.8
EEA
0.7
C4.5
MLP
0.6
0.5
0.4
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
# of games
• C4.5 and MLP more accurate
• But probably, most of learning algorithm is more accurate than
evolutionary algorithms
– Model is too simple
TFT
1
0.9
Accuracy
0.8
EEA
0.7
C4.5
MLP
0.6
0.5
0.4
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
# of games
• TFT is more correct until 10 games
– It’s more complex than AllC, but quite simple
Noisy TFT
1
0.9
Accuracy
0.8
EEA
0.7
C4.5
MLP
0.6
0.5
0.4
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
# of games
• In theory, 90% is maximum
• Maybe, C4.5 have ability to reduce noise reduction or prevent
overfitting
– Overfitting: there are too many possible scenario because of noise
Major
1
0.9
Accuracy
0.8
EEA
0.7
C4.5
MLP
0.6
0.5
0.4
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
# of games
• C4.5 is less accurate
– Major is more complex player
– Hard to model with decision tree
Pavlov
1
0.9
EEA
Accuracy
0.8
C4.5
MLP
0.7
0.6
0.5
0.4
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
# of games
• EEA more precise in small number of games, but
difference is not big
• C4.5 result is strange, I don’t know why?
Conclusion
• EEA is more precise in small number of
games, but difference is not big
– Can’t predict it’s usefulness, not yet
• EEA result similar to MLP
– Because, using NN as representation
– C4.5 is less accurate than others, except noise
case
• In many case, EEA more precise than MLP
– Ensemble effect? Informative data collection?
Future work
• To sure informative data collection effect
• Analysis log data
– Transition of data collection each case
– Calculate possibilities of data collection
– Disagreements data in each case