Group meeting Hyunsoo Park 2013-09-16 What is our PROBLEM? • Learning target object in Interactive scenario – Modeling opponent in interactive environment • Main problems – How collect informative data in restrict interactions – How model the target object in restrict data Modeling the target object • Conventional way – Collect large data about the target – Preprocessing – Using modeling algorithm – Can explain new phenomenon using learned model Interactive scenario • Restricts – User interaction data is very small • Benefits – Can collect data about current user state, actively • Assumptions – Informative data (less redundant data) is helpful to model the target even data is small – Can collect informative data if we can identify that • So, – We have to identify informative data and plan/execute collecting scenario with restrict interactions! Our approach • Exploration-Estimation Algorithm • Assumptions – There are many explanations in observed phenomenon – If there are disagreements each explanations then it is the point explanations can’t explain – To improve explanations, require data that solve disagreement Exploration-Estimation Algorithm Generate candidate models ? Data Collect data Model 1 Model 2 … Model N Searching maximum disagreements Other Self Planning next interaction Choose next actions • Representation – Neural networks – Evolution for model learning – Evolution for structure of NN Our toy problem • Iterated Prisoner’s Dilemma – Two players – Each player choose action at same time • Cooperation • Defection – Opponent players action determined by opponent and my past actions • Problem – How can I interact efficiently, to model opponent player • Small interaction, high performance – Model performance • How precisely predict opponent next action Opponent players • AllC – Always cooperate • TFT – First time cooperate, react opponent last action in after second • Noisy TFT – Similar to TFT, it change behavior 10% randomly • Major – First time cooperate, react opponent major choice in after second • Pavlov – First time cooperate, react cooperate if last time my and opponent action is same, otherwise defect Experiment methods • Algorithms – EEA – C4.5 (WEKA) – MLP (WEKA) • EEA – Collect data actively • C4.5 and MLP – Collect data randomly • Performance test – Possible all scenarios (32 cases in observer’s perspective) – There is no overfitting AllC 1 0.9 Accuracy 0.8 EEA 0.7 C4.5 MLP 0.6 0.5 0.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 # of games • C4.5 and MLP more accurate • But probably, most of learning algorithm is more accurate than evolutionary algorithms – Model is too simple TFT 1 0.9 Accuracy 0.8 EEA 0.7 C4.5 MLP 0.6 0.5 0.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 # of games • TFT is more correct until 10 games – It’s more complex than AllC, but quite simple Noisy TFT 1 0.9 Accuracy 0.8 EEA 0.7 C4.5 MLP 0.6 0.5 0.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 # of games • In theory, 90% is maximum • Maybe, C4.5 have ability to reduce noise reduction or prevent overfitting – Overfitting: there are too many possible scenario because of noise Major 1 0.9 Accuracy 0.8 EEA 0.7 C4.5 MLP 0.6 0.5 0.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 # of games • C4.5 is less accurate – Major is more complex player – Hard to model with decision tree Pavlov 1 0.9 EEA Accuracy 0.8 C4.5 MLP 0.7 0.6 0.5 0.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 # of games • EEA more precise in small number of games, but difference is not big • C4.5 result is strange, I don’t know why? Conclusion • EEA is more precise in small number of games, but difference is not big – Can’t predict it’s usefulness, not yet • EEA result similar to MLP – Because, using NN as representation – C4.5 is less accurate than others, except noise case • In many case, EEA more precise than MLP – Ensemble effect? Informative data collection? Future work • To sure informative data collection effect • Analysis log data – Transition of data collection each case – Calculate possibilities of data collection – Disagreements data in each case
© Copyright 2026 Paperzz