An Empirical Study on the Practical Impact of Prior Beliefs over

An Empirical Study on the Practical Impact of
Prior Beliefs over Policy Types
—
Appendix
Stefano V. Albrecht
The University of Edinburgh
Edinburgh, United Kingdom
[email protected]
Jacob W. Crandall
Masdar Institute of Science and Technology
Abu Dhabi, United Arab Emirates
[email protected]
Subramanian Ramamoorthy
The University of Edinburgh
Edinburgh, United Kingdom
[email protected]
This document is an appendix to [1]. It contains a listing of all parameter
settings used in the study as well as plots for each experiment.
1
Contents
1 Parameter Settings
1.1 HBA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Leader-Follower-Trigger Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Co-Evolved Decision Trees / Neural Networks . . . . . . . . . . . . . . . . . . . . . .
3
3
3
3
2 Plots of Results
2.1 Colour Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
4
4
List of Figures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Colour codes for heat matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Colour codes for payoff plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Leader-Follower-Trigger Agents – Random Type – No-Conflict Games . . . . . . . .
Leader-Follower-Trigger Agents – Random Type – Conflict Games . . . . . . . . . .
Leader-Follower-Trigger Agents – Fictitious Player – No-Conflict Games . . . . . . .
Leader-Follower-Trigger Agents – Fictitious Player – Conflict Games . . . . . . . . .
Leader-Follower-Trigger Agents – Conditioned Fictitious Player – No-Conflict Games
Leader-Follower-Trigger Agents – Conditioned Fictitious Player – Conflict Games . .
Co-Evolved Decision Trees – Random Type – No-Conflict Games . . . . . . . . . . .
Co-Evolved Decision Trees – Random Type – Conflict Games . . . . . . . . . . . . .
Co-Evolved Decision Trees – Fictitious Player – No-Conflict Games . . . . . . . . . .
Co-Evolved Decision Trees – Fictitious Player – Conflict Games . . . . . . . . . . . .
Co-Evolved Decision Trees – Conditioned Fictitious Player – No-Conflict Games . .
Co-Evolved Decision Trees – Conditioned Fictitious Player – Conflict Games . . . .
Co-Evolved Neural Networks – Random Type – No-Conflict Games . . . . . . . . . .
Co-Evolved Neural Networks – Random Type – Conflict Games . . . . . . . . . . . .
Co-Evolved Neural Networks – Fictitious Player – No-Conflict Games . . . . . . . .
Co-Evolved Neural Networks – Fictitious Player – Conflict Games . . . . . . . . . .
Co-Evolved Neural Networks – Conditioned Fictitious Player – No-Conflict Games .
Co-Evolved Neural Networks – Conditioned Fictitious Player – Conflict Games . . .
2
4
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
1
Parameter Settings
1.1
HBA
• 10 hypothesised types for player 2 (i.e. |Θ∗2 | = 10)
• True type of player 2 always included in Θ∗2
• Value/LP priors: t = 5 and b = 10
1.2
Leader-Follower-Trigger Agents
• Maximum number of joint actions in target solutions: 2
• Target solution admissible if average payoff ≥ maximin value (for each player)
1.3
Co-Evolved Decision Trees / Neural Networks
• Number of populations: 2 (one for each player)
• Individuals per population: 50 (first population randomly generated)
• Fitness = average payoff after 20 rounds (∈ [1, 4]) – average similarity (∈ [0, 1])
• Each individual evaluated against random 40% of other population
• Resampling method: linear ranking
• Decision Trees:
– Tree depth: 3 (up to three previous actions of other player)
– Similarity: percentage of nodes with same action choice
– Evolutions: 300 (evolution with highest average fitness used)
– Random mutation of single node (flipping action): 5% of population
– Random crossing of sub-trees (preserving tree depth): 30% of population
• Neural Networks:
– Input layer: 4 nodes (up to two previous joint actions)
– Hidden layer: 5 nodes
– Output layer: 1 node (probability of action 1)
– Each node fully-connected with nodes of next layer
– Standard sigmoidal threshold function
– Similarity: 1 – average difference of output for each input
– Evolutions: 1000 (evolution with highest average fitness used)
– Random mutation of single input weight (standard normal shift): 20% of population
– Random crossing of nodes (preserving network structure): 10% of population
3
2
2.1
Plots of Results
Colour Codes
1
Random
0.9
Utility
0.8
Stackelberg
0.7
Welfare
0.6
Fairness
0.5
LP−Utility
0.4
LP−Stackel
0.3
0.2
LP−Welfare
0.1
Welfare op
0
Fairness op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
Figure 1: Colour codes for heat matrices. Performance criteria are measured over time slices,
which are a temporal partitioning of a play. Each element (r, c) in the heat matrix corresponds to
3.08
the percentage of time slices
in which the prior belief r produced significantly higher values for the
criterion c than the Uniform prior, averaged over all plays in all tested games. (Note: binary criteria
3.06 equilibrium assume 0/1-values in each time slice.) All significance
such as convergence and Nash
statements are based on paired right-sided t-tests with a 5% significance level.
Average Payoff (p1)
3.04
3.02
Uniform
Random
Utility
Stackelberg
Welfare
Fairness
LP−Utility
LP−Stackelberg
LP−Welfare
LP−Fairness
3
2.98
2.96
2.94
2
4
6
8
10
12
14
16
18
20
Time Slice
Figure 2: Colour codes for payoff plots. The payoff plots
show the average payoffs of player 1 (i.e.
HBA) in each time slice. Every play against a random type lasted 100 rounds and was partitioned
into 20 time slices, consisting of 5 consecutive time steps each. Every play against a (conditioned)
fictitious player lasted 10,000 rounds and was partitioned into 100 time slices, consisting of 100
consecutive time steps each. The minimum and maximum achievable payoffs per round were 1 and
4, respectively.
2.2
Results
Note: A planning horizon of h means that HBA made predictions for the next h actions of player 2.
4
Random
3.300
Utility
Stackelberg
3.280
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.260
3.240
3.220
LP−Welfare
3.200
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
3.180
5
10
15
Time slice (each 5 steps)
20
5
10
15
Time slice (each 5 steps)
20
5
10
15
Time slice (each 5 steps)
20
(a) h = 5
Random
3.300
Utility
3.290
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.280
3.270
3.260
3.250
LP−Welfare
3.240
3.230
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
(b) h = 3
Random
3.280
Utility
3.270
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
LP−Fairness
3.260
3.250
3.240
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
3.230
3.220
(c) h = 1
Figure 3: Player 1 controlled by HBA using leader-follower-trigger agents and planning horizons
h = 5, 3, 1. Player 2 controlled by random type. Results from no-conflict games.
5
Random
3.060
Utility
3.040
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.020
3.000
2.980
LP−Welfare
2.960
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
2.940
5
10
15
Time slice (each 5 steps)
20
5
10
15
Time slice (each 5 steps)
20
5
10
15
Time slice (each 5 steps)
20
(a) h = 5
Random
Utility
3.040
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.020
3.000
2.980
LP−Welfare
LP−Fairness
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
2.960
2.940
(b) h = 3
Random
3.025
Utility
3.020
Stackelberg
3.015
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.010
3.005
3.000
2.995
2.990
2.985
LP−Fairness
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
2.980
2.975
(c) h = 1
Figure 4: Player 1 controlled by HBA using leader-follower-trigger agents and planning horizons
h = 5, 3, 1. Player 2 controlled by random type. Results from conflict games.
6
4.000
Random
3.990
Utility
3.980
Stackelberg
3.970
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.960
3.950
3.940
3.930
3.920
LP−Fairness
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
3.910
3.900
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
(a) h = 5
4.000
Random
Utility
3.980
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.960
3.940
3.920
LP−Welfare
3.900
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
(b) h = 3
4.000
Random
Utility
3.950
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.900
3.850
LP−Welfare
3.800
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
(c) h = 1
Figure 5: Player 1 controlled by HBA using leader-follower-trigger agents and planning horizons
h = 5, 3, 1. Player 2 controlled by fictitious player. Results from no-conflict games.
7
Random
Utility
3.300
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.250
3.200
3.150
LP−Welfare
3.100
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
(a) h = 5
Random
3.250
Utility
Stackelberg
3.200
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.150
3.100
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
3.050
(b) h = 3
Random
2.980
Utility
2.960
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
2.940
2.920
2.900
2.880
2.860
2.840
2.820
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
2.800
(c) h = 1
Figure 6: Player 1 controlled by HBA using leader-follower-trigger agents and planning horizons
h = 5, 3, 1. Player 2 controlled by fictitious player. Results from conflict games.
8
Random
3.960
Utility
3.940
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.920
3.900
3.880
3.860
3.840
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
(a) h = 5
Random
3.960
Utility
3.940
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.920
3.900
3.880
LP−Welfare
3.860
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
3.840
(b) h = 3
Random
3.800
Utility
Stackelberg
3.780
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.760
3.740
3.720
LP−Welfare
3.700
3.680
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
(c) h = 1
Figure 7: Player 1 controlled by HBA using leader-follower-trigger agents and planning horizons
h = 5, 3, 1. Player 2 controlled by conditioned fictitious player. Results from no-conflict games.
9
Random
3.220
Utility
3.210
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.200
3.190
3.180
3.170
3.160
3.150
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
3.140
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
(a) h = 5
3.160
Utility
3.150
Stackelberg
3.140
Welfare
3.130
Average payoff (p1)
Random
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.120
3.110
3.100
3.090
3.080
3.070
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
3.060
(b) h = 3
2.960
Random
Utility
2.940
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
2.920
2.900
2.880
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
2.860
(c) h = 1
Figure 8: Player 1 controlled by HBA using leader-follower-trigger agents and planning horizons
h = 5, 3, 1. Player 2 controlled by conditioned fictitious player. Results from conflict games.
10
Random
3.995
Utility
3.990
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.985
3.980
3.975
3.970
3.965
3.960
3.955
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
3.950
5
10
15
Time slice (each 5 steps)
20
5
10
15
Time slice (each 5 steps)
20
5
10
15
Time slice (each 5 steps)
20
(a) h = 5
Random
3.995
Utility
3.990
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.985
3.980
3.975
3.970
3.965
3.960
3.955
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
3.950
(b) h = 3
3.990
Random
Utility
3.985
Stackelberg
3.980
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.975
3.970
3.965
LP−Welfare
3.960
3.955
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
(c) h = 1
Figure 9: Player 1 controlled by HBA using co-evolved decision trees and planning horizons
h = 5, 3, 1. Player 2 controlled by random type. Results from no-conflict games.
11
Random
3.300
Utility
3.250
Stackelberg
3.200
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.150
3.100
3.050
3.000
2.950
2.900
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
5
10
15
Time slice (each 5 steps)
20
5
10
15
Time slice (each 5 steps)
20
5
10
15
Time slice (each 5 steps)
20
(a) h = 5
Random
3.200
Utility
3.150
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.100
3.050
3.000
LP−Welfare
LP−Fairness
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
2.950
2.900
(b) h = 3
Random
3.010
Utility
3.000
Stackelberg
2.990
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
2.980
2.970
2.960
2.950
2.940
2.930
LP−Fairness
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
2.920
2.910
(c) h = 1
Figure 10: Player 1 controlled by HBA using co-evolved decision trees and planning horizons
h = 5, 3, 1. Player 2 controlled by random type. Results from conflict games.
12
4.000
Utility
3.998
Stackelberg
3.996
Welfare
3.994
Average payoff (p1)
Random
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.992
3.990
3.988
3.986
3.984
3.982
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
3.980
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
(a) h = 5
4.000
Random
Utility
3.995
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.990
3.985
3.980
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
3.975
(b) h = 3
Random
3.866
3.864
Stackelberg
3.862
Welfare
3.860
Average payoff (p1)
Utility
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.858
3.856
3.854
3.852
3.850
LP−Fairness
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
3.848
3.846
(c) h = 1
Figure 11: Player 1 controlled by HBA using co-evolved decision trees and planning horizons
h = 5, 3, 1. Player 2 controlled by fictitious player. Results from no-conflict games.
13
Random
3.320
Utility
3.310
Stackelberg
3.300
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.290
3.280
3.270
3.260
3.250
3.240
3.230
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
3.220
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
(a) h = 5
Random
3.260
Utility
Stackelberg
3.240
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.220
3.200
LP−Welfare
3.180
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
3.160
(b) h = 3
Random
2.890
Stackelberg
2.888
Welfare
2.886
Average payoff (p1)
Utility
Fairness
LP−Utility
LP−Stackel
LP−Welfare
2.884
2.882
2.880
2.878
LP−Fairness
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
2.876
2.874
(c) h = 1
Figure 12: Player 1 controlled by HBA using co-evolved decision trees and planning horizons
h = 5, 3, 1. Player 2 controlled by fictitious player. Results from conflict games.
14
4.000
Random
Utility
3.995
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.990
3.985
3.980
LP−Welfare
3.975
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
3.970
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
(a) h = 5
4.000
Random
Utility
3.995
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.990
3.985
3.980
LP−Welfare
3.975
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
3.970
(b) h = 3
Random
3.895
Utility
Stackelberg
3.890
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.885
3.880
LP−Welfare
3.875
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
3.870
(c) h = 1
Figure 13: Player 1 controlled by HBA using co-evolved decision trees and planning horizons
h = 5, 3, 1. Player 2 controlled by conditioned fictitious player. Results from no-conflict games.
15
Random
3.240
Utility
Stackelberg
3.230
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.220
3.210
3.200
LP−Welfare
3.190
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
(a) h = 5
Random
3.115
Utility
Stackelberg
3.110
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.105
3.100
3.095
3.090
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
(b) h = 3
Random
2.812
Utility
2.810
Stackelberg
2.808
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
2.806
2.804
2.802
2.800
LP−Welfare
2.798
2.796
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
2.794
(c) h = 1
Figure 14: Player 1 controlled by HBA using co-evolved decision trees and planning horizons
h = 5, 3, 1. Player 2 controlled by conditioned fictitious player. Results from conflict games.
16
Random
3.930
Utility
Stackelberg
3.920
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.910
3.900
3.890
3.880
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
5
10
15
Time slice (each 5 steps)
20
5
10
15
Time slice (each 5 steps)
20
5
10
15
Time slice (each 5 steps)
20
(a) h = 5
Random
3.930
Utility
Stackelberg
3.920
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.910
3.900
3.890
3.880
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
(b) h = 3
Random
3.930
Utility
Stackelberg
3.920
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
LP−Fairness
3.910
3.900
3.890
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
3.880
3.870
(c) h = 1
Figure 15: Player 1 controlled by HBA using co-evolved neural networks and planning horizons
h = 5, 3, 1. Player 2 controlled by random type. Results from no-conflict games.
17
Random
3.000
Utility
2.980
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
2.960
2.940
2.920
2.900
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
2.860
Payoff (p1)
2.880
LP−Fairness
Conv (p1)
LP−Welfare
2.840
5
10
15
Time slice (each 5 steps)
20
5
10
15
Time slice (each 5 steps)
20
5
10
15
Time slice (each 5 steps)
20
(a) h = 5
Random
2.960
Utility
Stackelberg
2.940
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
2.920
2.900
2.880
LP−Welfare
2.860
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
2.840
(b) h = 3
Random
Utility
2.900
Stackelberg
2.890
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
2.880
2.870
2.860
2.850
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
(c) h = 1
Figure 16: Player 1 controlled by HBA using co-evolved neural networks and planning horizons
h = 5, 3, 1. Player 2 controlled by random type. Results from conflict games.
18
4.000
Random
Utility
3.980
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.960
3.940
3.920
3.900
3.880
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
(a) h = 5
4.000
Random
Utility
3.980
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.960
3.940
3.920
3.900
3.880
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
(b) h = 3
Random
3.900
Utility
3.880
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.860
3.840
3.820
LP−Welfare
3.800
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
3.780
(c) h = 1
Figure 17: Player 1 controlled by HBA using co-evolved neural networks and planning horizons
h = 5, 3, 1. Player 2 controlled by fictitious player. Results from no-conflict games.
19
Random
3.320
Utility
3.300
Stackelberg
3.280
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.260
3.240
3.220
3.200
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
3.160
Conv (p2)
LP−Fairness
Payoff (p1)
3.180
Conv (p1)
LP−Welfare
3.140
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
(a) h = 5
3.300
Utility
3.280
Stackelberg
3.260
Welfare
3.240
Average payoff (p1)
Random
Fairness
LP−Utility
LP−Stackel
3.220
3.200
3.180
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
3.140
Payoff (p1)
LP−Fairness
Conv (p2)
3.160
Conv (p1)
LP−Welfare
3.120
(b) h = 3
Random
2.920
Utility
Stackelberg
2.900
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
2.880
2.860
LP−Welfare
LP−Fairness
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
2.840
2.820
(c) h = 1
Figure 18: Player 1 controlled by HBA using co-evolved neural networks and planning horizons
h = 5, 3, 1. Player 2 controlled by fictitious player. Results from conflict games.
20
Random
3.960
Utility
3.950
Stackelberg
3.940
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.930
3.920
3.910
3.900
3.870
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
3.880
Conv (p2)
LP−Fairness
Payoff (p1)
3.890
Conv (p1)
LP−Welfare
3.860
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
(a) h = 5
Random
3.960
Utility
3.940
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.920
3.900
3.880
LP−Welfare
3.860
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
3.840
(b) h = 3
Random
3.900
Utility
Stackelberg
3.880
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.860
3.840
LP−Welfare
3.820
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
3.800
(c) h = 1
Figure 19: Player 1 controlled by HBA using co-evolved neural networks and planning horizons
h = 5, 3, 1. Player 2 controlled by conditioned fictitious player. Results from no-conflict games.
21
Random
3.200
Utility
3.180
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
3.160
3.140
3.120
LP−Welfare
3.100
Fairness op
Welfare op
Nash eq
Pareto op
Welfare
Fairness
Payoff (p2)
Conv (p2)
Payoff (p1)
Conv (p1)
LP−Fairness
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
20
40
60
Time slice (each 100 steps)
80
100
(a) h = 5
Random
3.080
Utility
Stackelberg
3.060
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
LP−Welfare
3.040
3.020
3.000
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
2.980
(b) h = 3
Random
2.820
Utility
Stackelberg
Average payoff (p1)
Welfare
Fairness
LP−Utility
LP−Stackel
2.810
2.800
2.790
LP−Welfare
2.780
Fairness op
Welfare op
Nash eq
Pareto op
Fairness
Welfare
Payoff (p2)
Payoff (p1)
Conv (p2)
Conv (p1)
LP−Fairness
2.770
(c) h = 1
Figure 20: Player 1 controlled by HBA using co-evolved neural networks and planning horizons
h = 5, 3, 1. Player 2 controlled by conditioned fictitious player. Results from conflict games.
22
References
[1] S. V. Albrecht, J. W. Crandall, and S. Ramamoorthy. An empirical study on the practical impact
of prior beliefs over policy types. In Proceedings of the 29th AAAI Conference on Artificial
Intelligence, Austin, Texas, USA, January 2015.
23