An Empirical Study on the Practical Impact of Prior Beliefs over Policy Types — Appendix Stefano V. Albrecht The University of Edinburgh Edinburgh, United Kingdom [email protected] Jacob W. Crandall Masdar Institute of Science and Technology Abu Dhabi, United Arab Emirates [email protected] Subramanian Ramamoorthy The University of Edinburgh Edinburgh, United Kingdom [email protected] This document is an appendix to [1]. It contains a listing of all parameter settings used in the study as well as plots for each experiment. 1 Contents 1 Parameter Settings 1.1 HBA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Leader-Follower-Trigger Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Co-Evolved Decision Trees / Neural Networks . . . . . . . . . . . . . . . . . . . . . . 3 3 3 3 2 Plots of Results 2.1 Colour Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 4 List of Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Colour codes for heat matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Colour codes for payoff plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leader-Follower-Trigger Agents – Random Type – No-Conflict Games . . . . . . . . Leader-Follower-Trigger Agents – Random Type – Conflict Games . . . . . . . . . . Leader-Follower-Trigger Agents – Fictitious Player – No-Conflict Games . . . . . . . Leader-Follower-Trigger Agents – Fictitious Player – Conflict Games . . . . . . . . . Leader-Follower-Trigger Agents – Conditioned Fictitious Player – No-Conflict Games Leader-Follower-Trigger Agents – Conditioned Fictitious Player – Conflict Games . . Co-Evolved Decision Trees – Random Type – No-Conflict Games . . . . . . . . . . . Co-Evolved Decision Trees – Random Type – Conflict Games . . . . . . . . . . . . . Co-Evolved Decision Trees – Fictitious Player – No-Conflict Games . . . . . . . . . . Co-Evolved Decision Trees – Fictitious Player – Conflict Games . . . . . . . . . . . . Co-Evolved Decision Trees – Conditioned Fictitious Player – No-Conflict Games . . Co-Evolved Decision Trees – Conditioned Fictitious Player – Conflict Games . . . . Co-Evolved Neural Networks – Random Type – No-Conflict Games . . . . . . . . . . Co-Evolved Neural Networks – Random Type – Conflict Games . . . . . . . . . . . . Co-Evolved Neural Networks – Fictitious Player – No-Conflict Games . . . . . . . . Co-Evolved Neural Networks – Fictitious Player – Conflict Games . . . . . . . . . . Co-Evolved Neural Networks – Conditioned Fictitious Player – No-Conflict Games . Co-Evolved Neural Networks – Conditioned Fictitious Player – Conflict Games . . . 2 4 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 1 Parameter Settings 1.1 HBA • 10 hypothesised types for player 2 (i.e. |Θ∗2 | = 10) • True type of player 2 always included in Θ∗2 • Value/LP priors: t = 5 and b = 10 1.2 Leader-Follower-Trigger Agents • Maximum number of joint actions in target solutions: 2 • Target solution admissible if average payoff ≥ maximin value (for each player) 1.3 Co-Evolved Decision Trees / Neural Networks • Number of populations: 2 (one for each player) • Individuals per population: 50 (first population randomly generated) • Fitness = average payoff after 20 rounds (∈ [1, 4]) – average similarity (∈ [0, 1]) • Each individual evaluated against random 40% of other population • Resampling method: linear ranking • Decision Trees: – Tree depth: 3 (up to three previous actions of other player) – Similarity: percentage of nodes with same action choice – Evolutions: 300 (evolution with highest average fitness used) – Random mutation of single node (flipping action): 5% of population – Random crossing of sub-trees (preserving tree depth): 30% of population • Neural Networks: – Input layer: 4 nodes (up to two previous joint actions) – Hidden layer: 5 nodes – Output layer: 1 node (probability of action 1) – Each node fully-connected with nodes of next layer – Standard sigmoidal threshold function – Similarity: 1 – average difference of output for each input – Evolutions: 1000 (evolution with highest average fitness used) – Random mutation of single input weight (standard normal shift): 20% of population – Random crossing of nodes (preserving network structure): 10% of population 3 2 2.1 Plots of Results Colour Codes 1 Random 0.9 Utility 0.8 Stackelberg 0.7 Welfare 0.6 Fairness 0.5 LP−Utility 0.4 LP−Stackel 0.3 0.2 LP−Welfare 0.1 Welfare op 0 Fairness op Nash eq Pareto op Welfare Fairness Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness Figure 1: Colour codes for heat matrices. Performance criteria are measured over time slices, which are a temporal partitioning of a play. Each element (r, c) in the heat matrix corresponds to 3.08 the percentage of time slices in which the prior belief r produced significantly higher values for the criterion c than the Uniform prior, averaged over all plays in all tested games. (Note: binary criteria 3.06 equilibrium assume 0/1-values in each time slice.) All significance such as convergence and Nash statements are based on paired right-sided t-tests with a 5% significance level. Average Payoff (p1) 3.04 3.02 Uniform Random Utility Stackelberg Welfare Fairness LP−Utility LP−Stackelberg LP−Welfare LP−Fairness 3 2.98 2.96 2.94 2 4 6 8 10 12 14 16 18 20 Time Slice Figure 2: Colour codes for payoff plots. The payoff plots show the average payoffs of player 1 (i.e. HBA) in each time slice. Every play against a random type lasted 100 rounds and was partitioned into 20 time slices, consisting of 5 consecutive time steps each. Every play against a (conditioned) fictitious player lasted 10,000 rounds and was partitioned into 100 time slices, consisting of 100 consecutive time steps each. The minimum and maximum achievable payoffs per round were 1 and 4, respectively. 2.2 Results Note: A planning horizon of h means that HBA made predictions for the next h actions of player 2. 4 Random 3.300 Utility Stackelberg 3.280 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.260 3.240 3.220 LP−Welfare 3.200 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 3.180 5 10 15 Time slice (each 5 steps) 20 5 10 15 Time slice (each 5 steps) 20 5 10 15 Time slice (each 5 steps) 20 (a) h = 5 Random 3.300 Utility 3.290 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.280 3.270 3.260 3.250 LP−Welfare 3.240 3.230 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness (b) h = 3 Random 3.280 Utility 3.270 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare LP−Fairness 3.260 3.250 3.240 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) 3.230 3.220 (c) h = 1 Figure 3: Player 1 controlled by HBA using leader-follower-trigger agents and planning horizons h = 5, 3, 1. Player 2 controlled by random type. Results from no-conflict games. 5 Random 3.060 Utility 3.040 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.020 3.000 2.980 LP−Welfare 2.960 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 2.940 5 10 15 Time slice (each 5 steps) 20 5 10 15 Time slice (each 5 steps) 20 5 10 15 Time slice (each 5 steps) 20 (a) h = 5 Random Utility 3.040 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.020 3.000 2.980 LP−Welfare LP−Fairness Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) 2.960 2.940 (b) h = 3 Random 3.025 Utility 3.020 Stackelberg 3.015 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.010 3.005 3.000 2.995 2.990 2.985 LP−Fairness Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) 2.980 2.975 (c) h = 1 Figure 4: Player 1 controlled by HBA using leader-follower-trigger agents and planning horizons h = 5, 3, 1. Player 2 controlled by random type. Results from conflict games. 6 4.000 Random 3.990 Utility 3.980 Stackelberg 3.970 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.960 3.950 3.940 3.930 3.920 LP−Fairness Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) 3.910 3.900 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 (a) h = 5 4.000 Random Utility 3.980 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.960 3.940 3.920 LP−Welfare 3.900 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness (b) h = 3 4.000 Random Utility 3.950 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.900 3.850 LP−Welfare 3.800 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness (c) h = 1 Figure 5: Player 1 controlled by HBA using leader-follower-trigger agents and planning horizons h = 5, 3, 1. Player 2 controlled by fictitious player. Results from no-conflict games. 7 Random Utility 3.300 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.250 3.200 3.150 LP−Welfare 3.100 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 (a) h = 5 Random 3.250 Utility Stackelberg 3.200 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.150 3.100 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 3.050 (b) h = 3 Random 2.980 Utility 2.960 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 2.940 2.920 2.900 2.880 2.860 2.840 2.820 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 2.800 (c) h = 1 Figure 6: Player 1 controlled by HBA using leader-follower-trigger agents and planning horizons h = 5, 3, 1. Player 2 controlled by fictitious player. Results from conflict games. 8 Random 3.960 Utility 3.940 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.920 3.900 3.880 3.860 3.840 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 (a) h = 5 Random 3.960 Utility 3.940 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.920 3.900 3.880 LP−Welfare 3.860 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 3.840 (b) h = 3 Random 3.800 Utility Stackelberg 3.780 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.760 3.740 3.720 LP−Welfare 3.700 3.680 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness (c) h = 1 Figure 7: Player 1 controlled by HBA using leader-follower-trigger agents and planning horizons h = 5, 3, 1. Player 2 controlled by conditioned fictitious player. Results from no-conflict games. 9 Random 3.220 Utility 3.210 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.200 3.190 3.180 3.170 3.160 3.150 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 3.140 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 (a) h = 5 3.160 Utility 3.150 Stackelberg 3.140 Welfare 3.130 Average payoff (p1) Random Fairness LP−Utility LP−Stackel LP−Welfare 3.120 3.110 3.100 3.090 3.080 3.070 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 3.060 (b) h = 3 2.960 Random Utility 2.940 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 2.920 2.900 2.880 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 2.860 (c) h = 1 Figure 8: Player 1 controlled by HBA using leader-follower-trigger agents and planning horizons h = 5, 3, 1. Player 2 controlled by conditioned fictitious player. Results from conflict games. 10 Random 3.995 Utility 3.990 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.985 3.980 3.975 3.970 3.965 3.960 3.955 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 3.950 5 10 15 Time slice (each 5 steps) 20 5 10 15 Time slice (each 5 steps) 20 5 10 15 Time slice (each 5 steps) 20 (a) h = 5 Random 3.995 Utility 3.990 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.985 3.980 3.975 3.970 3.965 3.960 3.955 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 3.950 (b) h = 3 3.990 Random Utility 3.985 Stackelberg 3.980 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.975 3.970 3.965 LP−Welfare 3.960 3.955 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness (c) h = 1 Figure 9: Player 1 controlled by HBA using co-evolved decision trees and planning horizons h = 5, 3, 1. Player 2 controlled by random type. Results from no-conflict games. 11 Random 3.300 Utility 3.250 Stackelberg 3.200 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.150 3.100 3.050 3.000 2.950 2.900 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 5 10 15 Time slice (each 5 steps) 20 5 10 15 Time slice (each 5 steps) 20 5 10 15 Time slice (each 5 steps) 20 (a) h = 5 Random 3.200 Utility 3.150 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.100 3.050 3.000 LP−Welfare LP−Fairness Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) 2.950 2.900 (b) h = 3 Random 3.010 Utility 3.000 Stackelberg 2.990 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 2.980 2.970 2.960 2.950 2.940 2.930 LP−Fairness Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) 2.920 2.910 (c) h = 1 Figure 10: Player 1 controlled by HBA using co-evolved decision trees and planning horizons h = 5, 3, 1. Player 2 controlled by random type. Results from conflict games. 12 4.000 Utility 3.998 Stackelberg 3.996 Welfare 3.994 Average payoff (p1) Random Fairness LP−Utility LP−Stackel LP−Welfare 3.992 3.990 3.988 3.986 3.984 3.982 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 3.980 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 (a) h = 5 4.000 Random Utility 3.995 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.990 3.985 3.980 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 3.975 (b) h = 3 Random 3.866 3.864 Stackelberg 3.862 Welfare 3.860 Average payoff (p1) Utility Fairness LP−Utility LP−Stackel LP−Welfare 3.858 3.856 3.854 3.852 3.850 LP−Fairness Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) 3.848 3.846 (c) h = 1 Figure 11: Player 1 controlled by HBA using co-evolved decision trees and planning horizons h = 5, 3, 1. Player 2 controlled by fictitious player. Results from no-conflict games. 13 Random 3.320 Utility 3.310 Stackelberg 3.300 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.290 3.280 3.270 3.260 3.250 3.240 3.230 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 3.220 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 (a) h = 5 Random 3.260 Utility Stackelberg 3.240 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.220 3.200 LP−Welfare 3.180 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 3.160 (b) h = 3 Random 2.890 Stackelberg 2.888 Welfare 2.886 Average payoff (p1) Utility Fairness LP−Utility LP−Stackel LP−Welfare 2.884 2.882 2.880 2.878 LP−Fairness Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) 2.876 2.874 (c) h = 1 Figure 12: Player 1 controlled by HBA using co-evolved decision trees and planning horizons h = 5, 3, 1. Player 2 controlled by fictitious player. Results from conflict games. 14 4.000 Random Utility 3.995 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.990 3.985 3.980 LP−Welfare 3.975 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 3.970 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 (a) h = 5 4.000 Random Utility 3.995 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.990 3.985 3.980 LP−Welfare 3.975 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 3.970 (b) h = 3 Random 3.895 Utility Stackelberg 3.890 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.885 3.880 LP−Welfare 3.875 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 3.870 (c) h = 1 Figure 13: Player 1 controlled by HBA using co-evolved decision trees and planning horizons h = 5, 3, 1. Player 2 controlled by conditioned fictitious player. Results from no-conflict games. 15 Random 3.240 Utility Stackelberg 3.230 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.220 3.210 3.200 LP−Welfare 3.190 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 (a) h = 5 Random 3.115 Utility Stackelberg 3.110 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.105 3.100 3.095 3.090 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness (b) h = 3 Random 2.812 Utility 2.810 Stackelberg 2.808 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 2.806 2.804 2.802 2.800 LP−Welfare 2.798 2.796 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 2.794 (c) h = 1 Figure 14: Player 1 controlled by HBA using co-evolved decision trees and planning horizons h = 5, 3, 1. Player 2 controlled by conditioned fictitious player. Results from conflict games. 16 Random 3.930 Utility Stackelberg 3.920 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.910 3.900 3.890 3.880 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 5 10 15 Time slice (each 5 steps) 20 5 10 15 Time slice (each 5 steps) 20 5 10 15 Time slice (each 5 steps) 20 (a) h = 5 Random 3.930 Utility Stackelberg 3.920 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.910 3.900 3.890 3.880 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness (b) h = 3 Random 3.930 Utility Stackelberg 3.920 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare LP−Fairness 3.910 3.900 3.890 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) 3.880 3.870 (c) h = 1 Figure 15: Player 1 controlled by HBA using co-evolved neural networks and planning horizons h = 5, 3, 1. Player 2 controlled by random type. Results from no-conflict games. 17 Random 3.000 Utility 2.980 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 2.960 2.940 2.920 2.900 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) 2.860 Payoff (p1) 2.880 LP−Fairness Conv (p1) LP−Welfare 2.840 5 10 15 Time slice (each 5 steps) 20 5 10 15 Time slice (each 5 steps) 20 5 10 15 Time slice (each 5 steps) 20 (a) h = 5 Random 2.960 Utility Stackelberg 2.940 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 2.920 2.900 2.880 LP−Welfare 2.860 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 2.840 (b) h = 3 Random Utility 2.900 Stackelberg 2.890 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 2.880 2.870 2.860 2.850 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness (c) h = 1 Figure 16: Player 1 controlled by HBA using co-evolved neural networks and planning horizons h = 5, 3, 1. Player 2 controlled by random type. Results from conflict games. 18 4.000 Random Utility 3.980 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.960 3.940 3.920 3.900 3.880 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 (a) h = 5 4.000 Random Utility 3.980 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.960 3.940 3.920 3.900 3.880 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness (b) h = 3 Random 3.900 Utility 3.880 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.860 3.840 3.820 LP−Welfare 3.800 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 3.780 (c) h = 1 Figure 17: Player 1 controlled by HBA using co-evolved neural networks and planning horizons h = 5, 3, 1. Player 2 controlled by fictitious player. Results from no-conflict games. 19 Random 3.320 Utility 3.300 Stackelberg 3.280 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.260 3.240 3.220 3.200 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) 3.160 Conv (p2) LP−Fairness Payoff (p1) 3.180 Conv (p1) LP−Welfare 3.140 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 (a) h = 5 3.300 Utility 3.280 Stackelberg 3.260 Welfare 3.240 Average payoff (p1) Random Fairness LP−Utility LP−Stackel 3.220 3.200 3.180 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) 3.140 Payoff (p1) LP−Fairness Conv (p2) 3.160 Conv (p1) LP−Welfare 3.120 (b) h = 3 Random 2.920 Utility Stackelberg 2.900 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 2.880 2.860 LP−Welfare LP−Fairness Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) 2.840 2.820 (c) h = 1 Figure 18: Player 1 controlled by HBA using co-evolved neural networks and planning horizons h = 5, 3, 1. Player 2 controlled by fictitious player. Results from conflict games. 20 Random 3.960 Utility 3.950 Stackelberg 3.940 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.930 3.920 3.910 3.900 3.870 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) 3.880 Conv (p2) LP−Fairness Payoff (p1) 3.890 Conv (p1) LP−Welfare 3.860 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 (a) h = 5 Random 3.960 Utility 3.940 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.920 3.900 3.880 LP−Welfare 3.860 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 3.840 (b) h = 3 Random 3.900 Utility Stackelberg 3.880 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.860 3.840 LP−Welfare 3.820 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 3.800 (c) h = 1 Figure 19: Player 1 controlled by HBA using co-evolved neural networks and planning horizons h = 5, 3, 1. Player 2 controlled by conditioned fictitious player. Results from no-conflict games. 21 Random 3.200 Utility 3.180 Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 3.160 3.140 3.120 LP−Welfare 3.100 Fairness op Welfare op Nash eq Pareto op Welfare Fairness Payoff (p2) Conv (p2) Payoff (p1) Conv (p1) LP−Fairness 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 20 40 60 Time slice (each 100 steps) 80 100 (a) h = 5 Random 3.080 Utility Stackelberg 3.060 Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel LP−Welfare 3.040 3.020 3.000 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 2.980 (b) h = 3 Random 2.820 Utility Stackelberg Average payoff (p1) Welfare Fairness LP−Utility LP−Stackel 2.810 2.800 2.790 LP−Welfare 2.780 Fairness op Welfare op Nash eq Pareto op Fairness Welfare Payoff (p2) Payoff (p1) Conv (p2) Conv (p1) LP−Fairness 2.770 (c) h = 1 Figure 20: Player 1 controlled by HBA using co-evolved neural networks and planning horizons h = 5, 3, 1. Player 2 controlled by conditioned fictitious player. Results from conflict games. 22 References [1] S. V. Albrecht, J. W. Crandall, and S. Ramamoorthy. An empirical study on the practical impact of prior beliefs over policy types. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, Texas, USA, January 2015. 23
© Copyright 2026 Paperzz