Multiagent Systems Lecture 2 Introduction to Multiagent Systems System Characteristics for Coordination: Objective Alignment in Multiagent Systems Difference Objectives: Design and Visualization Kagan Tumer, [email protected] Definitions • Distributed Computing: Parallelization, synchronization − Information is distributed, control generally is not • Distributed AI: Problem solving, communication − Information and control distributed • Distributed Problem Solving: Task Decomposition − Information and control distributed • Distributed Control: Local solutions, synchronization − Control is distributed, information may not be • Multiagent Systems: Coordination, interaction − Simple behavior, complex interactions − No guarantees about other agents or system interactions − Limited synchronization, communication, decomposition Kagan Tumer, [email protected] 1 Multiagent Systems • Design autonomous agents that: − Interact with one another − Have limited observation about the environment − Have limited communication − Have only local control capabilities − Interact in asynchronous manner • Analyze autonomous agents (possibly pre-existing) that: − Have all the properties listed above • Key issues: − No centralized control − No centralized data − No synchonization Kagan Tumer, [email protected] Types of Multiagent Systems • Cooperative multiagent systems − Homogeneous agents − Heterogeneous agents − Partially heterogeneous agents • Competitive multiagent systems − General sum games • Hybrid multiagent systems − Some cooperative agents, some competitive agents Kagan Tumer, [email protected] 2 Learning in Multiagent Systems • Multiagent Learning − Single learner (is this a MAS?) − Individual learning (all agents) − Individual learning (some agents) − Team learning − Models? o Agents model other agents o Agents respond to environment (which includes other agents) Kagan Tumer, [email protected] Multiagent System Organization • Hierarchy: task and resource allocation propagates down from above − What happens when an agent breaks down? o Take out whole tree under it − Who solves full problem? o Top agent? Centralized solution? • Market: Bids for tasks and resources − Inherently distributed o Robust to agent failures − Who solves the full problem? o System interactions? Invisible hand? • Specialization: tasks decomposed based on resources, capabilities − Inherently distributed o Robust to some agent failures. Some specialization may not be duplicated − Who solves the full problem? o System interactions? Kagan Tumer, [email protected] 3 Agent Interactions • Communication − Who talks to whom • Trust − Information from other agents reliable? o Trading agents o Team of robots • Modeling − Do agents know what other agents will do? • Team Formation − Cooperation, shared benefits • Coalitions − Cooperation, self-interest Kagan Tumer, [email protected] Measuring System performance • The system performance can be measured by: − Utility functions o Economics, game theory − Reward functions o Reinforcement Learning − Objective functions o Optimization − Goals o Planning − Evaluation functions o Evolutionary algorithms Kagan Tumer, [email protected] 4 Measuring Agent Performance • Self interested agents: Agents maximize their own performance • Each agent has its own objective function • Agents do not know about possible “system” objective function • Benefits: − Simple to pose problem − Local actions, local performance measure − Leverage economics, social science insights Kagan Tumer, [email protected] Measuring Agent Performance • Potential problems with self interested agents: − What happens to global behavior ? − Tragedy of the commons ? − Duplication of tasks • Potential solutions: − Market based methods − Negotiations − Reward shaping − Difference objectives Kagan Tumer, [email protected] 5 Multiagent Systems Lecture 2 Introduction to Multiagent Systems System Characteristics for Coordination: Objective Alignment in Multiagent Systems Difference Objectives: Design and Visualization Kagan Tumer, [email protected] Relation to Other Fields Mechanism Design Multigent Systems Reinforcement Learning Objective Alignment Operations Research Adaptive Control Game Theory Econophysics Computational Economics Swarm Intelligence 6 Multiagent Objective Alignment • Consider a large multiagent system where − Each agent has a private objective it is trying to optimize; and − There is a system objective function measuring the full system’s performance • Key Questions: − How to set agent objective functions? − How to update them (team formation)? − How to modify them to changing objectives (reconfiguration)? − What happened when agents can’t compute those objectives? − What happens when information is missing? − What happens when some agents start to fail? Analogy: A company • • • • Full System System objective Agents Agent objectives Company Valuation of company Employees Compensation packages 7 Analogy: A company • • • • Full System System objective Agents Agent objectives Company Valuation of company Employees Compensation packages • Design problem (faced by the board): − How to set/modify compensation packages (agent objectives) of the employees to increase valuation of company (system objective) o Salary o bonuses o Benefits o Stock options − Note: Board does not tell each individual what to do. They set the “incentive packages” for employees (including the CEO). Key Concepts for Coordinated MAS • Factoredness: Degree to which an agent’s objective is “aligned” with the system objective – e.g. stock options are factored w.r.t. company valuation. • Learnability: Based on sensitivity of an agent’s private objective to changes in its state (signal-to-noise). – e.g., performance bonuses increase learnability of agent’s objective • Interesting question: If you could, would you want everyone’s objective to be valuation of company? – Factored, yes; but what about learnability? 8 Nomenclature • • • • • z zi z-i ci z-i + ci State of full system State of agent i State of full system, except agent i Fixed vector (independent of agent i) Full state with “counterfactual” agent i • G(z) Reward/Objective for full system • gi(z) Reward/Objective for agent i Factoredness Factoredness: Degree to which an agent’s objective function is “aligned” with the system objective (z'−i = z−i ) € For continuous states: 9 Factoredness Factoredness: Degree to which an agent’s objective function is “aligned” with the system objective (z'−i = z−i ) Fgi= Actions of i that improve/deteriorate gi AND G All actions of i € gi(z) G(z) Full High Low Zero Anti Learnability Learnability: Degree to which an agent’s objective function is sensitive to its own actions, as opposed to the “background” noise of other agents’ actions 10 Learnability Learnability: Degree to which an agent’s objective function is sensitive to its own actions, as opposed to the “background” noise of other agents’ actions Lgi = Change in gi as a result of i’s actions Change in gi as a result of other agents’ actions gi(z) G(z) Low Learnability High Learnability High Learnability Properties gi(z) G(z) High Factoredness Low Learnability Low Factoredness High Learnability High Factoredness High Learnability 11 General Solution • To get agent objective with high factoredness and learnability, start with: gi (z) = G(z) − G(z−i +c i ) • € gi is aligned with G G(z-i+ci) is independent of i gi has cleaner signal than G G(z-i+ci) removes noise If g, G differentiable, then: ∂G(z−i + c i ) =0 ∂zi ∂gi (z) ∂G(z) = ∂zi ∂zi € € General Solution • Two examples for ci : • ci = 0 gi (z) = G(z) − G(z−i ) “world without me” • ci = ai € gi (z) = G(z) − G(z−i + ai ) “world with average me” € 12 Research Issues: • In general agents may not be able to compute g: − Limited Observability − Restricted Communication − Temporal separation − Spatial separation − Limited Computation • Solutions: − Estimate missing information − Leverage local information − Approximate G or z − Trade-off factoredness vs. learnability Multiagent Systems Lecture 2 Introduction to Multiagent Systems System Characteristics for Coordination: Objective Alignment in Multiagent Systems Difference Objectives: Design and Visualization Kagan Tumer, [email protected] 13 General Solution • To get agent objectives with high factoredness and learnability, start with: gi (z) = G(z) − G(z−i +c i ) • € If g, G differentiable, then: gi is aligned with G G(z-i+ci) is independent of i gi has cleaner signal than G G(z-i+ci) removes noise ∂G(z−i + c i ) =0 ∂zi ∂gi (z) ∂G(z) = ∂zi ∂zi € € Thinking in terms of Multiagent Systems • Distributed system can be designed from the ground up: − Routing data in a network. − Each router gets a new goal (not shortest path routing) • Distributed system kept as is, with a MAS “floating” on top: − Data download from a constellation of satellites. − A fixed algorithm controls the download of data. − The MAS simply sets “ghost” traffic along the links, modifying how the algorithm perceives the system. • Non-distributed system viewed as a MAS: − A traditional optimization algorithm (e.g., simulated annealing) − Variables viewed as agents. 14 Example 1: Rover Coordination Rovers • Rovers observe points of interest (POIs) – POIs vary in value, time and place – Get more information closer to POI – Only primary observation counts • Learning problem Low Valued POIs High Valued POI – Rovers learn in single trial (non-episodic) • Dynamic: POIs appear/disappear – Rovers reset at regular intervals (episodic) • Static: POIs the same in each episode • Dynamic: POIs different in each episode Objective Functions Global (Fully Factored, Low Learnability) “Perfectly Learnable” (Low Factoredness, ∞ Learnability) Difference (High Factoredness, High Learnability) 15 Agent State Projection Points of Interest Points of Interest Sensor Rover Sensor Agent State Projection Points of Interest Sensor Rover Sensor More Rovers Points of Interest More POIs Σ X Coord 16 Agent State Projection Factoredness Computation Points of Interest Sensor Y Coord Rover Sensor Σ More Rovers Points of Interest More POIs Σ X Coord Analyzed Rewards • Pi : Sum of POI values observed by agent i • Gi : Sum of POI values observed by all agents • Di : Sum of POI values observed by agent i that would have gone unobserved by other agents • Di(PO): Di with rovers communication restricted to distance they can travel in one step (3% of space) 17 Dynamic Environments Project to Problem Domain Pi Di (PO) 18 Project to Problem Domain Di (PO) Pi Example 2: Constellation of Satellites • Problem: − A set of satellites receives data faster than they can download (eg., in orbit around Earth, or for that matter Mars) − Central control difficult (e.g., too many, communication delays) − G given by: ∑w Gt = 1 − l j ijt ij ∑w ij j y ijt o lijt : amount of data of importance j lost at satellite i at time t o yijt : amount of data of importance j introduced at satellite i at time t o Wj : importance of data j 19 satellites Constellation of Satellites: Approach • Adaptively route data to minimize importance weighted data loss: − Devise a baseline algorithm. For example, use shortest path-like algorithm that aims to maximize unused bandwidth. • Now, “fool” the baseline algorithm by introducing “ghost” traffic among the satellites − Algorithm is the same, but its view of the world is distorted • Agents sit on each link • Agents set “ghost” traffic • Important note: This approach does NOT depend on baseline algorithm. It “floats” on top of it. 20 Constellations of Satellites Results Constellations of Satellites Results 21
© Copyright 2026 Paperzz