Multiagent Systems Definitions

Multiagent Systems
Lecture 2
Introduction to Multiagent Systems
System Characteristics for Coordination:
Objective Alignment in Multiagent Systems
Difference Objectives:
Design and Visualization
Kagan Tumer, [email protected]
Definitions
• Distributed Computing: Parallelization, synchronization
− Information is distributed, control generally is not
• Distributed AI: Problem solving, communication
− Information and control distributed
• Distributed Problem Solving: Task Decomposition
− Information and control distributed
• Distributed Control: Local solutions, synchronization
− Control is distributed, information may not be
• Multiagent Systems: Coordination, interaction
− Simple behavior, complex interactions
− No guarantees about other agents or system interactions
− Limited synchronization, communication, decomposition
Kagan Tumer, [email protected]
1
Multiagent Systems
• Design autonomous agents that:
− Interact with one another
− Have limited observation about the environment
− Have limited communication
− Have only local control capabilities
− Interact in asynchronous manner
• Analyze autonomous agents (possibly pre-existing) that:
− Have all the properties listed above
• Key issues:
− No centralized control
− No centralized data
− No synchonization
Kagan Tumer, [email protected]
Types of Multiagent Systems
• Cooperative multiagent systems
− Homogeneous agents
− Heterogeneous agents
− Partially heterogeneous agents
• Competitive multiagent systems
− General sum games
• Hybrid multiagent systems
− Some cooperative agents, some competitive agents
Kagan Tumer, [email protected]
2
Learning in Multiagent Systems
• Multiagent Learning
− Single learner (is this a MAS?)
− Individual learning (all agents)
− Individual learning (some agents)
− Team learning
− Models?
o Agents model other agents
o Agents respond to environment (which includes other agents)
Kagan Tumer, [email protected]
Multiagent System Organization
• Hierarchy: task and resource allocation propagates down from above
− What happens when an agent breaks down?
o Take out whole tree under it
− Who solves full problem?
o Top agent? Centralized solution?
• Market: Bids for tasks and resources
− Inherently distributed
o Robust to agent failures
− Who solves the full problem?
o System interactions? Invisible hand?
• Specialization: tasks decomposed based on resources, capabilities
− Inherently distributed
o Robust to some agent failures. Some specialization may not be duplicated
− Who solves the full problem?
o System interactions?
Kagan Tumer, [email protected]
3
Agent Interactions
• Communication
− Who talks to whom
• Trust
− Information from other agents reliable?
o Trading agents
o Team of robots
• Modeling
− Do agents know what other agents will do?
• Team Formation
− Cooperation, shared benefits
• Coalitions
− Cooperation, self-interest
Kagan Tumer, [email protected]
Measuring System performance
• The system performance can be measured by:
− Utility functions
o Economics, game theory
− Reward functions
o Reinforcement Learning
− Objective functions
o Optimization
− Goals
o Planning
− Evaluation functions
o Evolutionary algorithms
Kagan Tumer, [email protected]
4
Measuring Agent Performance
• Self interested agents: Agents maximize their own performance
• Each agent has its own objective function
• Agents do not know about possible “system” objective function
• Benefits:
− Simple to pose problem
− Local actions, local performance measure
− Leverage economics, social science insights
Kagan Tumer, [email protected]
Measuring Agent Performance
• Potential problems with self interested agents:
− What happens to global behavior ?
− Tragedy of the commons ?
− Duplication of tasks
• Potential solutions:
− Market based methods
− Negotiations
− Reward shaping
− Difference objectives
Kagan Tumer, [email protected]
5
Multiagent Systems
Lecture 2
Introduction to Multiagent Systems
System Characteristics for Coordination:
Objective Alignment in Multiagent Systems
Difference Objectives:
Design and Visualization
Kagan Tumer, [email protected]
Relation to Other Fields
Mechanism
Design
Multigent
Systems
Reinforcement
Learning
Objective
Alignment
Operations
Research
Adaptive
Control
Game
Theory
Econophysics
Computational
Economics
Swarm
Intelligence
6
Multiagent Objective Alignment
• Consider a large multiagent system where
− Each agent has a private objective it is trying to optimize; and
− There is a system objective function measuring the full system’s
performance
• Key Questions:
− How to set agent objective functions?
− How to update them (team formation)?
− How to modify them to changing objectives (reconfiguration)?
− What happened when agents can’t compute those objectives?
− What happens when information is missing?
− What happens when some agents start to fail?
Analogy: A company
• • • • Full System
System objective
Agents
Agent objectives
Company
Valuation of company
Employees
Compensation packages
7
Analogy: A company
• • • • Full System
System objective
Agents
Agent objectives
Company
Valuation of company
Employees
Compensation packages
• Design problem (faced by the board):
− How to set/modify compensation packages (agent objectives) of
the employees to increase valuation of company (system objective)
o Salary
o bonuses
o Benefits
o Stock options
− Note: Board does not tell each individual what to do. They set the
“incentive packages” for employees (including the CEO).
Key Concepts for Coordinated MAS
• Factoredness: Degree to which an agent’s objective is
“aligned” with the system objective
– e.g. stock options are factored w.r.t. company valuation.
• Learnability: Based on sensitivity of an agent’s private objective
to changes in its state (signal-to-noise).
– e.g., performance bonuses increase learnability of agent’s objective
• Interesting question: If you could, would you want everyone’s
objective to be valuation of company?
– Factored, yes; but what about learnability?
8
Nomenclature
• • • • • z
zi
z-i
ci
z-i + ci
State of full system
State of agent i
State of full system, except agent i
Fixed vector (independent of agent i)
Full state with “counterfactual” agent i
• G(z)
Reward/Objective for full system
• gi(z)
Reward/Objective for agent i
Factoredness
Factoredness: Degree to which an agent’s objective function is
“aligned” with the system objective
(z'−i = z−i )
€
For continuous states:
9
Factoredness
Factoredness: Degree to which an agent’s objective function is
“aligned” with the system objective
(z'−i = z−i )
Fgi=
Actions of i that improve/deteriorate gi AND G
All actions of i
€
gi(z)
G(z)
Full
High
Low
Zero
Anti
Learnability
Learnability: Degree to which an agent’s objective function is sensitive to its
own actions, as opposed to the “background” noise of other agents’ actions
10
Learnability
Learnability: Degree to which an agent’s objective function is sensitive to its
own actions, as opposed to the “background” noise of other agents’ actions
Lgi
=
Change in gi as a result of i’s actions
Change in gi as a result of other agents’ actions
gi(z)
G(z)
Low Learnability
High Learnability
High Learnability
Properties
gi(z)
G(z)
High Factoredness
Low Learnability
Low Factoredness
High Learnability
High Factoredness
High Learnability
11
General Solution
• To get agent objective with high factoredness and learnability, start with:
gi (z) = G(z) − G(z−i +c i )
• €
gi is aligned with G
G(z-i+ci) is independent of i
gi has cleaner signal than G
G(z-i+ci) removes noise
If g, G differentiable, then:
∂G(z−i + c i )
=0
∂zi
∂gi (z) ∂G(z)
=
∂zi
∂zi
€
€
General Solution
• Two examples for ci :
• ci = 0
gi (z) = G(z) − G(z−i )
“world without me”
• ci = ai
€
gi (z) = G(z) − G(z−i + ai )
“world with average me”
€
12
Research Issues:
• In general agents may not be able to compute g:
− Limited Observability
− Restricted Communication
− Temporal separation
− Spatial separation
− Limited Computation
• Solutions:
− Estimate missing information
− Leverage local information
− Approximate G or z
− Trade-off factoredness vs. learnability
Multiagent Systems
Lecture 2
Introduction to Multiagent Systems
System Characteristics for Coordination:
Objective Alignment in Multiagent Systems
Difference Objectives:
Design and Visualization
Kagan Tumer, [email protected]
13
General Solution
• To get agent objectives with high factoredness and learnability, start with:
gi (z) = G(z) − G(z−i +c i )
• €
If g, G differentiable, then:
gi is aligned with G
G(z-i+ci) is independent of i
gi has cleaner signal than G
G(z-i+ci) removes noise
∂G(z−i + c i )
=0
∂zi
∂gi (z) ∂G(z)
=
∂zi
∂zi
€
€
Thinking in terms of Multiagent Systems
• Distributed system can be designed from the ground up:
− Routing data in a network.
− Each router gets a new goal (not shortest path routing)
• Distributed system kept as is, with a MAS “floating” on top:
− Data download from a constellation of satellites.
− A fixed algorithm controls the download of data.
− The MAS simply sets “ghost” traffic along the links, modifying how the
algorithm perceives the system.
• Non-distributed system viewed as a MAS:
− A traditional optimization algorithm (e.g., simulated annealing)
− Variables viewed as agents.
14
Example 1: Rover Coordination
Rovers
• Rovers observe points of interest (POIs)
– POIs vary in value, time and place
– Get more information closer to POI
– Only primary observation counts
• Learning problem
Low Valued
POIs
High Valued
POI
– Rovers learn in single trial (non-episodic)
• Dynamic: POIs appear/disappear
– Rovers reset at regular intervals (episodic)
• Static: POIs the same in each episode
• Dynamic: POIs different in each episode
Objective Functions
Global
(Fully Factored, Low Learnability)
“Perfectly Learnable”
(Low Factoredness, ∞ Learnability)
Difference
(High Factoredness, High Learnability)
15
Agent State Projection
Points of Interest
Points of Interest
Sensor
Rover Sensor
Agent State Projection
Points of Interest
Sensor
Rover Sensor
More Rovers
Points of Interest
More POIs
Σ
X Coord
16
Agent State Projection
Factoredness Computation
Points of Interest
Sensor
Y Coord
Rover Sensor
Σ
More Rovers
Points of Interest
More POIs
Σ
X Coord
Analyzed Rewards
• Pi : Sum of POI values observed by agent i
• Gi : Sum of POI values observed by all agents
• Di : Sum of POI values observed by agent i that would
have gone unobserved by other agents
• Di(PO): Di with rovers communication restricted to
distance they can travel in one step (3% of space)
17
Dynamic Environments
Project to Problem Domain
Pi
Di (PO)
18
Project to Problem Domain
Di (PO)
Pi
Example 2: Constellation of Satellites
• Problem:
− A set of satellites receives data faster than they can download (eg., in
orbit around Earth, or for that matter Mars)
− Central control difficult (e.g., too many, communication delays)
− G given by:
∑w
Gt = 1 −
l
j ijt
ij
∑w
ij
j
y ijt
o lijt : amount of data of importance j lost at satellite i at time t
o yijt : amount of data of importance j introduced at satellite i at time t
o Wj : importance of data j
19
satellites
Constellation of Satellites: Approach
• Adaptively route data to minimize importance weighted data loss:
− Devise a baseline algorithm. For example, use shortest path-like
algorithm that aims to maximize unused bandwidth.
• Now, “fool” the baseline algorithm by introducing “ghost” traffic
among the satellites
− Algorithm is the same, but its view of the world is distorted
• Agents sit on each link
• Agents set “ghost” traffic
• Important note: This approach does NOT depend on baseline
algorithm. It “floats” on top of it.
20
Constellations of Satellites Results
Constellations of Satellites Results
21