A proposed game-theoretic strategy for packet

A Proposed Game-Theoretic Model of
Cooperation between Nodes in a MANET
Jim Catt
ECE 695
Department of Electrical and Computer Engineering
Purdue School of Engineering and Technology
Spring 2006
Introduction and Motivation




In mobile ad hoc networks (MANET), nodes in the
network must provide some level of relay service to
other nodes in the network to achieve optimal global
efficiency of network operation.
However, packet relay imposes a power cost on the
relaying node.
Since MANET nodes are often battery powered, this
is costly  shortens node lifetime.
The most rational local strategy for each node is not
to cooperate and only transmit its own packets
Introduction and Motivation

If all nodes adopt this locally rational strategy,
network connectedness drops to zero.





All nodes lose in this case – nodal utility drops to zero
Yet, if each node cooperates, there is the possibility
to maximize the utility of all nodes.
This is a classical game theory scenario
Game theory has been utilized to analyze several
aspects of MANET operation
This project is restricted to analysis of cooperation
Objective

The objective of this work is to develop a practical
game-theoretic model of nodal cooperation that uses
measurable, realistic parameters to make strategy
choices, and when combined with feasible protocol
modifications, can be reasonably implemented in
MANET nodes.
Prisoner’s Dilemma


The Prisoner’s Dilemma is often used as pedagogic
example of game theory
Preliminaries




Player – an entity with preferences
Strategy – A set of actions available to a player, in response
to the strategy of other players
Outcome – The result of complete set of strategic choices
by all players in the game
Utility - the amount of welfare a player derives from an
outcome (or strategy)


Often expressed as a utility function, a mathematical
mapping of the welfare received by the player from an
outcome.
Payoff – Usually formulated as: p = utility - cost
Prisoner’s Dilemma




The Prisoner’s Dilemma scenario:
Two people are arrested for armed robbery
Not enough evidence to convict for armed robbery, but enough
to convict for theft of getaway car
Each prisoner is given the following choices:




You confess and implicate your partner, but your partner doesn’t
confess, you go free, she gets ten years in prison
If you both confess, both get 5 years in prison
If neither confesses, both get 2 years for auto theft.
Utility (payoff) mapping:




Go free  4
2 years  3
5 years  2
10 years  0
Prisoner’s Dilemma

The game can be represented in strategic form by a
matrix:
Prisoner 2
Prisoner 1


Defect
(Confess)
Cooperate
(Refuse)
Defect
(Confess)
2,2
4,0
Cooperate
(Refuse)
0,4
3,3
The prisoners are separated and cannot communicate.
What will they decide?
Prisoner’s Dilemma



Consider one prisoner at a time
For a specific strategy – either defect or cooperate –
there are two possible payoffs
Which strategy offers the best set of potential
payoffs? Or, equivalently, which strategy maximizes
the minimum payoff?
Prisoner 2
Prisoner 1
Defect
(Confess)
Cooperate
(Refuse)
Defect
(Confess)
2,2
4,0
Cooperate
(Refuse)
0,4
3,3
Prisoner’s Dilemma


(Defect, Defect) is an equilibrium solution to the game (Nash
Equilibrium)
However, this clearly isn’t the optimal solution, which is (Cooperate,
Cooperate).
Prisoner 2
Prisoner 1

Defect
(Confess)
Cooperate
(Refuse)
Defect
(Confess)
2,2
4,0
Cooperate
(Refuse)
0,4
3,3
Hence, a Nash equilibrium isn’t necessarily an optimal solution to a
game !!!
Strategies

Types of strategies:


Pure Strategy – a player chooses to play a certain strategy
with probability 1. Usually only encountered in games of
perfect information.
Mixed Strategy – a player has a set of strategies to choose
from. A probability distribution describes the likelihood that a
particular strategy will be chosen.
Game Theory and Cooperation in MANETs

Classical game theory models for cooperation in
MANETs:






economic payment model
punishment/reward model.
Regardless of model, there is little consistency in the
formulation of utility functions.
Many formulations employ abstractions for utilities
and costs (less practical)
Some are based on some energy measure (more
practical).
Many require extraordinary overhead in the exchange
of information between nodes
Proposed Approach



Premise: the basic resource available to a node is its
lifetime store of energy  battery life.
This resource is available to be consumed for either
computational functions or information exchange
functions, both part of “mission” execution
Node behavior  obtain a balance between:


achieving maximum lifetime
executing its mission.
Proposed Approach: Ground Rules




Sending and receiving packets requires cooperation.
Payment is in-kind (punishment/reward framework)
Payoff should be proportional to the benefit received.
Cost for cooperation:


decrease in potential lifetime, or
alternately, lost opportunity to transmit own packets in the
future.
Problem Formulation


Dual objectives :

Maximization of the lifetime function

Subject to maintaining reward (R)  0.
Assumptions and conventions

Slotted communication intervals of fixed length

Packet length L is fixed for this study.

Data (symbol) rate Rb is fixed for this study.

One packet time = Tp = L/Rb.
Assumptions and Conventions (cont.)



On average, a node is connected to two or more adjacent
nodes
nodes are uniformly distributed throughout the region of
interest, and
The average mobility of the network is sufficiently high such
that no node is confined to an edge or border region for
long periods of time
Restrictions




Only selfish nodes are considered, not malicious
nodes
The proposed approach is for steady state conditions.
Modification for startup conditions requires further
study.
Energy consumption associated with packet reception
is ignored because even a selfish node will listen for
its own packets.
Playing the Game

A node has a relay buffer and own buffer.

At each slot time, a node plays a mixed strategy, and may
choose from the following action set:

Neither transmit nor relay

Transmit its own packet, given a packet is available in
its own buffer

Relay a received packet, given that a packet is
available in its relay buffer.

For this version of the game, the node will not transmit if:
 both its own buffer and its relay buffer is empty.
 either sending its own packet or relaying a packet causes
its cumulative payoff to be negative for the current slot
time
Playing the Game





PR = probability that node i relays a packet.
PO = probability that node i sends its own packet.
R = payoff received by node i when it relays a packet
O = payoff received by node i when it sends its own packet
The expected payoff (reward) for node i, is:
R  PR   ri  PO   oi

A rational node will act to maintain cumulative R  0. Or:
PO  R

PR  O

Equality with zero is allowed because temporarily, the only
strategy available to node i may cause R = 0.
Definitions

Definitions

Total available energy at t=0 is ET.

k = 1,2…,N, the number of packets relayed by node i for other
nodes

m = 1,2…,M the number of own packets transmitted by node i.

The total number of relay nodes (end-to-end) required for node
i’s m-th packet, is a random variable, hmi

j = 0,1,2…,J, set of links to adjacent nodes

The power used to transmit the m-th packet over the j-th link is a
random variable denoted by: W j
m

The energy used to transmit the m-th packet over the j-th link is
given by:
 L
Emj  Wmj  Tp  Wmj   
 Rb 

Denote relay energy as Er, and energy used to transmit own
packet as Eo.
Energy usage function


Average CPU power is Wcpu.
At time t, the total energy remaining for node i is:
N
M
k 1
m 1
E k , m, t   ET  Wcpu  t   Erk  Eom
N
M
k 1
m 1
 ET  Wcpu  t   T p  Wrk  T p  Wom
Lifetime function


Tmax
The maximum possible lifetime is:
Maximum remaining lifetime at time t is:
Tp
E (k , m, t )
Tmax  ,   
 Tmax  t 
Wcpu
Wcpu
 Tmax
M
N

 t  T p    k    m 
m 1
 k 1

where
Wrk
Wom
k 
, m 
Wcpu
Wcpu
ET

Wcpu
M
N

  Wrk   Wom 
m 1
 k 1

Payoff functions
•Payoff = utility - cost.
1  hk 1 
 h,     l    k 1  uoi
 Erk 1 
i
o
 1  hm 1 
i
 h,    



u
m 1
o
p 
h

Eo
m 1 
 m 1
i
r
Constructing PR and PO



PR and PO give the strategy rule that can be used by the
node to pick its strategy at each slot time.
PR and PO should be proportional to the payoffs received by
node i, and the level of cooperation received by node i.
Define V as a measure of the relationship between the
payoffs, or, the ratio of the absolute values of the payoffs:
O
V
,
R
R 

O
V
The expected payoff R becomes :
R  PO   O  PR 
or , V  PO  PR
O
V
 0,  O  0,  R  0
Constructing PR and PO

Define the following events:
 AQR = the event that there is a packet in the relay buffer
 AQO = the event that there is a packet in own transmit
buffer
 AR = the event that a packet is relayed
 AO = the event that own packet is transmitted
 AT = the event that a packet is transmitted, either a
relayed packet or own packet
 ARS = the event that a relayed packet successfully
reaches its destination
 AOS = the event that the node’s own packet successfully
reaches its destination
Constructing PR and PO

Assertions:
AQO
ARS  AR  AQR ,
AOS  AO  AQO ,
AT  AR  AO ,
AOS
AO
AR  AO  0,
P( AO | AT )  P( AR | AT )  1




The relevant event space is AT = (AR U AO)
 PO = P(AO|AT), and PR = P(AR|AT)
 PO + PR = 1
From V  PO  PR , we get PR 
V
V 1
AQR
ARS
AR
Constructing PR and PO


The cooperation experienced by node i for relay of its own
packets is P(AOS|AO).
Define the weighted payoff, O’ and weighted V’:
 O'  P( AOS | AO )   O ,
and V ' 


P ( AOS | AO )   O
R
V'
 PR  '
V 1
 as P(AOS|AO)  0, V’0, PO1, PR0.
as P(AOS|AO)  1, V’, PO, and PR all approach equilibrium
values
Strategy Rule parameters
•Define β as an estimate of P(AOS|AO):

own packets reaching destination

own packets sent
•Define an estimate of PR:
PˆR 
 packets relayed
 packets relayed   own packets transmitted
•update each parameter prior to each new slot time
Strategy Rule parameters
•Define the cumulative reward up to the current slot time:
N
M
k 1
m 1
RC    R ,k      O ,m
• Define the candidate updates for RC:
RO  RC     O ,m1 ,
•Define :


RR  RC   R ,k 1
 O ,m 1
 R ,k 1
 O ,m 1
 R ,k 1
1
V'

V '1
Strategy Rule algorithm

If AQR=1, calculate R,k+1 and RR.

If AQO=1, calculate O,m+1 and RO.



if (AQO=1 & AQR=0),

if O > 0, then AO=1 (send own packet), else do nothing
else if (AQO = 0 & AQR=1),

if RR >= 0, then AR=1 (accept relay request), else do nothing
else if (AQO = 1 & AQR=1),
if

else if RR >= 0, then AR=1 (accept relay request)

else if O >0, then AO=1 (send own packet)



V'
PˆR 
V '1

else do nothing
end
update β, PR and RC.
then AR=0 (reject relay), and if O >0, AO=1 (send own packet), else do nothing
Strategy Rule algorithm

This algorithm can be applied on a global basis (no
discrimination between nodes requesting relays) or
on a node-by-node basis (a β parameter is calculated
for each node).
Proposed Protocol Modifications for Own Packets

Routing Tables



For AODV, routing tables are modified to include all nodes on
the path to the destination. However, the current routing
method is still employed (i.e. next hop routing).
No change to DSR for path routing list
Furthermore, the routing table is modified by adding two
fields to hold values that are used to estimate cooperation
from other nodes.




NUM_PKT_OFFERED
NUM_PKT_ACCEPTED
These fields can be used to estimate each node’s unique β if
distinguishing between nodes achieves better fairness.
Otherwise, when summed over all nodes, they can be used
to calculate a global β
Proposed Protocol Modifications for Own Packets

Transport protocol must support an ACK mechanism
in order to estimate P(AOS|AO)


A destination node k sends an ACK for each packet
successfully received from node i (i.e., use a wireless,
pseudo connection-oriented transport protocol)
To reduce overhead, an ACK could be applied to a block of
packets, where block size is adjustable
Implementation for Own Packets

When node i transmits its own packet to destination
node k:

If node j is an intermediate (relay) on the path to node k


If an ACK is received from node k,



NUM_PKT_OFFEREDj =+1.
NUM_PKT_ACCEPTEDj =+1
If ACK timer expires, execute normal transport protocol
congestion adaptation
If RERR is received for node k before ACK time out,

NUM_PKT_OFFEREDj =-1.
Summary

Developed payoff functions that include parameters
incorporating energy usage and cooperation level.




Can be calculated from available or reasonably measurable
information, or from minor modifications to protocol
Developed a stochastic decision rule based on
modified payoff functions, thereby taking into
account the influence on battery life and cooperation
Proposed minor protocol modification and routing
table modification that enable the strategy rule.
Developed an algorithm implementing the strategy
rule
Future Work

Formally verify that the proposed approach achieves a stable
and optimal or pseudo-optimal equilibrium.


Test the model using a network simulation tool to verify that:





it achieves optimality
it is stable
it is insensitive to noisy β and estimate of PR
the proposed protocol modifications are viable and do not add
unacceptable overhead cost.
Develop a better method to estimate P(AOS|AO), as the
estimator should take into account the impact of packet loss
due to congestion or noise, i.e., remove or reduce the influence
of these effects on β.


Alternately, prove that the proposed framework is Pareto-efficient.
β may also need smoothing to account for lag in feedback
Develop modifications to the model that take into account start
up conditions
References
[1] J. Eichberger, “Game Theory for Economists”, Academic Press, Inc., San Diego, 1993.
[2] Selwyn Yuen and Baochun Li, “Strategyproof Mechanisms towards Evolutionary Topology
Formation in Autonomous Networks,” IEEE.
[3] Haijin Yan and David Lowenthal, “Towards Cooperation Fairness in Mobile Ad Hoc Networks,”
IEEE, WCNC 2005, pp. 2143-2148.
[4] V. Srinivasan, P. Nuggehalli, C.F. Chiasserini, R.R. Rao,”Cooperation in Wireless Ad Hoc
Networks,” IEEE Infocom 2003.
[5] M. Felegyhazi, J-P. Hubaux, L. Buttyan,”Nash Equilibria of Packet Forwarding Strategies in
Wireless Ad Hoc Networks,” IEEE Transactions on Mobile Computing, Vol. 5, No. 5, May
2006.
[6] L. DaSilva and V. Srivastava, “Node Participation in Ad Hoc and Peer-to-Peer Networks: A
Game-Theoretic Formulation,” Dept. of Electrical and Computer Engineering, Virginia Tech.
University.
[7] V. Srivastava, J. Neel, A.B. MacKenzie, R. Menon, L.A. DaSilva, J.E. Hicks, J.H. Reed, R.P.
Gilles,”Using Game Theory to Analyze Wireless Ad Hoc Networks,” Mobile and Portable
Radio Research Group, Virginia Tech. University.
[8] K. Chen and K. Nahrstedt,”iPass: an Incentive Compatible Auction Scheme to Enable Packet
Forwarding Service in MANET,” IEEE ICDCS 2004.
[9] A.B. MacKenzie and S.B. Wicker, “Game Theory and the Design of Self-Configuring, Adaptive
Wireless Networks,” IEEE Communications Magazine, November 2001.
[10] P. Michiardi and R. Molva,”A Game Theoretic Approach to Evaluate Cooperation
Enforcement Mechanisms in Mobile Ad hoc Networks,” Institut Eurecom, Sophia-Antipolis,
Fr.
Backup
Utility functions

The utility function for a node transmitting its own packet is:
1  hk
1  hk
uok h,W  

j
Eok
Tp Wokj

Utility has units of hops per joule. Maximizing utility with
regard to resource usage also maximizes remaining lifetime.
Utility associated with relaying a packet


When node i relays a packet for node j, it should receive a
benefit (utility) that is proportional to the utility accrued to
node j.
Let hj be the total number of relay nodes required for j’’s
packet. Node i’’s share of the utility accrued to j is:
1 1 h j
 j
l 
Erk 1  h



where
Erkl1  relay energy over link l for the k  1 packet
Cost functions


The cost incurred by node i for either transmitting its own
packet or relaying a packet is the incremental decrease in its
potential future utility.
The incremental cost in lifetime for relaying a packet is:
tlife 
E (k  1, m, t )  E (k , m, t )
Wcpu
Erk j1

 T p   k 1
Wcpu
where
Wrk j1
 k 1 
Wcpu
Cost functions

Likewise, the incremental cost in lifetime for transmitting
own packet is:
tlife  Tp   m 1
where
j
m 1
Wo
 m1 
Wcpu

Let u oi be the average utility received by node i in one
packet time as a result of transmitting one of its own
packets. Then, the incremental utility cost to node i when it
relays a packet is proportional to the incremental cost in
lifetime:
 k 1  uoi
Cost functions

Likewise, the incremental utility cost to node i for transmitting
its own packet is:
 m 1  uoi