Game Theoretic Approaches for Spectrum Sharing in

Game Theoretic Approaches for Spectrum Sharing
in Cognitive Radio Networks: A Survey
Manoj C∗ and Amrita Mishra†
Depatment of Electrical Engineering, Indian Institute of Technology Kanpur
Kanpur, Uttar Pradesh, India.
Email: ∗ [email protected], † [email protected]
Abstract—Cognitive Radio is gaining popularity and acceptance all over the world as an efficient way to utilise the limited
wireless spectrum resources. Out of the many design requirements of Cognitive Radio such as spectrum sensing, spectrum
allocation, power control etc., spectrum sharing is one of main
challenges to ensure peaceful co-existence of the primary and
secondary users. Since multiple users compete to get maximum
spectrum resources for themselves, Game Theory is an efficient
framework to design robust, stable and scalable spectrum sharing
schemes. In this paper, we discuss how the concepts of game
theory can be exploited to design spectrum sharing protocols.
Game theoretic models such as Cournot model, Bertrand model,
Auction based approach for spectrum sharing, which design
the spectrum sharing scenario from different aspects are also
presented. Further, the paper throws light on the following
scenario in which users competing for spectrum resources may
have no incentive to co-operate and may also exchange false
information about the channel conditions to gain more access to
spectrum. Cheat-proof strategies are thus developed to maintain
the efficiency of spectrum usage. Finally, this paper puts forth
a comparative study between the various models along with
research challenges and future directions in game theoretic
modelling.
Index Terms—Cognitive Radio, Spectrum Sharing, Game Theory.
I. I NTRODUCTION
Cognitive radio is gaining acceptance worldwide as the
next breakthrough technology in wireless services for efficient
utilisation of the available spectrum and thus providing faster
and reliable communication [1], [2]. It stands out from the
usual wireless networks in a sense that the users have to intelligently and dynamically change their operating parameters with
change of their immediate surroundings. This ability of the
radio transceiver enables the frequency spectrum to be shared
among primary (licensed) and secondary (unlicensed) users.
Among the various design requirements of a cognitive radio
setup, dynamic spectrum sharing among the various users
forms an important aspect. It requires that the performance of
the primary user shouldnt degrade due to the opportunistic and
selfish behaviour of the malicious users. The final objective is
to design a robust spectrum sharing scheme to ensure peaceful
functioning of the primary as well as the secondary users.
Users in a cognitive radio network are intelligent and
observe, learn to enhance their performance. In traditional
spectrum sharing policies we assume that all users cooperate
unconditionally in a static environment. However, in cognitive
radio scenario if the users work towards different goals, e.g.,
to compete for an open unlicensed band, fully cooperative
behaviour cannot be taken for granted. Instead, users will only
cooperate with others if cooperation can bring them more
benefit. The necessity of the users to change and adapt is
further invigorated due to the various changes in the radio
environment.
This intelligent behaviour of the users compels researchers
to take the help of Game Theory to analyse the cognitive interaction processes [3]. Game theory is a mathematical framework used to analyse the strategic decisions made by multiple
decision makers. Different game models (e.g. non cooperative/cooperative, static/dynamic, and complete/incomplete
information games) have been deployed to model and study
the user behaviours in different scenarios [4]. The common
aim of these models is to improve the network performance
such as throughput maximization, resource consumption, QoS
(Quality of Service) given the self-interest of the participating
users.
There are many advantages of studying cognitive radio
networks in a game theoretic framework. First, by modelling
dynamic spectrum sharing among network users (primary and
secondary users) as games, users behaviours and actions can
be analyzed in a formalized game structure and the theoretical
advancements in game theory can be useful in providing the
upper bounds. Second, the optimization of spectrum usage
is generally a multi-objective optimization problem, which
is very difficult to analyze and solve. Game theory provides us with well defined equilibrium criteria to measure
game optimality under various game settings. Third, noncooperative game theory, one of the most important branches
of game theory, enables us to derive efficient approaches for
dynamic spectrum sharing using only local information. Such
approaches become highly desirable when centralized control
is not available [5].
The organisation of the paper is as follows. We will discuss
the basic concepts of Game Theory in Section II. We then
present the various game theoretic models in spectrum sharing
along with their simulation results: Cournot model, Bertrand
model, Auction based model and the model incorporating
cheat-proof strategy in Section III, IV, V and VI respectively.
Finally we conclude in Section VII by putting forth the
research challenges and future directions in game theoretic
modelling along with a comparison between the various models for spectrum sharing.
II. BASICS OF G AME T HEORY
Game theory is a bag of analytical tools designed to help
us understand the phenomena that we observe when decisionmakers interact [6]. Game theory provides a mathematical
model that describes the interaction between various agents
which tries to maxmize their payoff. A game is a description
of strategic interaction that includes the constraints on the
actions that the players can take and the players’ interests,
but does not specify the actions that the players do take. The
three basic components in a game are the set of players, set
of actions available for each player and set of preferences for
each player.
A. Terminologies
A player is the decision makers in the game. In cognitive
radio scenario, the players are the wireless nodes. Set of
actions are the set of alternatives available to each player.
At any instant, the player must choose an element from the
subset of the set of actions. In general, the set of actions can
be different for different players. In cognitive radio, the set of
actions can be the choice of modulation scheme, coding rate,
protocol, flow control parameter, transmit power level, or any
other factor that is under the control of the node [7]. When
each player chooses an action, the resulting “action profile”
determines the outcome of the game. We also assume that
the player, when presented with any pair of actions, knows
which of the pair he/she prefers or knows that she regards
both actions as equally desirable [8]. Usually, preferences are
given by defining a utility function. A higher value of utility
function means that the outcome is more desirable compared
to an outcome with lesser utility function. The utility is also
called as pay-off of the user.
Game theory assumes that the players are rational. This
means that the player tries to maximize his/her profit irrespective of what other players are doing.
B. Strategic form game and Nash Equilibrium
A strategic game is a model of interactive decision-making
in which each decision-maker chooses his plan of action once
and for all. Also, these choices are made simultaneously.
The Nash equilibrium is a joint strategy where no player
can increase her utility by unilaterally deviating. It is the
best response of a player to the best response of every other
players. The best response of a player is the action that gives
him maximum pay-off, given that all other player’s strategies
remain unchanged. Mathematical model of Nash equilibrium
is given below.
A game consists of a finite set of players N = 1, 2, ..., N .
Each of the players i ∈ N selects a strategy si ∈ Si
with the objective of maximizing the utility ui . The strategy
profile s is the vector containing the strategies of all players
s = (si )i∈N = (s1 , s2 , ..., sN ). the collective strategies of all
players except player i is denoted by s–i . Strategy s ∈ S is a
Nash equilibrium if ui (s) ≥ ui (s0i , si ) ∀s0i ∈ Si ,∀i ∈ N .
Nash equilibrium corresponds to a steady state. If, whenever
the game is played, the action profile is the same Nash
equilibrium, then no player has a reason to choose any action
different from the current action; there is no incentive for the
player to change.
C. Cooperative and Non-Cooperative Games
A cooperative game in game theory is one where players
form groups or coalitions and these coalitions enforce cooperative behaviour. Here, the game is a competition between
coalitions of players, rather than between individual players. A
non-cooperative game is one in which players make decisions
independently, without coordination with other players and
each players have their own objectives towards which they
move. In this case, the players see only their own payoff and
they dont consider the payoff others or the whole system.
Thus, while they may be able to cooperate, any cooperation
must be self-enforcing.
In a cognitive radio framework, each user usually makes
its own decisions (possibly relying also on the information
collected from other users). These decisions may be dominated
by the rules of the operating protocol, but ultimately each user
has some freedom in setting parameters or changing its own
mode of operation. These users are autonomous agents, taking
their own decisions about transmit power, packet forwarding
etc. The users can exhibit three kinds of behaviour:
1) Users may work towards overall good of the entire
network community as whole.
2) In some cases, the same users may behave selfishly,
looking out for only their own interests.
3) Finally users may behave maliciously, seeking to ruin
network performance of other users.
Game theory can be applied in all the three cases.
III. C OURNOT M ODEL OF S PECTRUM S HARING
A. What is a Cournot game
Spectrum sharing problem is formulated as an oligopoly
market [9]. An oligopoly market is one in which few firms
compete with each other in terms of amount of commodity
supplied to the market to maximise the profit. In the above
spectrum sharing context, the SUs are analogous to the firms
who compete for the spectrum offered by the PU. The cost of
the spectrum is determined by using a pricing function by the
PU. A Cournot game is used to analyze this situation and the
Nash equilibrium (NE) is considered as the solution of this
game. The main objective of this Cournot game formulation
is to maximize the profit of all SUs based on the equilibrium
adopted by all SUs.
B. System Model
We consider a wireless system with a PU and multiple
SUs (i.e., total number of SUs is denoted by N ) who want
to share the spectrum allocated to the PU. The PU shares
some portion of the spectrum bi with secondary user i. The
primary user charges the secondary user for the spectrum at
a rate of c(b) per unit bandwidth, where b is the amount of
available bandwidth that can be shared. The SUs transmit in
the allocated spectrum using adaptive modulation to enhance
their transmission performance. The revenue of secondary user
i is denoted by ri per unit of achievable transmission rate. The
spectral efficiency of the transmission for secondary user i can
be obtained from
ki = log2 (1 + Kγi )
where
(1)
1.5
K=
ln 0.2/BERitar
(2)
We assume that the received SNR information is available
at the transmitting end through channel estimation
C. Spectrum Sharing Scheme
We discuss the static and the dynamic Cournot game. A
static game model is presented in an ideal case in which all
the SUs can observe the strategies and payoffs of all other SUs.
The dynamic case is however a practical version in which the
information of SUs are not known to a particular SU. The SU
observes change in payoff due to different charging price of
the PU and adapts its strategy accordingly.
1) Static Cournot Game: The players (i.e., firms in the
oligopoly market) in this game are the SUs. The strategy of
each of the players corresponds to the allocated spectrum size
(denoted by bi for SU i) which is non-negative. The payoff
for each player is the profit (i.e., revenue-cost) of secondary
user i (denoted by pi ). The pricing function used by the PU
for charging is given by

τ
X
c(B) = x + y 
bj 
(3)
j
of one player given others strategies [8]. The best response
function of secondary user i given the allocated spectrum size
of other secondary users bj , where j 6= i, is defined as follows
BRi (B−i ) = arg max pi (B−i ∪ {bi })
bi
The set B ∗ = {b∗1 . . . , b∗N } denotes the Nash equilibrium of
this game if and only if
∗
b∗i = BRi (B−i
),
∀i
pi (B) = ri ki bi − bi c(B)
(4)
We assume that the guard band used to separate the spectrum allocated to different users is fixed and small. Then, the
profit can be rewritten as follows


τ 
X
(5)
pi (B) = ri ki bi − bi x + y 
bj  
j
Let B−i = {b1 , . . . , bi1 , bi+1 , . . . , bN } denote the set of
strategies adopted by all except secondary user i such that
B = B−i {∪bi }. As the optimal allocated spectrum of one user
depends on the strategies of all other users, NE is considered
to be the solution of the game to ensure that all the secondary
users are satisfied with the solution. The NE is obtained by
using the best response function which is the best strategy
(7)
∗
where B−i
denotes the set of best responses of secondary users
j for j 6= i. We formulate an optimization problem with the
objective defined as follows:
Minimize :
N
X
|bi − BRi (B−i )|
(8)
i=1
i.e., we want to minimize the difference between decision
variables bi and the corresponding best response function.
The minimum value of the objective function is zero if the
algorithm reaches the NE.
2) Dynamic Cournot Game: Here the NE of every SU
is obtained by interaction with the PU only. Thus, each
SU communicates with the PU to obtain the differentiated
pricing function for different strategies. The adjustment of the
allocated spectrum size can be modelled as a repeated Cournot
game as:
bi (t + 1) = bi (t) + αi bi (t)
where x, y, and z are non-negative constants, τ ≥ 1 (so
that this pricing function is convex), and B denotes the set of
strategies of all secondary users (i.e., B = {b1 , . . . bN }). Let
w denote thePworth of the spectrum for the PU. The condition
c(B) > w× j bj is necessary to ensure that the PU is willing
to share spectrum of size b with the SUs. The PU charges all
of the SUs the same price. The revenue of the SU i can be
obtained from ri ×ki ×bi , while the cost of spectrum allocation
is bi c(b). The profit of the user i can be obtained as
(6)
∂pi (B)
∂bi (t)
(9)
where bi (t + 1) is the allocated spectrum size at time t, αi is
the speed adjustment parameter (i.e., learning rate) of SU i.
D. Simulation Results
We consider a cognitive radio environment with one PU and
two SUs sharing a spectrum of 15 MHz. The target BER for
both the SUs is BERitar = 10−4 . For the pricing function
of PU, we use x = 0 and y = 1, while τ is adjusted based
on the evaluation scenario (e.g., τ = 1.0 ), and the worth of
spectrum for PU is w = 1. The revenue of a SU per unit
transmission rate is ri = 10 ∀i ∈ I. We also assume that the
SNR information γi is available to all SUs through channel
estimation [10].
Fig. 2 shows the best response of both SU’s in the static
Cournot game. The best response of each SU is a linear
function of the other user’s strategy. The Nash equilibrium
is located at the point at which the best responses intersect.
We also observe that under different channel qualities, the
Nash equilibrium is located at the different places. Also, the
trajectory of spectrum sharing in the dynamic Cournot game
is shown for the case of α1 = α2 = 0.14. We again observe
that with the same speed adjustment parameter, better channel
quality results in more fluctuations in the trajectory to the NE.
estimation, the secondary users can obtain the received SNR
of the channel. For the secondary user i, given the received
SNR , targetBERtari (Target BER) and assigned spectrum
Bi , the transmission rate (in bits per second) can be obtained.
B. Bandwidth Auction
Fig. 1.
System Model for Spectrum Sharing.
The problem of spectrum sharing is formulated as an
auction in which the secondary users (SUs) make bids for
the bandwidth allocated to the primary user (PU). An auction
with relatively simple rules is proposed below to characterize
the behaviour of interaction between primary user and multiple
secondary users.
1) Information: Each SU i knows its revenue ri per unit
of achievable transmission rate, and it also knows its spectral
efficiency ki . ri relates to the QoS in a real network The PU
announces a positive reserve bid β > 0 and the price p > 0
to all SUs before the auction starts.
2) Bids: The SU i submits a bid bi (0 ≤ bi ≤ Btot ) which
generally represents the maximum bandwidth that SU desires
for data transmission.
3) Allocation: The PU allocates bandwidth according to
(here we only consider the FDM scheme). The bandwidth once
allocated by the PU there is no contention among the SUs.
Bi = P
bi
Btot
bj + β
(10)
j∈I
4) Payments: SU i pays the PU
Ci = pθi bi
Fig. 2.
Best responses and trajectories of both SU’s to NE.
IV. S PECTRUM S HARING USING AUCTION BASED
A PPROACH
A. System Model
Let us consider a system where there is only one primary
user (PU) and a group I = (1, ..., I) of secondary users
(SUs) who want to share the spectrum allocated to the primary
user Btot (as shown in Figure 1). The primary user retains a
given amount of bandwidth Brem > Breq where Breq is the
bandwidth required to provide a quality of service requirement.
The primary user charges secondary users at a price of p per
unit bandwidth. After the allocation, the secondary users may
transmit in the allocated spectrum using adaptive modulation
to enhance the transmission performance. The revenue of the
secondary user i is denoted by ri per unit of achievable
transmission rate. The spectral efficiency of transmission for
the user i is denoted by ki . We assume that through channel
(11)
Where θi is an user dependent priority parameter. We adopt a
‘prepay’ mechanism in which the SU pays for the bandwidth
it bids instead of that which is assigned by the PU. The prepay
mechanism is a crucial part of the auction rules as it prevents
the SU from over-bidding the bandwidth since they pay for
their own bid. where θi is an user dependent priority parameter.
We adopt a ‘prepay’ mechanism in which the SU pays for the
bandwidth it bids instead of that which is assigned by the PU.
The prepay mechanism is a crucial part of the auction rules
as it prevents the SU from over-bidding the bandwidth since
they pay for their own bid.
A bidding profile is defined as the vector containing the SUs
bids, b = (b1 , ..., bI ). The bidding profile of SU is opponents
is defined as b−i = (b1 , ..., bi1 , bi+1 , ..., bI ), such that b =
(bi ; b−i ). Under the rule of this auction, we notice that bi ∈
∆
br = [0, Btot ] and the bidding profile b is constrained by
∆
bi ∈ br = {b|0 ≤ bi ≤ Btot ∀i ∈ I}
(12)
In this auction, a positive reserve bid β is used by the PU
to control the remaining portion of the spectrum for its own
usage. The PU sets β such that β ≥ Breq is satisfied.
Given the allocated bandwidth, the SU is revenue is given
by
Ri = ri ki Bi
(13)
The SU i chooses to bid bi which maximises its payoff
Ui (bi ; b−i ; p) = Ri[Bi (bi ; b−i )]Ci (bi , p)
(14)
The desirable outcome of an auction is Nash Equilibrium
(NE) which is a bidding profile b∗ such that no user wants to
deviate from it i.e
U i(bi ∗; b−i ∗; p)Ui (bi ; b−i ∗; p)∀i ∈ I, bi ∈ bR
(15)
We define SU is best response as
B(b−i ; p) = bi |bi = arg max b ∈ bR Ui (bi ; b−i ; p
(16)
This in general could be a set. A NE is also a fixed point
solution of all the best responses of the SUs. We state certain
properties of NE along with a dynamic updating algorithm to
reach the NE in a distributed fashion.
Theorem 1: There are two extreme prices pi and pi defined
as
ri ki Btot {(I − 1)Btot + β}
pi =
(17)
θi (IBtot + β)2
ri ki Btot
pi =
(18)
θi β
If p < pi , all the SUs would bid for the maximum bandwidth
allocated to the PU (i.e., bi = Btot ∀ i ∈ I); if p > pi , no SU
would be willing to use any of the spectrum offered by the
PU (i.e., bi = 0 ∀ i ∈ I).
Theorem 2: There is a unique NE for the bids of the SU’s. In
addition, if p ∈ (pi , pi ), SU i’s unique best response function
is given as follows:
v
Btot
!
u
u
P


u
bj + β
ri ki Btot
u

X
t
j6
=
i


B(b−i , p) = 
−
bj + β 


pθi
j6=i


0
(19)
Fig. 3.
Fig. 4.
Region of values for stable Nash Equilibrium.
Nash Equilibrium of bid under different channel equalities.
where is [x]ba defined as
[x]ba = max {min {x, b} , a}
(20)
Theorem 3: If the unique NE is interior (Interior NE implies
that none of the participating users selects a strategy on the
boundary of his strategy space), then the bandwidth allocation
is fair.
C. Dynamic Updating Algorithm
In a practical cognitive radio scenario, the SUs may only be
able to observe the pricing and assignment information from
the primary user (PU), but not the strategies and payoffs of
other secondary users. Hence, we also investigate a distributed
algorithm for each SU to achieve Nash equilibrium based on
its interaction with the PU only. Here, each SU communicates
with the PU to obtain the price and different assignment
functions for different bids and updates its bid as follows:
∂Ui (b)
(21)
∂bi (t)
Where bi (t + 1) is the bid in terms of bandwidth at time t
and αi is the speed adjustment parameter of the SU i.
bi (t + 1) = bi (t)+ αi bi (t)
D. Simulation Results
A cognitive radio environment with one PU and two SUs
sharing a spectrum of Btot = 10 MHz is considered. The
target BER for both the SUs is BERtar i = 10−4 . The revenue
of a SU per unit transmission rate is ri = 10 ∀ i ∈ I. We also
assume that the SNR information γi is available to all SUs
through channel estimation. The PU sets the price p = 10 per
unit bandwidth and reserves bid β = 0.2 [11]
In Fig. 3, the regions indicated by arrows are the regions,
for which the spectrum sharing is stable and NE would be
reached else the sharing would be unstable and fluctuations
would occur.
In Fig. 4, we observe the adaptation of SUs bids under
different channel equalities. As expected SU 2 bids more
bandwidth and achieves higher revenue when its channel
quality becomes better. We also observe the dependence of
channel quality and bid of one user on the other user.
V. B ERTRAND G AME M ODEL
An oligopoly market can also be modeled as a Bertrand
Game where the firms fix their prices game theoretically. Here
at least two sellers producing homogeneous products compete
by setting prices simultaneously; buyers buy everything from
the firm with lower price. The solution of this Bertrand game
is the Nash Equilibrium. Here we apply Bertrand game model
to the problem of competitive spectrum pricing for dynamic
spectrum access [12]. In this model, a few primary services
compete to offer spectrum to a secondary service.
A. System Model
Consider a wireless environment with N primary services
operating on frequency bands Fi and a secondary service with
a group of secondary users. The primary service i serving Mi
local connections wants to sell part of the spectrum Fi at price
pi per unit bandwidth to the secondary user. The spectrum
demand depends on the data rate in the spectrum and the price
charged. The spectral efficiency of transmission by a secondary
user k is given by
k = log2 (1 + Kγ) where
1.5
K=
ln (0.2/BERtar )
where c1 and c2 denote the constant weights for the revenue
and cost functions respectively, Bireq is the bandwidth requirement of the primary connection, Wi is the size of spectrum, Mi
(p)
is the number of primary connections and ki is the spectral
efficiency of wireless transmission for primary service i. Based
on this model, a Bertrand game is formulated as
Players: Primary services
Strategy: Price per unit spectrum pi (non-negative)
Payoff: Profit Pi (Revenue minus cost) realized by selling
the spectrum to secondary user
Based on the spectrum demand, revenue and cost functions,
the profit of each primary firm is given by Pi (p) = bi pi +Ri −
Ci (bi ) where p = {p1 , . . . , pi , . . . , pN } is the set of prices
offered by all players in the game.
NE is obtained by using the fact that its the best strategy
of each player, given others’ strategies. The best response of
primary service i given the prices offered by other primary
services p−i (pi = p−i ∪ {pi }) is defined as
Bi (p−i ) = arg max Pi (p−i ∪ {pi })
pi
(22)
where γ is the SNR at the receiver and BERtar is the target
bit-error-rate.
B. Spectrum Pricing Competition
To quantify the spectrum demand, we consider the quadratic
utility function given by


N
N
N
X
X
X
X
1
(s)
bi bj  −
U(b) =
bi ki − 
b2i + 2ν
pi bi
2 i=1
i=1
i=1
i6=
(23)
where b is the set of size of spectrum shared by all primary
services. i.e., b = {b1 , . . . , bi , . . . , bN }, pi is the price offered
(s)
by primary service i, ki denotes the spectral efficiency
of transmission by a secondary user using the spectrum Fi
owned by primary service i. The spectrum substitutability
parameter ν represents the ability of the secondary user to
switch among the frequencies offered by the primary services.
ν = 0 means that the secondary user cannot switch to that
frequency spectrum while ν = 1 implies that the secondary
user can switch among the spectra freely.
The spectrum demand function Di (p) of spectrum Fi at
secondary service is obtained by differentiating U(b) with
respect to bi and equate it to zero. It is given by
P
(s)
(s)
(ki − pi )(ν(N − 2) + 1) − ν i6=j (kj − pj )
Di (p) =
(1 − ν) (ν (N − 1) + 1)
(24)
The cost function of the primary user is developed by
considering the degradation of QoS of the primary user. The
revenue function Ri and the cost function Ci are defined as
2
(p) Wi − bi
req
Ri = c1 Mi , Ci (bi ) = c2 Mi Bi − ki
(25)
Mi
(26)
p∗ {p∗1 , . . . p∗N } denotes the NE of this game if and only if
p∗i = Bi (p∗−i ),
∀i
(27)
where p∗−i denotes the best responses of all players expect
player i. We can obtain the NE by solving the equations ∂P∂pi (p)
i
for all i.
In cognitive radio situation the primary service will not be
able to observe the profit gained and strategy adopted by other
primary services. So, it has to decide its strategy from the
observed history. So, we go for a distributed price adjustment
algorithm which progressively reaches the NE.
Let pi [t] denote the price offered by primary service i at
timet. p[t] and p−i are defined similarly. We consider two
cases, first in which the strategies of other primary services
in previous iteration are known to all and the case in which
its not observable. In the first case, the price offered by the
primary service can be obtained from
pi [t + 1] = Bi (p−i [t]) ∀i
(28)
In the second case, the primary service has only local information and the spectrum demand. Using this, it adjusts its
prize in the direction that maximizes its profit as given by the
equation
∂Pi (p)
pi [t + 1] = pi [t] + αi
(29)
∂pi
where αi is the learning rate.
The first case has no control parameters and it is proved to
be stable [12] by considering the eigen values of the Jacobian
matrix. In the second case, the algorithm can be either stable
or unstable depending on the learning rate αi , number of local
connections Mi and spectrum substitutability factor ν.
C. Inefficiency of Nash Equilibrium
total profit for all primary services is given by
PThe
N
P
j=1 j (p). The optimal price for all primary services can
be obtained from
PN
∂ j=1 Pj (p)
=0
(30)
∂pi
The optimal values of pi obtained from this equation are different than those from NE. So, primary services may cooperate
to achieve higher profit. In a repeated game, the game is
played multiple times and the users can observe the outcome
of previous games. So, they will learn to cooperate. Since this
optimal price is not the NE, some primary users may deviate
unilaterally to increase their own profit. So, the optimum
pricing is not a stable equilibrium. As the optimal pricing is
desirable, it can be achieved by using a punishment mechanism
that punishes any user that deviate from the optimal price.
When a user deviates form the optimal pricing,the punishment
action is triggered and all the users switch to the NE state, from
which no player will deviate. We consider a trigger strategy
in which any primary service maintains the collusion as long
as other services agree to do so. But, if a primary service
deviates, a punishment action is triggered
A primary service usually gives a smaller weight to the
profit in the future stages than the profit in the current stage.
If the current profit is Pi , the profit in the next stage is of
worth δi Pi where δi is the weight. Let Pio , Pin and Pid denote
the profits of primary service i following optimal price, profit
by following price at NE and profit of deviating respectively.
The collusion will be maintained if the long-term profit by
adopting collusion is higher than that obtained by deviation.
Mathematically [12],
1
δi
P o ≥ Pid +
Pn
1 − δi i
1 − δi i
(31)
A lower bound on δi can be obtained from this as
δi ≥
Pid − Pio
Pid − Pin
(32)
Collusion will be maintained only where δi satisfies (32).
D. Performance Evaluation
We consider a cognitive radio environment with two primary
services and a secondary service. 20 MHz of frequency
spectrum available to each primary service. The number of
local connections at each primary service is 10. The target
BER of secondary service is BERtar = 10−4 . The bandwidth
requirement of the connections at each primary service is 2
Mbps ( Breq = 2), and c1 = c2 = 2. The channel quality
for the secondary service varies between 9 to 22 dB. The
spectrum substitutability factor lies between 0.1 to 0.6. For
the dynamic price adaptation algorithms, the initial prices are
set as p1 [0] = p2 [0] = 1.
If the primary services can observe each others strategies
(Case 1), the price converges to equilibrium price in a few iterations. But, if only the spectrum demand from the secondary
Fig. 5. Profit of each primary service at equilibrium under different channel
qualities of frequency spectrum offered by primary service one.
service is observable (Case 2), and the price is adjusted based
on this information, the speed of convergence depends on the
learning rate α. An optimum learning rate makes the algorithm
converges as fast as that of case 1. But, a larger learning rate
causes fluctuations in the price adaptation and the algorithm
requires large number of iterations to converge.
The profit of both the primary services at the Nash equilibrium is shown in Fig. 5. When the channel quality of spectrum
offered by primary service one is better, the spectrum demand
becomes higher. So, primary service one can increase the price
as well as the size of the offered spectrum share to gain higher
revenue. When primary service one gains a higher profit due to
larger demand, primary service two gains only a lower profit
due to smaller demand. The spectrum substitutability factor ν
also impacts the prices due to the different channel qualities.A
larger ν only slightly affects the price offered by primary
service one, the price offered by primary service two decreases
at a higher rate for a larger value of ν. A smaller value of ν
lowers the price offered by primary service one, the rate of
decrease in the price offered by service two has to be higher
to attract the secondary service. This is required to achieve the
highest profit given the channel qualities corresponding to the
spectrum offered by primary service one.
VI. C HEAT - P ROOF S TRATEGIES
In cognitive radio environment, users competing for the
open spectrum may have no incentive to cooperate with each
other. They may even exchange false private information like
channel conditions to get more access to the spectrum. So,
cheat-proof spectrum sharing schemes should be developed
to maintain the efficiency of the spectrum usage. So we use
mechanism design theory to make and provide incentives for
players to be honest [13]. We also make cheating unprofitable
by statistical approaches.
A. System Model
We consider a situation where K pairs of unlicensed users
coexist in the same area and compete for an unlicensed
spectrum band. The users trying to communicate with their
pair cause interference to other pairs. At time slot n, all pairs
try to occupy the spectrum and the received signal at the i-th
receiver yi [n] is be expressed as
yi [n] =
K
X
hji [n]xj [n] + wi [n],
i = 1, 2, . . . , K
(33)
j=1
where xj [n] is the transmitted information on j-th pair,
hji [n](j = 1, 2, . . . , K; i = 1, 2, . . . , K) represents the channel gain from j-th transmitter to the i-th receiver and wi [n]
is the white noise at i-th receiver. The transmission power of
i-th user bounded by PiM i.e., |xi [n]|2 ≤ PiM at all n.
Each user is selfish and they try to maximize their own
profit. This spectrum sharing game can be modelled as:
Players: K transmitter-receiver pairs
Strategy: Transmission power of each user pi in [0, PiM ]
Payoff: Ri (p1 , p2 , . . . , pK ), the gain of transmission
achieved by i-th player after the players have chosen the
power levels p1 , p2 , . . . , pK .
The averaged payoff of i-th player is given by
!
pi |hii |2
P
Ri (p1 , p2 , . . . , pK ) = log2 1 +
.
N0 + j6=i pj |hji |2
(34)
First,we consider a one-shot game in which the players
consider only the current payoff. Its proved in [13] that the
M
).
only Nash equilibrium of this game is (P1M , P2M , . . . , PK
The payoff at NE is given by
!
M
2
P
|h
|
ii
Pi
RiS (h1i , h2i , . . . , hKi ) = log2 1 +
N0 + j6=i PjM |hji |2
(35)
The superscript ‘S’ stands for selfish. This is the only possible
outcome of a one-shot game. All users transmitting at maximum power causes strong mutual interference to all users.
Spectrum sharing lasts over a long period of time. So,
everyone will be better off if they take turns and transmit.
Such a cooperation must be self-enforced. Now we consider
a repeated game which lasts over several turns. The players
view these rounds as a whole. The payoff is given by
Ui = (1 − δ)
+∞
X
δ n Ri [n]
(36)
n=0
where Ri [n] is the payoff of player i at time slot n. δ is the
discount factor as defined earlier.
If all the players follow some predetermined rules to share
the spectrum, higher expected one-slot payoff RiC (‘C’ stands
for cooperation, RiC > RiS ∀i = 1, 2, . . . K) can be achieved.
But, selfish players can take advantage of others by transmitting in the time slots not allotted to them. This gives a
payoff RiD (‘D’ stands for deviation). Cooperation is not a
stable equilibrium in the one-shot game,but it can be enforced
in a repeated game by the threat of punishment. We denote
the discounted payoff with deviation as UiD and that without
deviation as UiC . As δ → 1, UiD converges to RiS almost
surely and UiC converges to RiC almost surely.
Hence, cooperation exists only if UiC (= riC ) > UiD (= riS ).
i.e., all players are self-enforced to cooperate because of
punishment after deviation. We use a “punish-and-forgive”
strategy where the punishment state stays only for T − 1
time slots and cooperation resumes from T -th time slot. The
parameter T can be determined by analyzing the incentive
of the players. If the tendency to deviate is stronger, the
punishment should also be harsher to prevent deviation.
B. Cooperation with Optimal Detection
We assume that there is a common control channel over
which players can exchange information. Based on the information transmitted, the players decide who should transmit at
a given slot. Only one player transmits at a time. Each slot is
divided into three phases: in first phase, each player exchange
channel information with others; in second phase, players
decide whether to access the spectrum or not, according to
cooperation rule; in third slot, the eligible player transmits
data. During the third phase, the eligible player pauses transmission and ‘listens’ to the channel for some time to catch the
deviators. If the player finds any other player deviating, the
system is alerted into punishment mode.
We consider two cooperation rules: maximum total throughput criterion (MTT) which maximizes the sum of individual
payoffs and approximate proportional fairness (APF) criterion that maximizes their product. Punishment-based spectrum
sharing game provides incentive for players to be honest, as
deviation is deterred by the threat of punishment. Detection of
the deviating behavior is necessary for threat to be credible.
C. Cheat-Proof Strategies
The repeated game discussed above inherently assumes that
complete and perfect information is available. But, information
like the power constraints and channel gains are private
information of player. So, selfish players may provide false
information to get a higher payoff. Therefore, enforcing truthtelling is a crucial problem.
1) Mechanism-design-based strategy: Mechanism design
provides incentives for players to be honest. The players
claiming high values are asked to pay a tax and the amount
of the tax will increase as the claimed value increases. Some
monetary compensation is given to the players reporting low
values. Now, the spectrum sharing game becomes a new game
with original payoffs replaced by the overall payoffs which
includes the monetary transfers. The transfer function can be
designed such that the players get the highest payoff only
when they claim their true private values. With this transfer
functions, all players’ payment/income adds up to 0 at any
time slot. It means that the monetary transfer is exchanged
only within the community of cooperative players at any time.
This property is suitable for open spectrum sharing scenario.
2) Statistics-based strategy: For the APF rule, every player
reports the normalized channel gain and the player with the
highest reported value geta access to spectrum. The normalized
gains are exponentially distributed with mean 1. So, in the long
run, each player will have access to the spectrum 1/K of total
Fig. 6. The payoffs under a heterogeneous setting with different cooperation
rules.
slots. If player i occupies the spectrum more than (1/K + ε)
of the total time, where ε is a pre-determined threshold, it is
highly possible that the player has cheated. If a player is found
to transmit for more than (1/K + ε) of slots, that player will
be marked as cheater and get punished. In this way, the profit
of cheating is greatly limited.
D. Simulation Results
We consider a scenario with two players (K = 2) with same
maximum power constraint and same relative interference γ.
The players can gain more by cooperating than by being selfish. But, cooperation is unnecessary in cases when interference
is very less (γ ≈ 0). It was observed that payoff of cooperation
is higher than that of non cooperation for γ > 0.15.
Now we consider a heterogeneous environment where
players have different power constraints. We fix the power
constraint of player 1, and increase the power constraint of
player 2, P2M . The payoffs with the MTT and APF rules
are demonstrated in Fig. 6, where ‘1’ and ‘2’ refer to the
payoffs of player 1 and player 2, respectively. The payoffs
without cooperation and payoffs using the max-min fairness
criterion (denoted by “NOC” and “MMF” respectively) are
also shown for comparison. It can be seen from the figure that
both MTT and APF rules outperform the non-cooperation case.
This means players have the incentive to cooperate in both
rules. It was also seen that the payoff is maximized only if
the player honestly claims his/her true information. Therefore,
players are self enforced to tell the truth with this mechanism.
VII. C ONCLUSION
Although Game Theory has been extensively used in modelling the interactions between the users in a cognitive radio
network, yet it faces certain challenges too. Choosing a proper
pay-off function needn’t always result in a simple analysis for
the game theoretic model. As cognitive radio networks benefit
from technology evolution, the same technologies can also
be used by malicious users to launch more complicated and
unpredictable attacks. It is therefore wise to use the framework
of game theory judiciously.
Dynamic spectrum sharing is one of the key functions of
cognitive radio networks. In this paper, we initially discussed
the basics of game theory .Then we presented and elaborated
on the various game theoretic models namely Cournot, Auction based, Bertrand and Cheat-proof which can be applied
to the spectrum sharing scenario. Each model has been dealt
separately giving an extensive knowledge of how the problem
is formulated, what are the governing conditions imposed and
ultimately the equilibrium attained. Cournot model is the most
primitive form of modelling a spectrum sharing problem which
concludes that NE is the most desirable solution. Auction
based model is a novel way of modelling the spectrum
sharing problem with auction theory background. Bertrand
game moves a step ahead and shows the inefficiency of NE
and how to improve upon it. Cheat proof strategies throws
insight into mechanism design which seeks the players to be
honest. We have exhaustively analysed and presented existing
game theoretic models in the spectrum sharing scenario.
R EFERENCES
[1] S. Haykin. Cognitive radio: brain-empowered wireless communications.
Selected Areas in Communications, IEEE Journal on, 23(2):201 – 220,
Feb. 2005.
[2] Ian F. Akyildiz, Won-Yeol Lee, Mehmet C. Vuran, and Shantidev
Mohanty. Next generation/dynamic spectrum access/cognitive radio
wireless networks: A survey. Computer Networks, 50(13):2127 – 2159,
2006.
[3] Magnús M. Halldórsson, Joseph Y. Halpern, Li (Erran) Li, and Vahab S.
Mirrokni. On spectrum sharing games. In Proceedings of the twentythird annual ACM symposium on Principles of distributed computing,
PODC ’04, pages 107–114, New York, NY, USA, 2004. ACM.
[4] Jane Wei Huang and Vikram Krishnamurthy. Game theoretic issues in
cognitive radio systems (invited paper). Journal of Communications,
4(10), November 2009.
[5] Beibei Wang, Yongle Wu, and K.J. Ray Liu. Game theory for cognitive
radio networks: An overview. Computer Networks, 54(14):2537 – 2561,
2010.
[6] Martin J. Osborne and Ariel Rubinstein. A course in game theory. MIT
Press, 1994.
[7] Allen B. MacKenzie. Game Theory for Wireless Engineers. Morgan &
Claypool PublishersPress, 2006.
[8] Martin J. Osborne. An introduction to game theory. Oxford University
Press, 2003.
[9] Li Yan-bin, Wang Li-feng, and Li Ying. An improved game-theoretic
spectrum sharing algorithm in cognitive radio networks. In Computer
Research and Development (ICCRD), 2011 3rd International Conference
on, volume 2, pages 499 –503, March 2011.
[10] D. Niyato and E. Hossain. A game-theoretic approach to competitive
spectrum sharing in cognitive radio networks. In Wireless Communications and Networking Conference, 2007.WCNC 2007. IEEE, pages 16
–20, March 2007.
[11] Xinbing Wang, Zheng Li, Pengchao Xu, Youyun Xu, Xinbo Gao, and
Hsiao-Hwa Chen. Spectrum sharing in cognitive radio networks –
an auction-based approach. Systems, Man, and Cybernetics, Part B:
Cybernetics, IEEE Transactions on, 40(3):587 –596, June 2010.
[12] D. Niyato and E. Hossain. Competitive pricing for spectrum sharing in
cognitive radio networks: Dynamic game, inefficiency of nash equilibrium, and collusion. Selected Areas in Communications, IEEE Journal
on, 26(1):192 –202, Jan. 2008.
[13] Yongle Wu, B. Wang, K.J.R. Liu, and T.C. Clancy. Repeated open
spectrum sharing game with cheat-proof strategies. Wireless Communications, IEEE Transactions on, 8(4):1922 –1933, April 2009.