Universitá degli Studi di Padova
FACOLTÀ DI INGEGNERIA
DIPARTIMENTO DI INGEGNERIA DELL’INFORMAZIONE
Corso di Laurea Magistrale in Ingegneria delle Telecomunicazioni
TESI DI LAUREA MAGISTRALE
Access Policies for
Two Cognitive Secondary Users
under a Primary ARQ Process
LAUREANDA:
Giulia Agostini
RELATORE:
Dr. Michele Zorzi
Anno Accademico 2013-2014
A mamma, papà,
Andrea e Sara:
le colonne portanti del mio mondo;
e a chi, da lassù,
ha sempre vegliato su di me.
ii
Contents
1 Introduction
1
2 Cognitive Radio
2.1 The Birth of Cognitive Radio . . . . . . . . . . . . . . . . . . . .
2.2 Cognitive Radio and Spectrum Management in Cognitive Radio
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Cognitive Radio Technology and Network Architecture . .
2.2.2 Spectrum Management . . . . . . . . . . . . . . . . . . .
2.2.3 Markov Decision Processes and Reinforcement Learning
in Cognitive Radio . . . . . . . . . . . . . . . . . . . . . .
3
3
5
6
8
9
3 Centralized Access Policies in a Cognitive Radio Network with
two SUs
3.1 System Model and Policy Definition . . . . . . . . . . . . . . . .
3.2 Optimal Access Policies for two SUs . . . . . . . . . . . . . . . .
3.2.1 Upper Bound to the Average Long Term SUs Sum Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Low SU Access Rate Regime . . . . . . . . . . . . . . . .
3.2.3 High SU Access Rate Regime . . . . . . . . . . . . . . . .
3.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
11
14
17
17
18
19
4 Decentralized Access Policies in a Cognitive Radio Network
with two SUs and a Completely Observable System
4.1 System Model and Policy Definition . . . . . . . . . . . . . . . .
4.2 Optimal Access Policies for Two Independent SUs . . . . . . . .
4.2.1 SU1 Optimal Access Policy . . . . . . . . . . . . . . . . .
4.2.2 Numerical results . . . . . . . . . . . . . . . . . . . . . . .
21
21
22
26
28
5 Heuristic Decentralized Access Policies
Network with Two Independent SUs
5.1 Heuristic Access Policy H1 . . . . . . . .
5.2 Heuristic Access Policy H2 . . . . . . . .
5.3 Heuristic Access Policy H3 . . . . . . . .
5.4 Heuristic Access Policy H4 . . . . . . . .
41
44
60
73
82
in a Cognitive Radio
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6 Decentralize Access Policies in a Cognitive Radio Network with
two SUs and a Partially Observable System
93
7 Conclusions
101
iv
A MATLAB Code
103
A.1 DecMmdp.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
A.2 System evolution in CR network simulator . . . . . . . . . . . . . 107
A.3 H4 : transmission probabilities rearrangement . . . . . . . . . . . 113
Bibliography
117
v
List of Figures
2.1
2.2
Spectrum holes and DSA . . . . . . . . . . . . . . . . . . . . . . .
Cognitive radio network architecture . . . . . . . . . . . . . . . . . .
4
7
3.1
3.2
Cognitive radio system model . . . . . . . . . . . . . . . . . . . . .
MMDP: Average SUs sum throughput with respect to PU throughput
constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MMDP: Average PU throughput with respect to PU throughput constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
Markov Chain behind the model . . . . . . . . . . . . . . . . . . . .
DEC-MMDP: Average SUs sum throughput with respect to PU throughput constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 DEC-MMDP: Average PU throughput with respect to PU throughput
constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 DEC-MMDP: Average SUs sum Throughput vs γsip . . . . . . . . .
4.5 DEC-MMDP: Average SUs sum throughput with respect to PU throughput constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6 DEC-MMDP: Average PU throughput with respect to PU throughput
constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7 DEC-MMDPASY : Average SUs sum Throughput vs γs2p . . . . . . .
4.8 DEC-MMDPASY : Average SUs sum throughput with respect to PU
throughput constraint . . . . . . . . . . . . . . . . . . . . . . . . .
4.9 DEC-MMDPASY : Average PU throughput with respect to PU throughput constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10 DEC-MMDPASY : Tx probabilities vs γs2p for P U = 0.2 . . . . . . .
4.11 DEC-MMDPASY : Tx probabilities vs γs2p for P U = 1 . . . . . . . .
25
5.1
5.2
5.3
46
46
3.3
4.1
4.2
H1 : Average SUs sum Throughput vs (α, β) . . . . . . . . . . . . . .
H1 : (α∗ , β ∗ ) vs P U . . . . . . . . . . . . . . . . . . . . . . . . . .
H1 : Average SUs sum throughput with respect to PU throughput
constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 H1 : Average PU throughput with respect to PU throughput constraint
5.5 H1 : Average SUs sum Throughput vs γsip . . . . . . . . . . . . . . .
5.6 H1 : (α∗ , β ∗ ) vs γsip . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7 H1 Robustness: Average SUs sum throughput vs γsip . . . . . . . . .
5.8 H1 Robustness: Average PU throughput vs γsip . . . . . . . . . . . .
5.9 H1 Robustness: Average SUs sum throughput-Average PU throughput
tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.10 H1 : (α1∗ , β1∗ , α2∗ , β2∗ ) vs P U . . . . . . . . . . . . . . . . . . . . . . .
vi
19
20
28
29
31
32
33
35
36
37
38
38
47
48
50
53
53
54
54
55
5.11 H1,subOpt : (α1∗ , β1∗ , α2∗ , β2∗ ) vs P U . . . . . . . . . . . . . . . . . . .
5.12 H1,subOpt : Average SUs sum throughput with respect to PU throughput constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.13 H1,ASY : Average SUs sum throughput with respect to PU throughput
constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.14 H1,ASY : Average PU throughput with respect to PU throughput constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.15 H1,ASY : Average SUs sum throughput-Average PU throughput tradeoff
5.16 H1,ASY : α1∗ and β1∗ vs γs2p . . . . . . . . . . . . . . . . . . . . . . .
5.17 H1,ASY : α2∗ and β2∗ vs γs2p . . . . . . . . . . . . . . . . . . . . . . .
5.18 H2 : Average SUs sum Throughput vs η . . . . . . . . . . . . . . . .
5.19 H2 : η ∗ vs P U . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.20 H2 : Average SUs sum throughput with respect to PU throughput
constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.21 H2 : Average SUs sum Throughput vs γsip . . . . . . . . . . . . . . .
5.22 H2 : η ∗ vs γsip . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.23 H2 Robustness: Average SUs sum throughput vs γsip . . . . . . . . .
5.24 H2 Robustness: Average SUs sum throughput-Average PU throughput
tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.25 H2,ASY : Average SUs sum Throughput vs (η1 , η2 ) . . . . . . . . . . .
5.26 H2,ASY : (η1∗ , η2∗ ) vs P U . . . . . . . . . . . . . . . . . . . . . . . .
5.27 H2,ASY : Average SUs sum throughput with respect to PU throughput
constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.28 H3 : Average SUs sum Throughput vs (α2 , β2 ) given η1 = 1 . . . . . .
5.29 H3 (η1 = 1): (α2∗ , β2∗ ) vs P U . . . . . . . . . . . . . . . . . . . . . .
5.30 H3 (η1 = 1): Average SUs sum throughput with respect to PU throughput constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.31 H3 (η1 = 1): Average PU throughput with respect to PU throughput
constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.32 H3 : Average SUs sum Throughput vs (α2 , β2 ) given η1 = 0.5 . . . . .
5.33 H3 (η1 = 0.5): (α2∗ , β2∗ ) vs P U . . . . . . . . . . . . . . . . . . . . .
5.34 H3 (η1 = 0.5): Average SUs sum throughput with respect to PU
throughput constraint . . . . . . . . . . . . . . . . . . . . . . . . .
5.35 H3 (η1 = 0.5): Average PU throughput with respect to PU throughput
constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.36 H3 : Average SUs sum Throughput vs (α2 , β2 ) given η1 = 0.05 . . . .
5.37 H3 (η1 = 0.05): (α2∗ , β2∗ ) vs P U . . . . . . . . . . . . . . . . . . . . .
5.38 H3 (η1 = 0.05): Average SUs sum throughput with respect to PU
throughput constraint . . . . . . . . . . . . . . . . . . . . . . . . .
5.39 H3 (η1 = 0.05): Average PU throughput with respect to PU throughput constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.40 H3 comparison: Average SUs sum throughput with respect to PU
throughput constraint . . . . . . . . . . . . . . . . . . . . . . . . .
5.41 H4 : Average SUs sum throughput with respect to PU throughput
constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.42 H4 : Transmission probabilities convergence . . . . . . . . . . . . . .
5.43 H4,ASY : Average SUs sum throughput with respect to PU throughput
constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.44 H4,ASY : Average PU throughput with respect to PU throughput constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
56
57
57
58
59
59
60
62
63
63
66
68
69
69
71
72
72
74
74
75
75
76
77
77
78
79
80
80
81
82
83
87
88
89
5.45
5.46
5.47
5.48
H4,ASY : Average SUs sum throughput-Average PU throughput tradeoff
H4,ASY : α1∗ and β1∗ vs γs2p . . . . . . . . . . . . . . . . . . . . . . .
H4,ASY : α2∗ and β2∗ vs γs2p . . . . . . . . . . . . . . . . . . . . . . .
H1 − H2 − H4 comparison: Average SUs sum throughput with respect
to PU throughput constraint . . . . . . . . . . . . . . . . . . . . . .
viii
89
90
90
91
Chapter 1
Introduction
The fact that almost all the available frequencies are assigned to licensed users
and that these frequencies are often under-utilized caused a lack of spectrum
resources in wireless communication which gives rise to cognitive radio (CR)
as a way to improve spectral efficiency in wireless networks. Cognitive radio
enables the licensed primary user (PU) and unlicensed secondary users (SUs)
to coexist and transmit in the same frequency band; in the underlay cognitive
radio approach, the smart SUs are allowed to simultaneously transmit in the
licensed frequency band allotted to PU, while PU is oblivious to the presence of
SU, so SU needs to control the limited interference it causes at the PU receiver.
The idea of exploiting the ARQ retransmissions implemented by the PU is employed in [1], [2] and [3]. Levorato et al. in [1] consider a cognitive radio network
composed of one PU and one SU and does not utilize interference cancellation
(IC) at the SU receiver. Tannious and Nosratinia in [2] apply Hybrid ARQ
with incremental redundancy with at most one retransmission, where the SU
receiver tries to decode the PU message in the first time slot and if successful, it
removes this PU message in the second time slot to improve the SU throughput.
Michelusi et al. in [3] propose to exploit the intrinsic redundancy, introduced by
the Type-I HARQ implemented by the PU by enabling IC at the SU receiver;
in particular, they consider an arbitrary number of retransmissions and apply
backward and forward IC after decoding the PU message at the SU receiver.
Forward IC (FIC) provides IC on SU transmissions performed in future time
slots, since the SUrx, after decoding the PU message, performs IC in the next
PU retransmission attempts, if these occur. Backward IC (BIC) provides IC
on SU transmissions performed in previous time slots within the same primary
ARQ transmission window, whose decoding failed due severe interference from
the PU. However the number of SUs is limited to one in all these papers which
leverage the PU ARQ retransmission. Joda and Zorzi in [4] consider an underlay cognitive radio network that consists of two SUs and one PU in which the
PU employs Type-I HARQ; exploiting the redundancy in PU retransmissions,
each SU receiver applies IC to remove a successfully decoded PU message in
the subsequent PU retransmissions. Using a Constrained Markov Decision Process (CMDP) model, they propose centralized optimum access policies for the
two SUs in order to maximize the average SUs sum throughput under a PU
throughput constraint.
1
The scenario we consider in our work is a cognitive radio that consists of
two secondary users and one primary user in which there are no centralized
mechanisms, so the SUs make their decisions independently; every SU has only
a partial view of the state of system that is based on what it can observe from
its own perspective. The aim of our work is to analyze the described system and
find decentralized access policies for the two SUs. In order to design optimum
decentralized policies for the considered scenario we should model our system
by a Decentralized Constrained Partially Observable Markov Decision Process
(DEC-CPOMDP) model; however this represents a very hard challenge, so in
our work we decide to concentrate our efforts in the development of some valid
approximation (heuristic policy) and only in the last section we briefly try to
introduce the DEC-POMDP solution to our problem and to suggest some future
research topics.
2
Chapter 2
Cognitive Radio
2.1
The Birth of Cognitive Radio
Cognitive radio has opened up a new way of sensing and utilizing wireless spectrum resources; essentially, CR is a dynamically reconfigurable radio that can
adapt its operating parameters to the surrounding environment, which has made
been feasible by recent advances, such as software-defined radio (SDR) and
smart antennas, that enable flexible and agile access to the wireless spectrum,
and thus improve efficiency in spectrum utilization significantly. So far wireless
networks are characterized by a static spectrum allocation policy, where governmental agencies assign wireless spectrum to license holders on a long-term
basis for large geographical region. Recently, because of the increase in spectrum demand, this policy faces spectrum scarcity in particular spectrum bands:
recent studies [5] have shown that the licensed spectrum bands are severely
under-utilized mainly due to the traditional command-and-control type spectrum regulation that has prevailed for decades. Under such a spectrum policy,
each spectrum band is assigned to a designated party, which is given an exclusive
spectrum usage right for a specific type of service and radio device. Hence, dynamic spectrum access (DSA) techniques have been recently proposed to solve
these spectrum inefficiency problems. In particular, the key enabling technology
of DSA techniques is CR technology which allows unlicensed users/devices to
identify the un/under-utilized portions of licensed spectrum and utilize them
opportunistically as long as they do not cause any harmful interference to the
legacy spectrum users’ communications. The temporarily unused portions of
spectrum are called spectrum white spaces (WS) or spectrum holes that may
exist in time, frequency and space domains. Typically, spectrum holes are considered as the total or partial lack of power in the time-frequency plane. Thus,
a DSA, depicted in Fig. 2.1, which consists in ’jumping’ from a spectrum hole
to another allows to improve the spectrum usage. In particular there are three
different DSA access model:
• Dynamic Exclusive Access Model: the spectrum bands are allotted to
licensed users for exclusive usage. In order to introduce flexibility in the
spectrum employment two approaches are proposed: spectrum property
rights and dynamic spectrum allocation. The first allows the licensed users
to sell or lease the portions of the spectrum assigned to them and choose
3
the desired technology. The second allots, for a certain period and in a
given place, a portion of the spectrum for some service at exclusive usage.
• Open Sharing Model: also called common spectrum, it employs an open
spectrum sharing between equal users, i.e., it is an access model in which
there are no licensed and unlicensed users and the radio-frequency spectrum is available for use by all. However, sharing spectrum between unlicensed equipment requires that mitigation techniques (e.g. power limitation, dynamic frequency selection) are imposed to ensure that these
devices operate without interference.
• Hierarchical Access Model: this model adopts a hierarchical access in the
presence of primary users (PUs), the legacy users, and secondary users
(SUs), with the aim of allowing SUs to exploit the spectrum under a
constraint on the interference they cause to PUs. There are two possible
spectrum sharing approaches: underlay spectrum and overlay spectrum
that we describe below. This is the DSA approach we consider in our CR
model.
Figure 2.1: Spectrum holes and DSA
The concept of CR was first proposed in 1999 by Joseph Mitola III in his
pioneering work; since then, there has been rapidly increasing interest in CR
due to its potential for reshaping the way of utilizing spectrum resources: in the
United States the regulations on exploiting spectrum WS have been developed
by the Federal Communications Commission (FCC) that in 2000 released the
first notice of proposed rulemaking, discussing the necessary actions to remove
barriers to the development of the secondary spectrum market. After proposing
to allow unlicensed operation in the TV white spaces (TVWS), in 2008 the
FCC specified the rules in such unlicensed transmission in rural and urban areas
for fixed and personal/portable devices, thus paving the way for the CR-based
spectrum access. In the United Kingdom the Office of Communications (Ofcom)
launched the Digital Dividend Review (DDR) project in 2005 to explore the
options available after the digital TV switchover. The Ofcom proposed to allow
license-exempt use of interleaved spectrum for cognitive devices and decided to
allow cognitive access unless harmful interference is imposed on the licensed
users. Recently, the Ofcom also proposed parameters for license-exempt CRs
to provide PU protection, including those for spectrum sensing and geolocation
databases ([5]).
4
In the industrial field, in 2008, to evaluate the potential of WS devices
(WSDs), the FCC tested prototype WSDs in indoor and outdoor environments:
each tested device was capable of performing a combination of functions including DTV sensing, wireless microphone sensing, transmission and geolocation.
In 2009 the first public WS network was launched in Virginia using devices
which led to the first large scale ’Smart City’ network in North Carolina in
2010; such movement has shown that TVWS has real market value and thus
draws significant attention from the industry ([5]).
There have also been efforts to create international standards to utilize
TVWS using CR technology, in particular, IEEE 802.22 WRAN and Ecma
392. The former is designed for last-mile service in rural areas with fixed devices including the BS and the end-customer devices called customer premises
equipment (CPE); the latter has been proposed more recently to create an international standard for the personal/portable use of TVWS in urban areas.
IEEE 802.11af (also known as Wi-Fi 2.0 or White-Fi) has also been introduced
as a potential application of CR that may enhance the capacity and services of
current Wi-Fi systems by utilizing the TVWS, which provides better channel
propagation characteristics ([5]).
2.2
Cognitive Radio and Spectrum Management
in Cognitive Radio Networks
Although numerous definitions of cognitive radio exist, there are some common
features which characterize a CR:
• observation capability: a CR is aware of its environment and is able to
catch information from the latter;
• adaptability: a CR can dynamically and autonomously change its state
and/or its operating mode according to changes in its environment;
• intelligence: a CR uses information collected by observation to make decisions in order to achieve an objective.
CR networks are envisioned to provide high bandwidth to mobile users via
heterogeneous wireless architectures and DSA techniques; this goal can be realized only through dynamic and efficient spectrum management techniques. CR
networks, however, impose unique challenges due to the high fluctuation in the
available spectrum, as well as the diverse quality of service (QoS) requirements
of various applications.
In order to address these challenges, each CR user in the CR network must:
• determine which portions of the spectrum are available;
• select the best available channel;
• coordinate access to this channel with other users;
• vacate the channel when a licensed user is detected.
These capabilities can be realized through spectrum management functions that
address four main challenges: spectrum sensing, spectrum decision, spectrum
sharing and spectrum mobility.
5
2.2.1
Cognitive Radio Technology and Network Architecture
Formally, a CR is defined as a radio that can change its tramitter parameters
based on the interaction with its environment. Specifying what we have just
pointed out, a CR has to have two main characteristics [6]:
• cognitive capability: through real-time interaction with the radio environment, the portions of the spectrum that are unused at a specific time or
location (known as spectrum hole or white space) can be identified; conseguently, the best spectrum can be selected, shared with other users and
exploited without interference with the licensed user.
• Reconfigurability: a CR can be programmed to transmit and receive on
a variety of frequencies, and use different access technologies supported
by its hardware design. In this way, the best spectrum and the most
appropriate operating parameters can be selected and reconfigured.
As described in [6], in order to provide these capabilities, CR requires a novel
radio frequency (RF) transceiver, whose main components are the radio frontend and the baseband processing unit. In the former, the received signal is
amplified, mixed, and analo-to-digital converted; in the latter, the signal is
modulated/demodulated. Each component can be reconfigured via a control
bus to adapt to the time-varying RF environment. The novel characteristic of
the CR transceiver is the wideband RF front-end that is capable of simultaneous
sensing over a wide frequency range; this functionality is related mainly to
the RF hardware technologies, such as wideband antenna, power amplifier and
adaptive filter. RF hardware should be capable of being tuned to any part of
a large range spectrum. However, because the CR transceiver receives signals
from various transmitters operating at different power levels, bandwidths and
locations, the RF front-end should have the capability to detect a weak signal
in a large dynamic range, which is a major challenge in CR transceiver design.
As far as CR network architecture is concerned, its components are substantially two: the primary network and the CR network.
The primary (or licensed) network is referred to as an existing network,
where the PUs have a license to operate in a certain spectrum band; if such a
network has an infrastructure, PU activities are controlled through a primary
base station. Due to their priority in spectrum access, the operations of PUs
should not be affected by unlicensed users.
The CR network (or DSA network, or unlicensed network) does not have a
license to operate in a desired band; hence, additional functionality is required
for CR users to share the licensed spectrum band. CR networks also can be
equipped with CR base stations that provide single-hop connection to CR users.
Finally, they may include spectrum brokers that play a role in distributing the
spectrum resources among different CR networks.
6
Figure 2.2: Cognitive radio network architecture
Due to spectrum heterogeneity, CR users are capable of both licensed band
and unlicensed band operations. In the first case, they access the licensed
portions of the spectrum that is primarily used by PUs; hence, they are focused
mainly on the detection of PUs: if a PU appears in the spectrum band occupied
by CR users, they should vacate that spectrum band and move to available
spectrum immediately. In the second case, in the absence of PUs, the CR users
have the same right to access the spectrum, so spectrum sharing methods are
required to compete for the unlicensed band. Due to network heterogeneity, CR
users can perform three different access types [6]:
• CR network access: a CR user can access its own CR base station, on
both licensed and unlicensed spectrum bands. Since all interactions occur
inside the CR network, its spectrum sharing policy can be independent of
that of the primary network.
• CR ad hoc access: a CR user can communicate with other CR users
through an ad hoc connection on both licensed and unlicensed spectrum
bands.
• Primary network access: a CR user can also access the primary base station through the licensed band; unlike the other access type, CR user requires an adaptive medium access control (MAC) protocol, which enables
roaming over multiple primary networks with different access technologies.
According to the CR architecture shown in Fig. 2.2 various functionalities are
required to support spectrum management in CR networks.
7
2.2.2
Spectrum Management
CR networks impose unique challenges, so new spectrum management functions
are required: in particular, CR networks should avoid interference with primary
networks, should support QoS-aware communication considering the dynamic
and heterogeneous environment in order to decide on an appropriate spectrum
band, and should provide seamless communication regardless of the appearance
of the primary users. Specifically, the spectrum management process consists
of four major steps:
• Spectrum sensing: a CR user can allocate only an unused portion of the
spectrum. Therefore, it has to monitor the available spectrum bands,
capture their information and then detect spectrum holes without causing
interference to the primary network. Generally, there are three type of
spectrum sensing techniques [6]: primary transmitter detection, primary
receiver detection and interference temperature management. Transmitter detection is based on the detection of a weak signal from a primary
transmitter through the local observations of CR users; due to the lack
of interactions between PUs and CR users, this type of techniques alone
cannot avoid interference to primary receivers. Therefore, sensing information from other users, referred to as cooperative detection, grants more
accuracy minimizing the uncertainty of a single user’s detection and improving the detection probability in a heavily shadowed environment. The
most efficient way to detect spectrum holes is to detect the primary users
that are receiving data within the communication range of a CR user, i.e.,
using primary receiver detection. Interference temperature is a new model
for measuring interference recently introduced by the FCC, it limits the
interference at the receiver through an interference temperature threshold, which is the amount of new interference the receiver could tolerate: as
long as the CR users do not exceed this limit, they can use the spectrum
band.
• Spectrum decision: based on the spectrum availability, CR users can allocate a channel; thus, they have to be capable to decide which is the
best spectrum band among the available bands according to the QoS requirements of the applications. Spectrum decision is closely related to
the channel characteristics, such as interference at the PU receiver that
influences the CR transmitting power, path loss, wireless link errors and
link layer delay, and operations of primary users. Finally, also the activities of the other CR users in the network affect the spectrum decision.
The decision procedure consists of two steps: first, each spectrum band is
characterized, based on not only local observations of CR users but also
statistical information of primary network. Then, based on this characterization, the most appropriate spectrum band can be chosen. In particular,
to describe the dynamic nature of CR networks, a new metric is considered: the primary user activity [6], which represents the probability of a
primary user appearance during a CR user transmission. Because of the
PUs’ operation, CR users cannot obtain a reliable communication channel
for a long time period; therefore, multiple noncontiguous spectrum bands
can be simultaneously used for CR users’ transmissions, this method can
create a signal that is immune to the interference of the PU’s activity.
8
• Spectrum sharing: because there may be multiple CR users trying to access the spectrum, CR network access should be coordinated to prevent
multiple users colliding in overlapping portions of the spectrum. There
are three possible classification of the spectrum sharing based respectively
on network architecture, allocation behaviour and access technology [6].
In a centralized spectrum sharing the spectrum allocation and access procedures are controlled by a central entity, while in decentralized spectrum
sharing they are based on local (or possibly global) policies that are performed by each node distributively. From the allocation behaviour point of
view, in a cooperative spectrum sharing collaborative solutions exploit the
interference measurements of each node such that the effect of the communication of one node on other nodes is considered. Tipically, clusters
are formed to share interference information locally. In a non-cooperative
spectrum sharing only a single node is considered in non-collaborative (selfish) solutions. Because interference in other CR nodes is not considered,
this type of solutions may result in reduced spectrum utilization; however,
they do not require frequent message exchanges between neighbors as in
cooperative solutions. Considering the access technology, in overlay spectrum sharing nodes access the network using a portion of the spectrum
that has not been used by licensed users; this minimizes interference to the
primary network. In underlay spectrum sharing the spread spectrum techniques are exploited such that the transmission of a CR node is regarded
as noise by licensed users. Thus, underlay techniques can utilize higher
bandwidth at the cost of a slight increase in complexity. Finally, spectrum
sharing techniques are generally focused on two types of solutions: spectrum sharing inside a CR network (intranetwork spectrum sharing) and
among multiple coexisting CR networks (internetwork spectrum sharing).
• Spectrum mobility: CR users are regarded as visitors to the spectrum.
Hence, if the specific portion of the spectrum in use is required by a
PU, the communication must be continued in another vacant portion of
the spectrum, referred as spectrum handoff [6]. Since each time a CR
user changes its frequency of operation, protocols for different layers of
the network stack must adapt to the channel parameters for the operating frequency. The purpose of the spectrum mobility management in
CR networks is to ensure smooth and fast transition leading to minimum
performance degradation during a spectrum handoff.
It is evident from the significant number of interactions that the spectrum management functions require a cross-layer design approach.
2.2.3
Markov Decision Processes and Reinforcement Learning in Cognitive Radio
Markov decision processes (MDP) represent a mathematical model useful to
describe and analyze decisional processes when the results are random and/or
under the control of a decision maker. In CR the decision process is crucial
to choose the best action in response to changes in the environment. Another
useful technique in CR is reinforcement learning which concerns how an agent
should make decisions in a certain environment with the aim of maximizing
9
its reward in a long term horizon. The premise behind this approach is that
an agent receives external responses to the actions it perform, i.e. a ’good’
action will involve a reward, while a ’bad’ action will cause a cost. Thus, the
reinforcement learning algorithms search some rules which could map external
environment states into actions that the agent should select to improve its reward. Typically, the environment is modelled by an MDP with a finite number
of states.
Our work is structured as follows: in Chapter 3 we report the centralized
access policies design developed in paper [4] which represents a reference bound
for our work performance, i.e., a comparing instrument to evaluate the ’goodness’ of the access policies we propose; in Chapter 4 we extend the work of [4]
to the decentralized case and we find the optimal access policies for two independent SUs in a cognitive radio network under a primary ARQ process. Then,
in Chapter 5, we focus on the search of decentralized heuristic access policies
for the same CR scenario, in particular, we propose three offline and one online
policies considering both the symmetric and the asymmetric case, i.e., since the
SUs have the same transmission parameters (e.g. transmission rates, average
SNRs on channels from SUtx to possible network receivers), first we suppose
they adopt the same access strategy with the same transmission probabilities
and then we consider the possibility they adopt different transmitting behavior.
Furthermore, for each heuristic proposed we present numerical results in order to
analyze and compare the system performance in all considered cases. In Chapter 6 we suggest some interesting starting point to find an efficient solution to
the DEC-POMDP problem; finally, we conclude with some considerations and
we point out some open research problems which can be investigated in future
works.
10
Chapter 3
Centralized Access Policies
in a Cognitive Radio
Network with two SUs
The design of centralized access policy for two secondary users under a primary
ARQ process has jet been studied and discussed in [4] where the authors design
an optimum access policy for two SUs, which exploits the redundancy introduced
by the HARQ protocol in transmitting copies of the same PU message and
interference cancellation at the SU receivers. The aim of the paper is the same
we have, i.e., to maximize the average long term sum throughput of the SUs
under a constraint on the average long term PU throughput degradation. The
basic assumption they make is that the number of retransmissions is limited
and both SUs have a new packet to transmit in each time slot. Noting the
PU message knowledge state at each of the SU receivers and also the ARQ
retransmission time, the network is modeled using a Markov Decision Process
(MDP). Due to the constraint on the average long term PU throughput, they
then have a Constrained Markov Decision Process (CMDP).
3.1
System Model and Policy Definition
In the system we consider, there exist one primary and two secondary transmitters denoted by P Utx , SUtx1 and SUtx2 , respectively. These transmitters
transmit their messages with constant power over block fading channels and, in
each time slot, the channels are considered to be constant. The signal to noise
ratios (SNRs) of the channels P Utx → P Urx , P Utx → SUrx1 , P Utx → SUrx2 ,
SUtx1 → P Urx , SUtx1 → SUrx1 , SUtx1 → SUrx2 , SUtx2 → P Urx , SUtx2 →
SUrx1 , SUtx2 → SUrx2 are denoted by γpp , γps1 , γps2 , γs1p , γs1s1 , γs1s2 , γs2p ,
γs2s1 and γs2s2 , respectively. We assume that no channel State Information
(CSI) is available at the transmitters. Thus, transmissions are under outage,
when the selected rates are greater than the current channel capacity.
PU is unaware of the presence of the SUs and employs Type-I HARQ with
at most T transmissions of the same PU message. We assume that the ARQ
feedback is received by the PU transmitter at the end of the time-slot and a
11
Figure 3.1: Cognitive radio system model
retransmission can be performed in the next time-slot. Retransmission of the
PU message is performed if it is not successfully decoded at the PU receiver
until the PU message is correctly decoded or the maximum number of transmissions allowed, T , is reached. In each time-slot, each SU, if it accesses the
channel, transmits its own message, otherwise stays idle and does not transmit.
This decision is based on its access policy. The activity of the SUs affects the
outage performance of the PU, by creating interference to the PU receiver. The
objective is to find access policies for the two SUs to maximize the average sum
throughput of the SUs under a constraint on the PU average throughput degradation.
In paper [4] it is assumed that there is a central unit which controls the activities of the SUs. The central unit sends the ARQ transmission time, PU codebook, maximum transmission deadline T and feedback from P Urx (ACK/NACK
message). This unit also computes the secondary access probabilities and provides them to the two SUs.
In the system we want to model, we have four different combinations of
the accessibility of the SUs to the channel, listed in the accessibility vector
ϕ = [{0, 0} , {0, 1} , {1, 0} , {1, 1}]; the lth element of the accessibility vector is
referred to as accessibility action l ∈ A, where A = {0, 1, 2, 3}. For example,
ϕ(1) = {0, 1} shows that only SU2 accesses the channel.
If SUrx1 or SUrx2 succeds to decode the PU message, it can cancel the PU
message from the received signal in the future retransmissions. We refer to
12
this as Forward Interference Cancellation (FIC) [5]. We call the PU message
kowledge state as φ = [{U, U } , {U, K} , {K, U } , {K, K}], which denotes the
knowledge of the PU message at the two SU receivers. We suppose PU message
knowledge state, ARQ transmission time, maximum transmission deadline, T ,
and feedback from PU are known to SUtx1 and SUtx2 .
Based on PU message knowledge state ϕ and accessibility actions of the two
SUs, the rate of the secondary user i can be adapted and it is denoted by
Rsi,l,φ , i = 1, 2 and l ∈ {1, 2, 3}. (If l = 0 the rate is zero.) Therefore, we have:
Rs1,2,{K,K} = Rs1,2,{K,U } = Rs1,2,K
(3.1)
Rs1,2,{U,K} = Rs1,2,{U,U } = Rs1,2,U
(3.2)
Rs2,1,{K,K} = Rs2,1,{K,U } = Rs2,1,K
(3.3)
Rs2,1,{U,K} = Rs2,1,{U,U } = Rs1,1,U
(3.4)
We also define Rs1,3,{K,K} = Rs1,3,K and Rs2,3,{K,K} = Rs2,3,K . Note that we
can use (1) to (4) for action 3 if the channels from SUtx1 to SUrx2 and viceversa
are interference free.
The outage probabilities of the channel P Utx → P Urx in SU accessibility
action 0, 1, 2 and 3 are denoted by ρp,0 , ρp,1 , ρp,2 and ρp,3 , respectively. Noting
that the SU1 and SU2 transmissions are considered as background noise at the
P Urx , we have:
ρp,0 = 1 − P r(Rp ≤ C(γpp ))
(3.5)
ρp,i = 1 − P r Rp ≤ C
γpp 1 + γsip
ρp,3 = 1 − P r Rp ≤ C
γpp
1 + γs1p + γs2p
i ∈ {1, 2}
(3.6)
(3.7)
where Rp denotes the PU transmission rate in bits/s/Hz, C(x) = log2 (1 + x)
is the (normalized) capacity of the Gaussian channel with SNR x at the receiver.
The outage probability of the channel SUtxi → SUrxi at the PU message
knowledge state φ and accessibility action l is denoted by ρsi,l,φ , i ∈ {1, 2}. At
PU knowledge state {K, K} or {K, U }, the PU message is known at SUrx1 and
therefore the PU message may be canceled at this receiver. Thus, at accessibility
action 2, i.e., when only SU1 transmits its message, we have ρs1,2,{K,K} =
ρs1,2,{K,U } = ρs1,2,K , where:
ρs1,2,K = P r(Rs1,2,K > C(γs1s1 ))
(3.8)
In contrast, at PU knowledge {U, K} or {U, U }, where the PU message is not
decoded at SUrx1 , the outage probability of the channel from SUtx1 to SUrx1
is under the influence of the received PU message. Thus, at accessibility action
2, we have ρs1,2,{U,K} = ρs1,2,{U,U } = ρs1,2,U , where:
ρs1,2,U = P r(Rs1,2,U ∈
/ Γs1 (Rs1,2,U , Rp ))
13
(3.9)
Similarly, at accessibility action 1, i.e., when only SU2 transmits its message,
we obtain ρs2,1,{K,K} = ρs2,1,{K,U } = ρs2,1,K and ρs2,1,{U,K} = ρs2,1,{U,U } =
ρs2,1,U , where:
ρs2,1,K = P r(Rs2,1,K > C(γs2s2 ))
(3.10)
ρs2,1,U = P r(Rs2,1,U ∈
/ Γs2 (Rs2,1,U , Rp ))
(3.11)
The SNR region Γsi (Rsi,j,U , Rp ), i, j ∈ {1, 2} and i 6= j, is the union of two region: the first region guarantees that SUi and PU messages, transmitted at rates
Rsi,j,U and Rp , respectively, are correctly decoded at SUrxi via joint decoding;
on the other hand, in the second region, only SUi message can be successfully
decoded by assuming the interference from PU as background noise. Note that
the other source is idle.
For accessibility action 3, i.e., when both the SUs transmit their own message, we have:
ρs1,3,{K,φ2 } = P r(Rs1,3,{K,φ2 } ∈
/ Γ̇s1 (Rs1,3,{K,φ2 } , Rs2,3,{K,φ2 } ))
(3.12)
ρs2,3,{φ1 ,K} = P r(Rs2,3,{φ1 ,K} ∈
/ Γ̇s2 (Rs1,3,{φ1 ,K} , Rs2,3,{φ1 ,K} ))
(3.13)
ρs1,3,{U,φ2 } = P r(Rs1,3,{U,φ2 } ∈
/ Γ̈s1 (Rs1,3,{U,φ2 } , Rs2,3,{U,φ2 } , Rp ))
(3.14)
ρs2,3,{φ1 ,U } = P r(Rs2,3,{φ1 ,U } ∈
/ Γ̈s2 (Rs1,3,{φ1 ,U } , Rs2,3,{φ1 ,U } , Rp ))
(3.15)
The SNR region Γ̇si (Rsi,3,φ , Rsj,3,φ ), {i, j} ∈ {{1, 2} , {2, 1}}, guarantees that
the SUi message transmitted at rate Rsi,3,φ is successfully decoded at SUrxi
when another SU message is transmitted at rate Rsj,3,φ . Note that the PU
message received at SUrxi is canceled using FIC. On the other hand, if PU message is not decoded at SUrxi , the SNR region Γ̈si (Rsi,3,φ , Rsj,3,φ , Rp ), {i, j} ∈
{{1, 2} , {2, 1}}, guarantees that the SUi message transmitted at rate (Rsi,3,φ is
successfully decoded at SUrxi when other SU and PU messages are transmitted
at rates Rsj,3,φ and Rp , respectively.
Since the value of Rsi,j,K does not affect the outage performance at P Urx and
SUrxj , {i, j} ∈ {{1, 2} , {2, 1}}, this rate is chosen so as to maximize the SUi
throughput. Rate Rsi,3,K does not affect the outage performance at P Urx ; thus,
the value of Rsi,3,K nad Rsj,3,K are selected such that the SUs sum throughput
is maximized, whereas the same argument can not be applied for the states
in which the PU message is unknown, because in this case there is a tradeoff
between the SUs sum throughput and helping the SU receivers to decode the
PU message.
3.2
Optimal Access Policies for two SUs
The state of the system can be modeled by an MDP s = (t, φ(s)), where t ∈
{1, 2, ..., T } is the primary ARQ state and φ(s) ∈ {{U, U } , {U, K} , {K, U } , {K, K}}
denotes the PU message knowledge state. The set of all states is indicated by S.
14
The policy µ maps the state of the network, s, to the probability that the
secondary users take accessibility action l ∈ {0, 1, 2, 3}. The probability that
action l is selected in state s is denoted by µl (s). If accessibility action l is
selected, the expected throughputs of SU1 and SU2 in state s are respectively
computed as:
(
Rs1,l,φ(s) (1 − ρs1,l,φ(s) ) for l ∈ {2, 3}
(3.16)
Ts1,l,φ(s) =
0 for l ∈ {0, 1}
Rs2,l,φ(s) (1 − ρs2,l,φ(s) ) for l ∈ {1, 3}
(
Ts2,l,φ(s) =
(3.17)
0
for l ∈ {0, 2}
Since the model considered is a stationary Markov chain, the average long term
SUs sum throughput can be obtained as:
T̄s (µ) = El,s=(t,φ(s)) [T̄s1,l,φ(s) + T̄s2,l,φ(s) ]
= Es=(t,φ(s))
2
hX
µl (s)Rsi,l,φ(s) (1 − ρsi,l,φ(s) )+
l=1
µ3 (s)(Rs1,3,φ(s) (1 − ρs1,3,φ(s) ) + Rs2,3,φ(s) (1 − ρs2,3,φ(s) ))
i
(3.18)
where ρs1,2,φ(s) , ρs2,1,φ(s) , ρs1,3,φ(s) and ρs2,3,φ(s) are given in (3.8) to (3.15).
The average long term PU throughput is given by:
3
X
T̄p = Rp 1 −
Es=(t,φ(s)) [µl (s)]ρp,l
(3.19)
l=0
Using µ0 = 1 − µ1 − µ2 − µ3 , T̄p can be rewritten as follows:
3
X
T̄p = Rp 1 −
Es=(t,φ(s)) [πl (s)]ρp,l
l=1
3
X
− Rp ρp,0 −
Es=(t,φ(s)) [πl (s)]ρp,0
l=1
= TpI − Rp (Es=(t,φ(s)) [ρp,l − ρp,0 ])
(3.20)
TpI = Rp (1 − ρp,0 )
(3.21)
where
ρp,0 , ρp,1 , ρp,2 and ρp,3 are given in (3.5) to (3.7).
Thus, if we request that T̄p ≥ TpI (1 − P U ), the PU throughput degradation
constraint can be computed as follows:
TpI − T̄p = Rp El,s=(t,φ(s)) [ρp,l − ρp,0 ] ≤ Rp (1 − ρp,0 )P U
15
(3.22)
Now we can formalize the optimization problem as follows:
maximizeµl (s) T̄s = El,s=(t,φ(s)) [T̄s1,l,φ(s) + T̄s2,l,φ(s) ]
(3.23)
s.t. El,s=(t,φ(s)) [ρp,l − ρp,0 ] ≤ (1 − ρp,0 )P U = ω
(3.24)
(3.25)
where πl (s) is the probability that accessibility action l is selected in state s.
In paper [4], the authors first compute an upper bound to the average long
term SUs sum throughput, then they give a solution to (3.25) in low SU access
rate regime and in high access rate regime; in order to do this, they provide the
following definition, which identifies the boundary between low and high access
rate regimes.
Definition 1: Let µinit = {µ0,init , µ1,init , µ2,init , µ3,init } be the policy such
that SU1 or/and SU2 in all states s ∈ SK = {(t, {K, K}) : t ∈ {1, 2, ..., T }}
access the channel as follows:
{0, 0, 1, 0} if max(a,b,c)=a
{0, 1, 0, 0} if max(a,b,c)=b
(3.26)
µinit =
{0, 0, 0, 1} if max(a,b,c)=c
and for all other states, s ∈
/ SK , µinit = {1, 0, 0, 0}, where
a=
Rs1,2,K (1 − ρs1,2,K )
ρp,2 − ρp,0
(3.27)
b=
Rs2,1,K (1 − ρs2,1,K )
ρp,1 − ρp,0
(3.28)
c=
Rs1,3,K (1 − ρs1,3,K ) + Rs2,3,K (1 − ρs2,3,K )
ρp,3 − ρp,0
(3.29)
For access probability πinit , the constraint given in (3.24) can be computed and
referred to as ωinit . Hence, replacing (3.26) in (3.24) and then computing the
expectation with respect to l and s, ωinit can be obtained as follows:
PT
(ρp,2 − ρp,0 ) t=1 µ(t, {K, K}) if max(a,b,c)=a
PT
(3.30)
ωinit =
(ρp,1 − ρp,0 ) t=1 µ(t, {K, K}) if max(a,b,c)=b
PT
(ρp,3 − ρp,0 ) t=1 µ(t, {K, K}) if max(a,b,c)=c
where µ(t, {K, K}) is the steady-state probability of being in state s = (t, {K, K}),
and a, b, c are given in (3.27) to (3.29).
16
3.2.1
Upper Bound to the Average Long Term SUs Sum
Throughput
An upper bound to the average long term SUs sum throughput is achieved
when the receivers are assumed to be aware of the PU message, so that they
can always cancel the PU interference. Since each SU always knows the PU
message, there exist an optimal access policy which is independent of the ARQ
state and therefore is the same in each slot. Thus, in this case problem (3.25)
may be rewritten as follows:
maxµ1 ,µ2 ,µ3 T̄s =
2
X
µl Rsi,l,K (1 − ρsi,l,K )
l=1
+ µ3 (Rs1,3,K (1 − ρs1,3,K ) + Rs2,3,K (1 − ρs2,3,K ))
s.t.
3
X
µl (ρp,l − ρp,0 ) ≤ ω
(3.31)
(3.32)
l=1
(3.33)
In paper [4] the solution to problem (3.33) is given by the following proposition:
Proposition 1: An access policy to achieve the upper bound is given by
n
o
ω
ω
,
0,
,
0
if max(a,b,c)=a
1
−
ρp,2 −ρp,0
ρp,2 −ρp,0
n
o
ω
ω
1 − ρp,1−ρ
(3.34)
,
, 0, 0
if max(a,b,c)=b
µup =
p,0 ρp,1 −ρp,0
o
n
ω
ω
1−
if max(a,b,c)=c
ρp,3 −ρp,0 , 0, 0, ρp,3 −ρp,0
Furthermore, the upper bound to the average long term SUs sum throughput
is obtained as:
ω
ρp,2 −ρp,0 Rs1,2,K (1 − ρs1,2,K ) if max(a,b,c)=a
ω
T̄sup =
(3.35)
ρp,1 −ρp,0 Rs2,1,K (1 − ρs2,1,K ) if max(a,b,c)=b
P
2
ω
i=1 Rsi,3,K (1 − ρsi,3,K ) if max(a,b,c)=c
ρp,3 −ρp,0
3.2.2
Low SU Access Rate Regime
In low SU access rate regime ω ≤ ωinit ; in paper [4] the optimum access policy
is characterized by the following proposition:
Proposition 2: In the low SU rate regime ω ≤ ωinit , the optimal access
17
policy ∀s ∈ SK is given by
n
1−
n
∗
µ =
1−
n
1−
o
if max(a,b,c)=a
o
ω
ω
ωinit , ωinit , 0, 0
o
ω
ω
ωinit , 0, 0, ωinit
if max(a,b,c)=b
ω
ω
ωinit , 0, ωinit , 0
(3.36)
if max(a,b,c)=c
and
µ∗ = {1, 0, 0, 0}
∀s ∈
/ SK
(3.37)
Furthermore, the average long term SUs sum throughput is obtained as:
ω
ωinit Rs1,2,K (1 − ρs1,2,K ) if max(a,b,c)=a
ω
(3.38)
T̄s∗ =
ωinit Rs2,1,K (1 − ρs2,1,K ) if max(a,b,c)=b
P
2
ω
i=1 Rsi,3,K (1 − ρsi,3,K ) if max(a,b,c)=c
ωinit
3.2.3
High SU Access Rate Regime
In paper [4] to obtain a solution to the CMDP problem described in (3.25) in
high access rate regime, i.e., ω > ωinit , the authors employ the equivalent LP
formulation corresponding to CMDP. To provide the equivalent LP, it is necessary the transition probability matrix of the Markov process denoted by P,
where Pss0 ,l is the probability of moving from state s to s0 if the accessibility
action l is chosen.
For any unichain Constrained Markov Decision Process, there exists an equivalent Linear Programming (LP) formulation, where an MDP is considered unichain
if it contains a single recurrent class plus a (perhaps empty) set of transient
states. Thus, the equivalent LP problem for problem (3.25) is:
X X
maxx
(T̄s1,l,φ(s) + T̄s2,l,φ(s) )x(s, l)
(3.39)
s∈S l∈A×A
s.t.
X X
(ρp,l − ρp,0 )x(s, l) ≤ ω
(3.40)
X X
(3.41)
s∈S l∈A×A
X
x(s0 , l) −
l∈A×A
X X
P (ss0 , l)x(s, l) = 0 ∀s0 ∈ S
s∈S l∈A×A
x(s, l) = 1
(3.42)
s∈S l∈A×A
x(s, l) ≥ 0
∀s ∈ S,
l ∈A×A
(3.43)
(3.44)
18
The relationship between the optimal solution of LP problem (3.44) and the
optimal solution to problem (3.25) is obtained as follows:
P
0
P x(s,l) 0 ,
if
l0 ∈A×A x(s, l ) > 0
x(s,l )
0
l ∈A×A
µl (s) =
(3.45)
arbitrary,
otherwise
3.3
Numerical results
Now we report some numerical results of paper [4]. The channels considered are
Rayleigh fading, thus, the SNRs γx , x ∈ {pp, ps1, ps2, s1s1, s1p, s1s2, s2s2, s2p,
s2s1}, are exponentially distributed random variables with mean γ̄x . For mathematical convenience the links from SUtx1 to SUrx2 and viceversa are assumed
interference free. They consider the following parameters: the average SNRs are
γ̄pp = 10, γ̄si = 5, γ̄psi = 5, γ̄sip = 2, i ∈ {1, 2}, and the ARQ deadline is T = 5.
The PU rate Rp is selected such that PU throughput is maximized when both
SUs are idle, i.e., Rp = argmaxR TpI (R). The SUi rate Rsi,l,U under PU message
unknown to SUrxi is computed as Rsi,l,U = argmaxRsi Tsi,l,U (Rsi , Rp ), where
i = 1, l ∈ {2, 3} or i = 2, l ∈ {1, 3}, so as to maximize the SUs sum throughput.
The SUi rate Rsi,l,K under PU message known to the SUrxi is computed as
Rsi,l,K = argmaxR Tsi,l,K (R), where i = 1, l ∈ {2, 3} or i = 2, l ∈ {1, 3}. The
PU throughput constraint is set to (1 − P U )TpI , where P U ∈ {0.01, 0.05, 0.08,
0.1, 0.13, 0.15, 0.18, 0.2, 0.25, 0.3, 0.4, 0.4861, 0.6, 0.7, 0.8, 1}. FIC is exploited at
the SUrxi , i ∈ {1, 2}.
Figure 3.2: MMDP: Average SUs sum throughput with respect to PU throughput
constraint
The SUs sum throughput with respect to the PU throughput constraint
by varying the value of P U is depicted in Fig. 3.2. Obviously, as the PU
throughput increases, the average SUs sum throughput decreases.
19
Figure 3.3: MMDP: Average PU throughput with respect to PU throughput constraint
Fig. 3.3 depicts the PU throughput with respect to the PU throughput constraint by varying the value of P U . Obviously, as P U decreases the constraint
increases and the PU throughput degradation decreases.
20
Chapter 4
Decentralized Access
Policies in a Cognitive
Radio Network with two
SUs and a Completely
Observable System
The design of decentralized access policies for two secondary users under a
primary ARQ process has not been studied jet. Thus, the purpose of our work
is to study a decentralized CR network with two SUs and one PU exploiting the
redundancy introduced by the HARQ protocol in transmitting copies of the same
PU message and interference cancellation at the SU receivers. Our aim remains
the same, i.e., we want to maximize the average long term sum throughput of the
SUs under a constraint on the average long term PU throughput degradation. To
start with a simple scenario, we suppose that the system is completely observable
for the two SUs, which means that in each time-slot they know the system state,
s = (t, φ(s)) = (t, {φ1 , φ2 }), as in the centralized case, but in this case there is
no central unit which controls the activities of the SUs, i.e., they select their own
action in a certain state independently. Again we assume that the number of
retransmissions is limited and both SUs have a new packet to transmit in each
time slot. Noting the PU message knowledge state at each of the SU receivers
and also the ARQ retransmission time, the network can again be modeled using
a Markov Decision Process (MDP), and due to the constraint on the average long
term PU throughput, using a Constrained Markov Decision Process (CMDP).
4.1
System Model and Policy Definition
The system we consider is the same as in the centralized case. Thus, there are
the same transmitters, receivers and channels, whose signal to noise ratio is denoted in the same way, i.e., γpp , γps1 , γps2 , γs1p , γs1s1 , γs1s2 , γs2p , γs2s1 and γs2s2
are the SNR of the channels P Utx → P Urx , P Utx → SUrx1 , P Utx → SUrx2 ,
21
SUtx1 → P Urx , SUtx1 → SUrx1 , SUtx1 → SUrx2 , SUtx2 → P Urx , SUtx2 →
SUrx1 , SUtx2 → SUrx2 , respectively.
We mantain the same assumptions as in the centralized case about the PU
transmissions and retransmissions, and about the SUs activities and their influence on the outage performance of the PU. The substantial difference consists
in the absence of a central unit that monitors the activities of the SUs and
provides them with the optimal access policies.
In the system we want to model, we have the same accessibility vector, ϕ,
and the same PU message knowledge state, φ, as in the centralized scenario;
in particular, each element of ϕ can be considered as the couple of actions
performed by the two SUs, {a1 , a2 } ∈ A × A, in the considered time-slot, where
ai = 1 means that SUi accesses the channel and transmits its message, i =
1, 2. The relationship between the accessibility action l in the MMDP and
a= {a1 , a2 } in the DEC-MMDP is simple, since l corresponds to a in decimal
notation. For example, l = 1 corresponds to {a1 , a2 } = {0, 1}, which means
that only SU2 accesses the channel. Similarly, each element of φ represents the
couple of PU message knowledge at the two SUrx s, i.e. {φ1 , φ2 } ∈ Φ × Φ, where
Φ = {U, K}. The transmission rates for the possible accessible action and the
outage probabilities are the same as in the centralized case, given in (3.1) to
(3.4), and (3.5) to (3.15), respectively.
4.2
Optimal Access Policies for Two Independent SUs
As in the centralized case, the state of the system can be modeled by an MDP
s = (t, φ(s)), where t ∈ {1, 2, ..., T } is the primary ARQ state and φ(s) ∈
{{U, U } , {U, K} , {K, U } , {K, K}} denotes the PU message knowledge state.
Since we want to analize a decentralized scenario in which each SU selects its
own action independently, there are two policies, one for each SU, denoted by
π1 and π2 , that map the state of the network s to the probability that the SU
takes one of the two possible action, ai ∈ A, where A = {0, 1}, i.e., each SU can
accesses or not the channel, independently of the action of the other SU. The
probability that action ai is selected by SUi in state s is denoted by πi (ai |s),
i ∈ {1, 2}. Since the SUs are independent, the relationship between the policy
µ of the MMDP case and the policies π1 and π2 of the DEC-MMDP ∀s ∈ S is
given by:
µ1 (s) = π1 (0|s)π2 (1|s)
µ2 (s)
= π1 (1|s)π2 (0|s)
µ3 (s)
= π1 (1|s)π2 (1|s)
Since there is no central unit which controls the SUs activities and provides them
the optimal access policy, we can adopt the following optimization strategy: first
we assume that SU2 has a fixed stochastic policy, π2 (a2 |s) ∀a2 ∈ A, ∀s ∈ S,
which is known to SU1 , and we try to find SU1 ’s optimal stochastic policy, given
that the system state, s = (t, φ1 , φ2 ), is known to both the SUs. Then we invert
the perspective, we assume that SU1 has the fixed stochastic policy just found,
22
π1 (a1 |s) ∀a1 ∈ A, ∀s ∈ S, which is known to SU2 , and we try to find SU2 ’s
optimal stochastic policy, given that the system state, s = (t, φ1 , φ2 ), is known
to both the SUs, and so on until the policies of the two SUs converge to the
optimal ones.
Specifically:
• at optimization round k we fix SU2 ’s stochastic policy, π2,k−1 (a2 |s) (for
k = 0 it is randomly chosen), and find SU1 ’s optimal stochastic policy,
π1,k (a1 |s), by solving a LP problem;
• then, we invert the perspective: we fix SU1 ’s stochastic policy just found,
π1,k (a1 |s), and find SU2 ’s optimal stochastic policy, π2,k (a2 |s), by solving
another LP problem.
• At this point we can check the stopping conditions: π1,k = π1,k−1 , π2,k =
π2,k−1 ; if they are satisfied we have convergence and the SU stochastic
policies found are the optimal ones, i.e., π1∗ = π1,k and π2∗ = π2,k , otherwise
we have to start another optimization round, i.e., k = k + 1.
Once we have found the optimal stochastic policy for both the SUs to evaluate
the performace we have to calculate the average long term SUs sum throughput,
T̄s , and the average long term PU throughput, T̄p :
T̄s = Es,a [T̄s1,a,φ(s) + T̄s2,a,φ(s) ]
=
Xh X i
Rs1,a,φ1 (1 − ρs1,a,φ1 ) + Rs2,a,φ2 (1 − ρs2,a,φ2 ) π1 (a1 |s)π2 (a2 |s) P r(s)
s∈S a∈A×A
(4.1)
X
T̄p = Rp 1 −
Es [π1 (a|s)π2 (a|s)]ρp,a
a∈A×A
X X
= Rp 1 −
π1 (a1 |s)π2 (a2 |s)P r(s) ρp,a
(4.2)
a∈A×A s∈S
where P r(s) is the stationary probability of being in state s. We know that for
a regular Markov chain the stationary equations are valid: if we let Ps (j) be the
stationary probability of state j, we have
X
Ps (j) =
Ps (k)Pkj ∀j ∈ S
(4.3)
k∈S
X
Ps (k) = 1
(4.4)
k∈S
Ps (k) ≥ 0
k∈S
(4.5)
where Pkj is the transition probability from state k to state j. In matrix form
we have Ps =(I - P)−1 , where P is the transition probability matrix and I is the
identity matrix. The transition probability Pss0 ,a = P r(s|s0 , a) if accessibility
23
action a= {a1 , a2 } is chosen in state s can be computed as:
P r(s0 = (t − 1, U, U )|s = (t, U, U ), a = {i, j})
= ρpp,{i,j} ρps1,{i,j} ρps2,{i,j}
(4.6)
P r(s0 = (t − 1, U, U )|s = (t, U, K), a = {i, j})
= ρpp,{i,j} ρps1,{i,j} (1 − ρps2,{i,j} )
(4.7)
P r(s0 = (t − 1, U, U )|s = (t, K, U ), a = {i, j})
= ρpp,{i,j} (1 − ρps1,{i,j} )ρps2,{i,j}
(4.8)
P r(s0 = (t − 1, U, U )|s = (t, K, K), a = {i, j})
= ρpp,{i,j} (1 − ρps1,{i,j} )(1 − ρps2,{i,j} )
P r(s0 = (t − 1, U, K)|s = (t, U, U ), a = {i, j}) = 0
(4.9)
(4.10)
P r(s0 = (t − 1, U, K)|s = (t, U, K), a = {i, j})
= ρpp,{i,j} ρps1,{i,j}
(4.11)
P r(s0 = (t − 1, U, K)|s = (t, K, U ), a = {i, j}) = 0
(4.12)
P r(s0 = (t − 1, U, K)|s = (t, K, K), a = {i, j})
= ρpp,{i,j} (1 − ρps1,{i,j} )
(4.13)
P r(s0 = (t − 1, K, U )|s = (t, U, U ), a = {i, j}) = 0
(4.14)
P r(s0 = (t − 1, K, U )|s = (t, U, K), a = {i, j}) = 0
(4.15)
P r(s0 = (t − 1, K, U )|s = (t, K, U ), a = {i, j})
= ρpp,{i,j} ρps2,{i,j}
(4.16)
P r(s0 = (t − 1, K, U )|s = (t, K, K), a = {i, j})
= ρpp,{i,j} (1 − ρps2,{i,j} )
(4.17)
P r(s0 = (t − 1, K, K)|s = (t, U, U ), a = {i, j}) = 0
(4.18)
P r(s0 = (t − 1, K, K)|s = (t, U, K), a = {i, j}) = 0
(4.19)
P r(s0 = (t − 1, K, K)|s = (t, K, U ), a = {i, j}) = 0
(4.20)
P r(s0 = (t − 1, K, K)|s = (t, K, K), a = {i, j}) = ρpp,{i,j}
(4.21)
P r(s0 = (t − 1, φ1 , φ2 )|s = (1, U, U ), a = {i, j})
= 1 − ρpp,{i,j} ,
t − 1 ∈ {0, ..., T − 2}
24
(4.22)
P r(s0 = (T, φ1 , φ2 )|s = (1, U, U ), a = {i, j}) = 1
(4.23)
where ρpp,{i,j} are the PU outage probabilities of the channel P Utx → P Urx
given in (3.5) to (3.7), ρps1,{i,j} are the PU outage probabilities of the channel
P Utx → SUrx1 and ρps2,{i,j} are the PU outage probabilities of the channel
P Utx → SUrx2 ; the last two are given by:
γps1
(4.24)
ρps1,{i,j} = P r Rp > C
1 + i · γs1s1 + j · γs2s1
γps2
ρps2,{i,j} = P r Rp > C
(4.25)
1 + j · γs2s2 + i · γs1s2
The relationship between Pss0 and Pss0 ,a is given by:
X
Pss0 =
Pss0 ,a π1 (a1 |s)π2 (a2 |s)
a∈A×A
Fig. 4.1 shows the Markov chain behind our MMDP model.
Figure 4.1: Markov Chain behind the model
25
(4.26)
4.2.1
SU1 Optimal Access Policy
If we consider SU1 ’s perspective, under the assumptions that it knows the system state s and SU2 ’s policy, in each time slots it has two possible actions to
select, a1 ∈ A. Our aim is to maximize the average long term SUs sum throughput under the average long term PU throughput degradation constraint.
If accessibility action a1 is selected by SU1 , the expected SUs sum throughput in state s, denoted by R1 (s, a1 ), can be computed as:
X
R1 (s, a1 ) =
R1 (s, a1 , a2 )π2 (a2 |s)
(4.27)
a2 ∈A
where
R1 (s, a1 , a2 ) = T̄s1,{a1 ,a2 },φ(s) + T̄s2,{a1 ,a2 },φ(s)
(4.28)
Specifically, we have:
R1 (s = (t, U, U ), a1 = 0, a2 = 0) = 0
R1 (s = (t, U, U ), a1 = 0, a2 = 1) = Rs2,1,U (1 − ρs2,1,U )
R1 (s = (t, U, U ), a1 = 1, a2 = 0) = Rs1,2,U (1 − ρs1,2,U )
R1 (s = (t, U, U ), a1 = 1, a2 = 1) = Rs1,3,U (1 − ρs1,3,U ) + Rs2,3,U (1 − ρs2,3,U )
R1 (s = (t, U, K), a1 = 0, a2 = 0) = 0
R1 (s = (t, U, K), a1 = 0, a2 = 1) = Rs2,1,K (1 − ρs2,1,K )
R1 (s = (t, U, K), a1 = 1, a2 = 0) = Rs1,2,U (1 − ρs1,2,U )
R1 (s = (t, U, K), a1 = 1, a2 = 1) = Rs1,3,U (1 − ρs1,3,U ) + Rs2,3,K (1 − ρs2,3,K )
R1 (s = (t, K, U ), a1 = 0, a2 = 0) = 0
R1 (s = (t, K, U ), a1 = 0, a2 = 1) = Rs2,1,U (1 − ρs2,1,U )
R1 (s = (t, K, U ), a1 = 1, a2 = 0) = Rs1,2,K (1 − ρs1,2,K )
R1 (s = (t, K, U ), a1 = 1, a2 = 1) = Rs1,3,K (1 − ρs1,3,K ) + Rs2,3,U (1 − ρs2,3,U )
R1 (s = (t, K, K), a1 = 0, a2 = 0) = 0
R1 (s = (t, K, K), a1 = 0, a2 = 1) = Rs2,1,K (1 − ρs2,1,K )
R1 (s = (t, K, K), a1 = 1, a2 = 0) = Rs1,2,K (1 − ρs1,2,K )
R1 (s = (t, K, K), a1 = 1, a2 = 1) = Rs1,3,K (1 − ρs1,3,K ) + Rs2,3,K (1 − ρs2,3,K )
where Rs1,l,φ1 , Rs2,l,φ2 are the SUs transmission rates given in (3.1) to (3.4),
and ρs1,l,φ1 , ρs2,l,φ2 are the SUs outage probabilities given in (3.8) to (3.15).
26
Similarly, if accessibility action a1 is selected by SU1 , the expected PU
throughput degradation in state s, denoted by C1 (s, a1 ), can be computed as:
X
C1 (s, a1 ) =
C1 (s, a1 , a2 )π2 (a2 |s)
(4.29)
a2 ∈A
where
C1 (s, a1 , a2 ) = ρp,{a1 ,a2 } − ρp,0
(4.30)
and ρp,l are the PU outage probabilities given in (3.5) to (3.7).
The optimization problem we want to solve can be formalized as follows:
maximizeπ1 (a1 |s) Ea1 ,s [R1 (s, a1 )]
s.t. Ea1 ,s [C1 (s, a1 )] ≤ (1 − ρp,0 )P U = ω
(4.31)
where π1 (a1 |s) is the probability that accessibility action a1 is selected by SU1
in state s.
As in the centralized case, for high SU access rate regime, we can employ
the equivalent Linear Programming (LP) formulation corresponding to CMDP.
To provide the equivalent LP, we need the transition probability matrix of the
Markov process denoted by P , where Pss0 ,a1 = P r(s0 |s, a1 ) is the probability
of moving from state s to s0 if accessibility action a1 is chosen by SU1 . The
transition probability from state s = (t, φ1 , φ2 ) to s0 = (t + 1, φ01 , φ02 ) when SU1
selects action a1 is given by:
X
P r(s0 |s, a1 ) =
P r(s0 |s, a1 , a2 )π2 (a2 |s)
(4.32)
a2 ∈A
Since for every unichain Constrained Markov Decision Process there exists an
equivalent Linear Programming (LP) formulation, where a MDP is considered
unichain if it contains a single recurrent class plus a (perhaps empty) set of
transient states, the equivalent LP problem of problem (4.6) is the following:
X X
maxx
R1 (s, a1 )x(s, a1 )
(4.33)
s∈S a1 ∈A
s.t.
X X
C1 (s, a1 )x(s, a1 ) ≤ ω
(4.34)
s∈S a1 ∈A
X
x(s0 , a1 ) −
a1 ∈A
X X
X X
P (s0 |s, a1 )x(s, a1 ) = 0 ∀s0 ∈ S
(4.35)
s∈S a1 ∈A
x(s, a1 ) = 1
(4.36)
s∈S a1 ∈A
x(s, a1 ) ≥ 0
∀s ∈ S,
a1 ∈ A
(4.37)
(4.38)
27
The relationship between the optimal solution of LP problem (4.13) and the
optimal solution to our problem is obtained as follows:
P
0
P x(s,a1 ) 0 ,
if
a01 ∈A x(s, a1 ) > 0
x(s,a
)
0 ∈A
1
a
(4.39)
π1 (a1 |s) =
1
arbitrary,
otherwise
The symmetric case, i.e., SU2 ’s optimal access policy search, can be dealt with
in the same way simply by inverting the role of the SUs and exchanging the
indexes.
4.2.2
Numerical results
Now we present some numerical results. The channels considered are Rayleigh
fading, like in the centralized case, thus, the SNRs γx , x ∈ {pp, ps1, ps2, s1s1, s1p,
s1s2, s2s2, s2p, s2s1}, are exponentially distributed random variables with mean
γ̄x . For mathematical convenience the links from SUtx1 to SUrx2 and viceversa
are assumed interference free. We consider the same parameters as in the centralized case, i.e., the average SNRs are γ̄pp = 10, γ̄si = 5, γ̄psi = 5, γ̄sip = 2,
i ∈ {1, 2}, and the ARQ deadline is T = 5. The PU rate Rp is selected such that
PU throughput is maximized when both SUs are idle, i.e. Rp = argmaxR TpI (R).
The SUi rate Rsi,l,U under PU message unknown for SUrxi is computed as
Rsi,l,U = argmaxRsi Tsi,l,U (Rsi , Rp ), where i = 1, l ∈ {2, 3} or i = 2, l ∈ {1, 3}
so as to maximize the SUs sum throughput. The SUi rate Rsi,l,K under PU message known for the SUrxi is computed as Rsi,l,K = argmaxR Tsi,l,K (R), where
i = 1, l ∈ {2, 3} or i = 2, l ∈ {1, 3}. The PU throughput constraint is set to
(1−P U )TpI , where P U ∈ {0.01, 0.05, 0.08, 0.1, 0.13, 0.15, 0.18, 0.2, 0.25, 0.3, 0.4,
0.4861, 0.6, 0.7, 0.8, 1}. FIC is exploited at the SUrxi , i ∈ {1, 2}.
Figure 4.2: DEC-MMDP: Average SUs sum throughput with respect to PU throughput constraint
28
The SUs sum throughput with respect to the PU throughput constraint
by varying the value of P U is depicted in Fig. 4.2 in contrast with the SUs
sum throughput in the centralized case (MMDP) that can be considered as an
upper bound for the network performance in the decentralized case. Obviously,
as the PU throughput increases, the average SUs sum throughput decreases.
Furthermore, we can note that the performance in the decentralized case (DECMMDP) is almost the same as in the centralized ones.
Figure 4.3: DEC-MMDP: Average PU throughput with respect to PU throughput
constraint
Fig. 4.3 depicts the PU throughput with respect to the PU throughput constraint by varying the value of P U . Obviously, as P U decreases the constraint
increases and the PU throughput degradation decreases.
29
30
Figure 4.4: DEC-MMDP: Average SUs sum Throughput vs γsip
So far we have considered only one specific SNR value, γsip = 2; it is interesting to analyze the perfomance of the system for varying SNRs, γsip . Fig. 4.4
depicts the average SUs sum throughput with respect to γsip for varying P U ;
it shows different evolutions based on the PU constraint: for P U ≤ 0.08 the
average SUs sum throughput decreases as γsip increases, this is because as the
SNR grows the interference power of the SUs affects more significantly the PU
performace, so they have to limit their channel accesses in order to respect the
PU constraint with a consequent reduction of the maximum achievable throughput. For 0.1 < P U < 0.8 the average SUs sum throughput increases for γsip
below a certain value and decreases for γsip over it; this is due to the fact that
31
the PU degradation constraint is not active for small values of the SNR, i.e.,
for a low interference level at P Utx , thus, the SUs can exploit the transmitting
chances much more and utilize their transmitting power to gain a higher reward.
On the other hand, when the interference power of the SUs become too high
they affect significantly the PU performace, so they have to limit their channel
accesses in order to respect the PU constraint with a consequent reduction of
the maximum achievable throughput. For P U = 1 instead the average SUs
sum throughput tends to grow as the SNR increases, i.e., the PU degradation
constraint is not active for the considered SNRs and the SUs can exploit the
transmitting chances as much as possible.
Figure 4.5: DEC-MMDP: Average SUs sum throughput with respect to PU throughput constraint
Fig. 4.5 depicts the average SUs sum throughput with respect to the PU
throughput constraint for varying P U for different values of γsip ; we can note
that for small values of P U as γsip increases there is a performance degradation since the SUs create a higher level of interference at P Urx (see Fig. 4.6),
thus, since the PU constraint is active, they have to limit their own channel
accesses and consequently the chances to increase their average reward. On the
other hand, as γsip decreases there is an improvement since the SUs’ interference
power affects less the PU performance, thus, they can exploit more transmitting
chances. Considerations developed so far do not hold for high values of P U :
performance results degrade as the SNR decreases and improve as the SNR increases; this is reasonable since for small values of SUs’ SNR the interference
level is very low, so the PU retransmits rarely and the SUs have less chances to
transmit successfully and increase their own reward, whereas for high values of
SUs’ SNR the situation is inverted, i.e., the PU retransmits more frequently and
so the SUs have more chances to exploit FIC and improve their own throughput. In effect, the curves corresponding to γsip < 5 have an initial flat part that
corresponds to P U values for which the PU constraint is not active, i.e., the
32
SUs behavior does not degrade further the PU performance and so the SUs can
exploit the transmitting chances as much as possible with a consequent maximization of their own throughput. Then, in correspondence to the activation
of the PU constraint the performace begins to degrade in order to satisfy the
transmission limitations. For γsip ≥ 5 the curves are decreasing, but P U = 1
is probably a borderline case in which the PU constraint has an unperceivable
effect on SUs’ reward that consequently is very close to the maximum they can
achieve.
Figure 4.6: DEC-MMDP: Average PU throughput with respect to PU throughput
constraint
33
34
Figure 4.7: DEC-MMDPASY : Average SUs sum Throughput vs γs2p
Another interesting aspect to analyze is the influence of the asymmetries in
the SNR on the system performance, i.e., we suppose that γs1p = 2 whereas
γs2p 6= γs1p and varying. Fig. 4.7 depicts the average SUs sum throughput with
respect to γs2p for varying P U ; as in the symmetric case we can note three
different evolutions, obviously the P U bounds change, in particular the region
in which the average SUs sum throughput decreases as γs2p increases is larger
(until P U = 0.4861); this is reasonable since SU1 creates a constant interference
level at P Urx with a significant effect on the PU performance especially when the
constraint is very tight, i.e., for small P U , whereas in the absence of asymmetries
γs1p grows in parallel to γs2p and the effect of SUs interference power is weaker at
35
P Urx . Consequently the region in which the SUs first exploit as much as possible
the opportunities to increase their reward and then limit their accesses to the
channel in order to satisfy the constraint is narrower, but for P U = 1 the PU
degradation constraint is not active, thus the SUs can exploit the transmitting
chances as much as possible and the average SUs sum throughput tends to grow
as the SNR increases.
Figure 4.8: DEC-MMDPASY : Average SUs sum throughput with respect to PU
throughput constraint
Fig. 4.8 depicts the average SUs sum throughput with respect to the PU
throughput constraint for varying P U in different asymmetric situations, i.e., for
different values of γs2p . We can note that for γs2p > γs1p there is a performance
degradation since SU2 creates a higher level of interference at P Urx than in the
symmetric case (see Fig. 4.9), thus it has to limit its own channel accesses and
consequently the chances to increase the average reward. On the other hand,
for γs2p < γs1p there is an improvement since SU2 interference power affects
less the PU performance than in the symmetric case, so can mainly exploit the
transmitting chances. Finally, it is important to note that the considerations
done so far do not hold for P U = 1: performance results are worse than the
symmetric case for γs2p < γs1p , whereas they are better than the symmetric
case for γs2p > γs1p ; this is probably due to the fact that for small values of
SU2 ’s SNR the interference level is very low, so the PU retransmits rarely and
the SU2 has less chances to transmit successfully and increase its own reward,
whereas for high values of SU2 ’s SNR the situation is inverted, i.e., the PU
retransmits more frequently and so SU2 has more chances to exploit FIC and
improve its own throughput.
36
Figure 4.9: DEC-MMDPASY : Average PU throughput with respect to PU throughput constraint
Fig. 4.9 depicts the average PU throughput with respect to the PU throughput constraint by varying the value of P U in the same asymmetric situations
illustrated in Fig 4.5. We can note that for P U ≥ 0.4861 the asymmetries in the
SNR influence the PU throughput degradation, in particular, for γs2p > γs1p
SU2 ’s interference power at P Urx causes a degradation of PU performance with
respect to the symmetric case, whereas for γs2p < γs1p there is an improvement
in PU performance due to the lower interference level SU2 creates at P Urx .
Fig. 4.10 and 4.11 depict SU2 ’s optimal transmission probabilities, denoted
by π2∗ , for P U = 0.2 and P U = 1, respectively, in different system states.
37
Figure 4.10: DEC-MMDPASY : Tx probabilities vs γs2p for P U = 0.2
In Fig. 4.10 we can note that SU2 tends to transmit more than in the
symmetric case for γs2p = 0.25, i.e., when its interference power is almost unperceivable at P Urx , instead for γs2p > 2 it mantains the same transmitting
behavior as in the symmetric case because, even if the interference it creates at
P Urx is higher, the PU performance is not affected significantly, as shown in
Fig. 4.9.
Figure 4.11: DEC-MMDPASY : Tx probabilities vs γs2p for P U = 1
38
In Fig. 4.11 it is evident that SU2 mantains the same transmitting behavior
as in the symmetric case irrespective of its SNR changes; we suppose this is
related to the relaxation of the PU constraint which induces SU2 to access the
channel as much as possible.
39
Chapter 5
Heuristic Decentralized
Access Policies in a
Cognitive Radio Network
with Two Independent SUs
In the previous chapter we found the optimal access policies for the decentralized
case of two independent SUs in a CR network and we obtained that their performance was almost the same as that of the optimal centralized policy described
in Chapter 3. Now we are interested in developing some heuristic policies for
the same decentralized scenario and in analyzing their performance. In order
to achieve our purpose, we developed a network simulator which reproduces
the behavior of a decentralized cognitive radio scenario with one PU and two
independent SUs.
In simulation our work consists in monitoring the evolution of the system
we model in order to evaluate the average long term SUs sum throughput and
the corresponding average long term PU throughput degradation due to the
SUs interference. In the system we consider, there exist one primary and two
secondary transmitters denoted by P Utx , SUtx1 and SUtx2 , respectively. These
transmitters transmit their messages with constant power over block fading
channels and, in each time slot, the channels are considered to be constant.
The signal to noise ratios (SNR) of the channels P Utx → P Urx , P Utx → SUrx1 ,
P Utx → SUrx2 , SUtx1 → P Urx , SUtx1 → SUrx1 , SUtx1 → SUrx2 , SUtx2 →
P Urx , SUtx2 → SUrx1 , SUtx2 → SUrx2 are denoted by γpp , γps1 , γps2 , γs1p ,
γs1s1 , γs1s2 , γs2p , γs2s1 and γs2s2 , respectively. We assume that no channel
State Information (CSI) is available at the transmitters. Thus, transmissions
are under outage, when the selected rates are greater than the current channel
capacity. The system model, trasmission rates and outage probabilities are described in Section 4.1.
The harder step in building a CR network simulator consists in evaluating the decoding event at P Urx , SUrx1 and SUrx2 . Thus, it is necessary to
41
identify some outage conditions which help to characterize whether or not decoding is successful We denote by αU iU j the fading coefficient on the channel
U itx → U jrx .
We first consider the PU message correct decoding at the P Urx . The conditions to have a successful transmission of the PU message in SU accessibility
actions 0, 1, 2 and 3 are respectively:
Rp ≤ C(γpp αpp ) = log2 (1 + γpp αpp ) if l = 0
Rp ≤ C
γpp αpp γpp αpp = log2 1 +
1 + γsip αsip
1 + γsip αsip
(5.1)
if l = 1, 2,
i ∈ {1, 2}
(5.2)
Rp ≤ C
γpp αpp
γpp αpp
= log2 1 +
if l = 3
1 + γs1p αs1p + γs2p αs2p
1 + γs1p αs1p + γs2p αs2p
(5.3)
From the simulation point of view it is interesting to consider the SNRs which
influence the evolution of the system in each time-slot and the messages of interest for the various receivers. P Urx is interested in decoding only its own
message and it is oblivious of the presence of the SUs in the network; thus, their
messages are considered as background noise when they transmit. The conditions to check to establish the possible success of the PU message transmission
are given in (4.52) to (4.54).
SUrxi is interested in decoding its own message, but also the PU message,
if it is unknown, in order to perform FIC. At PU knowledge state {K, K} or
{K, U } ({U, K}), the PU message is known at SUrx1 (SUrx2 ) and therefore the
PU message may be canceled at this receiver. The condition to have a successful
transmission of the SUi message, i ∈ {1, 2}, in SU accessibility action l ∈ {1, 2},
l 6= i, i.e. when the other SU is idle, is:
Rsi,l,K ≤ C(γsisi αsisi ) = log2 (1 + γsisi αsisi ),
i ∈ {1, 2}
(5.4)
In contrast, at PU knowledge state {U, U } or {U, K} ({K, U }), where the PU
message is not decoded at SUrx1 (SUrx2 ), the outage probability of the channel
from SUtx1 (SUtx2 ) to SUrx1 (SUrx2 ) is under the influence of the received PU
message. Thus, we have two different SNRs to consider corresponding to the
two messages of interest:
SN RP =
γpsi αpsi
1 + γsisi αsisi
SN RSi =
γsisi αsisi
1 + γpsi αpsi
Hence, the conditions to have a successful decoding of the SUi message when
42
the other SU is idle are:
γ α
sisi sisi
Rsi,l,U ≤C
1 + γpsi αpsi
γsisi αsisi = log2 1 +
1 + γpsi αpsi
if Rp > C γpsi αpsi
(5.5)
Rsi,l,U ≤C(γsisi αsisi )
= log2 (1 + γsisi αsisi ) if Rp ≤ C γpsi αpsi
(5.6)
Similarly, the conditions to have a successful decoding of the PU message at the
SUrxi when the other SU is idle are:
γ α
psi psi
Rp ≤C
1 + γsisi αsisi
γpsi αpsi if Rsi,l,U > C γsisi αsisi
(5.7)
= log2 1 +
1 + γsisi αsisi
Rp ≤C(γpsi αpsi )
= log2 (1 + γpsi αpsi ) if Rsi,l,U ≤ C γsisi αsisi
(5.8)
For accessibility action 3 and PU message known by SUi , SUrxi is interested in
decoding its own message like for l = 1, 2, the only difference is the presence of
the other SU message. In this case, we have only one SNR corresponding to the
message of interest:
γsisi αsisi
SN RSi =
1 + γslsi αslsi
Hence, the condition to have a successfully decoding of the SUi message is:
γ α
γsisi αsisi sisi sisi
= log2 1 +
(5.9)
Rsi,l,K ≤ C
1 + γslsi αslsi
1 + γslsi αslsi
For accessibility action 3 and PU message unknown by SUi , SUrxi is interested
in decoding its own message, but also the PU message in order to perform FIC,
like for l = 1, 2, the only difference is the presence of the other SU message.
Again we have two different SNRs corresponding to the two messages of interest:
SN RP =
γpsi αpsi
1 + γsisi αsisi + γslsi αslsi
SN RSi =
γsisi αsisi
1 + γpsi αpsi + γslsi αslsi
Hence, the conditions to have a successful decoding of the SUi message when
43
the other SU is not idle are:
γsisi αsisi
Rsi,l,U ≤C
1 + γpsi αpsi + γslsi αslsi
γ α
γsisi αsisi
psi psi
= log2 1 +
if Rp > C
1 + γpsi αpsi + γslsi αslsi
1 + γslsi αslsi
(5.10)
γ α
sisi sisi
1 + γslsi αslsi
γsisi αsisi = log2 1 +
1 + γslsi αslsi
Rsi,l,U ≤C
if Rp ≤ C
γpsi αpsi 1 + γslsi αslsi
(5.11)
Similarly, the conditions to have a successful decoding of the PU message at the
SUrxi when the other SU is not idle are:
γpsi αpsi
Rp ≤C
1 + γsisi αsisi + γslsi αslsi
γ α
γpsi αpsi
sisi sisi
= log2 1 +
if Rsi,l,U > C
1 + γsisi αsisi + γslsi αslsi
1 + γslsi αslsi
(5.12)
γpsi αpsi 1 + γslsi αslsi
γ α
γpsi αpsi sisi sisi
if Rsi,l,U ≤ C
= log2 1 +
1 + γslsi αslsi
1 + γslsi αslsi
Rp ≤C
5.1
(5.13)
Heuristic Access Policy H1
The evolution of a CR network is deeply influenced by the state of the system. In the model we develop, the state of the system is represented by the
PU message knowledge state, φ, and the primary ARQ state, t; since the PU
message knowledge state of the single SU can assume only two possible values,
φi ∈ {U, K}, with i ∈ {1, 2}, the first idea we have in designing a heuristic
access policy for the two SUs consists in identifying two different transmission
probabilities for the two possible PU message knowledge state, denoted by α
and β, respectively. We mean that α is the probability that the SU accesses the
channel and transmits its own message when it knows the PU message in the
current time-slot, whereas β is the probability that the SU accesses the channel
and transmits its own message when it does not know the PU message in the
current time-slot. Obviously, α ≥ β, since the probability of success for the SU
is higher when it knows the PU message and so can perform FIC. We suppose
that the two SUs are symmetric, i.e., they have the same transmission parameters, thus, it is reasonable to suppose they adopt the same access policy.
The aim of our policy design is to develop a heuristic access policy which has to
maximize the SUs sum throughput as much as possible under the PU throughput degradation constraint. Thus, it is useful to identify which values of α and
44
β guarantee the higher possible SUs sum throughput. Considering the same average SNRs, transmission rates and PU throughput constraints used to obtain
numerical results in the optimal centralized and decentralized case, we examine
the performance of the CR network for different values of the probability couple
(α, β).
45
Figure 5.1: H1 : Average SUs sum Throughput vs (α, β)
Figure 5.2: H1 : (α∗ , β ∗ ) vs P U
46
Fig. 5.1 illustrates the α − β search for various values of P U ; it shows that
the higher P U , the bigger the tendency to grow of the α and β which grant the
maximum SUs sum throughput, denoted by α∗ and β ∗ , i.e., as P U increases and
the PU throughput constraint is relaxed, the two SUs adopt a more aggressive
policy to exploit the transmitting chances as much as possible (see Fig. 5.2).
Figure 5.3: H1 : Average SUs sum throughput with respect to PU throughput constraint
The SUs sum throughput with respect to the PU throughput constraint for
various values of P U is depicted in Fig. 5.3 and compared with the upper bound
represented by the centralized case (MMDP). Obviously, as the PU throughput
increases, the average SUs sum throughput decreases. The numerical results are
obtained by selecting for each value of P U considered the couple (α, β) which
guarantees the maximum SUs sum throughput achievable under this heuristic.
Fig. 5.4 depicts the average PU throughput with respect to the PU throughput constraint for various values of P U . Obviously, as P U decreases the constraint, (1 − P U )T̄pI , increases and the PU throughput degradation decreases.
In other words, as P U increases and the PU thoughput constraint is relaxed,
the SUs can mainly exploit the transmitting chances and gain a higher throughput; although, in doing so they create more interference at the P Urx and conseguently cause a degradation of the PU throughput which is more significant
as the transmitting chances grow, i.e., as P U increases. Furthermore, since the
MMDP case represents the upper bound for the maximum achievable SUs sum
throughput, on the other hand it represents the lower bound for the maximum
allowable PU throughput degradation.
47
Figure 5.4: H1 : Average PU throughput with respect to PU throughput constraint
48
49
Figure 5.5: H1 : Average SUs sum Throughput vs γsip
The numerical results obtained so far consider only one specific value of SNR
for the SUs, γsip = 2, i ∈ {1, 2}. Fig. 5.5 depicts the average SUs sum throughput with respect to γsip for varying P U ; it shows different evolutions based on
the PU constraint: for P U < 0.2 the bigger the SNR, the lower the maximum
SUs sum throughput; this is reasonable, since small values of P U correspond
to a tight constraint, i.e., harder to satisfy, which implies a decrease in the SUs
sum throughput as they create a higher interference level at P Urx since they
have to limit their transmissions. For 0.2 ≤ P U ≤ 0.8 the average SUs sum
throughput increases for γsip below a certain value and decreases for γsip over
it; this is due to the fact that the PU degradation constraint is not active for
small values of the SNR, i.e., for a low interference level at P Utx , thus, the SUs
can exploit the transmitting chances much more and utilize their transmitting
power to gain a higher reward. On the other hand, when the interference powers
of the SUs become too high they affect significantly the PU performace, so they
have to limit their channel accesses in order to respect the PU constraint with
a consequent reduction of the maximum achievable throughput. For P U = 1
instead the average SUs sum throughput tends to grow as the SNR increases,
i.e., the PU degradation constraint is not active for the considered SNRs and
the SUs can exploit the transmitting chances as much as possible.
Fig. 5.6 depicts the α and β which grant the maximum SUs sum throughput with respect to γsip for varying P U ; as we already observed, for γsip = 2
50
they tend to increase as P U increases since the SUs exploit the transmitting
chances as much as possible. If we consider their evolution with respect to the
SNR they tend to decrease as γsip grows and this is reasonable since the level
of interference they cause at P Urx increases, so they have to limit their access
to the channel to satisfy the PU constraint; only for P U = 1 do the SUs always
transmit since the PU constraint is not active for the considered SNRs.
51
52
Figure 5.6: H1 : (α∗ , β ∗ ) vs γsip
In order to analyze the degradation of H1 performance, we test it by using
the α∗ and β ∗ obtained for γs1p = γs2p = 2 and imposing γs1p = γs2p 6= 2.
Figure 5.7: H1 Robustness: Average SUs sum throughput vs γsip
Fig. 5.7 depicts the degradation of the average SUs sum throughput with
respect to the PU throughput constraint for various value of γsip ; it shows the
gap between performance obtained for γsip = 2 and performance obtained for
different SNRs values grows as the SNR decreases. This is reasonable since
for γsip < 2 the SUs create a lower interference level at P Urx , thus the PU
retransmits less and they have less chances to exploit FIC and increase their
reward, whereas for γsip > 2 the transmitting behavior adopted, which is the
optimal one for γsip = 2, is too aggressive and we are not able to find a result
because the SUs do not succeed in satisfying the PU throughput constraint
except for P U = 1 for which the constraint is not active (see Fig. 5.8) and for
P U = 0.01 case in which probably the transmission probabilities adopted are
53
very low or the transmitting chances very sporadic. We can note that for high
SNRs, i.e., for γsip > 2, and P U = 1 there is a performance improvement; we
suppose it is due to the fact that the higher SUs interference power at P Urx
makes PU to retransmit frequently so gives them more chances to increase their
throughput. This consideration clearly appear in Fig. 5.8.
Figure 5.8: H1 Robustness: Average PU throughput vs γsip
Figure 5.9: H1 Robustness: Average SUs sum throughput-Average PU throughput
tradeoff
Fig. 5.9 is very useful to better understand the tradeoff that affects the
heuristic performance: it depicts the average long term SUs sum throughput
with respect to the average PU throughput in the same situations examined in
54
Fig 5.7. We can note that as the SNR of the interference channel, γsip , decreases
the performance generally improves and the ranking reverses with respect to the
results shown in Fig. 5.7. It is evident that greater SNR values allow to reach
a higher average SUs sum throughput but at price of a more significant degradation of PU performance, i.e., the tradeoff between what we can gain and the
cost we have to pay clearly appears, whereas in Fig. 5.7 the PU advantage due
to the interference level reduction as γsip decreases is not evident because it
underlines only the effect of using a transmitting behavior too conservative in
respect to the real interference level at P Urx .
In the performance analysis it is interesting to consider also the asymmetric
case, i.e., to suppose that the SUs have different transmission probabilities,
denoted by αi and βi , which represents SUi transmission probability when the
PU message is known and unknown, respectively, i ∈ {1, 2}. As in the symmetric
case, we explore the space of all possible transmission probabilities and find the
α1 , β1 , α2 and β2 which maximize the SUs sum throughput.
Figure 5.10: H1 : (α1∗ , β1∗ , α2∗ , β2∗ ) vs P U
Fig. 5.10 illustrates the best transmissions probabilities for varying P U
compared with the optimal ones in the symmetric case; it is interesting to note
that in almost all the considered cases the best choice is represented by the symmetric behavior. This is reasonable since the SUs have the same transmission
parameters, i.e., the same transmission rates and the same average SNRs on the
accessible channels, so they can exchange their role and access the channel with
the same frequency in order to maximize their reward.
Fig. 5.11 instead illustrates the best transmission probabilities for varying
P U , again compared with the optimal ones in the symmetric case, but imposing
that α1 6= α2 and β1 6= β2 , i.e., considering the first suboptimal case; we can
note that for intermediate values of P U the optimal transmission probabilities
in the asymmetric case are distributed around the symmetric ones, in particular
one is bigger and the other is smaller than α∗ or β ∗ , i.e., there is a sort of balance between the SU activities. This is reasonable since the SUs have the same
55
Figure 5.11: H1,subOpt : (α1∗ , β1∗ , α2∗ , β2∗ ) vs P U
transmission paremeters, thus, if one of them is a little more aggressive the other
has to be a little more ’idle’ in order to gain a reward close to the symmetric one
which represent the best case. For low and high values of P U , instead, one of
the SU reaches the symmetric transmission probability while the other adopts
the closest one to respect the asymmetric conditions; this is reasonable since for
extreme values of P U the SUs in order to gain the maximum reward have to
transmit as much as possible (for high P U , i.e., when the PU constraint is more
relaxed) or the least as possible (for low P U , i.e., when the PU constraint is
very tight), thus they try to exploit or not the transmitting chances in the best
way they are allowed to. Finally, in both figures we can note that the higher
P U , the bigger the tendency to grow of the αi and βi which maximize the SUs
sum throughput, denoted by αi∗ and βi∗ , i.e., as usual as P U increases the two
SUs adopt a more aggressive policy to exploit the transmitting chances as much
as possible, i ∈ {1, 2}.
The SUs sum throughput with respect to the PU throughput constraint for
varying P U is depicted in Fig. 5.12 and compared with the upper bound represented by the centralized case (MMDP). As in the symmetric case, as the
PU throughput increases, the average SUs sum throughput decreases and the
numerical results are obtained by selecting for each value of P U considered the
values of αi and βi which maximize the SUs sum throughput under this heuristic, i ∈ {1, 2}. In Fig. 5.12 is also depicts the symmetric case; we can note that
the performace in the first suboptimal case almost coincides with the symmetric
one for low and intermidiate P U , whereas are below the latter for high P U , this
is because as we have just evidence examining the transmission probabilities,
the SUs manifest a tendency to balance their actions in order to reach the best
reward for intermediate value of P U ; when the PU constraint is relaxed instead
the best performance can be reached only with a symmetric approach, i.e., to
completely exploit the transmitting chances, the SUs have to transmit with the
same probability, and thus there exist an unavoidable degradation due to the
asymmetry imposed to the transmission probabilities.
56
Figure 5.12: H1,subOpt : Average SUs sum throughput with respect to PU throughput
constraint
A final aspect interesting to analyze is the effect of the asymmetries in the
SNR on the heuristic performance; Fig. 5.13 depicts the average SUs sum
throughput with respect to the PU throughput constraint for three specific
values of γs2p and compares them with the symmetric case.
Figure 5.13: H1,ASY : Average SUs sum throughput with respect to PU throughput
constraint
In all the examined scenarios there is SU1 which creates a constant average
interference level at P Urx and affects the PU performance more or less in depen-
57
dence of the tightness of the constraint, then there is SU2 whose SNR changes:
for γs2p = 0.25 SU2 creates an interference level almost unperceivable at P Urx ,
thus it can exploit its transmitting chances more than in the symmetric case, i.e.,
for γs2p = 2, and increase the average SUs reward. Obviously, once it reaches
its maximum transmission capability there is a saturation of the SUs throughput which correspond to the maximum degradation of PU throughput (see Fig.
5.14). On the other hand, for γs2p > 2 the total interference caused by the SUs
at P Urx is higher than the symmetric case with a consequent degradation of
the performace, in effect as γs2p increases SU2 has to limit its accesses to the
channel in order to respect the PU constraint. As we already unerlined in the
DEC-MMDP analysis, the considerations done so far do not hold for P U = 1:
performance results are worse than the symmetric case for γs2p = 0.25, whereas
they are better than the symmetric case for γs2p > 2; this is probably due to the
fact that for small values of SU2 ’s SNR the interference level is very low, so the
PU retransmits rarely and the SU2 has less chances to transmit successfully and
increase its own reward, whereas for high values of SU2 ’s SNR the situation is
inverted, i.e., the PU retransmits more frequently and so SU2 has more chances
to exploit FIC and improve its own throughput.
Figure 5.14: H1,ASY : Average PU throughput with respect to PU throughput constraint
Fig. 5.14 depicts the average PU throughput with respect to the PU throughput constraint by varying the value of P U in the same asymmetric situations
illustrated in Fig 5.13. We can note that for P U ≥ 0.6 the asymmetries in
the SNR influence the PU throughput degradation, in particular, for γs2p > 2
SU2 ’s interference power at P Urx causes a degradation of PU performance with
respect to the symmetric case, i.e., for γs2p = 2, whereas for γs2p = 0.25 there
is an improvement in PU performance due to the lower interference level SU2
creates at P Urx .
58
Figure 5.15: H1,ASY : Average SUs sum throughput-Average PU throughput tradeoff
Fig. 5.15 is very useful to better understand the tradeoff that affects the
heuristic performance: it depicts the average long term SUs sum throughput
with respect to the average PU throughput in the same asymmetric situations
examined in Fig. 5.13. We can note that as the SNR of the interference channel,
γs2p , decreases the performance generally improves; furthermore, it is evident
that greater SNR values allow to reach a higher average SUs sum throughput but
at price of a more significant degradation of PU performance, i.e., the tradeoff
between what we can gain and the cost we have to pay clearly appears, whereas
in Fig. 5.13 the PU advantage due to the interference level reduction as γs2p
decreases is not so evident.
Figure 5.16: H1,ASY : α1∗ and β1∗ vs γs2p
Fig. 5.16 and 5.17 depict SU1 ’s and SU2 ’s optimal transmission probabilities,
respectively, for the same values of γs2p we just considered compared with the
optimal ones in the symmetric case. We can note that for γs2p = 0.25 SU2
59
Figure 5.17: H1,ASY : α2∗ and β2∗ vs γs2p
adopts a more aggressive behavior than in the symmetric case as we supposed
before, in particular for small values of P U and if it does not know the PU
message, whereas SU1 limits its accesses to the channel; on the other hand, as
γs2p increases the SUs exchange their roles, i.e., SU2 tends to limit its accesses
to the channel as SU1 exploits as much as possible the transmitting chances.
In other word, there is a sort of balance of the SUs behavior: according to its
SNR SU2 adopts a certain trasmitting behavior and SU1 , whose SNR does not
change, tries to balance the effects of SU2 ’s interference at P Urx .
5.2
Heuristic Access Policy H2
A simpler heuristic policy we analyze, denoted by H2 , does not consider the
influence of the PU message knowledge in the activities of the SUs. We simply
suppose that there is only one transmission probability, η, irrespective of the
PU knowledge state of the system. As in heuristic H1 , we need to identify which
value of η guarantees the highest possible SUs sum throughput. Considering the
same average SNRs, transmission rates and PU throughput constraints used to
obtain numerical results in the optimal centralized and decentralized case, we
examine the performance of the CR network for different values of η.
60
61
Figure 5.18: H2 : Average SUs sum Throughput vs η
Fig. 5.18 illustrates the η search for varying P U ; it shows again that the
higher P U , the bigger the tendency to grow of the η which grants the maximum
SUs sum throughput, denoted by η ∗ , i.e., as P U increases the two SUs adopt a
more aggressive policy to exploit the transmitting chances as much as possible
(see Fig. 5.19). Note that when η = 0 or the SUs sum throughput is zero means
that it is impossible to respect the PU throughput constraint.
62
Figure 5.19: H2 : η ∗ vs P U
The SUs sum throughput with respect to the PU throughput constraint
for varying P U is depicted in Fig. 5.20 and compared with the upper bound
represented by the centralized case (MMDP). Obviously, as the PU throughput
increases, the average SUs sum throughput decreases. The numerical results
are obtained by selecting for each value of P U considered the value of η which
maximizes the SUs sum throughput under this heuristic.
Figure 5.20: H2 : Average SUs sum throughput with respect to PU throughput constraint
63
As for H1 , in Fig. 5.21 we consider the average SUs sum throughput with
respect to γsip for varying P U ; again, it shows different evolutions based on the
PU constraint: for P U < 0.2 the bigger the SNR, the lower the maximum SUs
sum throughput, for 0.2 ≤ P U ≤ 0.8 the SUs reward increases for γsip below a
certain value and decreases for γsip over it, and for P U = 1 the average SUs sum
throughput tends to grow as the SNR increases. As in H1 case, we can explain
the different behaviors repeating the same considerations on the effect of the
SUs interference power on the PU performance and the consequent limitation
of the SUs accessibility actions to satisfy the PU degradation constraint.
64
65
Figure 5.21: H2 : Average SUs sum Throughput vs γsip
Fig. 5.22 depicts the η which grants the maximum SUs sum throughput with
respect to γsip for varying P U ; as we already observed, for γsip = 2 it tends
to increase as P U increases since the SUs exploit the transmitting chances as
much as possible. As α and β in H1 , if we consider its evolution with respect to
the SNR it tends to decrease as γsip grows because the interference SUs cause
at P Urx increases, so they have to limit their access to the channel to satisfy
the PU constraint; only for P U = 1 do the SUs always transmit since the PU
constraint is not active for the considered SNRs.
66
67
Figure 5.22: H2 : η ∗ vs γsip
As in H1 case, in order to analyze the degradation of H2 performance, we test
it by using the η ∗ obtained for γs1p = γs2p = 2 and imposing γs1p = γs2p 6= 2.
Fig. 5.23 depicts the degradation of the average SUs sum throughput with
respect to the PU throughput constraint for various value of γsip ; while in H1
case the gap between curves corresponding to different γsip is almost costant
as the PU constraint changes, in this case there is no gap for small values of
P U , while as the PU constraint is relaxed the gap grows. This is because for
small value of P U the PU constraint is more difficul to satisfy, thus SUs limit
their transmission and even if the η we adopt is lower than the best one for
γsip < 2, the SUs performance degradation is unperceivable due to the scarsity
of transmitting chances created by the PU that retransmits rarely. On the other
hand, as P U increases SUs tend to mainly exploit the transmitting chances and,
since they adopt a transmission probability that is ideal for a higher SNR, they
do not exploit as much as possible their trasmitting power with a consequent
degradation of the maximum achievable reward which becomes more evident for
high P U values, i.e., when they waste much more transmitting opportunities.
68
Figure 5.23: H2 Robustness: Average SUs sum throughput vs γsip
Fig. 5.24 is very useful to better understand the tradeoff that affects the
heuristic performance: it depicts the average long term SUs sum throughput
with respect to the average PU throughput in the same situations examined
in Fig 5.23. As in H1 case, we can note that as the SNR of the interference
channel, γsip , decreases the performance generally improves and the ranking
reverses with respect to the results shown in Fig. 5.23.
Figure 5.24: H2 Robustness: Average SUs sum throughput-Average PU throughput
tradeoff
It is evident that greater SNR values allow to reach a higher average SUs
sum throughput but at price of a more significant degradation of PU perfor69
mance, i.e., the tradeoff between what we can gain and the cost we have to pay
clearly appears, whereas in Fig. 5.23 the PU advantage due to the interference
level reduction as γsip decreases is not evident because it underlines only the
effect of using a transmitting behavior too conservative in respect to the real
interference level at P Urx .
As for H1 , it is interesting to analyze also the asymmetric case, i.e., to suppose that the SUs have two different transmission probabilities, denoted by η1
and η2 , respectively, irrespective of their own PU message knowledge. Fig. 5.25
illustrates the η1 − η2 search for varying P U ; while in the symmetric case for
each considered value of P U there exists only one η which grants the maximum
SUs sum throughput, in the asymmetric case there exist two couples of transmission probabilities, (η10 , η20 ) and (η100 , η200 ), which maximize the SUs reward;
in particular, the elements of each one of these couples are ’symmetric’, i.e.,
η10 = η200 and η20 = η100 . This result is reasonable because the two SUs have the
same transmission parameters, i.e., the same transmission rates and the same
average SNRs on the accessible channels, thus we can exchange their role without altering the performance of the system.
70
Figure 5.25: H2,ASY : Average SUs sum Throughput vs (η1 , η2 )
As in the symmetric case, we can note that the higher P U , the bigger the
tendency to grow of the η1 and η2 which maximize the SUs sum throughput,
denoted by η1∗ and η2∗ , i.e., as P U increases the two SUs adopt a more aggressive
policy to exploit the transmitting chances as much as possible (see Fig. 5.26).
Note that when P U = 0.01 there exist no values of the transmission probabilities which satisfy the PU constraint, i.e., the heuristic access policy analyzed
creates too much interference at P Urx .
71
Figure 5.26: H2,ASY : (η1∗ , η2∗ ) vs P U
Fig. 5.26 also represents the optimal transmission probability in the symmetric case, η ∗ ; we can note that the optimal transmission probabilities in the
asymmetric case are distributed around the symmetric one. The fact that one
is a little bigger and the other a litte smaller than η ∗ reveal a balance between
the SU activities, this is reasonable since the SUs have the same transmission
parameters so if one of them is a little more aggressive the other has to be a
little more ’idle’ in order to gain a reward close to the symmetric one.
Figure 5.27: H2,ASY : Average SUs sum throughput with respect to PU throughput
constraint
The SUs sum throughput with respect to the PU throughput constraint for
varying P U is depicted in Fig. 5.27 and compared with the upper bound represented by the centralized case (MMDP). As in the symmetric case, as the
PU throughput increases, the average SUs sum throughput decreases and the
72
numerical results are obtained by selecting for each value of P U considered
the values of η1 and η2 which maximize the SUs sum throughput under this
heuristic. Fig. 5.27 also depicts the symmetric case; we can note that the
performace in the asymmetric case almost coincides with the symmetric one,
indeed is slightly above the latter, since, as we have just evidenced examining
the transmission probabilities, the SUs manifest a tendency to balance their
actions in order to reach the best reward. Only for high P U is the symmetric
approach better than the asymmetric one since, to completely exploit the transmitting chances, the SUs have to transmit with the same probability, i.e., always.
5.3
Heuristic Access Policy H3
The third heuristic policy we analyze, denoted by H3 , is a mix between H1 and
H2 , and represents an interesting case for the performance analysis: we suppose
that one of the two SUs, for example SU1 , has a fixed transmission probability,
η1 , irrespective of its own PU message knowledge state, while the other SU has
two different transmission probabilities, one selected if it knows the PU message
and the other chosen otherwise, denoted by α2 and β2 , respectively. In particular, we consider three possible values of η1 (1, 0.5 and 0.05), which correspond
to an aggressive, a moderate or a weak SU1 , respectively, and for each possible
case we identify which values of α2 and β2 guarantee the higher possible SUs
sum throughput. In other words, considering the same average SNRs, transmission rates and PU throughput constraints used to obtain numerical results in
the optimal centralized and decentralized case, we examine the performance of
the CR network for different values of the probability couple (α2 , β2 ), given a
certain value of η1 , and we choose the one that gives us the maximum achievable
SUs sum throughput.
73
Figure 5.28: H3 : Average SUs sum Throughput vs (α2 , β2 ) given η1 = 1
Fig. 5.28 illustrates the α2 − β2 search for varying P U and given η1 = 1; as
for the previous heuristic, we can do the same considerations about the increase
of P U and the parallel growth of the α2 and β2 which grant the maximum
SUs sum throughput, denoted by α2∗ and β2∗ , respectively, (see Fig. 5.29). It
is important to note that there exists no couple (α2 , β2 ) which satisfies the
PU throughput constraint when P U < 0.4861. This is because the value of
η1 chosen determines a too aggressive channel access of SU1 , which creates a
costant level of interference at P Urx that makes it impossible to satisfy the PU
constraint when the latter is very tight, i.e., for small values of P U .
Figure 5.29: H3 (η1 = 1): (α2∗ , β2∗ ) vs P U
The SUs sum throughput with respect to the PU throughput constraint
for varying P U is depicted in Fig. 5.30 and compared with the upper bound
represented by the centralized case (MMDP). Obviously, as the PU throughput
increases, the average SUs sum throughput decreases. As before, the numerical
results are obtained by selecting for each value of P U considered the couple
(α2 , β2 ) which maximizes SUs sum throughput under this heuristic, given η1 =
1.
74
Figure 5.30: H3 (η1 = 1): Average SUs sum throughput with respect to PU throughput constraint
Fig. 5.31 depicts the average PU throughput with respect to the PU throughput constraint for varying P U . Again, as P U decreases the constraint increases
and the PU throughput degradation decreases and the same considerations developed for previous heuristics are valid.
Figure 5.31: H3 (η1 = 1): Average PU throughput with respect to PU throughput
constraint
Fig. 5.32 illustrates the α2 − β2 search for varying P U and given η1 = 0.5;
as before, it shows that as P U increases SU2 adopts a more aggressive policy to
exploit the transmitting chances as much as possible, i.e., α2∗ and β2∗ grow (see
75
Fig. 5.33). It is important to note that there exists no couple (α2 , β2 ) which satisfies the PU throughput constraint when P U < 0.25. This is because the value
of η1 chosen determines a too aggressive channel access of SU1 for P U < 0.25,
which creates a costant level of interference at P Urx that makes it impossible
to satisfy the PU constraint. On the other side, when P U ≥ 0.25 SU1 does not
exploit as much as possible the opportunity of accessing the channel and thus
the SUs sum throughput achieved is lower than the optimal case, in particular
for high values of P U .
Figure 5.32: H3 : Average SUs sum Throughput vs (α2 , β2 ) given η1 = 0.5
76
Figure 5.33: H3 (η1 = 0.5): (α2∗ , β2∗ ) vs P U
The SUs sum throughput with respect to the PU throughput constraint
for varying P U is depicted in Fig. 5.34 and compared with the upper bound
represented by the centralized case (MMDP). Again, as the PU throughput
increases, the average SUs sum throughput decreases, and the numerical results
are obtained by selecting for each value of P U considered the couple (α2 , β2 )
which maximizes the SUs sum throughput under this heuristic, given η1 = 0.5.
Figure 5.34: H3 (η1 = 0.5): Average SUs sum throughput with respect to PU throughput constraint
77
Figure 5.35: H3 (η1 = 0.5): Average PU throughput with respect to PU throughput
constraint
Fig. 5.35 depicts the average PU throughput with respect to the PU throughput constraint for varying P U . As in previous cases, we can do the same condiderations about the P U increase, the resulting constraint relaxation and the
PU throughput degradation increase.
78
Figure 5.36: H3 : Average SUs sum Throughput vs (α2 , β2 ) given η1 = 0.05
79
Fig. 5.36 illustrates the α2 − β2 search for varying P U and given η1 = 0.05;
as before, it shows the growth of the α2 and β2 which grant the maximum SUs
sum throughput and simultaneously the increase of P U , i.e., the relaxation of
the PU throughput constraint (see Fig. 5.37). It is important to note that
there exists no couple (α2 , β2 ) which satisfies the PU throughput constraint
when P U = 0.01. This is because the value of η1 chosen determines a too
aggressive channel access of SU1 for P U = 0.01, which creates a costant level of
interference at P Urx that makes it impossible to satisfy the PU constraint. On
the other side, for high values of P U SU1 does not exploit as much as possible
the opportunity of accessing the channel and thus the SUs sum throughput
achieved is lower than the optimal case.
Figure 5.37: H3 (η1 = 0.05): (α2∗ , β2∗ ) vs P U
Figure 5.38: H3 (η1 = 0.05): Average SUs sum throughput with respect to PU
throughput constraint
80
The SUs sum throughput with respect to the PU throughput constraint
for varying P U is depicted in Fig. 5.38 and compared with the upper bound
represented by the centralized case (MMDP). Obviously, as the PU throughput
increases, the average SUs sum throughput decreases. As for previous cases, the
numerical results are obtained by selecting for each value of P U considered the
couple (α2 , β2 ) which maximizes the SUs sum throughput under this heuristic,
given η1 = 0.05.
Figure 5.39: H3 (η1 = 0.05): Average PU throughput with respect to PU throughput
constraint
Fig. 5.39 depicts the average PU throughput with respect to the PU throughput constraint for varying P U . As in previous cases, the increase of P U leads
to a more significant PU throughput degradation.
Fig. 5.40 offers a comparison of the performance given by heuristic H3 in
the three different cases considered. Since SU1 transmission probability, η1 , is
fixed, it creates a constant level of interference at the PU receiver. Thus, if η1
is too high, for example η1 = 1, the PU throughput degradation constraint can
not be satisfy since the latter is very tight; this is why it is not possible to find a
valid (α2 , β2 ) couple for P U under a certain value. On the other side, the fact
that SU1 has a fixed transmission probability affects negatively the SUs sum
throughput for high values of P U ; in these cases if η1 < 1 SU1 does not exploit
the transmitting chances as much as possible at the expense of the maximum
achievable SUs sum throughput.
81
Figure 5.40: H3 comparison: Average SUs sum throughput with respect to PU
throughput constraint
5.4
Heuristic Access Policy H4
The heuristic access policies proposed so far are offline strategies since, after a research of the optimal transmission probability values, we analyze the
performance of a CR network where the SUs select their action according to
an established policy strictly depending on the identified optimal transmission
probabilities. The last heuristic policy we propose, denoted by H4 , is an online
strategy, i.e., the SUs do not follow an established policy, but try to rearrange
their transmission probability during the evolution of the system according to
the PU feedback they receive and their own PU message knowledge with the
aim of maximizing their own throughput under the PU throughput constraint.
Since the state of the system deeply influences the network evolution, as in H1 ,
we consider two different transmission probabilities for each one of the SUs in
each time-slot; thus, we have αi and βi , i ∈ {1, 2}, which are the SUi probability of accessing the channel when it knows and does not know the PU message,
respectively. At the beginning of the simulation, αi and βi are initialized randomly with αi > βi and we suppose they are the same for the two SUs; instead,
during the evolution of the system they are not bound to assume the same value.
The aim of each SU is to maximize its own throughput under the PU throughput constraint. The primary ARQ feedback gives the SUs the notion of how
the system is evolving; in addition, they know the maximum PU throughput,
TpI , and the constraint they have to satisfy. In each time-slot the SU can check
if the PU throughput constraint is satisfied, i.e., if TpI − Tp ≤ Rp ω , and thus
rearrange its own transmission probability to regulate the level of interference
at the PU receiver. In other words, if the primary ARQ feedback, t, is equal to 1
and the PU message is successfully decoded at P Urx , each SUs can compute the
PU throughput collected until this moment, Tp ; if it is too high, i.e., Tp > TpI ,
the SUs transmission probabilities are too small so they can increase them by
82
an arbitrary small quantity (in simulation we use an increment of 0.05). On the
other hand, if t > 1, i.e., the PU is retransmitting, and the PU message is not
successfully decoded at P Urx , each SU can check the PU throughput constraint;
if it is not satisfied, i.e., TpI − Tp > Rp ω , the SUs create too much interference
at the PU receiver, so they have to decrease their transmission probability by
the same arbitrary small quantity. This kind of approach allows to rearrange
the SU access policy according to the evolution of the system.
Figure 5.41: H4 : Average SUs sum throughput with respect to PU throughput constraint
Fig. 5.41 depicts the SUs sum throughput with respect to the PU throughput constraint for varying P U and compared with the upper bound represented
by the centralized case (MMDP). Obviously, as the PU throughput increases,
the average SUs sum throughput decreases. The gap between the performance
of H4 and the upper bound is not very large; thus, we can consider this heuristic
approach as a good approximation of the optimal decentralized case.
83
84
85
86
Figure 5.42: H4 : Transmission probabilities convergence
The offline heuristic, H1 , represents the best we can do since after a search
among all possible cases we choose the optimal transmission probabilities which
maximize the SUs sum throughput under the PU throughput constraint; the online heuristic, H4 , instead is an adaptive algorithm which rearranges the transmission probabilities as the system evolves with the aim to gain the higher
reward and satisfy the PU constraint. Thus, H4 transmission probabilities have
to converge to the optimal ones or at least to get them as close as possible.
Fig. 5.42 depicts the evolution of α1∗ , α2∗ , β1∗ and β2∗ compared with the optimal
ones for varying P U . We can note that for small and high values of P U the
transmission probabilities tend to easily converge to the optimal values, while
87
for intermediate values the convergence is not completely reached. This clarifies
why the gap between H1 and H4 is larger for intermediate values of P U ; if the
SUs adopt transmission probabilities far from the optimal ones the best reward
can not be achieved with a consequent degradation of the performance.
As in H1 case, it is interesting to analyze the effect of the asymmetries in
the SNR on the heuristic performance; Fig. 5.43 depicts the average SUs sum
throughput with respect to the PU throughput constraint for three specific values of γs2p and compares them with the symmetric case. It is important to
underline that the online heuristic is based on the rearrangement of the SUs
transmission probabilities according to the fluctuations of the PU throughput
with the aim of maximizing the channel accesses, but under the PU constraint.
In all the examined scenarios there is SU1 which creates a constant average interference level at P Urx and affects the PU performance more or less in dependence
of the tightness of the constraint, then there is SU2 whose SNR changes: for
γs2p = 0.25 SU2 creates an interference level almost unperceivable at P Urx , thus
it can exploit its transmitting chances more than in the symmetric case, i.e., for
γs2p = 2, and increase the average SUs reward. Obviously, once it reaches its
maximum transmission capability there is a saturation of the SUs throughput.
For γs2p > 2 the total interference caused by the SUs at P Urx is higher than
the symmetric case with a consequent degradation of the performace, in effect
as γs2p increases SU2 has to limit its accesses to the channel in order to respect
the PU constraint. We can note that the performance is almost the same for
γs2p = 5 and γs2p = 10 until P U < 0.6, then the performance degradation SU2
causes is more significant as its SNR increases; this is reasonable if we observe
Fig. 5.44: it clearly shows that for P U ≥ 0.6 the degradation of PU performance grows as γs2p increases and thus SU2 has to rearrange its transmitting
behavior in a different way according to the interference it creates at P Urx .
Figure 5.43: H4,ASY : Average SUs sum throughput with respect to PU throughput
constraint
88
Figure 5.44: H4,ASY : Average PU throughput with respect to PU throughput constraint
Fig. 5.45 is very useful to better understand the tradeoff that affects the
heuristic performance: it depicts the average long term SUs sum throughput
with respect to the average PU throughput in the same asymmetric situations
examined in Fig. 5.43.
Figure 5.45: H4,ASY : Average SUs sum throughput-Average PU throughput tradeoff
As in the offline case, we can note that as the SNR of the interference channel,
γs2p , decreases the performance generally improves; furthermore, it is evident
that greater SNR values allow to reach a higher average SUs sum throughput but
at price of a more significant degradation of PU performance, i.e., the tradeoff
89
between what we can gain and the cost we have to pay clearly appears, whereas
in Fig. 5.43 the PU advantage due to the interference level reduction as γs2p
decreases is not so evident.
Figure 5.46: H4,ASY : α1∗ and β1∗ vs γs2p
Fig. 5.46 and 5.47 depict SU1 ’s and SU2 ’s optimal transmission probabilities, respectively, for the same values of γs2p we just considered compared with
the optimal ones in the symmetric case. As we already underlined in the offline
case, for γs2p = 0.25 SU2 adopts a more aggressive behavior than in the symmetric case, in particular for small values of P U and if it does not know the PU
message, whereas SU1 limits its accesses to the channel; on the other hand, as
γs2p increases the SUs exchange their roles, i.e., SU2 tends to limit its accesses
to the channel as SU1 exploits as much as possible the transmitting chances.
In other word, there is a sort of balance of the SUs behavior: according to its
SNR SU2 adopts a certain trasmitting behavior and SU1 , whose SNR does not
change, tries to balance the effects of SU2 ’s interference at P Urx .
Figure 5.47: H4,ASY : α2∗ and β2∗ vs γs2p
90
Fig. 5.48 offers a comparison of the performance given by the offline heuristic H1 and H2 , and the online heuristic H4 .
Figure 5.48: H1 − H2 − H4 comparison: Average SUs sum throughput with respect
to PU throughput constraint
If we consider the two offline heuristic, H1 and H2 , the best is H1 . This
result is reasonable since the access policy proposed in H1 depends on the PU
message knowledge of the SUs and thus exploits the information which the SUs
know about the state of the system, while the access policy proposed in H2 does
not exploit the PU message knowledge, which leads to inferior performance in
general.
The online heuristic curve, H4 , stays a little below the offline heuristic H1 ; in
particular, it shows slightly worse performance for intermediate values of P U .
This means that the auto-rearrangement of the SUs transmission probabilities
does not allow them to completely exploit the transmitting chances in these
cases. In our opinion this is due to the fact that for intermediate P U the system evolution affects more significantly the SUs behavior, i.e., there is more
diversification in SU transmission probabilities than when P U is high and the
SUs try to transmit as much as possible or when P U is small and the SUs limit
their channel accesses as much as possible. In other words for intermediate values of P U the system possible changes and evolutions are more heterogeneous,
thus simulative results are more approximate.
In terms of control informations used by the heuristics, the offline policies
only exploit the knowledge of the state of the system, i.e., the primary ARQ
feedback and the PU message knowledge, while the online one is more complex,
since in each time- slot it has to evaluate the instantaneous PU throughput, Tp ,
and check if the PU constraint is satisfied.
91
Chapter 6
Decentralize Access Policies
in a Cognitive Radio
Network with two SUs and
a Partially Observable
System
The problem to efficiently model a partially observable decentralized system is a
very hard challenge; it can not be reduced to the adaptation of a POMDP model
to a decentralized scenario because a lot of factors affect the system evolution
and the lack of information is an obstacle very difficult to overcome. As a final
contribute to our work we suggest some starting strategy which combined or reelaborated could offer some interesting idea for a solution to the DEC-POMDP
problem.
A first interesting paper is represented by [25] which offers a new framework
for learning without state estimation in a partially observable scenario: typically
in an MDP the value of the state of the system, s, under a specific policy, π,
can be written recursively as follows:
X
X
0
0
V π (s) =
P (a|π, s)[Ra (s) +
P a (s, s ) · V π (s )]
s0 ∈S
a∈A
but in a POMDP the value of an observation, x, under a specific policy, π, cannot
be defined in a similar form. However, the value of a state in the underlying
MDP does not change just because it is inacessible; therefore the value of an
observation, x, under a specific policy, π, can be define as follows:
X
V π (x) =
P π (s|x) · V π (s)
s∈S
where P π (s|x) is the asymptotic probability that the state of the underlying
93
MDP is s when the observation is known to be x and can be defined as:
P (x|s)P π (s)
P (x|s)P π (s)
P
P π (s|x) =
=
0
π 0
P π (x)
s0 ∈S P (x|s )P (s )
Note that the last equation is only a definition of V π (x) because the states are
not observable in POMDPs. The authors use a Q-learning algorithm applied
with a fixed stationary persistent exitation learning policy, i.e. a policy that
assigns a non-zero probability to every action in every state and for which the
underlying Markov chain is ergodic; in POMDPs that satisfy the assumption
that the underlying MDPs are ergodic for every stationary policy all stationary policies with the characteristics just illustrated are persistent exciting. In
the paper is demonstrated that in a POMDP of the type just described, if a
persistent excitation policy π is followed during learning, the Q-learnimg algorithm will converge to the solution of the following system of equations with
probability one: ∀x ∈ X
i
X
X
Q(x, a) =
P π1 (s|x, a) Ra (s) + γ
P a (s, x0 ) · maxa0 ∈A {Q(x0 , a0 )} (6.1)
s∈S
x0 ∈X
where P π (s|x, a) is the asymptotic probability, under policy π, that the underlying state is s given that the observation-action pair is (x, a), and P a (s, x0 ) =
P
a
0
0 0
s0 ∈S P (s, s )P (x |s ). Since the scenario we are considering is decentralized,
i.e. the SUs are independent and take their action independently to each other
on the base of their own observation of the state of the system, our approach
will consists on fixing the policy of one SU and finding the optimum policy for
the other, and viceversa. A possible way to exploit this approach can be the
following: since we aim to find the optimal policy of agent SUi assuming that
policy of agent SUj is fixed and known to agent SUi , with i, j ∈ {1, 2} and
i 6= j, we have to use equation (6.1) assuming that P πi (s|x, a) is the asymptotic probability, under policy πi , that the underlying
P state is s given that the
observation-action pair is (x, a), and P a (s, x0 ) = s0 ∈S P a (s, s0 )P (x0 |s0 ). To
implement equation (6.1) we have to calculate previously the single components; we assume that a = ai ∈ {0, 1}, s = (t, φi , φj ) with t ∈ {1, ..., T } and
φi , φj ∈ {U, K}, x = (t, φi ) is the observation on PU message knowledge of
agent SUi .
The probability that SUi observes x when the system is in state s, P (x|s),
can be calculated as follows:
P [xi = (t, U )|s = ((t, U, φj )]
=1
P [xi = (t, U )|s = ((t, K, φj )]
=0
P [xi = (t, K)|s = ((t, K, φj )]
=1
P [xi = (t, K)|s = ((t, U, φj )]
=0
(6.2)
The instantaneous SUs sum throughput when the system is in state s and SUi
take action a, Ria (s), can be computed as follows:
P
Ria (s) = aj ∈A Ri (s, ai , aj )πj (aj |xj )
(6.3)
P
= aj ∈A Ri (xi , ai , aj )πj (aj |xj )
94
where Ri (s, ai , aj ) = Ri (xi , ai , aj ) since in simulations we suppose that there is
no interference between SUi trasmitter and SUj receiver.
The instantaneous PU throughput degradation when the system is in state s
and SUi take action a, Cia (s), can be computed as follows:
P
Cia (s) = aj ∈A Ci (s, ai , aj )πj (aj |xj )
(6.4)
P
= aj ∈A Ci (xi , ai , aj )πj (aj |xj )
where Ci (s, ai , aj ) = ρp,(a1 ,a2 ) − ρp,0 = Ci (xi , ai , aj ) for the same reason as
before.
The transition probability from state s to s0 when SUi take action a, P a (s, s0 ) =
P (s0 |s, a), can be calculated as follows:
X
P (s0 |s, ai , aj )πj (aj |xj )
(6.5)
P ai (s, s0 ) = P (s0 |s, ai ) =
aj ∈A
The asymptotic probability under policy πi that the underlying state is s given
that the observation-action pair of SUi is (x, a), P πi (s|x, a), can be compute in
the following way:
P πi (s|x, a)
= P πi (s|xi , ai )
P (xi |s,ai )P (s|ai )
P (xi |s0 ,ai )P (s0 |ai )
0
=P
(6.6)
s ∈S
where
P (xi |s, ai ) = P (xi |s)
(6.7)
and P (s|ai ) is:
P (s|ai )
=P
P (ai |s)P (s)
P (ai |s0 )P (s0 )
=P
πi (ai |xi )P (s)
πi (ai |x0i )P (s0 )
0
s0 ∈S
(6.8)
s ∈S
Let be s = (t, φ1 , φ2 ) the actual state, s0 = (t − 1, φ01 , φ02 ) the previous state and
a = (a1 , a2 ) the actions selected by the two agents in state s0 ; the probability
of being in state s, P (s), can be compute as follows:
X X
P (s) =
P a (s0 , s)π1 (a1 |x01 )π2 (a2 |x02 )
(6.9)
s0 ∈S a∈AxA
where P a (s0 , s) is the transition probability from state s0 to state s under the
action pair a.
However, this procedure does not consider any constraints and thus the optimum policies we can obtain are deterministic, even if we fix stochastic starting
policies at convergence the optimum policies are deterministic.
A second interesting perspective is represented by paper [28] in which the
authors demonstrate that a constrained optimization problem can often be reduced to one with no constraints through the introduction of parameters, called
Lagrange multipliers. They illustrates as a discrete-time Markovian system with
a finite state space, a compact action space and an average reward subject to a
95
global constraint can be optimized by simple dynamic programming equations
and the use of Lagrangian multipliers; moreover, they demonstrate that there is
always a stationary optimal policy and that it can be simple (non-randomized)
or a mixed policy, i.e. a mix of two non-randomized policies. Therefore, the
solution of our problem can be divided in two steps:
1) solve the Lagrangian unconstrained POMDP problem using a suitable policy
iteration algorithm;
2) compute the optimal randomized policy as a probabilistic mixture of two
pure policies.
The problem we have to solve is the following:
n
hP
io
π
J = maxπ Es,a
s,a R(s, a)
hP
i
π
s.t. Es,a
s,a C(s, a) ≤ W
(6.10)
where R(s, a) is the SU s sum throughput for the state s = (s1 , s2 ) and the
SU s joint action a = (a1 , a2 ), and C(s, a) is the PU throughput degradation in
the state s = (s1 , s2 ) when the SU s take the joint action a = (a1 , a2 ). Since
the analyzed scenario is distributed, i.e., agents SU1 and SU2 take their action
indipendently from each other and have only a partial view of the state of the
system, the observation x, we can use the same strategy adopted before, i.e.,
first we consider the perspective of agent SU1 , so we fix agent SU2 stochastic
policy and try to find the optimum stochastic policy for agent SU1 with the
Lagrangian procedure, then we exchange the role of the two agents and after
fixing the policy of agent SU1 just found we try to find the optimum stochastic
policy for agent SU2 in the same way; henceforward when we refer to action,
a, and observation, x, we intend action and observation of the agent SUi for
which we want to find the optimum policy. In the Lagrangian perspective the
problem above described can be reformulated as follows:
B λ (s, a) = R(s, a) − λC(s, a)
V λ (x, a) = maxa∈A
nX
(6.11)
h
io
X
0
0
P π (s|x, a) B λ (s, a) +
P a (s, x ) · V λ (x )
x0 ∈X
s∈S
(6.12)
π λ (x) = argmaxa∈A V λ (x, a)
(6.13)
where, since each agent does not know the state of the system, s, but only its
own perspective, the calculation of V λ depends on the actual observation of
agent SUi , x. R(s, a) = Ra (s) given in (6.3) and C(s, a) = C a (s) given in (6.4)
are respectively the SUs sum throughput and the PU throughput degradation
when the system is in state s = (s1 , s2 ) and agent SUi take action a = ai , for
λ
i ∈ {1, 2}, and P πi (s|x, a) is the asymptotic probability under policy πiλ that the
underlying state is s given that the observation-action pair is (x, a) = (xi , ai ).
Thus for every possible observation, x ∈ Xi , and SUi action , a ∈ Ai , we
have V λ (x, a) and we can find the deterministic corresponding policy by policy
iteration.
Finding a policy iteration algortithm that solve problem (6.10) is anything but
96
simple, beacuse the update of V λ (x, a) has not to depend on the policies to
grant convergence. Supposing to have an efficient policy iteration algorithm, at
the end of this procedure we have πiλ optimum associated to a certain λ, thus
we can evaluate the corresponding PU throughput degradation:
X
Cλ =
C λ (x) · P (x)
(6.14)
x∈Xi
where
C λ (x) =
X X
λ
P πi (s|x, a) · C a (s) πiλ (a|x)
(6.15)
X
(6.16)
a∈Ai s∈S
and
P (x) =
P (x|s) · P (s)
s∈S
where C a (s), P (x|s), P (s) are given in (6.4), (6.2) and (6.9), respectively.
Since V λ and C λ are monotone non-increasing function in λ (as demonstrated in [28]), we can find two different values of λ (λ1 and λ2 ), such that
the corrisponding PU throughput degradations (C λ1 and C λ2 ) are respectively
C λ1 > W and C λ2 < W in the most strictly way, i.e., λ1 is the greater value
of λ for which C λ1 > W and λ2 is the smaller value of λ for which C λ2 < W ,
where W is the constraint we have to satisfy in the optimization procedure.
The optimum policy we expected to find is an optimal constrained policy in the
form:
µ∗i (x) = qµλi 1 (x) + (1 − q)µλi 2 (x) ∀x ∈ Xi
(6.17)
where µλi 1 and µλi 2 are respectively the trasmission probability for λ1 and λ2
associated to the policy obtained by policy iteration, q ∈ [0, 1] and it has to
satisfy Cµ∗i (x) = W ∀x ∈ Xi that can be calculate as follows:
Cµ∗i (x) =
X X
∗
P πi (s|x, a) · C a (s) πi∗ (a|x),
∀x ∈ Xi
(6.18)
a∈Ai s∈S
where πi∗ is the SUi ’s optimum stochastic policy we aimed to find and is related
to µ∗i simply by
(
πi∗ (a|x)
=
πi∗ (0|x)
= 1 − µ∗i (x),
πi∗ (1|x)
= µ∗i (x),
∀x ∈ Xi
∀x ∈ Xi
The last approach we suggest is the optimal one from the analitical point
of view: an optimization procedure based on belief states. A POMDP model
can be formally defined by the 6-tuple Ξ = (S, A, Z, R, T , O) where: S is
the state space, A is the action set, T is the transition probability from one
state to another given the action, Z and O are the observation set and related
probabilities and R generalizes the COMDP rewards to add a dependency on
the observation. As in any MDP our goal is to find the optimal policy and the
97
only way for a policy to specify the truly optimal behavior is to remember the
entire history of the process. In a POMDP it is possible to derive a summary
statistic for the entire history of the process, called information state or belief
state, that unlike the entire history is of fixed dimension. A belief state is a
sufficient statistic for the history, which means that optimal behavior can be
achieved using the belief state in place of the history. A belief state, b, is simply
a probability distribution over the set of the states, Π(S), with b(s) being the
probability of occupying state s; thus, we can define B = Π(S) to be the space of
all probability distributions over S. A single belief state can capture the relevant
aspects of the entire previous history of the process and more importantly can
be easily updated after each transition to incorporate one additional step into
the history.
A starting point we suggest for a complete formulation is the following: assuming
that SUj has a fixed stochastic policy, πj , and we want to find SUi ’s optimal
stochastic policy, with i, j ∈ {1, 2} and i 6= j, our state is sM DP = (t, φi , φj ) =
(xi , φj ) in underlay MDP, where xi = (t, φi ) and φi , φj ∈ Φ = {U, K}. The
unknow part of the state is φj , while the action selected by SUj given xj =
(t, φj ) is known to SUi , since it knows SUj ’s fixed stochastic policy. Hence,
in new model, we can use the belief on φj and our new state is given by s =
(t, φi , P r(φj |hi )) = (xi , bi (φj )), where bi (φj ) = P r(φj |hi ) and hi is the history
of previous actions and observations in SUi . The new MDP model for SUi is a
tuple (Ṡ, Ṗ , Ṙi , Ċi ), where Ṡ is the state space, Ṗ is the transition probability
matrix, Ṙi is the instantaneous reward function and Ċi is the instantaneous cost
function.
In order to characterize our new model we need to find a formula which given
0
i
a belief state vector bi (φj ) computes the resulting belief state, bai,x
0 (φj ), after
i
a transition in the process, then we have to compute Ṗ , Ṙi , Ċi . Once we
have all this elements, since for every unichain Constrained Markov Decision
Process there exists an equivalent Linear Programming (LP) formulation, where
an MDP is considered unichain if it contains a single recurrent class plus a
(perhaps empty) set of transient states. The equivalent LP problem of the
problem we want to solve is the following:
X X
maxz
Ṙi (s, ai )z(ai , s)
(6.19)
s∈Ṡ ai ∈A
s.t.
X X
Ċi (s, ai )z(ai , s) ≤ W
(6.20)
s∈Ṡ ai ∈A
X
z(ai , s0 ) −
ai ∈A
X X
X X
Ṗ (s0 |s, ai )z(ai , s) = 0 ∀s0 ∈ Ṡ
(6.21)
s∈Ṡ ai ∈A
z(ai , s) = 1
(6.22)
s∈Ṡ ai ∈A
z(ai , s) ≥ 0 ∀s ∈ Ṡ,
ai ∈ A
(6.23)
(6.24)
98
The relationship between the optimal solution of LP problem (6.24) and the
optimal solution to our problem is obtained as follows:
P
P z(ai ,xi ,bi )
,
if
ai ∈A z(ai , xi , bi ) > 0
z(ai ,xi ,bi )
ai ∈A
πi (ai |xi , bi ) =
arbitrary,
otherwise
The development of an efficient DEC-POMDP formulation based on belief states
remain an open research problem that could be investigated in future works.
99
Chapter 7
Conclusions
Cognitive radio represents an innovative and intelligent way to exploit the frequency spectrum and offers a lot of idea for interesting future research work.
In this thesis work we focus on the decentralized access policies design in a CR
network with one PU and two independent SUs under a primary ARQ process;
exploiting FIC we extend previous research works which considered only the
centralized scenario and/or a network with only one secondary agent. The decentralized scenario is very actual and recurrent in many applications and real
contests, thus our work can be considered as a little stone in the wall of future
reasearch on cognitive radio network. Cognitive radio represents a very hard
challenge for complexity, lack of information, problems in system modelling and
expecially in analytical formulation; for these reason we concentrate our efforts
on the decentralized heuristic policies design. We suggest some interesting offline and online access strategies and we point out that in such a scenario with
independent secondary users characterized by the same transmission parameters a symmetric approach gives good results very closed to the optimal ones.
Since the only partial knowledge of the state of the system represents a crucial
element which affects significantly the SUs performance, we show how important could be exploiting this information in the development of a good channel
access strategy. Finally, we point out the lack of an efficient analytical formulation of the DEC-POMDP problem and suggest some starting point for future
research work, like Q-learning exploitation, Lagrangian multipliers utilization
and formulation based on belief states. Cognitive radio represents a continuous
challenge and thus a rich inspiration source for future research works: in addition to the development of a correct analytical formulation, another interesting
field to investigate is the multiple CR scenario, i.e., to study how the growth in
SUs number affects the performance of system and thus design suitable access
policies which let to efficiently employ the spectrum bands.
101
Appendix A
MATLAB Code
In this appendix we report some fragment of the Matlab code developed to
implement the access policies proposed.
A.1
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
DecMmdp.m
..Transmission paramiters initialization..
T = maximum number of primary transmission
Ns = cardinality of states space
Na = number of possible actions
gammaP = max power of PU
gammaS1max = max power of SU1
gammaS1min = min power of SU1
gammaS2max = max power of SU2
gammaS2min = min power of SU2
alphaBarS1S2 = ch. coeff. from SU1 to SU2
alphaBarS2S1 = ch. coeff. from SU2 to SU1
alphaBarS1S1 = ch. coeff. from SU1 to SU1
alphaBarS2S2 = ch. coeff. from SU2 to SU2
alphaBarS1P = ch. coeff. from SU1 to PU
alphaBarS2P = ch. coeff. from SU2 to PU
alphaBarPS1 = ch. coeff. from PU to SU1
alphaBarPS2 = ch. coeff. from PU to SU2
alphaBarPP = ch. coeff. from PU to PU
gammaPS1 = average SNR on ch. PUtx->SU1rx;
gammaPS2 = average SNR on ch. PUtx->SU2rx;
gammaPP = average SNR on ch. PUtx->PUrx;
R_P = PU tx rate
gammaS1 = power of SU1
gammaS2 = power of SU2
R_SK1 = SU1 tx rate if PU msg known
R_SK2 = SU2 tx rate if PU msg known
R_SU1 = SU1 tx rate if PU msg unknown
R_SU2 = SU2 tx rate if PU msg unknown
rhoP = PU outage on ch. PUtx->PUrx
103
%
%
%
%
%
%
rhoPS1
rhoPS2
rhoS1K
rhoS2K
rhoS1U
rhoS2U
=
=
=
=
=
=
PU outage on ch. PUtx->SU1rx
PU outage on ch. PUtx->SU2rx
SU1 outage when PU msg known
SU2 outage when PU msg known
SU1 outage when PU msg unknown
SU2 outage when PU msg unknown
...
EpsPU = [.01 .05 .08 .1 .13 .15 .18 .2 .25 .3 .4 .4861 .6 .7 .8 1];
Nsim = length(EpsPU);
% Optimization procedure..
for ns = 1:Nsim
fprintf(’Optimization for epsPU=%f..\n’,EpsPU(ns));
epsPU = EpsPU(ns);
epsW = (1-rhoP(1,1))*epsPU; % PU thr degradation constraint
...
ThrSmax = 0;
ThrPmin = 0;
pi1opt = zeros(Na,Ns);
pi2opt = zeros(Na,Ns);
Nit = 1000;
for iter = 1:Nit
if mod(iter,1000)==0
fprintf(’Simulation #%d..\n’,iter);
end
ThrS = 0;
ThrP = 0;
% initialization of SU1’s policy
pi1 = zeros(Na,Ns);
mu1 = zeros(1,Ns);
% initialization of SU2’s fixed policy
pi2 = zeros(Na,Ns);
mu2 = zeros(1,Ns);
for s = 1:Ns
mu2(s) = random(’unif’,0,1);
end
for s=1:Ns
for j=1:Na
if j==1
pi2(j,s) = 1-mu2(s);
else
pi2(j,s) = mu2(s);
end
end
end
stop = 0;
pi1_old = zeros(Na,Ns);
pi2_old = zeros(Na,Ns);
round = 1;
104
while(stop~=1)
if round==10
break;
end
found1 = 0;
% LP problem formulation for SU1..
% transition probability from state s to state sprim
% when agent SU1 selects action a1=i (P^{a1}(s,s’)=Pr(s’|s,a1))
Ptran1 = zeros(Ns,Ns,Na);
for i = 1:Na
for s = 1:Ns
for sprim = 1:Ns
PR = 0;
for j = 1:Na
PR = PR+Ptran(s,sprim,i,j)*pi2(j,s);
end
Ptran1(s,sprim,i) = PR;
end
end
end
% reward when SU1 is in state s
% and selects action a1=i (R^{a1}(s)=R1(s,a1))
R1 = zeros(Ns,Na);
for i = 1:Na
for s = 1:Ns
R1(s,i) = rwd1(s,i,mu2);
end
end
% cost in terms of PU thr degradation when SU1 is in state s
% and selects action a1=i (C^{a1}(s)=C1(s,a1))
C1 = zeros(Ns,Na);
for i = 1:Na
for s = 1:Ns
C1(s,i) = cost1(s,i,mu2);
end
end
Ts = zeros(Ns*Na,1); % SUs reward
for s = 1:Ns
for i = 1:Na
Ts((s-1)*Na+i) = R1(s,i);
end
end
Tc = zeros(1,Ns*Na); % PU thr degradation(cost)
for s = 1:Ns
for i = 1:Na
Tc((s-1)*Na+i) = C1(s,i);
end
end
DeltaMinusP = zeros(Ns,Ns*Na);
for sprim = 1:Ns
105
for s = 1:Ns
delta = 1*(s==sprim)+0*(s~=sprim);
for i = 1:Na
DeltaMinusP(sprim,(s-1)*Na+i) = DeltaMinusP(sprim,(s-1)*Na+i)
+(delta-Ptran1(s,sprim,i));
end
end
end
sumZ = zeros(1,Ns*Na);
for s = 1:Ns
for i = 1:Na
sumZ(1,(s-1)*Na+i) = 1;
end
end
Aeq=[DeltaMinusP;sumZ];
beq=[zeros(Ns,1);1];
lb=zeros(Ns*Na,1);
options=optimset(’LargeScale’,’on’,’Simplex’,’off’,’Display’,’off’);
[z,fval,exitflag,output,lambda]=linprog(-1*Ts,Tc,epsW,Aeq,beq,lb,[],[],options);
if length(z)==0
break;
end
pi1_old = pi1;
for s = 1:Ns
sum = 0;
for i = 1:Na
sum = sum+z((s-1)*Na+i);
end
if sum>0
for i = 1:Na
pi1(i,s) = z((s-1)*Na+i)/sum;
if i==2
mu1(s) = pi1(i,s);
end
end
else
pi1(1,s) = random(’unif’,0,1);
pi1(2,s) = 1-pi1(1,s);
mu1(s) = pi1(2,s);
end
end
if exitflag==1
found1 = 1;
elseif exitflag==-2
fprintf(’No feasible solution found!\n’);
end
found2 = 0;
% ..LP problem formulation for SU2 identical to SU1 case..
if exitflag==1
found2 = 1;
106
elseif exitflag==-2
fprintf(’No feasible solution found!\n’);
end
% check stopping condition..
eq1 = 0;
eq2 = 0;
for s = 1:Ns
if abs(pi1_old(1,s)-pi1(1,s))<0.001
eq1 = eq1+1;
end
if abs(pi2_old(1,s)-pi2(1,s))<0.001
eq2 = eq2+1;
end
end
if (eq1==Ns) && (eq2==Ns)
if found1==1 && found2==1
% Average SUs sum throughput calculation:
[ThrS,ThrP] = thr(pi1,pi2,Ptran);
stop = 1;
if ThrS>ThrSmax
ThrSmax = ThrS;
ThrPmin = ThrP;
pi1opt = pi1;
pi2opt = pi2;
end
else
break;
end
else
round = round+1;
end
end
end
...
end
end
A.2
System evolution in CR network simulator
% ..Transmission paramiters initialization (as in DecMmdp.m)..
...
%
%
%
%
%
Rp
R1
R2
a1
a2
=
=
=
=
=
PU tx rate
SU1 tx rate
SU2 tx rate
SU1 action in current time-slot
SU2 action in current time-slot
107
%
%
%
%
%
%
%
%
%
%
%
%
fpp = fading coeff. on ch. PUtx->PUrx
fps1 = fading coeff. on ch. PUtx->SU1rx
fps2 = fading coeff. on ch. PUtx->SU2rx
fs1p = fading coeff. on ch. SU1tx->PUrx
fs2p = fading coeff. on ch. SU2tx->PUrx
fs1s1 = fading coeff. on ch. SU1tx->SU1rx
fs1s2 = fading coeff. on ch. SU1tx->SU2rx
fs2s2 = fading coeff. on ch. SU2tx->SU2rx
fs2s1 = fading coeff. on ch. SU2tx->SU1rx
t = primary ARQ feedback
S1k = SU1’s knowledge of PU msg
S2k = SU2’s knowledge of PU msg
if a1==0 && a2==0 % only PU tx
snrPP = gammaPP*fpp;
snrPS1 = gammaPS1*fps1;
snrPS2 = gammaPS2*fps2;
if Rp<=log2(1+snrPP) % PU success
succPU = succPU+1;
t = 1;
S1k = 0;
S2k = 0;
else
if t==T
t = 1;
S1k = 0;
S2k = 0;
else
t = t+1;
if S1k==0
if Rp<=log2(1+snrPS1)
S1k = 1;
end
end
if S2k==0
if Rp<=log2(1+snrPS2)
S2k = 1;
end
end
end
end
elseif a1==1 && a2==0 % PU and only SU1 tx
% SU1 tx success/insuccess
if S1k==0
R1 = R_SU1(a1+1,a2+1);
snrPS1 = gammaPS1*fps1;
cP = (Rp<=log2(1+snrPS1));
if cP==1
snrS1S1 = gammaS1S1(a1+1)*fs1s1;
cS1 = (R1<=log2(1+snrS1S1));
108
cPS1 = ((R1+Rp)<=log2(1+snrS1S1+snrPS1));
if cS1==1 && cPS1==1 % SU1 success;
succSU1 = succSU1+1;
thrS1 = thrS1+R1;
S1k = 1;
end
elseif cP==0
snrS1S1 = (gammaS1S1(a1+1)*fs1s1)/(1+gammaPS1*fps1);
cS1 = (R1<=log2(1+snrS1S1));
if cS1==1 % SU1 success
succSU1 = succSU1+1;
thrS1 = thrS1+R1;
end
end
if S1k==0
snrS1S1 = gammaS1S1(a1+1)*fs1s1;
cS1 = (R1<=log2(1+snrS1S1));
if cS1==0
snrPS1 = (gammaPS1*fps1)/(1+gammaS1S1(a1+1)*fs1s1);
cP = (Rp<=log2(1+snrPS1));
if cP==1
S1k = 1;
end
end
end
elseif S1k==1
R1 = R_SK1(a1+1,a2+1);
snrS1S1 = gammaS1S1(a1+1)*fs1s1;
if R1<=log2(1+snrS1S1) % SU1 success
succSU1 = succSU1+1;
thrS1 = thrS1+R1;
end
end
% PU tx success/insuccess
snrPP = (gammaPP*fpp)/(1+gammaS1P(a1+1)*fs1p);
snrPS2 = (gammaPS2*fps2)/(1+gammaS1S2(a1+1)*fs1s2);
if Rp<=log2(1+snrPP) % PU success
succPU = succPU+1;
t = 1;
S1k = 0;
S2k = 0;
else
if t==T
t = 1;
S1k = 0;
S2k = 0;
else
t = t+1;
if S2k==0
if Rp<=log2(1+snrPS2)
109
S2k = 1;
end
end
end
end
elseif a1==0 && a2==1 % PU and only SU2 tx
% SU2 tx success/insuccess
if S2k==0
R2 = R_SU2(a1+1,a2+1);
snrPS2 = gammaPS2*fps2;
cP = (Rp<=log2(1+snrPS2));
if cP==1
snrS2S2 = gammaS2S2(a2+1)*fs2s2;
cS2 = (R2<=log2(1+snrS2S2));
cPS2 = ((R2+Rp)<=log2(1+snrS2S2+snrPS2));
if cS2==1 && cPS2==1 % SU2 success
succSU2 = succSU2+1;
thrS2 = thrS2+R2;
S2k = 1;
end
elseif cP==0
snrS2S2 = (gammaS2S2(a2+1)*fs2s2)/(1+gammaPS2*fps2);
cS2 = (R2<=log2(1+snrS2S2));
if cS2==1 % SU2 success
succSU2 = succSU2+1;
thrS2 = thrS2+R2;
end
end
if S2k==0
cS2 = (R2<=log2(1+gammaS2S2(a2+1)*fs2s2));
if cS2==0
snrPS2 = (gammaPS2*fps2)/(1+gammaS2S2(a2+1)*fs2s2);
cP = (Rp<=log2(1+snrPS2));
if cP==1
S2k = 1;
end
end
end
elseif S2k==1
R2 = R_SK2(a1+1,a2+1);
snrS2S2 = gammaS2S2(a2+1)*fs2s2;
if R2<=log2(1+snrS2S2) % SU2 success
succSU2 = succSU2+1;
thrS2 = thrS2+R2;
end
end
% PU tx success/insuccess
snrPP = (gammaPP*fpp)/(1+gammaS2P(a2+1)*fs2p);
snrPS1 = (gammaPS1*fps1)/(1+gammaS2S1(a2+1)*fs2s1);
if Rp<=log2(1+snrPP) % PU success
110
succPU = succPU+1;
t = 1;
S1k = 0;
S2k = 0;
if t==T
t = 1;
S1k = 0;
S2k = 0;
else
t = t+1;
if S1k==0
if Rp<=log2(1+snrPS1)
S1k = 1;
end
end
end
end
elseif a1==1 && a2==1 % PU and both SUs tx
% SU1 tx success/insuccess
if S1k==0
R1 = R_SU1(a1+1,a2+1);
snrPS1 = (gammaPS1*fps1)/(1+gammaS2S1(a2+1)*fs2s1);
cP = (Rp<=log2(1+snrPS1));
if cP==1
snrS1S1 = (gammaS1S1(a1+1)*fs1s1)/(1+gammaS2S1(a2+1)*fs2s1);
cS1 = (R1<=log2(1+snrS1S1));
cPS1 = ((R1+Rp)<=log2(1+snrPS1+snrS1S1));
if cS1==1 && cPS1==1 % SU1 success
succSU1 = succSU1+1;
thrS1 = thrS1+R1;
S1k = 1;
end
else
snrS1S1 = (gammaS1S1(a1+1)*fs1s1)/(1+gammaPS1*fps1+gammaS2S1(a2+1)*fs2s1);
cS1 = (R1<=log2(1+snrS1S1));
if cS1==1 % SU1 success
succSU1 = succSU1+1;
thrS1 = thrS1+R1;
end
end
if S1k==0
snrS1S1 = (gammaS1S1(a1+1)*fs1s1)/(1+gammaS2S1(a2+1)*fs2s1);
cS1 = (R1<=log2(1+snrS1S1));
if cS1==0
snrPS1 = (gammaPS1*fps1)/(1+gammaS1S1(a1+1)*fs1s1+gammaS2S1(a2+1)*fs2s1);
cP = (Rp<=log2(1+snrPS1));
if cP==1
S1k = 1;
end
end
111
end
elseif S1k==1
R1 = R_SK1(a1+1,a2+1);
snrS1S1 = (gammaS1S1(a1+1)*fs1s1)/(1+gammaS2S1(a2+1)*fs2s1);
if R1<=log2(1+snrS1S1) % SU1 success
succSU1 = succSU1+1;
thrS1 = thrS1+R1;
end
end
% SU2 tx success/insuccess
if S2k==0
R2 = R_SU2(a1+1,a2+1);
snrPS2 = (gammaPS2*fps2)/(1+gammaS1S2(a1+1)*fs1s2);
cP = (Rp<=log2(1+snrPS2));
if cP==1
snrS2S2 = (gammaS2S2(a2+1)*fs2s2)/(1+gammaS1S2(a1+1)*fs1s2);
cS2 = (R2<=log2(1+snrS2S2));
cPS2 = ((R2+Rp)<=log2(1+snrPS2+snrS2S2));
if cS2==1 && cPS2==1 % SU2 success
succSU2 = succSU2+1;
thrS2 = thrS2+R2;
S2k = 1;
end
else
snrS2S2 = (gammaS2S2(a2+1)*fs2s2)/(1+gammaPS2*fps2+gammaS1S2(a1+1)*fs1s2);
cS2 = (R2<=log2(1+snrS2S2));
if cS2==1 % SU2 success
succSU2 = succSU2+1;
thrS2 = thrS2+R2;
end
end
if S2k==0
snrS2S2 = (gammaS2S2(a2+1)*fs2s2)/(1+gammaS1S2(a1+1)*fs1s2);
cS2 = (R2<=log2(1+snrS2S2));
if cS2==0
snrPS2 = (gammaPS2*fps2)/(1+gammaS2S2(a2+1)*fs2s2+gammaS1S2(a1+1)*fs1s2);
cP = (Rp<=log2(1+snrPS2));
if cP==1
S2k = 1;
end
end
end
elseif S2k==1
R2 = R_SK2(a1+1,a2+1);
snrS2S2 = (gammaS2S2(a2+1)*fs2s2)/(1+gammaS1S2(a1+1)*fs1s2);
if R2<=log2(1+snrS2S2) % SU2 success
succSU2 = succSU2+1;
thrS2 = thrS2+R2;
end
end
112
% PU tx success/insuccess
snrPP = (gammaPP*fpp)/(1+gammaS1P(a1+1)*fs1p+gammaS2P(a2+1)*fs2p);
if Rp<=log2(1+snrPP) % PU success
succPU = succPU+1;
t = 1;
S1k = 0;
S2k = 0;
else
if t==T
t = 1;
S1k = 0;
S2k = 0;
else
t = t+1;
end
end
end
...
A.3
H4 : transmission probabilities rearrangement
% ..Transmission paramiters initialization (as in DecMmdp.m)..
...
%
%
%
%
%
%
%
%
%
%
alpha1 = SU1 tx prob when PU msg known
alpha2 = SU2 tx prob when PU msg known
beta1 = SU1 tx prob when PU msg unknown
beta2 = SU2 tx prob when PU msg unknown
Rp = PU tx rate
t = primary ARQ feedback
S1k = SU1’s knowledge of PU msg
S2k = SU2’s knowledge of PU msg
phi1 = S1k in current time-slot
phi2 = S2k in current time-slot
if Rp<=log2(1+snrPP) % PU success
succPU = succPU+1;
% E4 tx prob. updating
if t==1
Tp = (Rp*succPU)/n;
if Tp>TpI % too much PU success
if phi1==1
if alpha1<0.96
alpha1 = alpha1+0.05;
else
alpha1 = 1;
end
113
else
if beta1<0.95
if (beta1+0.05)<alpha1
beta1 = beta1+0.05;
end
else
if alpha1==1
beta1 = 1;
end
end
end
if phi2==1
if alpha2<0.96
alpha2 = alpha2+0.05;
else
alpha2 = 1;
end
else
if beta2<0.95
if (beta2+0.05)<alpha2
beta2 = beta2+0.05;
end
else
if alpha2==1
beta2 = 1;
end
end
end
end
end
t = 1;
S1k = 0;
S2k = 0;
else
% E4 tx prob. updating
if t>1
Tp = (Rp*succPU)/n;
if (TpI-Tp)>Rp*epsW % too much SUs interference
if phi1==1
if (alpha1>0.05) && ((alpha1-0.05)>beta1)
alpha1 = alpha1-0.05;
end
else
if beta1>0.05
beta1 = beta1-0.05;
else
beta1 = 0;
end
end
if phi2==1
114
if (alpha2>0.05) && ((alpha2-0.05)>beta2)
alpha2 = alpha2-0.05;
end
else
if beta2>0.05
beta2 = beta2-0.05;
else
beta2 = 0;
end
end
end
end
if t==T
t = 1;
S1k = 0;
S2k = 0;
else
t = t+1;
if S1k==0
if Rp<=log2(1+snrPS1)
S1k = 1;
end
end
if S2k==0
if Rp<=log2(1+snrPS2)
S2k = 1;
end
end
end
end
...
115
Bibliography
[1] M. Levorato, U. Mitra, and M. Zorzi, ”Cognitive interference management
in retransmission-based wireless networks”, IEEE Trans. Inform. Theory.,
vol. 58, pp. 3023 − 3046, May 2012.
[2] R. Tannious and A. Nosratinia, ”Cognitive radio protocols based on exploiting
hybrid ARQ retransmissions”, IEEE Trans. Wireless Commun., vol. 9, pp.
2833.2841, Sept. 2010.
[3] N. Michelusi, P. Popovski, O. Simeone, M. Levorato, and M. Zorzi, ”Cognitive access policies under a primary ARQ process via forward-backward interference cancellation”, IEEE J. Sel. Areas Commun., vol. 31, pp. 2374 − 2386,
Nov. 2013.
[4] M. Zorzi and R. Joda, ”Centralized access policy design for two cognitive
secondary users under a primary ARQ process”, IEEE Int. Conf. Commun.
Worksh. ICCW , pp. 268 − 273 , June 2014 .
[5] K. G. Shin, H. Kim, A. W. Min, and A. Kumar, ”Cognitive radios for
dynamic spectrum access: from concept to reality”, IEEE Trans. Wireless
Commun., vol. 17, pp. 64 − 74, Dec. 2010.
[6] I. Akyildiz, W. Y. Lee, M. Vuran, and S. Mohanty, ”A survey on spectrum
management in cognitive radio networks”, IEEE Commun. Mag., vol. 46,
pp. 40 − 48, Apr. 2008.
[7] M. L. Puterman, ”Markov Decision Processes. Discrete Sthochastic Dynamic
Programming”, Wiley, 1994, chapter 8.
[8] R. Tajan, ”Mécamismes de retransmission Hybrid-ARQ en radio-cognitive”,
PhD Thesis, Université de Cergy-Pontoise, 2013.
[9] B. Makki, and T. Eriksson, ”On the AverageRate of HARQ-based Quasistatic Spectrum Sharing Networks”, IEEE Trans. Wireless Commun., vol.
11, pp. 65 − 77, Jan. 2012.
[10] H.-S. Ahn, and R. Righter, ”Multi-Actor Markov Decision Processes”, J.
Appl. Prob. 42, pp. 15 − 26, 2005.
[11] C. Boutilier, ”Planning, Learning and Coordination in Multiagent Decision
Processes”, Proceedings of the 6th Conference on Theoretical Aspects of
Rationality and Knowledge, pp. 195 − 210, 1996.
117
[12] P. Xuan, V. Lesser, and S. Zilberstein, ”Communication in Multi-agent
Markov Decision Processes”, Technical Report TR-2000-01, Department of
Computer Science, University of Massachusetts, 2000.
[13] P. Xuan, and V. Lesser, ”Multi-Agent Policies: from Centralized Ones to
Decentralized Ones”, Proceedings of the 1st International Joint Conference
on Autonomous Agents and Multi Agents Systems, pp. 1098 − 1105, 2002.
[14] M. Petrik, and S. Zilberstein, ”Average-Reward Decentralized Markov Decision Processes”, Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-07), pp.1997 − 2002, 2007.
[15] R. Becker, S. Zilberstein, V. Lesser, and C. V. Goldman, ”Solving Transition Independent Decentralized Markov Decison Processes”, Journal of Artificial Intelligence Research 22, pp. 423 − 455, 2004.
[16] D. S. Bernstein, S. Zilberstein, and N. Immerman, ”The Complexity of
Decentralized Control of Markov Decision Processes”, Proceedings of the
Conference on Uncertainty in Artificial Intelligence, pp. 32 − 37, 2000.
[17] C. Amato, D. S. Bernstein, and s. Zilberstein, ”Optimizing fixed-size
stochastic controllers for POMDPs and decentralized POMDPs”, Journal
of Autonomous Agents and Multi Agents Systems, vol. 21, pp. 293 − 310,
2010.
[18] D. S. Bernstein, E. A. Hansen, and S. Zilberstein, ”Bounded Policy Iteration for Decentralized POMDPs”, Proceedings of the 15th International
Conference on Automated Planning and Scheduling, pp. 52 − 57, 2005.
[19] T. Jaakkola, S. P. Singh, and M. I. Jordan, ”Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems”, Advances in
Neural Information Processing Systems 7, MIT Press, Cambridge, MA, pp.
345 − 352, 1995.
[20] S. P. Singh, ”Reinforcement Learning Algorithms for Average-Payoff
Markovian Decision Processes”, Proceedings of the 12th National Conference on Artificial Intelligence, pp. 700 − 704, 1994.
[21] C. Amato, G. Chowdhary, A. Geramifard, N. K. Ü re, and M. J. Kochenderfer, ”Decentralized Control of Partially Observable Markov Decision Processes’, Proceedings of the 52nd IEEE Conference on Decision and Control,
pp. 2398 − 2405, Dec. 2013.
[22] J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet, ”Optimally Solving
Dec-POMDPs as Continuous-State MDPs”, Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI-13), Aug. 2013.
[23] R. Nair, M. Tambe, M. Yokoo, D. Pynadath, and S. Marsella, ”Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent
Settings”, Proceedings of the International Joint Conference on Artificial
Intelligence (IJCAI-03), pp. 705 − 711, Aug. 2003.
[24] I. Chades, B. Scherrer, and F. Charpillet, ”A Heuristic Approach for Solving
Decentralized-POMDP: Assessment on the Pursuit Problem”, Proceedings of
the 2002 ACM Symposium on Applied Computing (SAC), March 2002.
118
[25] S. P. Singh, T. Jaakkola, and M. I. Jordan, ”Learning Without StateEstimation in Partially Observable Markovian Decision Processes”, Proceedings of the 11th International Conference on Machine Learning, pp.
284 − 292, 1994.
[26] D. Kim, J. Lee, K.-E. Kim, and P. Poupart, ”Point-Based Value Iteration for Constrained POMDPs”, Proceedings of the 22nd International Joint
Conference on Artificial Intelligence (IJCAI-11), July 2011.
[27] R. C. Chen, and G. L. Blankenship, ”Dynamic Programming Equations
for Discounted Constrained Stochastic Control”, IEEE Trans. on Automatic
Control, vol. 49, pp. 699 − 709, May 2004.
[28] F. J. Beutler, and K. W. Ross, ”Optimal Policies for Controlled Markov
Chains with a Constraint”, Journal of Mathematical Analysis and Applications 112, pp. 236 − 252, 1985.
[29] D. V. Djonin, and V. Krishnamurthy, ”MIMO Transmission Control in
Fading Channels: A Constrained Markov Decision Process Formulation
With Monotone Randomized Policies”, IEEE Trans. on Signal Processing,
vol. 55, pp. 5069 − 5083, Oct. 2007.
[30] K. P. Murphy, ”A Survey of POMDP Solution Techniques”, Sept. 2000.
119
© Copyright 2026 Paperzz