This paper was presented as part of the main technical program at IEEE INFOCOM 2011
Optimal Cooperative Sensing Scheduling for
Energy-Efficient Cognitive Radio Networks
Tengyi Zhang and Danny H. K. Tsang
Department of Electronic and Computer Engineering
The Hong Kong University of Science and Technology
Email: {zhangty, eetsang}@ust.hk
AbstractβDue to the problem of spectrum scarcity and
large energy consumption in wireless communications, designing energy-efficient Cognitive Radio Networks (CRNs) becomes
important and necessary. In this paper, we consider the problem
of optimal Cooperative Sensing Scheduling (CSS) and parameter
design to achieve energy efficiency in CRNs using the framework
of Partially Observable Markov Decision Process (POMDP). In
particular, we consider the CSS problem for a CRN with π
Secondary Users (SUs) and π primary channels to determine
how many SUs should be assigned to sense each channel in
order to maximize the objective function that is related to energy
efficiency. By assigning more SUs to sense one channel, higher
sensing accuracy can be gained; however, by spreading out the
SUs to sense more channels, spectrum opportunities can be better
exploited. The CSS problem is formulated as a combinatorial
optimization problem. While such problem is generally hard and
can only be solved by numerical methods with high computation
complexity, in this paper we provide a detailed analysis and
the analytical results provide useful and interesting insights. The
optimality of the myopic CSS is proved for the case of two
channels, and it is also conjectured for the general case. We
also study the tradeoff between the sensing and transmission
durations. In addition, the structure of the optimal sensing time
that maximizes the energy efficiency objective is also analyzed,
the condition for the optimality of the myopic sensing time is
obtained, and the performance upper bound of the myopic policy
is derived. Based on the numerical results, we show that by
carefully tuning a punishment parameter, better energy efficiency
can be achieved.
I. I NTRODUCTION
In recent years, the dramatic growth in various wired and
wireless communication applications leads to great increase
in related energy consumption. Therefore, it is time for the
communication world to investigate radio and networking
solutions which are energy-efficient and resource-efficient, i.e.
green communications.
As one of the emerging wireless technologies, Cognitive
Radio (CR) is considered to be a promising solution for
improving spectrum efficiency. By intelligently monitoring the
spectrum, Secondary Users (SUs) are able to opportunistically
access the idle spectrum originally assigned to Primary Users
(PUs). On the other hand, such new functionalities and additional tasks (e.g., spectrum sensing) make the CR enabling
devices energy-consuming. Meanwhile, with its agility and
intelligence, CR technology creates new possibilities and
methods to realize green communications [1]. These facts
motivate us to study the analysis and designing issues in
energy-efficient Cognitive Radio Networks (CRNs).
978-1-4244-9920-5/11/$26.00 ©2011 IEEE
Consider a centralized CRN with multiple SUs and a
Base Station (BS), which responses for the scheduling and
coordination among the SUs. Cooperative spectrum sensing
[2] is adopted to improve the sensing accuracy to better
protect the PUs and capture the spectrum opportunities in
a slotted primary system. In each slot, the BS determines:
(1) assign how many SUs to sense each channel; (2) how
long is the sensing duration; (3) whether to allow the SUs to
access the primary channels based on the sensing outcomes.
This sequential decision-making problem is studied in the
framework of Partially Observable Markov Decision Process
(POMDP). Meanwhile, the problem of how to improve energy
efficiency by optimally designing a parameter is also studied.
A. Fundamental Tradeoffs and Contributions
1) Cooperative sensing scheduling: At the beginning of each
slot, the BS decides how many SUs should be assigned to
sense each channel in order to maximize the immediate reward
(which is related to energy efficiency). This Cooperative
Sensing Scheduling (CSS) problem is actually a combinatorial optimization problem. Such problem is only solved by
numerical methods and the computational complexity is high.
However, for the problem studied in this paper, we provide
an analytical framework. With the help of Discrete Convex
Analysis theory, we obtain the optimal solution analytically
and some useful and interesting insights are gained. We show
that for a given number of channels that need to be sensed, the
best combination is to assign the SUs as equally as possible for
each channel. In addition, we show the necessary and sufficient
condition for the case that assigning all the SUs to cooperatively sense one channel is the optimal solution of the CSS
problem. By applying our framework, the optimal solution for
general cases, i.e. any number of SUs and channels, can be
obtained as well. To the best of our knowledge, this is the
first work that analytically obtains the optimal solution for
this combinatorial optimization problem.
2) Optimality of the myopic CSS: The myopic CSS is shown
to be the optimal CSS, under the case of 2 channels and
fixed sensing durations. This result is useful for reducing the
complexity of obtaining the optimal CSS, since the myopic
CSS is analytically obtained, while the general method requires recursive computation. From the numerical results we
conjecture that this optimality also holds for general number
of SUs and channels.
2723
2
3) Structure of the optimal sensing duration: As a sequential
decision problem, the BS needs to determine the sensing
duration after performing the CSS. Although the myopic
sensing duration can be found by numerical methods, it does
not always preserve the optimality. In this case, we provide
the analysis on the structure of the optimal sensing duration
and show that it is always larger than or equal to the myopic
sensing duration. The conclusion is similar as [10], but under
a different model. Moreover, we also establish the condition
for the optimality of the myopic sensing duration.
4) Designing the punishment parameter to improve energy
efficiency: Motivated by the result in [11], we introduce a
punishment parameter for the unsuccessful transmissions to
help the CRN to create less collision to the PUs so as to
achieve higher energy efficiency by saving the power for retransmission. We derive the myopic performance upper bound
by adopting the methods in [8]. Numerical results show that
the energy efficiency of the CRN can be improved by carefully
tuning the punishment parameter.
B. Related Works
Designing optimal MAC protocols for opportunistic spectrum access in the framework of POMDP started from [4].
A separation principle for a joint design problem of the
spectrum sensor operating point, sensing channel selection
and the access policy is established in [12]. [13] showed the
optimality of a simple but robust round-robin myopic channel
selection policy held for general number of positive correlated
channels, while [8] extended the optimality to the imperfect
sensing case.
Literature is also rich for the technique of cooperative
sensing. [2] provided a detailed survey about various schemes
to fuse the sensing information from SUs. The impact of
the cooperative sensing overhead on the system throughput
was studied in [14], with the consideration of the number
of reporting packets. [15] studied the tradeoff of finding the
optimal sensing time and the parameter for the result fusion in
order to maximize SUsβ throughput. [16] extended the analysis
to the case of multiple channels using soft decision fusion rule.
Existing works of cooperative sensing problem mainly
focused on how to achieve the best sensing performance
on a single channel. However, the discussion on how to
assign SUs to sense multiple channels, i.e. the CSS problem,
is still missing except a recent work [17]. Moreover, the
cooperative sensing problem is commonly formulated as a
static optimization problem including the literature mentioned.
It is of importance to consider this CSS problem under
varying spectrum environment with uncertainty, which results
in the sequential decision problem studied in this paper. As
mentioned in [17], this CSS problem is considered to be NPhard. Different from [17] and other works which adopted
numerical methods, we obtain the optimal CSS solution in an
analytical way. Only the probability of detection is considered
in [17], while we consider a more practical case, where the
spectrum sensing performance is described by the probability
of detection as well as the probability of false alarm. Moreover,
soft result fusion method was applied in [17], while we
utilize the hard result fusion method which requires much
less overhead. Unlike [8], our work considers a dynamic CSS
problem, and different number of channels may be sensed
in each slot due to the CSS result. In addition, we consider
an energy efficiency-oriented objective in POMDP, while the
sensing duration is variable in each slot. On analyzing the
structure of optimal sensing duration, a similar result as in [10]
is obtained. However, in this paper the case of multiple primary
channels and a different reward function are considered. We
also present the condition for the optimality of the myopic
duration.
II. S YSTEM MODEL
Assume there exists π independent and stochastically
identical Gilbert-Elliot channels owned by PUs, denoted by
π© = {1, 2, ..., π }. The CRN consists of π SUs and a BS,
where π β₯ π 1 . The primary system operates in a time slotted
manner with a fixed slot duration πΏ. The occupancy state of
each channel transits according to a two-state discrete-time
Markov chain with transition probabilities {πππ }π,π=0,1 at the
beginning of each slot where π00 β₯ π10 2 . This system model
is commonly used to abstract physical channels with memory,
and the slotted system structure is shown to fit well in the
application of CRN (see [4], [16] and references therein). Let
π π (π‘) β {0 (idle), 1 (busy)} denote the occupancy state of
channel π in time slot π‘. The primary system state in slot π‘
can be denoted as s(π‘) β [π 1 (π‘), ..., π π (π‘)].
SUs are required to carry out spectrum sensing before
operating on the primary channels using energy detection
mechanism, which is widely adopted in CRNs. Each SU can
only sense one channel at a time due to physical limitations.
The spectrum sensor of each SU detects the presence of PU
signals by performing the binary hypothesis test as follows:
π»0 : π π (π‘) = 0(idle), and π»1 : π π (π‘) = 1(busy).
(1)
The sensing performance of each SU can be described by the
probability of detection ππ β Pr{decide π»1 β£π»1 is true} and
the probability of false alarm ππ β Pr{decide π»1 β£π»0 is true}.
We focus on the complex-valued PSK signal and Circular
Symmetric Complex Gaussian (CSCG) noise case [3], without
loss of generality. Under this model, for a given probability
of detection ππ , the probability of false alarm is given by:
(β
)
β
ππ = π¬ 2πΎ + 1π¬β1 (ππ ) + π ππ πΎ ,
(2)
where π¬(β
) is the complementary distribution function of the
standard Gaussian, π¬β1 (β
) denotes the inverse of π¬(β
), πΎ
denotes the received signal-to-noise ratio (SNR) of the primary
signal at the SU, π denotes the sensing time and ππ denotes
the sampling rate.
In this paper, the cooperative sensing mechanism is adopted.
Based on the sensing outcomes from individual SU, the BS
1 Generally, the number of channels is greater than that of the SUs [8].
Moreover, our work also holds for the case that π < π .
2 It means the channels are not negative correlated. The case that π
00 < π10
can be similarly analyzed.
2724
3
performs a result fusion procedure to process the individual
outcomes jointly and obtain the final sensing outcome. On
receiving the results from π SUs, the BS will apply the βORβ
rule [2] for fusion, which is a hard decision fusion rule3 and
can be mathematically expressed as:
ππ (π ) = 1 β
π
β
π=1
(1 β ππ,π ), ππ (π ) = 1 β
π
β
(1 β ππ,π ), (3)
π=1
where ππ,π and ππ,π denote the probability of detection and
probability of false alarm obtained by SU π, respectively. In
this paper all SUs are assumed to be homogeneous, i.e. with
the same sensing performance. The case for heterogeneous
sensing performance can be easily incorporated.
III. P ROBLEM F ORMULATION
At the beginning of each slot, the BS will sequentially
determine ππ (π‘), the number of SUs that assigned to sense
channel π, with what sensor operating point and for how
long. All SUs will be assigned a channel for sensing and with
the same sensing duration for synchronization consideration.
Transmission decision is made based on the fusion outcome
after collecting the sensing reports from SUs. The BS will
randomly allocate the channels which are selected for transmission to SUs, while the fairness issue is beyond our scope.
At the end of a slot, SU(s) utilizing a channel will send an
ACK to the BS if the transmission is successful.
Since the channel sensing capability is limited (i.e. some of
the channels may not be sensed) and the sensing performance
is imperfect, the system state is not fully observable to the
CRN. The BS can only abstract the system state in a probabilistic way by incorporating the decision and observation
history. Hence, our problem fits into the framework of Partially
Observable Markov Decision Process (POMDP).
A. Observation
Let ππ (π‘) denote the observation of channel π obtained in
slot π‘. There are four possible outcomes: (i) ππ (π‘) = 0, denotes
that data transmission is performed and ACK is received; (ii)
ππ (π‘) = 1, denotes that data transmission is performed and no
ACK is received; (iii) ππ (π‘) = 2, denotes that the channel is
determined as busy based on the result fusion outcome and will
not be utilized; (iv) ππ (π‘) = 3, denotes that the BS determines
not to sense the channel. The system observation vector can
be expressed as π½(π‘) β [π1 (π‘), ..., ππ (π‘)]. Note that these four
observations can be distinguished since the BS governs the
transmission decisions.
B. Belief Vector
The sufficient statistic of the system state is described as
π
the belief vector b(π‘) β {π10 (π‘), ..., ππ
0 (π‘)}, where π0 (π‘) is the
conditional probability that π π (π‘) = 0 given the decision and
observation history [4] and ππ1 (π‘) = 1βππ0 (π‘). b(π‘) is computed
at the end of slot π‘ after the observation is received and is used
3 Since hard decision only requires one bit information feedback, it is
favored for reducing the overhead.
for decisions making in slot π‘ + 1. Based on the action and
the observation received in slot π‘, the updating of the belief
vector b(π‘) β π― (b(π‘ β 1)β£π΄(π‘), π½(π‘)) can be obtained through
the Bayes rule [19].
C. Reward
A reward will be received at the end of each slot. Since
our objective is to design energy-efficient CRNs, the energy
consumption is taken into account. Specifically, the reward
for channel π, π
π (π΄(π‘), π½(π‘)), consists of the following: (i)
When ππ (π‘) = 0, a positive reward (πΏ β π β π (π‘))(π΅ β ππ‘ )
will be received, where π (π‘) β (0, πΏ β π) denotes the sensing
duration, π denotes the duration for sensing scheduling and
result fusion in the BS, π΅ denotes the reward for successful
transmission and βππ‘ denotes the energy consumed for transmission, both are proportional to the transmission duration. (ii)
When ππ (π‘) = 1, a negative reward (πΏ β π β π (π‘))(βππ€ β ππ‘ )
will be received. βππ€ can be regarded as the punishment for
the interference generated to the PU, meanwhile can reflect
the energy waste due to collision. This is the key parameter
to achieve the energy-efficient CRN design, which will be
analyzed in detail in later sections. (iii) When a channel is
sensed, a negative reward βπ (π‘)π will be received, where π
denotes the energy consumed in spectrum sensing per unit
of time. (iv) When a channel is not selected for sensing, no
reward will be received.
D. POMDP Problem Formulation
We expect that the CRN can carry out as many successful
transmissions as possible while minimize the collisions caused
to the PU, since collisions will result in retransmission and
energy is wasted. Therefore, the objective of the POMDP
problem is to find out the optimal policy π that can maximize
the total reward received in π slots. A policy π specifies a
sequence of functions π = [π1 , ..., ππ ], where ππ‘ maps the
belief vector b(π‘ β 1) to an action π΄(π‘) in slot π‘. Our problem
can be consequently formulated as
βπ
π β = arg max πΌπ {
π
(π΄(π‘), π½(π‘))β£b(0)}
(4)
π
π‘=1
βπ
with constraint
β₯ π¯π and
π=1 ππ (π‘) = π , βπ β
π© , π‘ = 1, ..., π . b(0) is the initial belief vector whose entries
10
are set to the stationary distribution ¯π = π01π+π
of the
10
π
underlying Markov chain [4] [12]. Constraint ππ (π‘) β₯ π¯π
serves as the protection for the PUs, which is defined as the
probability of detection for all the PU channels πππ (π‘) should
be larger than some threshold π¯π pre-determined by the PUs.
It has been shown that [4] by tuning the operating point of
the sensors to make the equality hold, then the optimal access
policy is to access channel π if the result fusion outcome is idle
and not to access otherwise. Applying this result, the constraint
πππ (π‘) β₯ π¯π can be removed and the original problem becomes
an unconstrained POMDP problem. Moreover, (2) reveals that
the probability of false alarm can be obtained when the target
probability of detection and sensing duration are determined.
As a result, the action of the BS in each slot can now be
expressed as π΄(π‘) β {a(π‘), π (π‘)}, where a(π‘) β {ππ (π‘)}πβπ© .
2725
πππ (π‘)
4
E. Optimal Policy and Myopic Policy
To solve the objective function (4), we resort to the following value function to obtain the optimal policy:
ππ‘ (b(π‘ β 1))
=
+
βπ
max{πΌπ‘ (b(π‘ β 1))
π΄(π‘)
πΌ[ππ‘+1 (π― (b(π‘ β 1)β£π΄(π‘), π½(π‘))]}, (5)
with constraint
π=1 ππ (π‘) = π and πΌπ‘ (b(π‘ β 1)) =
πΌ[π
(π΄(π‘), π½(π‘))β£b(π‘ β 1)] denotes the expected immediate reward. The value function (5) represents the maximum expected
reward accumulated from slot π‘ up to the maximum time
horizon π . The computation complexity required to obtain the
optimal policy is very high. One of the methods for addressing
this problem is to apply the myopic policy [4], which merely
focuses on the immediate reward and ignores the impact of
current policy on the future rewards. The myopic policy is
Λ
given by π΄(π‘)
= arg maxπ΄(π‘) πΌπ‘ (b(π‘ β 1)), with constraint
βπ
π
(π‘)
=
π
. Generally, the myopic policy reduces the
π=1 π
computational complexity but possibly sacrificing optimality.
In the following sections, the myopic policy is shown to be in
fact optimal under some conditions.
IV. M YOPIC C OOPERATIVE S ENSING S CHEDULING
In the beginning of each slot, the first task of the BS
is to determine for each channel, how many users should
be assigned to perform spectrum sensing cooperatively. As
pointed out in [2] [14], the more SUs sensing the channel,
the better the spectrum sensing performance. However on the
other hand, some of the channels may not be sensed since
the number of SUs is limited, then the spectrum opportunities
cannot be fully exploited. The objective of the tradeoff in
CSS, i.e. between the sensing accuracy and the spectrum
opportunities, is to find an optimal scheduling of the SUs in
order to maximize the immediate reward received by the BS.
For a fixed sensing time π (π‘) = πΛ, define πΌΛπ‘ (b(π‘ β 1)) β
πΌπ‘ (b(π‘ β 1))β£π (π‘)=Λπ . We can obtain the myopic CSS in slot π‘
by solving the following maximization problem:
βπ
(P1:) maxa(π‘) πΌΛπ‘ (b(π‘ β 1)), s.t.
ππ = π.
(6)
π=1
Without loss of generality, we consider the case that in any
slot π‘, the belief values of all the channels are the same. The
case of channels with different belief values will be discussed
later. After careful inspection on πΌΛπ‘ (b(π‘β1)), we instead solve
the following problem (P2):
β
(P2:) maxa(π‘)
(1 β ππ (ππ (π‘)))
(7)
βπ
{π:ππ (π‘)>0}
with constraint π=1 ππ (π‘) = π . At the end of this section
we will show the optimal solution for (P2) is actually the
one for (P1). Although the optimization objective in (P2)
has a simpler form compared to (P1), (P2) is a combinatorial optimization problem and regarded to be NP-hard [17].
Moreover, even numerical methods can be applied to find the
optimal solution, such methods cannot provide any insight to
the system design, i.e. how will CSS change with the system
parameters like the number of total SUs π .
In this section, we analytically study this combinatorial
problem and find some useful and interesting insights. We
also establish the conditions for some specific scheduling
combinations to be the myopic policy. Before exploiting the
pattern of the sensing scheduling, we need to first examine
the properties of the objective function (7). We begin with the
following two lemmas.
Lemma 1: Let π be a continuous variable with domain
[1, +β). Denote πΛπ (π) as the relaxed individual probability
of detection as a function of π:
πΛπ (π) = 1 β (1 β π¯π )1/π ,
(8)
which is decreasing and convex.
Proof: Eq. (8) relaxes the integer variable in (3) to a
continuous one. Taking the first-order derivative of πΛπ (π):
(
)
ln(1 β π¯π ) ln(1 β π¯π )
βΛ
ππ (π) = exp
,
(9)
π
π2
where β denotes the differentiation of the function with
respect to its argument. Since 0 < 1 β π¯π < 1, we have
βΛ
ππ (π) < 0, which means πΛπ (π) is decreasing in π.
Similarly, the second-order derivative of πΛπ (π) is shown to
be positive [19]. From these two derivatives, πΛπ (π) is proved
to be decreasing and convex.
Lemma 2: Denote πΛπ (π) as the relaxed individual probability of false alarm as a function of π:
(β
)
β
πΛπ (π) = π¬ 2πΎ + 1π¬β1 (Λ
ππ (π)) + πΛππ πΎ ,
(10)
which is obtained from (2) and is decreasing and convex.
Proof: From (10), we have
β
1
βΛ
ππ (π) = ββ π₯1 2πΎ + 1βπ¬β1 (Λ
ππ (π)),
(11)
2π
β
β
where π₯1 = exp[β( 2πΎ + 1π¬β1
(Λ
ππ (π)) + ]πΛππ πΎ)2 /2], and
[
β
β1
2
ππ (π). The
βπ¬β1 (Λ
ππ (π)) = β 2π exp (π¬ (Λπ2π (π))) βΛ
derivative [of the inverse
of π¬(π₯) is given by βπ¬β1 (π₯) =
]
β
(π¬β1 (π₯))2
β 2π exp
, which can be derived from [5]. Since
2
βΛ
ππ (π) < 0 from Lemma 1, we have βπ¬β1 (Λ
ππ (π)) > 0.
Therefore βΛ
ππ (π) < 0, i.e. πΛπ (π) is decreasing in π.
Taking the second derivative of πΛπ (π) and after some
manipulation [19], we can show β2 πΛπ (π) > 0. Therefore,
πΛπ (π) is decreasing and convex in π.
These two lemmas present the properties of πΛπ (π) and πΛπ (π),
with respect to the variable π. Originally, the number of
cooperating SUs can only be integer, which is difficult to
analyze. For the sake of the analysis convenience, we relax
the number to a continuous variable π while preserving
the definition of the individual probability of detection and
probability of false alarm, and using the results as the basis
for the following analysis. The actual individual probability
of detection and probability of false alarm can be regarded
as some discrete points on the relaxed functions πΛπ (π) and
πΛπ (π). Based on these two lemmas, the property of the relaxed
probability of false alarm after result fusion, πΛπ (π), can be
2726
5
characterized as follows:
Proposition 1: The probability of false alarm after result
fusion, πΛπ (π), is decreasing and convex on the domain of π,
if the following condition holds.
]2
[
π
βΛ
ππ (π)
ln(1 β πΛπ (π)) β
1 β πΛπ (π)
[β
]2
2βΛ
ππ (π) + πβ2 πΛπ (π)
πβΛ
ππ (π)
<
β
, βπ. (12)
1 β πΛπ (π)
1 β πΛπ (π)
Proof: Since we consider homogenous SUs, the relaxed
probability of false alarm after result fusion can be written
as πΛπ (π) = 1 β (1 β πΛπ (π))π . The first order derivative of
πΛπ (π) can be shown to be smaller than zero from Lemma 1
and Lemma 2. By taking the second-order derivative of πΛπ (π)
and after some algebraic manipulations [19], it can be shown
that β2 πΛπ (π) > 0, i.e. πΛπ (π) is decreasing and convex on
the domain of π if condition (12) holds.
It is natural to conjecture that πΛπ (π) is decreasing and convex.
In fact, [2] [14] show that the more cooperating SUs, the
smaller the πΛπ (π); otherwise we do not have incentive to
perform the cooperative sensing. Moreover, for π β +β,
condition (12) should hold; otherwise πΛπ (π) will eventually
go below zero which is impossible. From the extensive simulations we have performed, condition (12) holds for most of
the cases, i.e. πΛπ (π) is usually decreasing and convex. Even
there is a concave section, it can be easily incorporated into our
following analytical framework. Without loss of generality, we
assume πΛπ (π) to be decreasing and convex throughout this
paper. With the above results, we are ready to analyze the
combinatorial problem (P2). In order to clearly describe our
analysis, we introduce the following definitions.
Definition 1: All combinations, i.e. the way of assigning
SUs to sense the channels, are divided into groups. Define
πΊπ , π = 1, ..., π as the group consists of β£πΊπ β£ combinations
in which exactly π channels are sensed (i.e., each of the π
channels is sensed by at least one SU). At most π channels
will be sensed since π β₯ π . The reason for such division
is that in some combinations, some channels may not have
any SU assigned for sensing. Here we consider a general
case, where the channels have different belief values. We order
the channels according to their belief values in a descending
order and define in πΊπ , the first π channels in the ordering are
selected for sensing. Define r β {π1 , ..., ππ } where π1 denotes
the real channel number which is the first one in the ordering
(i.e., with the largest belief value), π2 denotes the real channel
number of the second one, etc. Denote πΆπ,π = {πππ,π } as the
π-th combination in group πΊπ , where π = 1, ..., β£πΊπ β£. Let πππ,π
denote the number of SUs assigned to sense channel ππ in
combination πΆπ,π group πΊπ and π = 1, ..., π. Further assume
β²
πππ,π β₯ πππ,π for π < π β² . The reason is that the channels are
β²
ordered according to the belief values, the case of πππ,π β₯ πππ,π
will produce an objective value of (7) β²greater than or equal to
that produced by the case of πππ,π < πππ,π , for π < π β² . When the
belief values are the same, the logic also applies. Note that by
this assumption we have already excluded some
βnon-optimal
combinations. It is obvious that πππ,π β₯ 1 and π πππ,π = π ,
βπΊπ , πΆπ,π .
Definition 2: Each combination may produce different value
of (7). As a result, let the operator β» denote that a combination
πΆπ,π is larger than πΆπβ² ,πβ² , βπ, πβ² , π, πβ² . In other words, πΆπ,π β»
πΆπβ² ,πβ² means πΆπ,π can produce larger value of (7) than πΆπβ² ,πβ² .
Similarly, we define β½ and βΌ for the relationship of βlarger
than or equal toβ and βsmaller than or equal toβ, respectively.
We first investigate in each group, which combination is the
largest one. Consider group πΊπ . If πππ,π can take continuous
values, it is straightforward that the combination πΆπ,π with
πππ,π = ππ , βπ is the largest one due to
1
π
1
1
πΛπ ( ) β€ πΛπ (π1π,π )+ πΛπ (π2π,π )+...+ πΛπ (πππ,π ), βπΆπ,π (13)
π
π
π
π
which results from πΛπ (π) is convex. However, in (P2) only
integer values are allowed for πππ,π . In this case, we resort to
the theory of Discrete Convex Analysis.
The theory of discrete convex analysis is introduced by
Murota[6], which incorporated discrete settings and the concept of combinatorial optimization into the framework of
convex analysis. A comprehensive survey is referred to [7].
Simply speaking, in our problem, the actual probability of
false alarm ππ (π), π β [1, +β) and π β β€ is a discrete
convex function, since πΛπ (π) is convex and ππ (π) = πΛπ (π)
βπ β [1, +β) and π β β€. It can be interpreted also in the
sense that ππ (π) is actually taking the integer points on the
domain of πΛπ (π).
Murota also introduced the concepts of L-convex functions
and M-convex functions. We briefly introduce these concepts
for the case of two scalar variables. Consider function π (π₯)
where π₯ is a scalar. If π (π₯) is an L-convex function, then
π+π
π (π) + π (π) β₯ π (β π+π
2 β) + π (β 2 β), βπ, π β β€. This property
is referred as discrete midpoint convexity. Then if π (π₯) is an
M-convex function, it follows π (π)+π (π) β₯ π (π+π)+π (πβπ),
βπ β€ π, π + π β€ π β π, π, π, π β β€. This property is referred as
equidistance convexity. We can easily establish the following
lemma due to the fact that we only have a scalar variable:
Lemma 3: ππ (π) is both L-convex and M-convex function.
Now we are ready to show which combination is the largest
one in each group. (13) provides the insight that the largest
combination should be the one that equally assigns the SUs
on each channel, which can lead to the largest value of the
objective function. For the case that ππ is an integer, same
conclusion holds for ππ (π). By also considering the case that
π
π is not divided, we have the following proposition:
Proposition 2: The largest combination πΆπ,max in group πΊπ
is given by
π
π β π
πππ,max = β β or πππ,max = β β,
ππ,max = π, βπ, π.
π
π
π
(14)
The largest combination has the following property:
β
β
ππ (πππ,max ) <
ππ (πππ,π ),
(15)
2727
π
π
6
where {πππ,π } excludes the largest combination πΆπ,max .
Proof: It is easyβto see that (14) has a unique solution
under the constraint π πππ,max = π . Without loss of generality, consider a combination πΆπ,π in group πΊπ . By recursively
utilizing the properties of discrete midpoint convexity and
equidistance convexity from Lemma 3, we have the following
procedure: 1) for π1π,π and πππ,π , since ππ (π1π,π ) + ππ (πππ,π ) β₯
ππ (β
π1π,π +πππ,π
2
π1π,π +πππ,π
β) + ππ (β
π1π,π +πππ,π
π1π,π +πππ,π
2
β), we replace π1π,π and πππ,π with
β 2 β and β 2 β and obtain a new combination πΆπ,π1 .
It is obvious that πΆπ,π1 β½ πΆπ,π . 2) Re-arrange the order within
πΆπ,π1 to meet the ordering requirement for πππ,π mentioned in
Definition 1. Then repeat the same operation in 1) to gain
a larger πΆπ,π2 . 3) The logic behind the previous two steps is
that we always take the largest and smallest πππ,π in a group
and replace them with the average of them. By doing so, we
have a larger group due to Lemma 3. And it is apparent that by
performing the operations in 1) and 2) for several times, all πππ,π
will equal to the ones given in (14). The largest combination
in πΊπ is then obtained and (15) follows.
Proposition 2 reveals a similar result as in (13), i.e. in each
group when exactly π channels need to be sensed, one should
distribute the SUs among the channels as equally as possible,
in order to gain the minimum summation of the probabilities
of false alarm. This interesting result is similar to the waterfilling property [18] to some extend, where the same amount
of power will be allocated if the channels are homogeneous
in noise. Here for the homogeneous channels, SUs are equally
assigned for sensing.
On finding out the largest combination in each group, the
last step towards the optimal solution to (P2) is to find the
largest one among these π combinations πΆπ,max , π = 1, ..., π .
The optimal solution and its related properties is given in the
following theorem.
Theorem 1: (i) The optimal solution of (P2) is πΆ1,max , i.e.
all π SUs sense one channel cooperatively, if the condition
π
π
β) + ππ (β β) β ππ (π ) β 1 β₯ 0
(16)
2
2
holds, which is a necessary and sufficient condition.
(ii) If condition (C0) holds for π β² , then for all π < π β² ,
condition (C0) also holds.
(iii) If condition (C0) does not hold for π β² , then for all
π > π β² , condition (C0) will never hold. In other words,
the BS will never assign all SUs to cooperatively sense one
channel if the network has π or more SUs.
Proof: We first assume (i) and (ii) hold for π β 1. We
begin with the proof of sufficiency. Consider two consecutive
groups, πΊπ and πΊπβ² where πβ² = π + 1 and π β₯ 2. Consider πΊπ .
From Lemma 3 and Proposition 2, some manipulations can be
perform on πΆπ,max to obtain
(C0 :) ππ (β
πΆπ,max
β½ πΆπ,π
β²
β²
relationship in the following. Meanwhile πΆπ,π above is also
a valid combination and will not cause any problem. Since
(14) reveals that πππ,max β ππ+1
π,max β€ 1, it then follows that
β²
β²
β²
β²
β²
ππβ² β1 +ππβ²
β²
β²
β²
β²
π β1
π
π
ππ (πππβ² β1
,max ) + ππ (ππβ² ,max ) β ππ (ππβ² ,max + ππβ² ,max ) β 1 β₯ 0,
which implies πΆπ,π β½ πΆπβ² ,max . Since πΆπ,max β½ πΆπ,π , this result
builds a bridge between two consecutive groups and reveals
that the largest combination πΆπ,max for π β₯ 2 is πΆ2,max . Then
π
for π , if the following inequality ππ (β π
2 β) + ππ (β 2 β) β
ππ (π ) β 1 β₯ 0 holds, then πΆ1,max β½ πΆ2,max . The inequality
is actually the same as given in (i).
We prove the necessity. In the results above, we have shown
that πΆ2,max is the largest combination excluding πΆ1,max .
Therefore in order for the combination πΆ1,max to be the
optimal solution, the only requirement is πΆ1,max β½ πΆ2,max ,
which is satisfied if condition (C0) holds.
Now we prove (iii). Assume for π condition (C0) does
π
not hold, i.e. ππ (β π
2 β) + ππ (β 2 β) β ππ (π ) β 1 < 0. Then
consider the left hand side in the case of π +1. Since β π
2 β=
π +1
β π2+1 β and β π
2 β + 1 = β 2 β can be easily verified, after
some manipulation we arrive at
π
π
ππ (β β) + ππ (β β) β ππ (π ) β 1
2
2
[
]
π +1
π +1
β
ππ (β
β) + ππ (β
β) β ππ (π + 1) β 1
2
2
π
π +1
= ππ (β β) β ππ (β
β) β [ππ (π ) β ππ (π + 1)]
2
2
which is larger than 0 from the decreasing and convex property
of πΛπ (π). This result implies that ππ (β π2+1 β)+ππ (β π2+1 β)β
ππ (π + 1) β 1 < 0, which means πΆ1,max βΌ πΆ2,max for
π + 1. In this case, assigning all SUs to sense one channel
cooperatively is always not the best action, which finishes the
proof of (iii). In fact, (ii) follows the same argument as (iii)
and hence be proved as well. To this end we complete our
self-contained proof for Theorem 1.
Theorem 1 introduces the condition that the BS assigns
all SUs to cooperatively sense one channel in order to gain
the largest objective value, given the number of SUs π . The
structural results show that sensing less channel to gain higher
sensing accuracy is better in the sense of the objective function
in (P2) and condition (C0). In addition, (ii) and (iii) reveals
that for given system parameters, there exists a threshold value
of π , which determines whether the combination πΆ1,max
is the optimal solution. Now turn back to (P1). After some
manipulation, we have
β²
π β1
π
= {π1πβ² ,max , π2πβ² ,max , ..., πππβ² β2
,max , (ππβ² ,max + ππβ² ,max )}.
β²
β²
ππβ² β1 +ππβ²
β²
π ,max
π ,max
πππβ² β1
β and πππβ² ,max = β π ,max 2 π ,max β.
,max = β
2
In this case, from the assumption that (ii) holds we have
β²
π β2
π
Note that (πππβ² β1
,max +ππβ² ,max ) is larger than ππβ² ,max , however we
do not change to the correct ordering for easier showing the
2728
πΌΛπ‘ (b(π‘ β 1))
β
{
}
=
(1 β ππ (ππ (π‘)))(πΏ β π β πΛ)(π΅ β ππ‘ )π₯2
{π:ππ (π‘)>0}
+
β
{π:ππ (π‘)>0}
{
}
(1 β π¯π )(πΏ β π β πΛ)(βππ‘ β ππ€ )π₯2 β πΛπ ,
7
where π₯2 = π10 + ππ0 (π‘ β 1)(π00 β π10 ). It is obvious that
πΌΛπ‘ (b(π‘ β 1))β£a(π‘)=πΆ1,max β₯ πΌΛπ‘ (b(π‘ β 1))β£a(π‘)=πΆπ,π , for all π, π,
since the second term is negative. As a result, the optimal
solution obtained for (P2) is also the optimal one for (P1).
During the derivation of the condition (C0), we analytically
show the methods for obtaining the second largest combination
among all possible combinations, then compare it with our
desired combination (i.e., πΆ1,max ) to gain the final condition. It
can be seen from the derivation procedure that when the channels have different belief values and be ordered accordingly
in an descending way, (C0) is easier to be satisfied. Hence
considering the channels with the same belief values is the
most stringent case. By utilizing the intrinsic properties of our
problem and Lemma 3, the optimal solution of a combinatorial
optimization problem is obtained in an analytical way. Another
significance of the result is that given the number of SUs π , it
can be immediately shown that whether we should put all the
efforts (assigning all SUs) to sense one channel, i.e. whether
πΆ1,max is the optimal solution by simply testing (C0).
Although only the case that (C0) is satisfied is considered
here, our framework can be extended to more general case,
i.e. when π is large and (C0) is violated. An interesting
observation from our framework is that when π becomes
large, more channels should be sensed. The reason is in
this situation, exploiting more spectrum opportunities is more
beneficial, which is in fact the tradeoff lies in our problem.
Showing the optimal solution for general cases is left for our
future work.
Specifically, we would like to point out that although we
assume πΛπ (π) is convex, our contribution will not be compensated even if πΛπ (π) is not always convex on the domain of
π. As mentioned before, the concave section can only occur
in the beginning of the domain due to the value of πΛπ (π) is
bounded by zero. By working out the range where πΛπ (π) is
concave, we can follow the same procedure in the derivation
of the largest combination and the proof of Theorem 1, and
obtain the new combination ordering and the optimal solution
corresponding to the new property of πΛπ (π). In the following
sections, we study the case that π satisfies condition (C0),
and assume it holds for all πΛ within the range of [0, πΏ β π].
V. S TRUCTURE OF THE O PTIMAL P OLICY
A. Optimal Cooperative Sensing Scheduling
It is natural to consider whether the myopic CSS is also the
optimal one under the same condition. It has been confirmed
in our extensive simulation results, that when condition (C0)
is satisfied the optimal CSS is to assign all SUs to sense one
channel. However, it turns out to be difficult to analytically
prove this result since the imperfect spectrum sensor introduces complexity in the belief vector update. Moreover, we
have one more observation compared to [8], which implies
more information needs to be handled and the problem studied
in this paper appears to be more challenging.
Although it is difficult to deal with the extension of the
myopic CSS to the optimal case, now we prove a simple but
nontrivial case, where two SUs and two channels exist, while
considering fixed sensing duration. The reward parameters ππ‘ ,
ππ€ and π is set to zero for expression simplicity. In fact, the
parameters with general values can be easily incorporated [11].
π01
Similarly as [8], here we assume ππ (π) β€ ππ10
, βπ.
00 π11
Theorem 2: Consider the network with 2 SUs and 2 channels. The optimal CSS at any slot π‘ is to assign all SUs to
cooperatively sense the channel given by arg maxπ ππ0 (π‘ β 1),
if condition (C0) holds.
Proof: First assume we always assign all SUs to sense
one channel. From the result in [8], the BS will choose the
channel with the largest belief value, i.e. arg maxπ ππ0 (π‘ β 1).
Now we need to compare the expected value obtained under
the following two cases: (i) two SUs sense channel 1; (ii) each
SU senses one channel.
At the last time slot π‘ = π , the optimal action is actually the
myopic action. Suppose for π‘ + 1 < π , the BS will assign all
SUs to cooperatively sense the channel with the largest belief
value. We need to show it also holds for π‘. Same as [8], denote
πΛπ‘ (b(π‘ β 1); π) as the expected total reward obtain by sensing
action π in slot π‘ followed by the myopic policy in future slots.
Similarly, denote πΌπ‘ (b(π‘ β 1); π) as the expected immediate
reward obtained by sensing action π in slot π‘. Further define
π = 1 as the action that two SUs sense channel 1 and π = 2
as the action that each SU senses one channel. In order to
show the optimality of the myopic policy, we need to show
πΛπ‘ (b(π‘ β 1); π = 1) β πΛπ‘ (b(π‘ β 1); π = 2) β₯ 0.
For both actions, the observation and system state at π‘
determine the channel selected in π‘ + 1. Therefore, similar
arguments in the proof of Theorem 2 in [8] can be applied. It
follows that
πΛπ‘ (b(π‘ β 1); π = 1) β πΛπ‘ (b(π‘ β 1); π = 2)
= [πΌπ‘ (b(π‘ β 1); π = 1) β πΌπ‘ (b(π‘ β 1); π = 2)] + Ξ
+ [π10 (π‘ β 1)(1 β π20 (π‘ β 1))ππ (2) β (1 β π10 (π‘ β 1))
π20 ππ ππ (1)]Ξ(π00 π11 β π10 π01 ),
(17)
where Ξ is a positive term and Ξ = πΛπ‘ (1β£[0, 1]) β πΛπ‘ (1β£[1, 0]).
πΛπ‘ (πβ£[π 1 , π 2 ]) denotes the expected total reward starting from
π‘ under the action π and system state s(π‘ β 1) in slot π‘ β 1,
the same as in [8]. The first term in the right hand side of
(17) is nonnegative due to condition (C0) holds. Denote the
last term as π₯3 . After some manipulation [19], we have π₯3 β₯
π10 π01 ππ (2)ππ ππ (1) > 0, which leads to πΛπ‘ (b(π‘ β 1); π =
1) β πΛπ‘ (b(π‘ β 1); π = 2) > 0.
Theorem 2 reveals that in this 2 SUs 2 channels simplified
model, the myopic CSS is actually optimal, which highly
simplifies the procedure of finding the optimal policy. This
conclusion provides nontrivial results and insights to the problem at hand. However to show the optimality of the myopic
sensing scheduling for general π and π is highly challenging
and will be left as our future work. In the following sections
we focus on the case that the optimal CSS is assigning all
SUs to sense one channel.
2729
8
B. Structure of the Optimal Sensing Time
The myopic policy consists of two parts: CSS and the
sensing time πΛβ (π‘), where the latter one is given by πΛβ (π‘) =
arg maxπ (π‘) πΌπ‘ (b(π‘ β 1)), which is the solution of a statistic
optimization problem. It can be shown that there is only one
maximum π (π‘) within the range of [0, πΏ β π] by utilizing the
method similar to [15]. The myopic solution can be obtained
by many popular effective searching algorithms with low
complexity. Although the optimal sensing time is very difficult
to obtain, some insights about its structure can be shown in
the following proposition.
Proposition 3: (i) The optimal sensing time π β (π‘) that maximizes the total expected reward is the same as the myopic
sensing time πΛβ (π‘) that maximizes the immediate expected
reward, if π00 π11 = π10 π01 . (ii) The optimal sensing time
π β (π‘) is no smaller than πΛβ (π‘), βπ‘.
Proof: (i): π00 π11 = π10 π01 implies π00 = π01 = π11 =
π10 . All the belief vectors after state transition will be set to the
same value π10 , in which case the expected future reward in
any π‘ will not be influenced by the probability of false alarm,
which is related to sensing duration. Since πΛβ (π‘) maximizes
the immediate reward, the optimal sensing time π β (π‘) should
be equivalent to πΛβ (π‘).
Sketch of the Proof of (ii): Consider π (π‘) β€ πΛβ (π‘). It can
be proved that the expected future reward obtained under
πΛβ (π‘) is larger than that under π (π‘). Since πΛβ (π‘) maximizes
the immediate reward, it also maximizes the total expected
remaining reward. Therefore, only when π (π‘) > πΛβ (π‘) could
π (π‘) result in a larger total expected remaining reward than
πΛβ (π‘). Details please refer to [19].
Here we arrive at a similar conclusion as mentioned in
[10]. Although the myopic sensing time cannot always be
generalized to the optimal sensing time, the analysis reveals
some useful insights to the structure of the optimal sensing
time. Moreover, the myopic sensing time can serve as a
good tradeoff between the computational complexity and the
optimality while the performance is acceptable [10] [4].
model the situation is more complicated, and that slot may
refer to: (i) no ACK is received, and negative reward is obtain;
and (ii) the sensing outcome is busy and thus no transmission
is performed, only the energy for sensing is consumed. Let
βπ denotes the length of π-th transmission period.
1) Average Successful Transmission Amount: We define
the average successful transmission amount βπ΅ as the ratio
between the overall length of successful transmission duration
and the overall length of transmission duration (including
the successful
βπΎ ones, failed ones and silent ones). Let βπ =
βπ
limπββ π=1
denote the average length of a transmission
πΎ
period. The average successful transmission amount βπ΅ is
π (1)
, where π = (πΏβπ β πΛ(1))βπ +
given by βπ΅ β
= 1β πΏβπβΛ
π
πΎ
πΛ(π
B. Energy Efficiency Criteria
We define the Successful transmission oVer Collision (SVC)
criteria to measure the energy efficiency of CRN, which
reflects the ratio between meaningful energy consumption and
energy waste. Denote the SVC criteria as β° ππ πΆ , which is
defined as the ratio between overall successful transmission
duration and the overall collision duration. The upper bound
of β° ππ πΆ can be expressed as the following [19]:
β° ππ πΆ =
VI. E NERGY-E FFICIENT R EWARD PARAMETER D ESIGN
Motivated by the findings in [11], we introduce the punishment ππ€ for collisions. We are interested to see whether
this punishment can help to improve the energy efficiency
in CRNs compared to the general design that without the
punishment. Although the myopic policy does not always
preserve optimality, the performance of which still suffices
to reflect the impact of the punishment parameter on the
energy efficiency. On finding that our problem shares some
similar properties as [8], we utilize the method of analyzing
the steady-state reward in [8], where the only difference is
that our reward function is related to the sensing time π (π‘).
Detailed derivations of this section please refer to [19].
A. Analytical Expression of the Throughput
The concept of Transmission Period in [8] is applied here,
where under the situation of π00 β₯ π10 , the event of channel
switch is equivalent to a slot without positive reward. In our
)
πΛ(1) β lim π=1πΎ 1,π denotes the average length of transπββ
mission duration of a transmission period, ππ,π denotes the
belief value for the π-th slot in π-th transmission period, πΛ(ππ,π )
denotes the optimal sensing time given the corresponding
belief value ππ,π . Moreover, the upper bound of βπ΅ is given by
π (1)
βπ΅ β€ 1β πΏβπβΛ
, where π = (πΏβπβΛ
π (1))β+Λ
π (1)βΛ
π (¯π),
π
¯
π
β = 1 + 1βπ00 (1βππ (π ;Λπ (1))) and ππ (π; πΛ(π)) denotes the
probability of false alarm achieved by π cooperating SUs and
the sensing time πΛ(π).
2) Average Collision Amount: The average collision
amount is defined as the ratio of overall length of transmission
duration that results in collision with the PUs and the overall
length of transmission duration. The upper bound of the averπ¯π )
age collision amount βπΆ is given by βπΆ = (πΏβπβΛπ (1))(1β
.
π
π β (πΏ β π β πΛ(1))
.
(πΏ β π β πΛ(1))(1 β π¯π )
(18)
VII. N UMERICAL R ESULTS
In this section, simulation results are presented to show
the impact of the punishment parameter ππ€ on the energy
efficiency of the CRN. We consider the number of primary
channels to be π = 10 and the number of SUs to be π = 8.
The frame duration is chosen to be πΏ = 100ππ , and the
duration for sensing scheduling and result fusion is π = 0.1πΏ.
The signal model is the same as [3], where the low SNR
region is considered and the sampling rate is ππ = 4MHz.
The required detection probability is set to π¯π = 0.9, and the
parameters related to the reward function are set to π΅ = 10,
ππ‘ = 1 and π = 10.
In Fig. 1(a), we compare the value of the efficiency criteria
β° ππ πΆ achieved in the upper bound derived and the one
achieved in the simulation, under the transition probabilities
π00 = 0.8, π10 = 0.7 and SNR value πΎ = β14dB. We vary
2730
9
40
Energy efficiency
Energy efficiency
38.5
Upper Bound
Simulation
38.4
38.3
38.2
38.1
5
10
15
20
25
35
30
25
Change ew
20
Change B
15
20
(a) Simulation vs. upper bound
Fig. 1.
15
10
Ξ³
step
(b) Tuning ππ€ vs tuning π΅
Performance of energy efficiency by introducing punishment ππ€
the value of the punishment parameter ππ€ from 30 to 550,
where the step size is 40 in Fig. 1(a). Note that considering
the practical situation, the sensing duration cannot use up
the whole frame, otherwise even the sensing result is idle,
transmission will never be performed. In this case, the value
of ππ€ cannot be too large so as to avoid the situation mentioned
above for all belief values. It can be seen from the figure that
the difference between the upper bound and the simulation
results is not big (within 1%). Moreover, the optimal value of
ππ€ is achieved at the boundary of making the model practical.
Fig. 1(a) confirms our idea that by appropriately choosing the
value of ππ€ , better energy efficiency can be achieved.
We consider two different schemes in Fig. 1(b): 1) adjusting
the punishment ππ€ which is introduced in our work; 2) adjusting the reward for successful transmission π΅ while setting
ππ€ = 0, which is the general design. The maximum values of
β° ππ πΆ of the two schemes are compared under different SNR
values, which range from β24dB to β8dB. It is shown in Fig.
1(b) that the first scheme leads to better energy efficiency than
the second one. It can be observed that the improvement is
large in the low SNR region while small in high SNR region.
The reason is that by tuning ππ€ , the BS has better sensing
performance and makes cautious transmission decisions to
avoid collision and perform more successful transmissions;
while by tuning π΅, the collision will not be taken into account.
In low SNR regions, the sensing performance is not good, as a
result even the BS tries to gain higher throughput by tuning π΅,
the throughput is limited by bad sensing performance and the
energy efficiency is low. When SNR is high, only a very small
duration for sensing is needed to maximize the corresponding
value function and the sensing performance tends to become
close in both schemes. Fig. 1(b) reveals that by introducing
the punishment parameter, higher energy efficiency can be
obtained compared to the traditional design.
VIII. C ONCLUSION
In this paper several problems related to energy-efficient
CRN design are studied in the framework of POMDP. For
the myopic CSS problem, which is an NP-hard combinatorial
optimization problem, we provide an analytical framework and
obtain the solution analytically. The optimality of the myopic
CSS is proved under a the case of 2 channels and fixed sensing
duration. We also study the structure of the optimal sensing
duration and establish the condition for the optimality of the
myopic sensing duration. On deriving the upper bound of the
myopic policy performance, we show in the simulations that
by appropriately selecting the punishment parameter, energy
efficiency can be improved compared to the traditional design.
The CSS problem can be further exploited in the future
using our framework. In this paper only the case of assigning
all SUs to sense one channel is discussed. What will be the
optimal solution when π increases and condition (C0) is
not satisfied remains an open question. Another interesting
direction is to investigate the shaping problem of the reward
function. Instead of the simple linear transformation adopted
in this paper, more delicate reward functions that can bring
higher energy efficiency to the CRNs may exist.
R EFERENCES
[1] J. Palicot, βCognitive Radio: An Enabling Technology for the Green
Radio Communications Concept,β in Proc. International Conference on
Wireless Communications and Mobile Computing, Jun. 2009.
[2] K. B. Letaief and W. Zhang, βCooperative Spectrum Sensing,β Cognitive
Wireless Communication Networks, Springer, pp. 115C138, Oct. 2007.
[3] Y.C. Liang, Y. Zeng, E.C.Y. Peh, and A.T. Hoang, βSensing-Throughput
Tradeoff for Cognitive Radio Networks,β IEEE Trans. Wireless Commun.,
vol.7, no.4, pp.1326-1337, Apr. 2008.
[4] Q. Zhao, L. Tong, A. Swami, and Y. Chen, βDecentralized Cognitive
MAC for Opportunistic Spectrum Access in Ad Hoc Networks: A
POMDP Framework,β IEEE J. Sel. Areas Commun., vol.25, no.3, pp.589600, Apr. 2007.
[5] D.E. Dominici, βThe Inverse of the Cumulative Standard Normal Probability Function,β Integral Transforms and Special Functions, vol. 14, no.
4, pp. 281-292, Aug. 2003.
[6] K. Murota, βDiscrete Convex Analysis,β Mathematical Programming,
Springer, vol. 83, no. 1-3, pp. 313-371, Jan. 1998.
[7] K. Murota, βRecent Developments in Discrete Convex Analysis,β Research Trends in Combinatorial Optimization, pp. 219-260, Nov. 2008.
[8] K. Liu, Q. Zhao and B. Krishnamachari, βDynamic Multichannel Access
With Imperfect Channel State Detection,β IEEE Trans. Signal Process.,
vol. 58, no. 5, pp. 2795-2808, Apr. 2010.
[9] R. Smallwood and E. Sondik, βThe optimal control of partially observable
Markov processes over a finite horizon,β Operations Research, pp. 10711088, 1971.
[10] A.T. Hoang, Y.C. Liang, D.T.C Wong, Y. Zeng and R. Zhang, βOpportunistic Spectrum Access for Energy-constrained Cognitive Radiosβ,
IEEE Trans. Wireless Commun., vol. 8, no. 3, pp. 1206-1211, Mar. 2009.
[11] A.Y. Ng, D. Harada and S. Russell, βPolicy Invariance Under Reward
Transformations: Theory and Application to Reward Shaping,β in Proc. of
The Sixteenth International Conference on Machine Learning, Jun. 1999.
[12] Y. Chen, Q. Zhao, A. Swami, βJoint design and separation principle for
opportunistic spectrum access in the presence of sensing errors,β IEEE
Trans. Inf. Theory, vol. 54, no. 5, pp. 2053-2071, 2008.
[13] S.H. Ahmad, M. Liu, T. Javidi, Q. Zhao and B. Krishnamachari,
βOptimality of Myopic Sensing in Multi-Channel Opportunistic Access,β
IEEE Trans. Information Theory, vol. 55, No. 9, pp. 4040-4050, 2009.
[14] Y.J. Choi, Y. Xin and S. Rangarajan, βOverhead-Throughput Tradeoff
in Cooperative Cognitive Radio Networks,β in Proc. of IEEE WCNC, pp.
1-6, Apr. 2009.
[15] E.C.Y. Peh, Y.C. Liang, Y.L. Guan and Y. Zeng, βOptimization of Cooperative Sensing in Cognitive Radio Networks: A Sensing-Throughput
Tradeoff View,β IEEE Trans. Vehicular Technology, vol. 58, no. 9, pp.
5294-5299, Nov. 2009.
[16] R. Fan and H. Jiang, βOptimal Multi-Channel Cooperative Sensing in
Cognitive Radio Networks,β IEEE Trans. Wireless Commun., vol. 9, no.
3, pp. 1128-1138, Mar. 2010.
[17] C. Song and Q. Zhang, βCooperative Spectrum Sensing with MultiChannel Coordination in Cognitive Radio Networks,β in Proc. of IEEE
ICC 2010, pp. 1-5, Jul. 2010.
[18] Y. Wu and D.H.K. Tsang, βDistributed Power Allocation Algorithm for
Spectrum Sharing Cognitive Radio Networks with QoS Guarantee,β in
Proc. of IEEE INFOCOM 2009, pp. 981-989, Jun. 2009.
[19] T. Zhang and D.H.K. Tsang, βOptimal Cooperative Sensing Scheduling
for Energy-Efficient Cognitive Radio Networksβ, Technical Report, Aug.
2010, http://eez058.ece.ust.hk/publication/OptScheduling.pdf
2731
© Copyright 2026 Paperzz