Optimal Cooperative Sensing Scheduling for Energy

This paper was presented as part of the main technical program at IEEE INFOCOM 2011
Optimal Cooperative Sensing Scheduling for
Energy-Efficient Cognitive Radio Networks
Tengyi Zhang and Danny H. K. Tsang
Department of Electronic and Computer Engineering
The Hong Kong University of Science and Technology
Email: {zhangty, eetsang}@ust.hk
Abstract—Due to the problem of spectrum scarcity and
large energy consumption in wireless communications, designing energy-efficient Cognitive Radio Networks (CRNs) becomes
important and necessary. In this paper, we consider the problem
of optimal Cooperative Sensing Scheduling (CSS) and parameter
design to achieve energy efficiency in CRNs using the framework
of Partially Observable Markov Decision Process (POMDP). In
particular, we consider the CSS problem for a CRN with 𝑀
Secondary Users (SUs) and 𝑁 primary channels to determine
how many SUs should be assigned to sense each channel in
order to maximize the objective function that is related to energy
efficiency. By assigning more SUs to sense one channel, higher
sensing accuracy can be gained; however, by spreading out the
SUs to sense more channels, spectrum opportunities can be better
exploited. The CSS problem is formulated as a combinatorial
optimization problem. While such problem is generally hard and
can only be solved by numerical methods with high computation
complexity, in this paper we provide a detailed analysis and
the analytical results provide useful and interesting insights. The
optimality of the myopic CSS is proved for the case of two
channels, and it is also conjectured for the general case. We
also study the tradeoff between the sensing and transmission
durations. In addition, the structure of the optimal sensing time
that maximizes the energy efficiency objective is also analyzed,
the condition for the optimality of the myopic sensing time is
obtained, and the performance upper bound of the myopic policy
is derived. Based on the numerical results, we show that by
carefully tuning a punishment parameter, better energy efficiency
can be achieved.
I. I NTRODUCTION
In recent years, the dramatic growth in various wired and
wireless communication applications leads to great increase
in related energy consumption. Therefore, it is time for the
communication world to investigate radio and networking
solutions which are energy-efficient and resource-efficient, i.e.
green communications.
As one of the emerging wireless technologies, Cognitive
Radio (CR) is considered to be a promising solution for
improving spectrum efficiency. By intelligently monitoring the
spectrum, Secondary Users (SUs) are able to opportunistically
access the idle spectrum originally assigned to Primary Users
(PUs). On the other hand, such new functionalities and additional tasks (e.g., spectrum sensing) make the CR enabling
devices energy-consuming. Meanwhile, with its agility and
intelligence, CR technology creates new possibilities and
methods to realize green communications [1]. These facts
motivate us to study the analysis and designing issues in
energy-efficient Cognitive Radio Networks (CRNs).
978-1-4244-9920-5/11/$26.00 ©2011 IEEE
Consider a centralized CRN with multiple SUs and a
Base Station (BS), which responses for the scheduling and
coordination among the SUs. Cooperative spectrum sensing
[2] is adopted to improve the sensing accuracy to better
protect the PUs and capture the spectrum opportunities in
a slotted primary system. In each slot, the BS determines:
(1) assign how many SUs to sense each channel; (2) how
long is the sensing duration; (3) whether to allow the SUs to
access the primary channels based on the sensing outcomes.
This sequential decision-making problem is studied in the
framework of Partially Observable Markov Decision Process
(POMDP). Meanwhile, the problem of how to improve energy
efficiency by optimally designing a parameter is also studied.
A. Fundamental Tradeoffs and Contributions
1) Cooperative sensing scheduling: At the beginning of each
slot, the BS decides how many SUs should be assigned to
sense each channel in order to maximize the immediate reward
(which is related to energy efficiency). This Cooperative
Sensing Scheduling (CSS) problem is actually a combinatorial optimization problem. Such problem is only solved by
numerical methods and the computational complexity is high.
However, for the problem studied in this paper, we provide
an analytical framework. With the help of Discrete Convex
Analysis theory, we obtain the optimal solution analytically
and some useful and interesting insights are gained. We show
that for a given number of channels that need to be sensed, the
best combination is to assign the SUs as equally as possible for
each channel. In addition, we show the necessary and sufficient
condition for the case that assigning all the SUs to cooperatively sense one channel is the optimal solution of the CSS
problem. By applying our framework, the optimal solution for
general cases, i.e. any number of SUs and channels, can be
obtained as well. To the best of our knowledge, this is the
first work that analytically obtains the optimal solution for
this combinatorial optimization problem.
2) Optimality of the myopic CSS: The myopic CSS is shown
to be the optimal CSS, under the case of 2 channels and
fixed sensing durations. This result is useful for reducing the
complexity of obtaining the optimal CSS, since the myopic
CSS is analytically obtained, while the general method requires recursive computation. From the numerical results we
conjecture that this optimality also holds for general number
of SUs and channels.
2723
2
3) Structure of the optimal sensing duration: As a sequential
decision problem, the BS needs to determine the sensing
duration after performing the CSS. Although the myopic
sensing duration can be found by numerical methods, it does
not always preserve the optimality. In this case, we provide
the analysis on the structure of the optimal sensing duration
and show that it is always larger than or equal to the myopic
sensing duration. The conclusion is similar as [10], but under
a different model. Moreover, we also establish the condition
for the optimality of the myopic sensing duration.
4) Designing the punishment parameter to improve energy
efficiency: Motivated by the result in [11], we introduce a
punishment parameter for the unsuccessful transmissions to
help the CRN to create less collision to the PUs so as to
achieve higher energy efficiency by saving the power for retransmission. We derive the myopic performance upper bound
by adopting the methods in [8]. Numerical results show that
the energy efficiency of the CRN can be improved by carefully
tuning the punishment parameter.
B. Related Works
Designing optimal MAC protocols for opportunistic spectrum access in the framework of POMDP started from [4].
A separation principle for a joint design problem of the
spectrum sensor operating point, sensing channel selection
and the access policy is established in [12]. [13] showed the
optimality of a simple but robust round-robin myopic channel
selection policy held for general number of positive correlated
channels, while [8] extended the optimality to the imperfect
sensing case.
Literature is also rich for the technique of cooperative
sensing. [2] provided a detailed survey about various schemes
to fuse the sensing information from SUs. The impact of
the cooperative sensing overhead on the system throughput
was studied in [14], with the consideration of the number
of reporting packets. [15] studied the tradeoff of finding the
optimal sensing time and the parameter for the result fusion in
order to maximize SUs’ throughput. [16] extended the analysis
to the case of multiple channels using soft decision fusion rule.
Existing works of cooperative sensing problem mainly
focused on how to achieve the best sensing performance
on a single channel. However, the discussion on how to
assign SUs to sense multiple channels, i.e. the CSS problem,
is still missing except a recent work [17]. Moreover, the
cooperative sensing problem is commonly formulated as a
static optimization problem including the literature mentioned.
It is of importance to consider this CSS problem under
varying spectrum environment with uncertainty, which results
in the sequential decision problem studied in this paper. As
mentioned in [17], this CSS problem is considered to be NPhard. Different from [17] and other works which adopted
numerical methods, we obtain the optimal CSS solution in an
analytical way. Only the probability of detection is considered
in [17], while we consider a more practical case, where the
spectrum sensing performance is described by the probability
of detection as well as the probability of false alarm. Moreover,
soft result fusion method was applied in [17], while we
utilize the hard result fusion method which requires much
less overhead. Unlike [8], our work considers a dynamic CSS
problem, and different number of channels may be sensed
in each slot due to the CSS result. In addition, we consider
an energy efficiency-oriented objective in POMDP, while the
sensing duration is variable in each slot. On analyzing the
structure of optimal sensing duration, a similar result as in [10]
is obtained. However, in this paper the case of multiple primary
channels and a different reward function are considered. We
also present the condition for the optimality of the myopic
duration.
II. S YSTEM MODEL
Assume there exists 𝑁 independent and stochastically
identical Gilbert-Elliot channels owned by PUs, denoted by
𝒩 = {1, 2, ..., 𝑁 }. The CRN consists of 𝑀 SUs and a BS,
where 𝑁 ≥ 𝑀 1 . The primary system operates in a time slotted
manner with a fixed slot duration 𝐿. The occupancy state of
each channel transits according to a two-state discrete-time
Markov chain with transition probabilities {𝑝𝑖𝑗 }𝑖,𝑗=0,1 at the
beginning of each slot where 𝑝00 ≥ 𝑝10 2 . This system model
is commonly used to abstract physical channels with memory,
and the slotted system structure is shown to fit well in the
application of CRN (see [4], [16] and references therein). Let
𝑠𝑛 (𝑡) ∈ {0 (idle), 1 (busy)} denote the occupancy state of
channel 𝑛 in time slot 𝑡. The primary system state in slot 𝑡
can be denoted as s(𝑡) ≜ [𝑠1 (𝑡), ..., 𝑠𝑁 (𝑡)].
SUs are required to carry out spectrum sensing before
operating on the primary channels using energy detection
mechanism, which is widely adopted in CRNs. Each SU can
only sense one channel at a time due to physical limitations.
The spectrum sensor of each SU detects the presence of PU
signals by performing the binary hypothesis test as follows:
𝐻0 : 𝑠𝑛 (𝑡) = 0(idle), and 𝐻1 : 𝑠𝑛 (𝑡) = 1(busy).
(1)
The sensing performance of each SU can be described by the
probability of detection 𝑝𝑑 ≜ Pr{decide 𝐻1 ∣𝐻1 is true} and
the probability of false alarm 𝑝𝑓 ≜ Pr{decide 𝐻1 ∣𝐻0 is true}.
We focus on the complex-valued PSK signal and Circular
Symmetric Complex Gaussian (CSCG) noise case [3], without
loss of generality. Under this model, for a given probability
of detection 𝑝𝑑 , the probability of false alarm is given by:
(√
)
√
𝑝𝑓 = 𝒬 2𝛾 + 1𝒬−1 (𝑝𝑑 ) + 𝜏 𝑓𝑠 𝛾 ,
(2)
where 𝒬(⋅) is the complementary distribution function of the
standard Gaussian, 𝒬−1 (⋅) denotes the inverse of 𝒬(⋅), 𝛾
denotes the received signal-to-noise ratio (SNR) of the primary
signal at the SU, 𝜏 denotes the sensing time and 𝑓𝑠 denotes
the sampling rate.
In this paper, the cooperative sensing mechanism is adopted.
Based on the sensing outcomes from individual SU, the BS
1 Generally, the number of channels is greater than that of the SUs [8].
Moreover, our work also holds for the case that 𝑁 < 𝑀 .
2 It means the channels are not negative correlated. The case that 𝑝
00 < 𝑝10
can be similarly analyzed.
2724
3
performs a result fusion procedure to process the individual
outcomes jointly and obtain the final sensing outcome. On
receiving the results from 𝑀 SUs, the BS will apply the “OR”
rule [2] for fusion, which is a hard decision fusion rule3 and
can be mathematically expressed as:
𝑃𝑑 (𝑀 ) = 1 −
𝑀
∏
𝑖=1
(1 − 𝑝𝑑,𝑖 ), 𝑃𝑓 (𝑀 ) = 1 −
𝑀
∏
(1 − 𝑝𝑓,𝑖 ), (3)
𝑖=1
where 𝑝𝑑,𝑖 and 𝑝𝑓,𝑖 denote the probability of detection and
probability of false alarm obtained by SU 𝑖, respectively. In
this paper all SUs are assumed to be homogeneous, i.e. with
the same sensing performance. The case for heterogeneous
sensing performance can be easily incorporated.
III. P ROBLEM F ORMULATION
At the beginning of each slot, the BS will sequentially
determine 𝑎𝑛 (𝑡), the number of SUs that assigned to sense
channel 𝑛, with what sensor operating point and for how
long. All SUs will be assigned a channel for sensing and with
the same sensing duration for synchronization consideration.
Transmission decision is made based on the fusion outcome
after collecting the sensing reports from SUs. The BS will
randomly allocate the channels which are selected for transmission to SUs, while the fairness issue is beyond our scope.
At the end of a slot, SU(s) utilizing a channel will send an
ACK to the BS if the transmission is successful.
Since the channel sensing capability is limited (i.e. some of
the channels may not be sensed) and the sensing performance
is imperfect, the system state is not fully observable to the
CRN. The BS can only abstract the system state in a probabilistic way by incorporating the decision and observation
history. Hence, our problem fits into the framework of Partially
Observable Markov Decision Process (POMDP).
A. Observation
Let 𝜃𝑛 (𝑡) denote the observation of channel 𝑛 obtained in
slot 𝑡. There are four possible outcomes: (i) 𝜃𝑛 (𝑡) = 0, denotes
that data transmission is performed and ACK is received; (ii)
𝜃𝑛 (𝑡) = 1, denotes that data transmission is performed and no
ACK is received; (iii) 𝜃𝑛 (𝑡) = 2, denotes that the channel is
determined as busy based on the result fusion outcome and will
not be utilized; (iv) 𝜃𝑛 (𝑡) = 3, denotes that the BS determines
not to sense the channel. The system observation vector can
be expressed as 𝜽(𝑡) ≜ [𝜃1 (𝑡), ..., 𝜃𝑁 (𝑡)]. Note that these four
observations can be distinguished since the BS governs the
transmission decisions.
B. Belief Vector
The sufficient statistic of the system state is described as
𝑛
the belief vector b(𝑡) ≜ {𝑏10 (𝑡), ..., 𝑏𝑁
0 (𝑡)}, where 𝑏0 (𝑡) is the
conditional probability that 𝑠𝑛 (𝑡) = 0 given the decision and
observation history [4] and 𝑏𝑛1 (𝑡) = 1−𝑏𝑛0 (𝑡). b(𝑡) is computed
at the end of slot 𝑡 after the observation is received and is used
3 Since hard decision only requires one bit information feedback, it is
favored for reducing the overhead.
for decisions making in slot 𝑡 + 1. Based on the action and
the observation received in slot 𝑡, the updating of the belief
vector b(𝑡) ≜ 𝒯 (b(𝑡 − 1)∣𝐴(𝑡), 𝜽(𝑡)) can be obtained through
the Bayes rule [19].
C. Reward
A reward will be received at the end of each slot. Since
our objective is to design energy-efficient CRNs, the energy
consumption is taken into account. Specifically, the reward
for channel 𝑛, 𝑅𝑛 (𝐴(𝑡), 𝜽(𝑡)), consists of the following: (i)
When 𝜃𝑛 (𝑡) = 0, a positive reward (𝐿 − 𝜂 − 𝜏 (𝑡))(𝐵 − 𝑒𝑡 )
will be received, where 𝜏 (𝑡) ∈ (0, 𝐿 − 𝜂) denotes the sensing
duration, 𝜂 denotes the duration for sensing scheduling and
result fusion in the BS, 𝐵 denotes the reward for successful
transmission and −𝑒𝑡 denotes the energy consumed for transmission, both are proportional to the transmission duration. (ii)
When 𝜃𝑛 (𝑡) = 1, a negative reward (𝐿 − 𝜂 − 𝜏 (𝑡))(−𝑒𝑤 − 𝑒𝑡 )
will be received. −𝑒𝑤 can be regarded as the punishment for
the interference generated to the PU, meanwhile can reflect
the energy waste due to collision. This is the key parameter
to achieve the energy-efficient CRN design, which will be
analyzed in detail in later sections. (iii) When a channel is
sensed, a negative reward −𝜏 (𝑡)𝑐 will be received, where 𝑐
denotes the energy consumed in spectrum sensing per unit
of time. (iv) When a channel is not selected for sensing, no
reward will be received.
D. POMDP Problem Formulation
We expect that the CRN can carry out as many successful
transmissions as possible while minimize the collisions caused
to the PU, since collisions will result in retransmission and
energy is wasted. Therefore, the objective of the POMDP
problem is to find out the optimal policy 𝜋 that can maximize
the total reward received in 𝑇 slots. A policy 𝜋 specifies a
sequence of functions 𝜋 = [𝜋1 , ..., 𝜋𝑇 ], where 𝜋𝑡 maps the
belief vector b(𝑡 − 1) to an action 𝐴(𝑡) in slot 𝑡. Our problem
can be consequently formulated as
∑𝑇
𝜋 ∗ = arg max 𝔼𝜋 {
𝑅(𝐴(𝑡), 𝜽(𝑡))∣b(0)}
(4)
𝜋
𝑡=1
∑𝑁
with constraint
≥ 𝑃¯𝑑 and
𝑛=1 𝑎𝑛 (𝑡) = 𝑀 , ∀𝑛 ∈
𝒩 , 𝑡 = 1, ..., 𝑇 . b(0) is the initial belief vector whose entries
10
are set to the stationary distribution ¯𝑏 = 𝑝01𝑝+𝑝
of the
10
𝑛
underlying Markov chain [4] [12]. Constraint 𝑃𝑑 (𝑡) ≥ 𝑃¯𝑑
serves as the protection for the PUs, which is defined as the
probability of detection for all the PU channels 𝑃𝑑𝑛 (𝑡) should
be larger than some threshold 𝑃¯𝑑 pre-determined by the PUs.
It has been shown that [4] by tuning the operating point of
the sensors to make the equality hold, then the optimal access
policy is to access channel 𝑛 if the result fusion outcome is idle
and not to access otherwise. Applying this result, the constraint
𝑃𝑑𝑛 (𝑡) ≥ 𝑃¯𝑑 can be removed and the original problem becomes
an unconstrained POMDP problem. Moreover, (2) reveals that
the probability of false alarm can be obtained when the target
probability of detection and sensing duration are determined.
As a result, the action of the BS in each slot can now be
expressed as 𝐴(𝑡) ≜ {a(𝑡), 𝜏 (𝑡)}, where a(𝑡) ≜ {𝑎𝑛 (𝑡)}𝑛∈𝒩 .
2725
𝑃𝑑𝑛 (𝑡)
4
E. Optimal Policy and Myopic Policy
To solve the objective function (4), we resort to the following value function to obtain the optimal policy:
𝑉𝑡 (b(𝑡 − 1))
=
+
∑𝑁
max{𝐼𝑡 (b(𝑡 − 1))
𝐴(𝑡)
𝔼[𝑉𝑡+1 (𝒯 (b(𝑡 − 1)∣𝐴(𝑡), 𝜽(𝑡))]}, (5)
with constraint
𝑛=1 𝑎𝑛 (𝑡) = 𝑀 and 𝐼𝑡 (b(𝑡 − 1)) =
𝔼[𝑅(𝐴(𝑡), 𝜽(𝑡))∣b(𝑡 − 1)] denotes the expected immediate reward. The value function (5) represents the maximum expected
reward accumulated from slot 𝑡 up to the maximum time
horizon 𝑇 . The computation complexity required to obtain the
optimal policy is very high. One of the methods for addressing
this problem is to apply the myopic policy [4], which merely
focuses on the immediate reward and ignores the impact of
current policy on the future rewards. The myopic policy is
˜
given by 𝐴(𝑡)
= arg max𝐴(𝑡) 𝐼𝑡 (b(𝑡 − 1)), with constraint
∑𝑁
𝑎
(𝑡)
=
𝑀
. Generally, the myopic policy reduces the
𝑛=1 𝑛
computational complexity but possibly sacrificing optimality.
In the following sections, the myopic policy is shown to be in
fact optimal under some conditions.
IV. M YOPIC C OOPERATIVE S ENSING S CHEDULING
In the beginning of each slot, the first task of the BS
is to determine for each channel, how many users should
be assigned to perform spectrum sensing cooperatively. As
pointed out in [2] [14], the more SUs sensing the channel,
the better the spectrum sensing performance. However on the
other hand, some of the channels may not be sensed since
the number of SUs is limited, then the spectrum opportunities
cannot be fully exploited. The objective of the tradeoff in
CSS, i.e. between the sensing accuracy and the spectrum
opportunities, is to find an optimal scheduling of the SUs in
order to maximize the immediate reward received by the BS.
For a fixed sensing time 𝜏 (𝑡) = 𝜏˜, define 𝐼˜𝑡 (b(𝑡 − 1)) ≜
𝐼𝑡 (b(𝑡 − 1))∣𝜏 (𝑡)=˜𝜏 . We can obtain the myopic CSS in slot 𝑡
by solving the following maximization problem:
∑𝑁
(P1:) maxa(𝑡) 𝐼˜𝑡 (b(𝑡 − 1)), s.t.
𝑎𝑛 = 𝑀.
(6)
𝑛=1
Without loss of generality, we consider the case that in any
slot 𝑡, the belief values of all the channels are the same. The
case of channels with different belief values will be discussed
later. After careful inspection on 𝐼˜𝑡 (b(𝑡−1)), we instead solve
the following problem (P2):
∑
(P2:) maxa(𝑡)
(1 − 𝑃𝑓 (𝑎𝑛 (𝑡)))
(7)
∑𝑁
{𝑛:𝑎𝑛 (𝑡)>0}
with constraint 𝑛=1 𝑎𝑛 (𝑡) = 𝑀 . At the end of this section
we will show the optimal solution for (P2) is actually the
one for (P1). Although the optimization objective in (P2)
has a simpler form compared to (P1), (P2) is a combinatorial optimization problem and regarded to be NP-hard [17].
Moreover, even numerical methods can be applied to find the
optimal solution, such methods cannot provide any insight to
the system design, i.e. how will CSS change with the system
parameters like the number of total SUs 𝑀 .
In this section, we analytically study this combinatorial
problem and find some useful and interesting insights. We
also establish the conditions for some specific scheduling
combinations to be the myopic policy. Before exploiting the
pattern of the sensing scheduling, we need to first examine
the properties of the objective function (7). We begin with the
following two lemmas.
Lemma 1: Let 𝑚 be a continuous variable with domain
[1, +∞). Denote 𝑝˜𝑑 (𝑚) as the relaxed individual probability
of detection as a function of 𝑚:
𝑝˜𝑑 (𝑚) = 1 − (1 − 𝑃¯𝑑 )1/𝑚 ,
(8)
which is decreasing and convex.
Proof: Eq. (8) relaxes the integer variable in (3) to a
continuous one. Taking the first-order derivative of 𝑝˜𝑑 (𝑚):
(
)
ln(1 − 𝑃¯𝑑 ) ln(1 − 𝑃¯𝑑 )
∇˜
𝑝𝑑 (𝑚) = exp
,
(9)
𝑚
𝑚2
where ∇ denotes the differentiation of the function with
respect to its argument. Since 0 < 1 − 𝑃¯𝑑 < 1, we have
∇˜
𝑝𝑑 (𝑚) < 0, which means 𝑝˜𝑑 (𝑚) is decreasing in 𝑚.
Similarly, the second-order derivative of 𝑝˜𝑑 (𝑚) is shown to
be positive [19]. From these two derivatives, 𝑝˜𝑑 (𝑚) is proved
to be decreasing and convex.
Lemma 2: Denote 𝑝˜𝑓 (𝑚) as the relaxed individual probability of false alarm as a function of 𝑚:
(√
)
√
𝑝˜𝑓 (𝑚) = 𝒬 2𝛾 + 1𝒬−1 (˜
𝑝𝑑 (𝑚)) + 𝜏˜𝑓𝑠 𝛾 ,
(10)
which is obtained from (2) and is decreasing and convex.
Proof: From (10), we have
√
1
∇˜
𝑝𝑓 (𝑚) = −√ 𝒥1 2𝛾 + 1∇𝒬−1 (˜
𝑝𝑑 (𝑚)),
(11)
2𝜋
√
√
where 𝒥1 = exp[−( 2𝛾 + 1𝒬−1
(˜
𝑝𝑑 (𝑚)) + ]𝜏˜𝑓𝑠 𝛾)2 /2], and
[
√
−1
2
𝑝𝑑 (𝑚). The
∇𝒬−1 (˜
𝑝𝑑 (𝑚)) = − 2𝜋 exp (𝒬 (˜𝑝2𝑑 (𝑚))) ∇˜
derivative [of the inverse
of 𝒬(𝑥) is given by ∇𝒬−1 (𝑥) =
]
√
(𝒬−1 (𝑥))2
− 2𝜋 exp
, which can be derived from [5]. Since
2
∇˜
𝑝𝑑 (𝑚) < 0 from Lemma 1, we have ∇𝒬−1 (˜
𝑝𝑑 (𝑚)) > 0.
Therefore ∇˜
𝑝𝑓 (𝑚) < 0, i.e. 𝑝˜𝑓 (𝑚) is decreasing in 𝑚.
Taking the second derivative of 𝑝˜𝑓 (𝑚) and after some
manipulation [19], we can show ∇2 𝑝˜𝑓 (𝑚) > 0. Therefore,
𝑝˜𝑓 (𝑚) is decreasing and convex in 𝑚.
These two lemmas present the properties of 𝑝˜𝑑 (𝑚) and 𝑝˜𝑓 (𝑚),
with respect to the variable 𝑚. Originally, the number of
cooperating SUs can only be integer, which is difficult to
analyze. For the sake of the analysis convenience, we relax
the number to a continuous variable 𝑚 while preserving
the definition of the individual probability of detection and
probability of false alarm, and using the results as the basis
for the following analysis. The actual individual probability
of detection and probability of false alarm can be regarded
as some discrete points on the relaxed functions 𝑝˜𝑑 (𝑚) and
𝑝˜𝑓 (𝑚). Based on these two lemmas, the property of the relaxed
probability of false alarm after result fusion, 𝑃˜𝑓 (𝑚), can be
2726
5
characterized as follows:
Proposition 1: The probability of false alarm after result
fusion, 𝑃˜𝑓 (𝑚), is decreasing and convex on the domain of 𝑚,
if the following condition holds.
]2
[
𝑚
∇˜
𝑝𝑓 (𝑚)
ln(1 − 𝑝˜𝑓 (𝑚)) −
1 − 𝑝˜𝑓 (𝑚)
[√
]2
2∇˜
𝑝𝑓 (𝑚) + 𝑚∇2 𝑝˜𝑓 (𝑚)
𝑚∇˜
𝑝𝑓 (𝑚)
<
−
, ∀𝑚. (12)
1 − 𝑝˜𝑓 (𝑚)
1 − 𝑝˜𝑓 (𝑚)
Proof: Since we consider homogenous SUs, the relaxed
probability of false alarm after result fusion can be written
as 𝑃˜𝑓 (𝑚) = 1 − (1 − 𝑝˜𝑓 (𝑚))𝑚 . The first order derivative of
𝑃˜𝑓 (𝑚) can be shown to be smaller than zero from Lemma 1
and Lemma 2. By taking the second-order derivative of 𝑃˜𝑓 (𝑚)
and after some algebraic manipulations [19], it can be shown
that ∇2 𝑃˜𝑓 (𝑚) > 0, i.e. 𝑃˜𝑓 (𝑚) is decreasing and convex on
the domain of 𝑚 if condition (12) holds.
It is natural to conjecture that 𝑃˜𝑓 (𝑚) is decreasing and convex.
In fact, [2] [14] show that the more cooperating SUs, the
smaller the 𝑃˜𝑓 (𝑚); otherwise we do not have incentive to
perform the cooperative sensing. Moreover, for 𝑚 → +∞,
condition (12) should hold; otherwise 𝑃˜𝑓 (𝑚) will eventually
go below zero which is impossible. From the extensive simulations we have performed, condition (12) holds for most of
the cases, i.e. 𝑃˜𝑓 (𝑚) is usually decreasing and convex. Even
there is a concave section, it can be easily incorporated into our
following analytical framework. Without loss of generality, we
assume 𝑃˜𝑓 (𝑚) to be decreasing and convex throughout this
paper. With the above results, we are ready to analyze the
combinatorial problem (P2). In order to clearly describe our
analysis, we introduce the following definitions.
Definition 1: All combinations, i.e. the way of assigning
SUs to sense the channels, are divided into groups. Define
𝐺𝑖 , 𝑖 = 1, ..., 𝑀 as the group consists of ∣𝐺𝑖 ∣ combinations
in which exactly 𝑖 channels are sensed (i.e., each of the 𝑖
channels is sensed by at least one SU). At most 𝑀 channels
will be sensed since 𝑁 ≥ 𝑀 . The reason for such division
is that in some combinations, some channels may not have
any SU assigned for sensing. Here we consider a general
case, where the channels have different belief values. We order
the channels according to their belief values in a descending
order and define in 𝐺𝑖 , the first 𝑖 channels in the ordering are
selected for sensing. Define r ≜ {𝑟1 , ..., 𝑟𝑖 } where 𝑟1 denotes
the real channel number which is the first one in the ordering
(i.e., with the largest belief value), 𝑟2 denotes the real channel
number of the second one, etc. Denote 𝐶𝑖,𝑙 = {𝑎𝑗𝑖,𝑙 } as the
𝑙-th combination in group 𝐺𝑖 , where 𝑙 = 1, ..., ∣𝐺𝑖 ∣. Let 𝑎𝑗𝑖,𝑙
denote the number of SUs assigned to sense channel 𝑟𝑗 in
combination 𝐶𝑖,𝑙 group 𝐺𝑖 and 𝑗 = 1, ..., 𝑖. Further assume
′
𝑎𝑗𝑖,𝑙 ≥ 𝑎𝑗𝑖,𝑙 for 𝑗 < 𝑗 ′ . The reason is that the channels are
′
ordered according to the belief values, the case of 𝑎𝑗𝑖,𝑙 ≥ 𝑎𝑗𝑖,𝑙
will produce an objective value of (7) ′greater than or equal to
that produced by the case of 𝑎𝑗𝑖,𝑙 < 𝑎𝑗𝑖,𝑙 , for 𝑗 < 𝑗 ′ . When the
belief values are the same, the logic also applies. Note that by
this assumption we have already excluded some
∑non-optimal
combinations. It is obvious that 𝑎𝑗𝑖,𝑙 ≥ 1 and 𝑗 𝑎𝑗𝑖,𝑙 = 𝑀 ,
∀𝐺𝑖 , 𝐶𝑖,𝑙 .
Definition 2: Each combination may produce different value
of (7). As a result, let the operator ≻ denote that a combination
𝐶𝑖,𝑙 is larger than 𝐶𝑖′ ,𝑙′ , ∀𝑖, 𝑖′ , 𝑙, 𝑙′ . In other words, 𝐶𝑖,𝑙 ≻
𝐶𝑖′ ,𝑙′ means 𝐶𝑖,𝑙 can produce larger value of (7) than 𝐶𝑖′ ,𝑙′ .
Similarly, we define ≽ and ≼ for the relationship of “larger
than or equal to” and “smaller than or equal to”, respectively.
We first investigate in each group, which combination is the
largest one. Consider group 𝐺𝑖 . If 𝑎𝑗𝑖,𝑙 can take continuous
values, it is straightforward that the combination 𝐶𝑖,𝑙 with
𝑎𝑗𝑖,𝑙 = 𝑀𝑖 , ∀𝑗 is the largest one due to
1
𝑀
1
1
𝑃˜𝑓 ( ) ≤ 𝑃˜𝑓 (𝑎1𝑖,𝑙 )+ 𝑃˜𝑓 (𝑎2𝑖,𝑙 )+...+ 𝑃˜𝑓 (𝑎𝑖𝑖,𝑙 ), ∀𝐶𝑖,𝑙 (13)
𝑖
𝑖
𝑖
𝑖
which results from 𝑃˜𝑓 (𝑚) is convex. However, in (P2) only
integer values are allowed for 𝑎𝑗𝑖,𝑙 . In this case, we resort to
the theory of Discrete Convex Analysis.
The theory of discrete convex analysis is introduced by
Murota[6], which incorporated discrete settings and the concept of combinatorial optimization into the framework of
convex analysis. A comprehensive survey is referred to [7].
Simply speaking, in our problem, the actual probability of
false alarm 𝑃𝑓 (𝑚), 𝑚 ∈ [1, +∞) and 𝑚 ∈ ℤ is a discrete
convex function, since 𝑃˜𝑓 (𝑚) is convex and 𝑃𝑓 (𝑚) = 𝑃˜𝑓 (𝑚)
∀𝑚 ∈ [1, +∞) and 𝑚 ∈ ℤ. It can be interpreted also in the
sense that 𝑃𝑓 (𝑚) is actually taking the integer points on the
domain of 𝑃˜𝑓 (𝑚).
Murota also introduced the concepts of L-convex functions
and M-convex functions. We briefly introduce these concepts
for the case of two scalar variables. Consider function 𝑓 (𝑥)
where 𝑥 is a scalar. If 𝑓 (𝑥) is an L-convex function, then
𝑎+𝑏
𝑓 (𝑎) + 𝑓 (𝑏) ≥ 𝑓 (⌈ 𝑎+𝑏
2 ⌉) + 𝑓 (⌊ 2 ⌋), ∀𝑎, 𝑏 ∈ ℤ. This property
is referred as discrete midpoint convexity. Then if 𝑓 (𝑥) is an
M-convex function, it follows 𝑓 (𝑎)+𝑓 (𝑏) ≥ 𝑓 (𝑎+𝑐)+𝑓 (𝑏−𝑐),
∀𝑎 ≤ 𝑏, 𝑎 + 𝑐 ≤ 𝑏 − 𝑐, 𝑎, 𝑏, 𝑐 ∈ ℤ. This property is referred as
equidistance convexity. We can easily establish the following
lemma due to the fact that we only have a scalar variable:
Lemma 3: 𝑃𝑓 (𝑚) is both L-convex and M-convex function.
Now we are ready to show which combination is the largest
one in each group. (13) provides the insight that the largest
combination should be the one that equally assigns the SUs
on each channel, which can lead to the largest value of the
objective function. For the case that 𝑀𝑖 is an integer, same
conclusion holds for 𝑃𝑓 (𝑚). By also considering the case that
𝑀
𝑖 is not divided, we have the following proposition:
Proposition 2: The largest combination 𝐶𝑖,max in group 𝐺𝑖
is given by
𝑀
𝑀 ∑ 𝑗
𝑎𝑗𝑖,max = ⌈ ⌉ or 𝑎𝑗𝑖,max = ⌊ ⌋,
𝑎𝑖,max = 𝑀, ∀𝑖, 𝑗.
𝑖
𝑖
𝑗
(14)
The largest combination has the following property:
∑
∑
𝑃𝑓 (𝑎𝑗𝑖,max ) <
𝑃𝑓 (𝑎𝑗𝑖,𝑙 ),
(15)
2727
𝑗
𝑗
6
where {𝑎𝑗𝑖,𝑙 } excludes the largest combination 𝐶𝑖,max .
Proof: It is easy∑to see that (14) has a unique solution
under the constraint 𝑗 𝑎𝑗𝑖,max = 𝑀 . Without loss of generality, consider a combination 𝐶𝑖,𝑙 in group 𝐺𝑖 . By recursively
utilizing the properties of discrete midpoint convexity and
equidistance convexity from Lemma 3, we have the following
procedure: 1) for 𝑎1𝑖,𝑙 and 𝑎𝑖𝑖,𝑙 , since 𝑃𝑓 (𝑎1𝑖,𝑙 ) + 𝑃𝑓 (𝑎𝑖𝑖,𝑙 ) ≥
𝑃𝑓 (⌈
𝑎1𝑖,𝑙 +𝑎𝑖𝑖,𝑙
2
𝑎1𝑖,𝑙 +𝑎𝑖𝑖,𝑙
⌉) + 𝑃𝑓 (⌊
𝑎1𝑖,𝑙 +𝑎𝑖𝑖,𝑙
𝑎1𝑖,𝑙 +𝑎𝑖𝑖,𝑙
2
⌋), we replace 𝑎1𝑖,𝑙 and 𝑎𝑖𝑖,𝑙 with
⌈ 2 ⌉ and ⌊ 2 ⌋ and obtain a new combination 𝐶𝑖,𝑙1 .
It is obvious that 𝐶𝑖,𝑙1 ≽ 𝐶𝑖,𝑙 . 2) Re-arrange the order within
𝐶𝑖,𝑙1 to meet the ordering requirement for 𝑎𝑗𝑖,𝑙 mentioned in
Definition 1. Then repeat the same operation in 1) to gain
a larger 𝐶𝑖,𝑙2 . 3) The logic behind the previous two steps is
that we always take the largest and smallest 𝑎𝑗𝑖,𝑙 in a group
and replace them with the average of them. By doing so, we
have a larger group due to Lemma 3. And it is apparent that by
performing the operations in 1) and 2) for several times, all 𝑎𝑗𝑖,𝑙
will equal to the ones given in (14). The largest combination
in 𝐺𝑖 is then obtained and (15) follows.
Proposition 2 reveals a similar result as in (13), i.e. in each
group when exactly 𝑖 channels need to be sensed, one should
distribute the SUs among the channels as equally as possible,
in order to gain the minimum summation of the probabilities
of false alarm. This interesting result is similar to the waterfilling property [18] to some extend, where the same amount
of power will be allocated if the channels are homogeneous
in noise. Here for the homogeneous channels, SUs are equally
assigned for sensing.
On finding out the largest combination in each group, the
last step towards the optimal solution to (P2) is to find the
largest one among these 𝑀 combinations 𝐶𝑖,max , 𝑖 = 1, ..., 𝑀 .
The optimal solution and its related properties is given in the
following theorem.
Theorem 1: (i) The optimal solution of (P2) is 𝐶1,max , i.e.
all 𝑀 SUs sense one channel cooperatively, if the condition
𝑀
𝑀
⌉) + 𝑃𝑓 (⌊ ⌋) − 𝑃𝑓 (𝑀 ) − 1 ≥ 0
(16)
2
2
holds, which is a necessary and sufficient condition.
(ii) If condition (C0) holds for 𝑀 ′ , then for all 𝑀 < 𝑀 ′ ,
condition (C0) also holds.
(iii) If condition (C0) does not hold for 𝑀 ′ , then for all
𝑀 > 𝑀 ′ , condition (C0) will never hold. In other words,
the BS will never assign all SUs to cooperatively sense one
channel if the network has 𝑀 or more SUs.
Proof: We first assume (i) and (ii) hold for 𝑀 − 1. We
begin with the proof of sufficiency. Consider two consecutive
groups, 𝐺𝑖 and 𝐺𝑖′ where 𝑖′ = 𝑖 + 1 and 𝑖 ≥ 2. Consider 𝐺𝑖 .
From Lemma 3 and Proposition 2, some manipulations can be
perform on 𝐶𝑖,max to obtain
(C0 :) 𝑃𝑓 (⌈
𝐶𝑖,max
≽ 𝐶𝑖,𝑙
′
′
relationship in the following. Meanwhile 𝐶𝑖,𝑙 above is also
a valid combination and will not cause any problem. Since
(14) reveals that 𝑎𝑗𝑖,max − 𝑎𝑗+1
𝑖,max ≤ 1, it then follows that
′
′
′
′
′
𝑎𝑖′ −1 +𝑎𝑖′
′
′
′
′
𝑖 −1
𝑖
𝑖
𝑃𝑓 (𝑎𝑖𝑖′ −1
,max ) + 𝑃𝑓 (𝑎𝑖′ ,max ) − 𝑃𝑓 (𝑎𝑖′ ,max + 𝑎𝑖′ ,max ) − 1 ≥ 0,
which implies 𝐶𝑖,𝑙 ≽ 𝐶𝑖′ ,max . Since 𝐶𝑖,max ≽ 𝐶𝑖,𝑙 , this result
builds a bridge between two consecutive groups and reveals
that the largest combination 𝐶𝑖,max for 𝑖 ≥ 2 is 𝐶2,max . Then
𝑀
for 𝑀 , if the following inequality 𝑃𝑓 (⌈ 𝑀
2 ⌉) + 𝑃𝑓 (⌊ 2 ⌋) −
𝑃𝑓 (𝑀 ) − 1 ≥ 0 holds, then 𝐶1,max ≽ 𝐶2,max . The inequality
is actually the same as given in (i).
We prove the necessity. In the results above, we have shown
that 𝐶2,max is the largest combination excluding 𝐶1,max .
Therefore in order for the combination 𝐶1,max to be the
optimal solution, the only requirement is 𝐶1,max ≽ 𝐶2,max ,
which is satisfied if condition (C0) holds.
Now we prove (iii). Assume for 𝑀 condition (C0) does
𝑀
not hold, i.e. 𝑃𝑓 (⌈ 𝑀
2 ⌉) + 𝑃𝑓 (⌊ 2 ⌋) − 𝑃𝑓 (𝑀 ) − 1 < 0. Then
consider the left hand side in the case of 𝑀 +1. Since ⌈ 𝑀
2 ⌉=
𝑀 +1
⌊ 𝑀2+1 ⌋ and ⌊ 𝑀
2 ⌋ + 1 = ⌈ 2 ⌉ can be easily verified, after
some manipulation we arrive at
𝑀
𝑀
𝑃𝑓 (⌈ ⌉) + 𝑃𝑓 (⌊ ⌋) − 𝑃𝑓 (𝑀 ) − 1
2
2
[
]
𝑀 +1
𝑀 +1
−
𝑃𝑓 (⌈
⌉) + 𝑃𝑓 (⌊
⌋) − 𝑃𝑓 (𝑀 + 1) − 1
2
2
𝑀
𝑀 +1
= 𝑃𝑓 (⌊ ⌋) − 𝑃𝑓 (⌈
⌉) − [𝑃𝑓 (𝑀 ) − 𝑃𝑓 (𝑀 + 1)]
2
2
which is larger than 0 from the decreasing and convex property
of 𝑃˜𝑓 (𝑚). This result implies that 𝑃𝑓 (⌈ 𝑀2+1 ⌉)+𝑃𝑓 (⌊ 𝑀2+1 ⌋)−
𝑃𝑓 (𝑀 + 1) − 1 < 0, which means 𝐶1,max ≼ 𝐶2,max for
𝑀 + 1. In this case, assigning all SUs to sense one channel
cooperatively is always not the best action, which finishes the
proof of (iii). In fact, (ii) follows the same argument as (iii)
and hence be proved as well. To this end we complete our
self-contained proof for Theorem 1.
Theorem 1 introduces the condition that the BS assigns
all SUs to cooperatively sense one channel in order to gain
the largest objective value, given the number of SUs 𝑀 . The
structural results show that sensing less channel to gain higher
sensing accuracy is better in the sense of the objective function
in (P2) and condition (C0). In addition, (ii) and (iii) reveals
that for given system parameters, there exists a threshold value
of 𝑀 , which determines whether the combination 𝐶1,max
is the optimal solution. Now turn back to (P1). After some
manipulation, we have
′
𝑖 −1
𝑖
= {𝑎1𝑖′ ,max , 𝑎2𝑖′ ,max , ..., 𝑎𝑖𝑖′ −2
,max , (𝑎𝑖′ ,max + 𝑎𝑖′ ,max )}.
′
′
𝑎𝑖′ −1 +𝑎𝑖′
′
𝑖 ,max
𝑖 ,max
𝑎𝑖𝑖′ −1
⌋ and 𝑎𝑖𝑖′ ,max = ⌈ 𝑖 ,max 2 𝑖 ,max ⌉.
,max = ⌊
2
In this case, from the assumption that (ii) holds we have
′
𝑖 −2
𝑖
Note that (𝑎𝑖𝑖′ −1
,max +𝑎𝑖′ ,max ) is larger than 𝑎𝑖′ ,max , however we
do not change to the correct ordering for easier showing the
2728
𝐼˜𝑡 (b(𝑡 − 1))
∑
{
}
=
(1 − 𝑃𝑓 (𝑎𝑛 (𝑡)))(𝐿 − 𝜂 − 𝜏˜)(𝐵 − 𝑒𝑡 )𝒥2
{𝑛:𝑎𝑛 (𝑡)>0}
+
∑
{𝑛:𝑎𝑛 (𝑡)>0}
{
}
(1 − 𝑃¯𝑑 )(𝐿 − 𝜂 − 𝜏˜)(−𝑒𝑡 − 𝑒𝑤 )𝒥2 − 𝜏˜𝑐 ,
7
where 𝒥2 = 𝑝10 + 𝑏𝑛0 (𝑡 − 1)(𝑝00 − 𝑝10 ). It is obvious that
𝐼˜𝑡 (b(𝑡 − 1))∣a(𝑡)=𝐶1,max ≥ 𝐼˜𝑡 (b(𝑡 − 1))∣a(𝑡)=𝐶𝑖,𝑙 , for all 𝑖, 𝑙,
since the second term is negative. As a result, the optimal
solution obtained for (P2) is also the optimal one for (P1).
During the derivation of the condition (C0), we analytically
show the methods for obtaining the second largest combination
among all possible combinations, then compare it with our
desired combination (i.e., 𝐶1,max ) to gain the final condition. It
can be seen from the derivation procedure that when the channels have different belief values and be ordered accordingly
in an descending way, (C0) is easier to be satisfied. Hence
considering the channels with the same belief values is the
most stringent case. By utilizing the intrinsic properties of our
problem and Lemma 3, the optimal solution of a combinatorial
optimization problem is obtained in an analytical way. Another
significance of the result is that given the number of SUs 𝑀 , it
can be immediately shown that whether we should put all the
efforts (assigning all SUs) to sense one channel, i.e. whether
𝐶1,max is the optimal solution by simply testing (C0).
Although only the case that (C0) is satisfied is considered
here, our framework can be extended to more general case,
i.e. when 𝑀 is large and (C0) is violated. An interesting
observation from our framework is that when 𝑀 becomes
large, more channels should be sensed. The reason is in
this situation, exploiting more spectrum opportunities is more
beneficial, which is in fact the tradeoff lies in our problem.
Showing the optimal solution for general cases is left for our
future work.
Specifically, we would like to point out that although we
assume 𝑃˜𝑓 (𝑚) is convex, our contribution will not be compensated even if 𝑃˜𝑓 (𝑚) is not always convex on the domain of
𝑚. As mentioned before, the concave section can only occur
in the beginning of the domain due to the value of 𝑃˜𝑓 (𝑚) is
bounded by zero. By working out the range where 𝑃˜𝑓 (𝑚) is
concave, we can follow the same procedure in the derivation
of the largest combination and the proof of Theorem 1, and
obtain the new combination ordering and the optimal solution
corresponding to the new property of 𝑃˜𝑓 (𝑚). In the following
sections, we study the case that 𝑀 satisfies condition (C0),
and assume it holds for all 𝜏˜ within the range of [0, 𝐿 − 𝜂].
V. S TRUCTURE OF THE O PTIMAL P OLICY
A. Optimal Cooperative Sensing Scheduling
It is natural to consider whether the myopic CSS is also the
optimal one under the same condition. It has been confirmed
in our extensive simulation results, that when condition (C0)
is satisfied the optimal CSS is to assign all SUs to sense one
channel. However, it turns out to be difficult to analytically
prove this result since the imperfect spectrum sensor introduces complexity in the belief vector update. Moreover, we
have one more observation compared to [8], which implies
more information needs to be handled and the problem studied
in this paper appears to be more challenging.
Although it is difficult to deal with the extension of the
myopic CSS to the optimal case, now we prove a simple but
nontrivial case, where two SUs and two channels exist, while
considering fixed sensing duration. The reward parameters 𝑒𝑡 ,
𝑒𝑤 and 𝑐 is set to zero for expression simplicity. In fact, the
parameters with general values can be easily incorporated [11].
𝑝01
Similarly as [8], here we assume 𝑃𝑓 (𝑚) ≤ 𝑝𝑝10
, ∀𝑚.
00 𝑝11
Theorem 2: Consider the network with 2 SUs and 2 channels. The optimal CSS at any slot 𝑡 is to assign all SUs to
cooperatively sense the channel given by arg max𝑛 𝑏𝑛0 (𝑡 − 1),
if condition (C0) holds.
Proof: First assume we always assign all SUs to sense
one channel. From the result in [8], the BS will choose the
channel with the largest belief value, i.e. arg max𝑛 𝑏𝑛0 (𝑡 − 1).
Now we need to compare the expected value obtained under
the following two cases: (i) two SUs sense channel 1; (ii) each
SU senses one channel.
At the last time slot 𝑡 = 𝑇 , the optimal action is actually the
myopic action. Suppose for 𝑡 + 1 < 𝑇 , the BS will assign all
SUs to cooperatively sense the channel with the largest belief
value. We need to show it also holds for 𝑡. Same as [8], denote
𝑉ˆ𝑡 (b(𝑡 − 1); 𝑔) as the expected total reward obtain by sensing
action 𝑔 in slot 𝑡 followed by the myopic policy in future slots.
Similarly, denote 𝐼𝑡 (b(𝑡 − 1); 𝑔) as the expected immediate
reward obtained by sensing action 𝑔 in slot 𝑡. Further define
𝑔 = 1 as the action that two SUs sense channel 1 and 𝑔 = 2
as the action that each SU senses one channel. In order to
show the optimality of the myopic policy, we need to show
𝑉ˆ𝑡 (b(𝑡 − 1); 𝑔 = 1) − 𝑉ˆ𝑡 (b(𝑡 − 1); 𝑔 = 2) ≥ 0.
For both actions, the observation and system state at 𝑡
determine the channel selected in 𝑡 + 1. Therefore, similar
arguments in the proof of Theorem 2 in [8] can be applied. It
follows that
𝑉ˆ𝑡 (b(𝑡 − 1); 𝑔 = 1) − 𝑉ˆ𝑡 (b(𝑡 − 1); 𝑔 = 2)
= [𝐼𝑡 (b(𝑡 − 1); 𝑔 = 1) − 𝐼𝑡 (b(𝑡 − 1); 𝑔 = 2)] + Λ
+ [𝑏10 (𝑡 − 1)(1 − 𝑏20 (𝑡 − 1))𝑃𝑓 (2) − (1 − 𝑏10 (𝑡 − 1))
𝑏20 𝑃𝑑 𝑃𝑓 (1)]Δ(𝑝00 𝑝11 − 𝑝10 𝑝01 ),
(17)
where Λ is a positive term and Δ = 𝑉ˆ𝑡 (1∣[0, 1]) − 𝑉ˆ𝑡 (1∣[1, 0]).
𝑉ˆ𝑡 (𝑔∣[𝑠1 , 𝑠2 ]) denotes the expected total reward starting from
𝑡 under the action 𝑔 and system state s(𝑡 − 1) in slot 𝑡 − 1,
the same as in [8]. The first term in the right hand side of
(17) is nonnegative due to condition (C0) holds. Denote the
last term as 𝒥3 . After some manipulation [19], we have 𝒥3 ≥
𝑝10 𝑝01 𝑃𝑓 (2)𝑃𝑑 𝑃𝑓 (1) > 0, which leads to 𝑉ˆ𝑡 (b(𝑡 − 1); 𝑔 =
1) − 𝑉ˆ𝑡 (b(𝑡 − 1); 𝑔 = 2) > 0.
Theorem 2 reveals that in this 2 SUs 2 channels simplified
model, the myopic CSS is actually optimal, which highly
simplifies the procedure of finding the optimal policy. This
conclusion provides nontrivial results and insights to the problem at hand. However to show the optimality of the myopic
sensing scheduling for general 𝑀 and 𝑁 is highly challenging
and will be left as our future work. In the following sections
we focus on the case that the optimal CSS is assigning all
SUs to sense one channel.
2729
8
B. Structure of the Optimal Sensing Time
The myopic policy consists of two parts: CSS and the
sensing time 𝜏˜∗ (𝑡), where the latter one is given by 𝜏˜∗ (𝑡) =
arg max𝜏 (𝑡) 𝐼𝑡 (b(𝑡 − 1)), which is the solution of a statistic
optimization problem. It can be shown that there is only one
maximum 𝜏 (𝑡) within the range of [0, 𝐿 − 𝜂] by utilizing the
method similar to [15]. The myopic solution can be obtained
by many popular effective searching algorithms with low
complexity. Although the optimal sensing time is very difficult
to obtain, some insights about its structure can be shown in
the following proposition.
Proposition 3: (i) The optimal sensing time 𝜏 ∗ (𝑡) that maximizes the total expected reward is the same as the myopic
sensing time 𝜏˜∗ (𝑡) that maximizes the immediate expected
reward, if 𝑝00 𝑝11 = 𝑝10 𝑝01 . (ii) The optimal sensing time
𝜏 ∗ (𝑡) is no smaller than 𝜏˜∗ (𝑡), ∀𝑡.
Proof: (i): 𝑝00 𝑝11 = 𝑝10 𝑝01 implies 𝑝00 = 𝑝01 = 𝑝11 =
𝑝10 . All the belief vectors after state transition will be set to the
same value 𝑝10 , in which case the expected future reward in
any 𝑡 will not be influenced by the probability of false alarm,
which is related to sensing duration. Since 𝜏˜∗ (𝑡) maximizes
the immediate reward, the optimal sensing time 𝜏 ∗ (𝑡) should
be equivalent to 𝜏˜∗ (𝑡).
Sketch of the Proof of (ii): Consider 𝜏 (𝑡) ≤ 𝜏˜∗ (𝑡). It can
be proved that the expected future reward obtained under
𝜏˜∗ (𝑡) is larger than that under 𝜏 (𝑡). Since 𝜏˜∗ (𝑡) maximizes
the immediate reward, it also maximizes the total expected
remaining reward. Therefore, only when 𝜏 (𝑡) > 𝜏˜∗ (𝑡) could
𝜏 (𝑡) result in a larger total expected remaining reward than
𝜏˜∗ (𝑡). Details please refer to [19].
Here we arrive at a similar conclusion as mentioned in
[10]. Although the myopic sensing time cannot always be
generalized to the optimal sensing time, the analysis reveals
some useful insights to the structure of the optimal sensing
time. Moreover, the myopic sensing time can serve as a
good tradeoff between the computational complexity and the
optimality while the performance is acceptable [10] [4].
model the situation is more complicated, and that slot may
refer to: (i) no ACK is received, and negative reward is obtain;
and (ii) the sensing outcome is busy and thus no transmission
is performed, only the energy for sensing is consumed. Let
ℒ𝑘 denotes the length of 𝑘-th transmission period.
1) Average Successful Transmission Amount: We define
the average successful transmission amount ℋ𝐵 as the ratio
between the overall length of successful transmission duration
and the overall length of transmission duration (including
the successful
∑𝐾 ones, failed ones and silent ones). Let ℒ𝑘 =
ℒ𝑘
lim𝑘→∞ 𝑘=1
denote the average length of a transmission
𝐾
period. The average successful transmission amount ℋ𝐵 is
𝜏 (1)
, where 𝒟 = (𝐿−𝜂 − 𝜏ˆ(1))ℒ𝑘 +
given by ℋ𝐵 ∑
= 1− 𝐿−𝜂−ˆ
𝒟
𝐾
𝜏ˆ(𝑏
B. Energy Efficiency Criteria
We define the Successful transmission oVer Collision (SVC)
criteria to measure the energy efficiency of CRN, which
reflects the ratio between meaningful energy consumption and
energy waste. Denote the SVC criteria as ℰ 𝑆𝑉 𝐶 , which is
defined as the ratio between overall successful transmission
duration and the overall collision duration. The upper bound
of ℰ 𝑆𝑉 𝐶 can be expressed as the following [19]:
ℰ 𝑆𝑉 𝐶 =
VI. E NERGY-E FFICIENT R EWARD PARAMETER D ESIGN
Motivated by the findings in [11], we introduce the punishment 𝑒𝑤 for collisions. We are interested to see whether
this punishment can help to improve the energy efficiency
in CRNs compared to the general design that without the
punishment. Although the myopic policy does not always
preserve optimality, the performance of which still suffices
to reflect the impact of the punishment parameter on the
energy efficiency. On finding that our problem shares some
similar properties as [8], we utilize the method of analyzing
the steady-state reward in [8], where the only difference is
that our reward function is related to the sensing time 𝜏 (𝑡).
Detailed derivations of this section please refer to [19].
A. Analytical Expression of the Throughput
The concept of Transmission Period in [8] is applied here,
where under the situation of 𝑝00 ≥ 𝑝10 , the event of channel
switch is equivalent to a slot without positive reward. In our
)
𝜏ˆ(1) − lim 𝑘=1𝐾 1,𝑘 denotes the average length of trans𝑘→∞
mission duration of a transmission period, 𝑏𝑖,𝑘 denotes the
belief value for the 𝑖-th slot in 𝑘-th transmission period, 𝜏ˆ(𝑏𝑖,𝑘 )
denotes the optimal sensing time given the corresponding
belief value 𝑏𝑖,𝑘 . Moreover, the upper bound of ℋ𝐵 is given by
𝜏 (1)
ℋ𝐵 ≤ 1− 𝐿−𝜂−ˆ
, where 𝒟 = (𝐿−𝜂−ˆ
𝜏 (1))ℒ+ˆ
𝜏 (1)−ˆ
𝜏 (¯𝑏),
𝒟
¯
𝑏
ℒ = 1 + 1−𝑝00 (1−𝑃𝑓 (𝑀 ;ˆ𝜏 (1))) and 𝑃𝑓 (𝑚; 𝜏ˆ(𝑏)) denotes the
probability of false alarm achieved by 𝑚 cooperating SUs and
the sensing time 𝜏ˆ(𝑏).
2) Average Collision Amount: The average collision
amount is defined as the ratio of overall length of transmission
duration that results in collision with the PUs and the overall
length of transmission duration. The upper bound of the aver𝑃¯𝑑 )
age collision amount ℋ𝐶 is given by ℋ𝐶 = (𝐿−𝜂−ˆ𝜏 (1))(1−
.
𝒟
𝒟 − (𝐿 − 𝜂 − 𝜏ˆ(1))
.
(𝐿 − 𝜂 − 𝜏ˆ(1))(1 − 𝑃¯𝑑 )
(18)
VII. N UMERICAL R ESULTS
In this section, simulation results are presented to show
the impact of the punishment parameter 𝑒𝑤 on the energy
efficiency of the CRN. We consider the number of primary
channels to be 𝑁 = 10 and the number of SUs to be 𝑀 = 8.
The frame duration is chosen to be 𝐿 = 100𝑚𝑠, and the
duration for sensing scheduling and result fusion is 𝜂 = 0.1𝐿.
The signal model is the same as [3], where the low SNR
region is considered and the sampling rate is 𝑓𝑠 = 4MHz.
The required detection probability is set to 𝑃¯𝑑 = 0.9, and the
parameters related to the reward function are set to 𝐵 = 10,
𝑒𝑡 = 1 and 𝑐 = 10.
In Fig. 1(a), we compare the value of the efficiency criteria
ℰ 𝑆𝑉 𝐶 achieved in the upper bound derived and the one
achieved in the simulation, under the transition probabilities
𝑝00 = 0.8, 𝑝10 = 0.7 and SNR value 𝛾 = −14dB. We vary
2730
9
40
Energy efficiency
Energy efficiency
38.5
Upper Bound
Simulation
38.4
38.3
38.2
38.1
5
10
15
20
25
35
30
25
Change ew
20
Change B
15
20
(a) Simulation vs. upper bound
Fig. 1.
15
10
γ
step
(b) Tuning 𝑒𝑤 vs tuning 𝐵
Performance of energy efficiency by introducing punishment 𝑒𝑤
the value of the punishment parameter 𝑒𝑤 from 30 to 550,
where the step size is 40 in Fig. 1(a). Note that considering
the practical situation, the sensing duration cannot use up
the whole frame, otherwise even the sensing result is idle,
transmission will never be performed. In this case, the value
of 𝑒𝑤 cannot be too large so as to avoid the situation mentioned
above for all belief values. It can be seen from the figure that
the difference between the upper bound and the simulation
results is not big (within 1%). Moreover, the optimal value of
𝑒𝑤 is achieved at the boundary of making the model practical.
Fig. 1(a) confirms our idea that by appropriately choosing the
value of 𝑒𝑤 , better energy efficiency can be achieved.
We consider two different schemes in Fig. 1(b): 1) adjusting
the punishment 𝑒𝑤 which is introduced in our work; 2) adjusting the reward for successful transmission 𝐵 while setting
𝑒𝑤 = 0, which is the general design. The maximum values of
ℰ 𝑆𝑉 𝐶 of the two schemes are compared under different SNR
values, which range from −24dB to −8dB. It is shown in Fig.
1(b) that the first scheme leads to better energy efficiency than
the second one. It can be observed that the improvement is
large in the low SNR region while small in high SNR region.
The reason is that by tuning 𝑒𝑤 , the BS has better sensing
performance and makes cautious transmission decisions to
avoid collision and perform more successful transmissions;
while by tuning 𝐵, the collision will not be taken into account.
In low SNR regions, the sensing performance is not good, as a
result even the BS tries to gain higher throughput by tuning 𝐵,
the throughput is limited by bad sensing performance and the
energy efficiency is low. When SNR is high, only a very small
duration for sensing is needed to maximize the corresponding
value function and the sensing performance tends to become
close in both schemes. Fig. 1(b) reveals that by introducing
the punishment parameter, higher energy efficiency can be
obtained compared to the traditional design.
VIII. C ONCLUSION
In this paper several problems related to energy-efficient
CRN design are studied in the framework of POMDP. For
the myopic CSS problem, which is an NP-hard combinatorial
optimization problem, we provide an analytical framework and
obtain the solution analytically. The optimality of the myopic
CSS is proved under a the case of 2 channels and fixed sensing
duration. We also study the structure of the optimal sensing
duration and establish the condition for the optimality of the
myopic sensing duration. On deriving the upper bound of the
myopic policy performance, we show in the simulations that
by appropriately selecting the punishment parameter, energy
efficiency can be improved compared to the traditional design.
The CSS problem can be further exploited in the future
using our framework. In this paper only the case of assigning
all SUs to sense one channel is discussed. What will be the
optimal solution when 𝑀 increases and condition (C0) is
not satisfied remains an open question. Another interesting
direction is to investigate the shaping problem of the reward
function. Instead of the simple linear transformation adopted
in this paper, more delicate reward functions that can bring
higher energy efficiency to the CRNs may exist.
R EFERENCES
[1] J. Palicot, “Cognitive Radio: An Enabling Technology for the Green
Radio Communications Concept,” in Proc. International Conference on
Wireless Communications and Mobile Computing, Jun. 2009.
[2] K. B. Letaief and W. Zhang, “Cooperative Spectrum Sensing,” Cognitive
Wireless Communication Networks, Springer, pp. 115C138, Oct. 2007.
[3] Y.C. Liang, Y. Zeng, E.C.Y. Peh, and A.T. Hoang, “Sensing-Throughput
Tradeoff for Cognitive Radio Networks,” IEEE Trans. Wireless Commun.,
vol.7, no.4, pp.1326-1337, Apr. 2008.
[4] Q. Zhao, L. Tong, A. Swami, and Y. Chen, “Decentralized Cognitive
MAC for Opportunistic Spectrum Access in Ad Hoc Networks: A
POMDP Framework,” IEEE J. Sel. Areas Commun., vol.25, no.3, pp.589600, Apr. 2007.
[5] D.E. Dominici, “The Inverse of the Cumulative Standard Normal Probability Function,” Integral Transforms and Special Functions, vol. 14, no.
4, pp. 281-292, Aug. 2003.
[6] K. Murota, “Discrete Convex Analysis,” Mathematical Programming,
Springer, vol. 83, no. 1-3, pp. 313-371, Jan. 1998.
[7] K. Murota, “Recent Developments in Discrete Convex Analysis,” Research Trends in Combinatorial Optimization, pp. 219-260, Nov. 2008.
[8] K. Liu, Q. Zhao and B. Krishnamachari, “Dynamic Multichannel Access
With Imperfect Channel State Detection,” IEEE Trans. Signal Process.,
vol. 58, no. 5, pp. 2795-2808, Apr. 2010.
[9] R. Smallwood and E. Sondik, “The optimal control of partially observable
Markov processes over a finite horizon,” Operations Research, pp. 10711088, 1971.
[10] A.T. Hoang, Y.C. Liang, D.T.C Wong, Y. Zeng and R. Zhang, “Opportunistic Spectrum Access for Energy-constrained Cognitive Radios”,
IEEE Trans. Wireless Commun., vol. 8, no. 3, pp. 1206-1211, Mar. 2009.
[11] A.Y. Ng, D. Harada and S. Russell, “Policy Invariance Under Reward
Transformations: Theory and Application to Reward Shaping,” in Proc. of
The Sixteenth International Conference on Machine Learning, Jun. 1999.
[12] Y. Chen, Q. Zhao, A. Swami, “Joint design and separation principle for
opportunistic spectrum access in the presence of sensing errors,” IEEE
Trans. Inf. Theory, vol. 54, no. 5, pp. 2053-2071, 2008.
[13] S.H. Ahmad, M. Liu, T. Javidi, Q. Zhao and B. Krishnamachari,
“Optimality of Myopic Sensing in Multi-Channel Opportunistic Access,”
IEEE Trans. Information Theory, vol. 55, No. 9, pp. 4040-4050, 2009.
[14] Y.J. Choi, Y. Xin and S. Rangarajan, “Overhead-Throughput Tradeoff
in Cooperative Cognitive Radio Networks,” in Proc. of IEEE WCNC, pp.
1-6, Apr. 2009.
[15] E.C.Y. Peh, Y.C. Liang, Y.L. Guan and Y. Zeng, “Optimization of Cooperative Sensing in Cognitive Radio Networks: A Sensing-Throughput
Tradeoff View,” IEEE Trans. Vehicular Technology, vol. 58, no. 9, pp.
5294-5299, Nov. 2009.
[16] R. Fan and H. Jiang, “Optimal Multi-Channel Cooperative Sensing in
Cognitive Radio Networks,” IEEE Trans. Wireless Commun., vol. 9, no.
3, pp. 1128-1138, Mar. 2010.
[17] C. Song and Q. Zhang, “Cooperative Spectrum Sensing with MultiChannel Coordination in Cognitive Radio Networks,” in Proc. of IEEE
ICC 2010, pp. 1-5, Jul. 2010.
[18] Y. Wu and D.H.K. Tsang, “Distributed Power Allocation Algorithm for
Spectrum Sharing Cognitive Radio Networks with QoS Guarantee,” in
Proc. of IEEE INFOCOM 2009, pp. 981-989, Jun. 2009.
[19] T. Zhang and D.H.K. Tsang, “Optimal Cooperative Sensing Scheduling
for Energy-Efficient Cognitive Radio Networks”, Technical Report, Aug.
2010, http://eez058.ece.ust.hk/publication/OptScheduling.pdf
2731

Download Report

Optimal Cooperative Sensing Scheduling for Energy

Paperzz.com

Your Paperzz