Optimal Cooperative Sensing Scheduling for Energy

This paper was presented as part of the main technical program at IEEE INFOCOM 2011
Optimal Cooperative Sensing Scheduling for
Energy-Efficient Cognitive Radio Networks
Tengyi Zhang and Danny H. K. Tsang
Department of Electronic and Computer Engineering
The Hong Kong University of Science and Technology
Email: {zhangty, eetsang}@ust.hk
Abstractβ€”Due to the problem of spectrum scarcity and
large energy consumption in wireless communications, designing energy-efficient Cognitive Radio Networks (CRNs) becomes
important and necessary. In this paper, we consider the problem
of optimal Cooperative Sensing Scheduling (CSS) and parameter
design to achieve energy efficiency in CRNs using the framework
of Partially Observable Markov Decision Process (POMDP). In
particular, we consider the CSS problem for a CRN with 𝑀
Secondary Users (SUs) and 𝑁 primary channels to determine
how many SUs should be assigned to sense each channel in
order to maximize the objective function that is related to energy
efficiency. By assigning more SUs to sense one channel, higher
sensing accuracy can be gained; however, by spreading out the
SUs to sense more channels, spectrum opportunities can be better
exploited. The CSS problem is formulated as a combinatorial
optimization problem. While such problem is generally hard and
can only be solved by numerical methods with high computation
complexity, in this paper we provide a detailed analysis and
the analytical results provide useful and interesting insights. The
optimality of the myopic CSS is proved for the case of two
channels, and it is also conjectured for the general case. We
also study the tradeoff between the sensing and transmission
durations. In addition, the structure of the optimal sensing time
that maximizes the energy efficiency objective is also analyzed,
the condition for the optimality of the myopic sensing time is
obtained, and the performance upper bound of the myopic policy
is derived. Based on the numerical results, we show that by
carefully tuning a punishment parameter, better energy efficiency
can be achieved.
I. I NTRODUCTION
In recent years, the dramatic growth in various wired and
wireless communication applications leads to great increase
in related energy consumption. Therefore, it is time for the
communication world to investigate radio and networking
solutions which are energy-efficient and resource-efficient, i.e.
green communications.
As one of the emerging wireless technologies, Cognitive
Radio (CR) is considered to be a promising solution for
improving spectrum efficiency. By intelligently monitoring the
spectrum, Secondary Users (SUs) are able to opportunistically
access the idle spectrum originally assigned to Primary Users
(PUs). On the other hand, such new functionalities and additional tasks (e.g., spectrum sensing) make the CR enabling
devices energy-consuming. Meanwhile, with its agility and
intelligence, CR technology creates new possibilities and
methods to realize green communications [1]. These facts
motivate us to study the analysis and designing issues in
energy-efficient Cognitive Radio Networks (CRNs).
978-1-4244-9920-5/11/$26.00 ©2011 IEEE
Consider a centralized CRN with multiple SUs and a
Base Station (BS), which responses for the scheduling and
coordination among the SUs. Cooperative spectrum sensing
[2] is adopted to improve the sensing accuracy to better
protect the PUs and capture the spectrum opportunities in
a slotted primary system. In each slot, the BS determines:
(1) assign how many SUs to sense each channel; (2) how
long is the sensing duration; (3) whether to allow the SUs to
access the primary channels based on the sensing outcomes.
This sequential decision-making problem is studied in the
framework of Partially Observable Markov Decision Process
(POMDP). Meanwhile, the problem of how to improve energy
efficiency by optimally designing a parameter is also studied.
A. Fundamental Tradeoffs and Contributions
1) Cooperative sensing scheduling: At the beginning of each
slot, the BS decides how many SUs should be assigned to
sense each channel in order to maximize the immediate reward
(which is related to energy efficiency). This Cooperative
Sensing Scheduling (CSS) problem is actually a combinatorial optimization problem. Such problem is only solved by
numerical methods and the computational complexity is high.
However, for the problem studied in this paper, we provide
an analytical framework. With the help of Discrete Convex
Analysis theory, we obtain the optimal solution analytically
and some useful and interesting insights are gained. We show
that for a given number of channels that need to be sensed, the
best combination is to assign the SUs as equally as possible for
each channel. In addition, we show the necessary and sufficient
condition for the case that assigning all the SUs to cooperatively sense one channel is the optimal solution of the CSS
problem. By applying our framework, the optimal solution for
general cases, i.e. any number of SUs and channels, can be
obtained as well. To the best of our knowledge, this is the
first work that analytically obtains the optimal solution for
this combinatorial optimization problem.
2) Optimality of the myopic CSS: The myopic CSS is shown
to be the optimal CSS, under the case of 2 channels and
fixed sensing durations. This result is useful for reducing the
complexity of obtaining the optimal CSS, since the myopic
CSS is analytically obtained, while the general method requires recursive computation. From the numerical results we
conjecture that this optimality also holds for general number
of SUs and channels.
2723
2
3) Structure of the optimal sensing duration: As a sequential
decision problem, the BS needs to determine the sensing
duration after performing the CSS. Although the myopic
sensing duration can be found by numerical methods, it does
not always preserve the optimality. In this case, we provide
the analysis on the structure of the optimal sensing duration
and show that it is always larger than or equal to the myopic
sensing duration. The conclusion is similar as [10], but under
a different model. Moreover, we also establish the condition
for the optimality of the myopic sensing duration.
4) Designing the punishment parameter to improve energy
efficiency: Motivated by the result in [11], we introduce a
punishment parameter for the unsuccessful transmissions to
help the CRN to create less collision to the PUs so as to
achieve higher energy efficiency by saving the power for retransmission. We derive the myopic performance upper bound
by adopting the methods in [8]. Numerical results show that
the energy efficiency of the CRN can be improved by carefully
tuning the punishment parameter.
B. Related Works
Designing optimal MAC protocols for opportunistic spectrum access in the framework of POMDP started from [4].
A separation principle for a joint design problem of the
spectrum sensor operating point, sensing channel selection
and the access policy is established in [12]. [13] showed the
optimality of a simple but robust round-robin myopic channel
selection policy held for general number of positive correlated
channels, while [8] extended the optimality to the imperfect
sensing case.
Literature is also rich for the technique of cooperative
sensing. [2] provided a detailed survey about various schemes
to fuse the sensing information from SUs. The impact of
the cooperative sensing overhead on the system throughput
was studied in [14], with the consideration of the number
of reporting packets. [15] studied the tradeoff of finding the
optimal sensing time and the parameter for the result fusion in
order to maximize SUs’ throughput. [16] extended the analysis
to the case of multiple channels using soft decision fusion rule.
Existing works of cooperative sensing problem mainly
focused on how to achieve the best sensing performance
on a single channel. However, the discussion on how to
assign SUs to sense multiple channels, i.e. the CSS problem,
is still missing except a recent work [17]. Moreover, the
cooperative sensing problem is commonly formulated as a
static optimization problem including the literature mentioned.
It is of importance to consider this CSS problem under
varying spectrum environment with uncertainty, which results
in the sequential decision problem studied in this paper. As
mentioned in [17], this CSS problem is considered to be NPhard. Different from [17] and other works which adopted
numerical methods, we obtain the optimal CSS solution in an
analytical way. Only the probability of detection is considered
in [17], while we consider a more practical case, where the
spectrum sensing performance is described by the probability
of detection as well as the probability of false alarm. Moreover,
soft result fusion method was applied in [17], while we
utilize the hard result fusion method which requires much
less overhead. Unlike [8], our work considers a dynamic CSS
problem, and different number of channels may be sensed
in each slot due to the CSS result. In addition, we consider
an energy efficiency-oriented objective in POMDP, while the
sensing duration is variable in each slot. On analyzing the
structure of optimal sensing duration, a similar result as in [10]
is obtained. However, in this paper the case of multiple primary
channels and a different reward function are considered. We
also present the condition for the optimality of the myopic
duration.
II. S YSTEM MODEL
Assume there exists 𝑁 independent and stochastically
identical Gilbert-Elliot channels owned by PUs, denoted by
𝒩 = {1, 2, ..., 𝑁 }. The CRN consists of 𝑀 SUs and a BS,
where 𝑁 β‰₯ 𝑀 1 . The primary system operates in a time slotted
manner with a fixed slot duration 𝐿. The occupancy state of
each channel transits according to a two-state discrete-time
Markov chain with transition probabilities {𝑝𝑖𝑗 }𝑖,𝑗=0,1 at the
beginning of each slot where 𝑝00 β‰₯ 𝑝10 2 . This system model
is commonly used to abstract physical channels with memory,
and the slotted system structure is shown to fit well in the
application of CRN (see [4], [16] and references therein). Let
𝑠𝑛 (𝑑) ∈ {0 (idle), 1 (busy)} denote the occupancy state of
channel 𝑛 in time slot 𝑑. The primary system state in slot 𝑑
can be denoted as s(𝑑) β‰œ [𝑠1 (𝑑), ..., 𝑠𝑁 (𝑑)].
SUs are required to carry out spectrum sensing before
operating on the primary channels using energy detection
mechanism, which is widely adopted in CRNs. Each SU can
only sense one channel at a time due to physical limitations.
The spectrum sensor of each SU detects the presence of PU
signals by performing the binary hypothesis test as follows:
𝐻0 : 𝑠𝑛 (𝑑) = 0(idle), and 𝐻1 : 𝑠𝑛 (𝑑) = 1(busy).
(1)
The sensing performance of each SU can be described by the
probability of detection 𝑝𝑑 β‰œ Pr{decide 𝐻1 ∣𝐻1 is true} and
the probability of false alarm 𝑝𝑓 β‰œ Pr{decide 𝐻1 ∣𝐻0 is true}.
We focus on the complex-valued PSK signal and Circular
Symmetric Complex Gaussian (CSCG) noise case [3], without
loss of generality. Under this model, for a given probability
of detection 𝑝𝑑 , the probability of false alarm is given by:
(√
)
√
𝑝𝑓 = 𝒬 2𝛾 + 1π’¬βˆ’1 (𝑝𝑑 ) + 𝜏 𝑓𝑠 𝛾 ,
(2)
where 𝒬(β‹…) is the complementary distribution function of the
standard Gaussian, π’¬βˆ’1 (β‹…) denotes the inverse of 𝒬(β‹…), 𝛾
denotes the received signal-to-noise ratio (SNR) of the primary
signal at the SU, 𝜏 denotes the sensing time and 𝑓𝑠 denotes
the sampling rate.
In this paper, the cooperative sensing mechanism is adopted.
Based on the sensing outcomes from individual SU, the BS
1 Generally, the number of channels is greater than that of the SUs [8].
Moreover, our work also holds for the case that 𝑁 < 𝑀 .
2 It means the channels are not negative correlated. The case that 𝑝
00 < 𝑝10
can be similarly analyzed.
2724
3
performs a result fusion procedure to process the individual
outcomes jointly and obtain the final sensing outcome. On
receiving the results from 𝑀 SUs, the BS will apply the β€œOR”
rule [2] for fusion, which is a hard decision fusion rule3 and
can be mathematically expressed as:
𝑃𝑑 (𝑀 ) = 1 βˆ’
𝑀
∏
𝑖=1
(1 βˆ’ 𝑝𝑑,𝑖 ), 𝑃𝑓 (𝑀 ) = 1 βˆ’
𝑀
∏
(1 βˆ’ 𝑝𝑓,𝑖 ), (3)
𝑖=1
where 𝑝𝑑,𝑖 and 𝑝𝑓,𝑖 denote the probability of detection and
probability of false alarm obtained by SU 𝑖, respectively. In
this paper all SUs are assumed to be homogeneous, i.e. with
the same sensing performance. The case for heterogeneous
sensing performance can be easily incorporated.
III. P ROBLEM F ORMULATION
At the beginning of each slot, the BS will sequentially
determine π‘Žπ‘› (𝑑), the number of SUs that assigned to sense
channel 𝑛, with what sensor operating point and for how
long. All SUs will be assigned a channel for sensing and with
the same sensing duration for synchronization consideration.
Transmission decision is made based on the fusion outcome
after collecting the sensing reports from SUs. The BS will
randomly allocate the channels which are selected for transmission to SUs, while the fairness issue is beyond our scope.
At the end of a slot, SU(s) utilizing a channel will send an
ACK to the BS if the transmission is successful.
Since the channel sensing capability is limited (i.e. some of
the channels may not be sensed) and the sensing performance
is imperfect, the system state is not fully observable to the
CRN. The BS can only abstract the system state in a probabilistic way by incorporating the decision and observation
history. Hence, our problem fits into the framework of Partially
Observable Markov Decision Process (POMDP).
A. Observation
Let πœƒπ‘› (𝑑) denote the observation of channel 𝑛 obtained in
slot 𝑑. There are four possible outcomes: (i) πœƒπ‘› (𝑑) = 0, denotes
that data transmission is performed and ACK is received; (ii)
πœƒπ‘› (𝑑) = 1, denotes that data transmission is performed and no
ACK is received; (iii) πœƒπ‘› (𝑑) = 2, denotes that the channel is
determined as busy based on the result fusion outcome and will
not be utilized; (iv) πœƒπ‘› (𝑑) = 3, denotes that the BS determines
not to sense the channel. The system observation vector can
be expressed as 𝜽(𝑑) β‰œ [πœƒ1 (𝑑), ..., πœƒπ‘ (𝑑)]. Note that these four
observations can be distinguished since the BS governs the
transmission decisions.
B. Belief Vector
The sufficient statistic of the system state is described as
𝑛
the belief vector b(𝑑) β‰œ {𝑏10 (𝑑), ..., 𝑏𝑁
0 (𝑑)}, where 𝑏0 (𝑑) is the
conditional probability that 𝑠𝑛 (𝑑) = 0 given the decision and
observation history [4] and 𝑏𝑛1 (𝑑) = 1βˆ’π‘π‘›0 (𝑑). b(𝑑) is computed
at the end of slot 𝑑 after the observation is received and is used
3 Since hard decision only requires one bit information feedback, it is
favored for reducing the overhead.
for decisions making in slot 𝑑 + 1. Based on the action and
the observation received in slot 𝑑, the updating of the belief
vector b(𝑑) β‰œ 𝒯 (b(𝑑 βˆ’ 1)∣𝐴(𝑑), 𝜽(𝑑)) can be obtained through
the Bayes rule [19].
C. Reward
A reward will be received at the end of each slot. Since
our objective is to design energy-efficient CRNs, the energy
consumption is taken into account. Specifically, the reward
for channel 𝑛, 𝑅𝑛 (𝐴(𝑑), 𝜽(𝑑)), consists of the following: (i)
When πœƒπ‘› (𝑑) = 0, a positive reward (𝐿 βˆ’ πœ‚ βˆ’ 𝜏 (𝑑))(𝐡 βˆ’ 𝑒𝑑 )
will be received, where 𝜏 (𝑑) ∈ (0, 𝐿 βˆ’ πœ‚) denotes the sensing
duration, πœ‚ denotes the duration for sensing scheduling and
result fusion in the BS, 𝐡 denotes the reward for successful
transmission and βˆ’π‘’π‘‘ denotes the energy consumed for transmission, both are proportional to the transmission duration. (ii)
When πœƒπ‘› (𝑑) = 1, a negative reward (𝐿 βˆ’ πœ‚ βˆ’ 𝜏 (𝑑))(βˆ’π‘’π‘€ βˆ’ 𝑒𝑑 )
will be received. βˆ’π‘’π‘€ can be regarded as the punishment for
the interference generated to the PU, meanwhile can reflect
the energy waste due to collision. This is the key parameter
to achieve the energy-efficient CRN design, which will be
analyzed in detail in later sections. (iii) When a channel is
sensed, a negative reward βˆ’πœ (𝑑)𝑐 will be received, where 𝑐
denotes the energy consumed in spectrum sensing per unit
of time. (iv) When a channel is not selected for sensing, no
reward will be received.
D. POMDP Problem Formulation
We expect that the CRN can carry out as many successful
transmissions as possible while minimize the collisions caused
to the PU, since collisions will result in retransmission and
energy is wasted. Therefore, the objective of the POMDP
problem is to find out the optimal policy πœ‹ that can maximize
the total reward received in 𝑇 slots. A policy πœ‹ specifies a
sequence of functions πœ‹ = [πœ‹1 , ..., πœ‹π‘‡ ], where πœ‹π‘‘ maps the
belief vector b(𝑑 βˆ’ 1) to an action 𝐴(𝑑) in slot 𝑑. Our problem
can be consequently formulated as
βˆ‘π‘‡
πœ‹ βˆ— = arg max π”Όπœ‹ {
𝑅(𝐴(𝑑), 𝜽(𝑑))∣b(0)}
(4)
πœ‹
𝑑=1
βˆ‘π‘
with constraint
β‰₯ 𝑃¯π‘‘ and
𝑛=1 π‘Žπ‘› (𝑑) = 𝑀 , βˆ€π‘› ∈
𝒩 , 𝑑 = 1, ..., 𝑇 . b(0) is the initial belief vector whose entries
10
are set to the stationary distribution ¯π‘ = 𝑝01𝑝+𝑝
of the
10
𝑛
underlying Markov chain [4] [12]. Constraint 𝑃𝑑 (𝑑) β‰₯ 𝑃¯π‘‘
serves as the protection for the PUs, which is defined as the
probability of detection for all the PU channels 𝑃𝑑𝑛 (𝑑) should
be larger than some threshold 𝑃¯π‘‘ pre-determined by the PUs.
It has been shown that [4] by tuning the operating point of
the sensors to make the equality hold, then the optimal access
policy is to access channel 𝑛 if the result fusion outcome is idle
and not to access otherwise. Applying this result, the constraint
𝑃𝑑𝑛 (𝑑) β‰₯ 𝑃¯π‘‘ can be removed and the original problem becomes
an unconstrained POMDP problem. Moreover, (2) reveals that
the probability of false alarm can be obtained when the target
probability of detection and sensing duration are determined.
As a result, the action of the BS in each slot can now be
expressed as 𝐴(𝑑) β‰œ {a(𝑑), 𝜏 (𝑑)}, where a(𝑑) β‰œ {π‘Žπ‘› (𝑑)}π‘›βˆˆπ’© .
2725
𝑃𝑑𝑛 (𝑑)
4
E. Optimal Policy and Myopic Policy
To solve the objective function (4), we resort to the following value function to obtain the optimal policy:
𝑉𝑑 (b(𝑑 βˆ’ 1))
=
+
βˆ‘π‘
max{𝐼𝑑 (b(𝑑 βˆ’ 1))
𝐴(𝑑)
𝔼[𝑉𝑑+1 (𝒯 (b(𝑑 βˆ’ 1)∣𝐴(𝑑), 𝜽(𝑑))]}, (5)
with constraint
𝑛=1 π‘Žπ‘› (𝑑) = 𝑀 and 𝐼𝑑 (b(𝑑 βˆ’ 1)) =
𝔼[𝑅(𝐴(𝑑), 𝜽(𝑑))∣b(𝑑 βˆ’ 1)] denotes the expected immediate reward. The value function (5) represents the maximum expected
reward accumulated from slot 𝑑 up to the maximum time
horizon 𝑇 . The computation complexity required to obtain the
optimal policy is very high. One of the methods for addressing
this problem is to apply the myopic policy [4], which merely
focuses on the immediate reward and ignores the impact of
current policy on the future rewards. The myopic policy is
˜
given by 𝐴(𝑑)
= arg max𝐴(𝑑) 𝐼𝑑 (b(𝑑 βˆ’ 1)), with constraint
βˆ‘π‘
π‘Ž
(𝑑)
=
𝑀
. Generally, the myopic policy reduces the
𝑛=1 𝑛
computational complexity but possibly sacrificing optimality.
In the following sections, the myopic policy is shown to be in
fact optimal under some conditions.
IV. M YOPIC C OOPERATIVE S ENSING S CHEDULING
In the beginning of each slot, the first task of the BS
is to determine for each channel, how many users should
be assigned to perform spectrum sensing cooperatively. As
pointed out in [2] [14], the more SUs sensing the channel,
the better the spectrum sensing performance. However on the
other hand, some of the channels may not be sensed since
the number of SUs is limited, then the spectrum opportunities
cannot be fully exploited. The objective of the tradeoff in
CSS, i.e. between the sensing accuracy and the spectrum
opportunities, is to find an optimal scheduling of the SUs in
order to maximize the immediate reward received by the BS.
For a fixed sensing time 𝜏 (𝑑) = 𝜏˜, define πΌΛœπ‘‘ (b(𝑑 βˆ’ 1)) β‰œ
𝐼𝑑 (b(𝑑 βˆ’ 1))∣𝜏 (𝑑)=˜𝜏 . We can obtain the myopic CSS in slot 𝑑
by solving the following maximization problem:
βˆ‘π‘
(P1:) maxa(𝑑) πΌΛœπ‘‘ (b(𝑑 βˆ’ 1)), s.t.
π‘Žπ‘› = 𝑀.
(6)
𝑛=1
Without loss of generality, we consider the case that in any
slot 𝑑, the belief values of all the channels are the same. The
case of channels with different belief values will be discussed
later. After careful inspection on πΌΛœπ‘‘ (b(π‘‘βˆ’1)), we instead solve
the following problem (P2):
βˆ‘
(P2:) maxa(𝑑)
(1 βˆ’ 𝑃𝑓 (π‘Žπ‘› (𝑑)))
(7)
βˆ‘π‘
{𝑛:π‘Žπ‘› (𝑑)>0}
with constraint 𝑛=1 π‘Žπ‘› (𝑑) = 𝑀 . At the end of this section
we will show the optimal solution for (P2) is actually the
one for (P1). Although the optimization objective in (P2)
has a simpler form compared to (P1), (P2) is a combinatorial optimization problem and regarded to be NP-hard [17].
Moreover, even numerical methods can be applied to find the
optimal solution, such methods cannot provide any insight to
the system design, i.e. how will CSS change with the system
parameters like the number of total SUs 𝑀 .
In this section, we analytically study this combinatorial
problem and find some useful and interesting insights. We
also establish the conditions for some specific scheduling
combinations to be the myopic policy. Before exploiting the
pattern of the sensing scheduling, we need to first examine
the properties of the objective function (7). We begin with the
following two lemmas.
Lemma 1: Let π‘š be a continuous variable with domain
[1, +∞). Denote π‘Λœπ‘‘ (π‘š) as the relaxed individual probability
of detection as a function of π‘š:
π‘Λœπ‘‘ (π‘š) = 1 βˆ’ (1 βˆ’ 𝑃¯π‘‘ )1/π‘š ,
(8)
which is decreasing and convex.
Proof: Eq. (8) relaxes the integer variable in (3) to a
continuous one. Taking the first-order derivative of π‘Λœπ‘‘ (π‘š):
(
)
ln(1 βˆ’ 𝑃¯π‘‘ ) ln(1 βˆ’ 𝑃¯π‘‘ )
βˆ‡Λœ
𝑝𝑑 (π‘š) = exp
,
(9)
π‘š
π‘š2
where βˆ‡ denotes the differentiation of the function with
respect to its argument. Since 0 < 1 βˆ’ 𝑃¯π‘‘ < 1, we have
βˆ‡Λœ
𝑝𝑑 (π‘š) < 0, which means π‘Λœπ‘‘ (π‘š) is decreasing in π‘š.
Similarly, the second-order derivative of π‘Λœπ‘‘ (π‘š) is shown to
be positive [19]. From these two derivatives, π‘Λœπ‘‘ (π‘š) is proved
to be decreasing and convex.
Lemma 2: Denote π‘Λœπ‘“ (π‘š) as the relaxed individual probability of false alarm as a function of π‘š:
(√
)
√
π‘Λœπ‘“ (π‘š) = 𝒬 2𝛾 + 1π’¬βˆ’1 (˜
𝑝𝑑 (π‘š)) + πœΛœπ‘“π‘  𝛾 ,
(10)
which is obtained from (2) and is decreasing and convex.
Proof: From (10), we have
√
1
βˆ‡Λœ
𝑝𝑓 (π‘š) = βˆ’βˆš π’₯1 2𝛾 + 1βˆ‡π’¬βˆ’1 (˜
𝑝𝑑 (π‘š)),
(11)
2πœ‹
√
√
where π’₯1 = exp[βˆ’( 2𝛾 + 1π’¬βˆ’1
(˜
𝑝𝑑 (π‘š)) + ]πœΛœπ‘“π‘  𝛾)2 /2], and
[
√
βˆ’1
2
𝑝𝑑 (π‘š). The
βˆ‡π’¬βˆ’1 (˜
𝑝𝑑 (π‘š)) = βˆ’ 2πœ‹ exp (𝒬 (Λœπ‘2𝑑 (π‘š))) βˆ‡Λœ
derivative [of the inverse
of 𝒬(π‘₯) is given by βˆ‡π’¬βˆ’1 (π‘₯) =
]
√
(π’¬βˆ’1 (π‘₯))2
βˆ’ 2πœ‹ exp
, which can be derived from [5]. Since
2
βˆ‡Λœ
𝑝𝑑 (π‘š) < 0 from Lemma 1, we have βˆ‡π’¬βˆ’1 (˜
𝑝𝑑 (π‘š)) > 0.
Therefore βˆ‡Λœ
𝑝𝑓 (π‘š) < 0, i.e. π‘Λœπ‘“ (π‘š) is decreasing in π‘š.
Taking the second derivative of π‘Λœπ‘“ (π‘š) and after some
manipulation [19], we can show βˆ‡2 π‘Λœπ‘“ (π‘š) > 0. Therefore,
π‘Λœπ‘“ (π‘š) is decreasing and convex in π‘š.
These two lemmas present the properties of π‘Λœπ‘‘ (π‘š) and π‘Λœπ‘“ (π‘š),
with respect to the variable π‘š. Originally, the number of
cooperating SUs can only be integer, which is difficult to
analyze. For the sake of the analysis convenience, we relax
the number to a continuous variable π‘š while preserving
the definition of the individual probability of detection and
probability of false alarm, and using the results as the basis
for the following analysis. The actual individual probability
of detection and probability of false alarm can be regarded
as some discrete points on the relaxed functions π‘Λœπ‘‘ (π‘š) and
π‘Λœπ‘“ (π‘š). Based on these two lemmas, the property of the relaxed
probability of false alarm after result fusion, π‘ƒΛœπ‘“ (π‘š), can be
2726
5
characterized as follows:
Proposition 1: The probability of false alarm after result
fusion, π‘ƒΛœπ‘“ (π‘š), is decreasing and convex on the domain of π‘š,
if the following condition holds.
]2
[
π‘š
βˆ‡Λœ
𝑝𝑓 (π‘š)
ln(1 βˆ’ π‘Λœπ‘“ (π‘š)) βˆ’
1 βˆ’ π‘Λœπ‘“ (π‘š)
[√
]2
2βˆ‡Λœ
𝑝𝑓 (π‘š) + π‘šβˆ‡2 π‘Λœπ‘“ (π‘š)
π‘šβˆ‡Λœ
𝑝𝑓 (π‘š)
<
βˆ’
, βˆ€π‘š. (12)
1 βˆ’ π‘Λœπ‘“ (π‘š)
1 βˆ’ π‘Λœπ‘“ (π‘š)
Proof: Since we consider homogenous SUs, the relaxed
probability of false alarm after result fusion can be written
as π‘ƒΛœπ‘“ (π‘š) = 1 βˆ’ (1 βˆ’ π‘Λœπ‘“ (π‘š))π‘š . The first order derivative of
π‘ƒΛœπ‘“ (π‘š) can be shown to be smaller than zero from Lemma 1
and Lemma 2. By taking the second-order derivative of π‘ƒΛœπ‘“ (π‘š)
and after some algebraic manipulations [19], it can be shown
that βˆ‡2 π‘ƒΛœπ‘“ (π‘š) > 0, i.e. π‘ƒΛœπ‘“ (π‘š) is decreasing and convex on
the domain of π‘š if condition (12) holds.
It is natural to conjecture that π‘ƒΛœπ‘“ (π‘š) is decreasing and convex.
In fact, [2] [14] show that the more cooperating SUs, the
smaller the π‘ƒΛœπ‘“ (π‘š); otherwise we do not have incentive to
perform the cooperative sensing. Moreover, for π‘š β†’ +∞,
condition (12) should hold; otherwise π‘ƒΛœπ‘“ (π‘š) will eventually
go below zero which is impossible. From the extensive simulations we have performed, condition (12) holds for most of
the cases, i.e. π‘ƒΛœπ‘“ (π‘š) is usually decreasing and convex. Even
there is a concave section, it can be easily incorporated into our
following analytical framework. Without loss of generality, we
assume π‘ƒΛœπ‘“ (π‘š) to be decreasing and convex throughout this
paper. With the above results, we are ready to analyze the
combinatorial problem (P2). In order to clearly describe our
analysis, we introduce the following definitions.
Definition 1: All combinations, i.e. the way of assigning
SUs to sense the channels, are divided into groups. Define
𝐺𝑖 , 𝑖 = 1, ..., 𝑀 as the group consists of βˆ£πΊπ‘– ∣ combinations
in which exactly 𝑖 channels are sensed (i.e., each of the 𝑖
channels is sensed by at least one SU). At most 𝑀 channels
will be sensed since 𝑁 β‰₯ 𝑀 . The reason for such division
is that in some combinations, some channels may not have
any SU assigned for sensing. Here we consider a general
case, where the channels have different belief values. We order
the channels according to their belief values in a descending
order and define in 𝐺𝑖 , the first 𝑖 channels in the ordering are
selected for sensing. Define r β‰œ {π‘Ÿ1 , ..., π‘Ÿπ‘– } where π‘Ÿ1 denotes
the real channel number which is the first one in the ordering
(i.e., with the largest belief value), π‘Ÿ2 denotes the real channel
number of the second one, etc. Denote 𝐢𝑖,𝑙 = {π‘Žπ‘—π‘–,𝑙 } as the
𝑙-th combination in group 𝐺𝑖 , where 𝑙 = 1, ..., βˆ£πΊπ‘– ∣. Let π‘Žπ‘—π‘–,𝑙
denote the number of SUs assigned to sense channel π‘Ÿπ‘— in
combination 𝐢𝑖,𝑙 group 𝐺𝑖 and 𝑗 = 1, ..., 𝑖. Further assume
β€²
π‘Žπ‘—π‘–,𝑙 β‰₯ π‘Žπ‘—π‘–,𝑙 for 𝑗 < 𝑗 β€² . The reason is that the channels are
β€²
ordered according to the belief values, the case of π‘Žπ‘—π‘–,𝑙 β‰₯ π‘Žπ‘—π‘–,𝑙
will produce an objective value of (7) β€²greater than or equal to
that produced by the case of π‘Žπ‘—π‘–,𝑙 < π‘Žπ‘—π‘–,𝑙 , for 𝑗 < 𝑗 β€² . When the
belief values are the same, the logic also applies. Note that by
this assumption we have already excluded some
βˆ‘non-optimal
combinations. It is obvious that π‘Žπ‘—π‘–,𝑙 β‰₯ 1 and 𝑗 π‘Žπ‘—π‘–,𝑙 = 𝑀 ,
βˆ€πΊπ‘– , 𝐢𝑖,𝑙 .
Definition 2: Each combination may produce different value
of (7). As a result, let the operator ≻ denote that a combination
𝐢𝑖,𝑙 is larger than 𝐢𝑖′ ,𝑙′ , βˆ€π‘–, 𝑖′ , 𝑙, 𝑙′ . In other words, 𝐢𝑖,𝑙 ≻
𝐢𝑖′ ,𝑙′ means 𝐢𝑖,𝑙 can produce larger value of (7) than 𝐢𝑖′ ,𝑙′ .
Similarly, we define ≽ and β‰Ό for the relationship of β€œlarger
than or equal to” and β€œsmaller than or equal to”, respectively.
We first investigate in each group, which combination is the
largest one. Consider group 𝐺𝑖 . If π‘Žπ‘—π‘–,𝑙 can take continuous
values, it is straightforward that the combination 𝐢𝑖,𝑙 with
π‘Žπ‘—π‘–,𝑙 = 𝑀𝑖 , βˆ€π‘— is the largest one due to
1
𝑀
1
1
π‘ƒΛœπ‘“ ( ) ≀ π‘ƒΛœπ‘“ (π‘Ž1𝑖,𝑙 )+ π‘ƒΛœπ‘“ (π‘Ž2𝑖,𝑙 )+...+ π‘ƒΛœπ‘“ (π‘Žπ‘–π‘–,𝑙 ), βˆ€πΆπ‘–,𝑙 (13)
𝑖
𝑖
𝑖
𝑖
which results from π‘ƒΛœπ‘“ (π‘š) is convex. However, in (P2) only
integer values are allowed for π‘Žπ‘—π‘–,𝑙 . In this case, we resort to
the theory of Discrete Convex Analysis.
The theory of discrete convex analysis is introduced by
Murota[6], which incorporated discrete settings and the concept of combinatorial optimization into the framework of
convex analysis. A comprehensive survey is referred to [7].
Simply speaking, in our problem, the actual probability of
false alarm 𝑃𝑓 (π‘š), π‘š ∈ [1, +∞) and π‘š ∈ β„€ is a discrete
convex function, since π‘ƒΛœπ‘“ (π‘š) is convex and 𝑃𝑓 (π‘š) = π‘ƒΛœπ‘“ (π‘š)
βˆ€π‘š ∈ [1, +∞) and π‘š ∈ β„€. It can be interpreted also in the
sense that 𝑃𝑓 (π‘š) is actually taking the integer points on the
domain of π‘ƒΛœπ‘“ (π‘š).
Murota also introduced the concepts of L-convex functions
and M-convex functions. We briefly introduce these concepts
for the case of two scalar variables. Consider function 𝑓 (π‘₯)
where π‘₯ is a scalar. If 𝑓 (π‘₯) is an L-convex function, then
π‘Ž+𝑏
𝑓 (π‘Ž) + 𝑓 (𝑏) β‰₯ 𝑓 (⌈ π‘Ž+𝑏
2 βŒ‰) + 𝑓 (⌊ 2 βŒ‹), βˆ€π‘Ž, 𝑏 ∈ β„€. This property
is referred as discrete midpoint convexity. Then if 𝑓 (π‘₯) is an
M-convex function, it follows 𝑓 (π‘Ž)+𝑓 (𝑏) β‰₯ 𝑓 (π‘Ž+𝑐)+𝑓 (π‘βˆ’π‘),
βˆ€π‘Ž ≀ 𝑏, π‘Ž + 𝑐 ≀ 𝑏 βˆ’ 𝑐, π‘Ž, 𝑏, 𝑐 ∈ β„€. This property is referred as
equidistance convexity. We can easily establish the following
lemma due to the fact that we only have a scalar variable:
Lemma 3: 𝑃𝑓 (π‘š) is both L-convex and M-convex function.
Now we are ready to show which combination is the largest
one in each group. (13) provides the insight that the largest
combination should be the one that equally assigns the SUs
on each channel, which can lead to the largest value of the
objective function. For the case that 𝑀𝑖 is an integer, same
conclusion holds for 𝑃𝑓 (π‘š). By also considering the case that
𝑀
𝑖 is not divided, we have the following proposition:
Proposition 2: The largest combination 𝐢𝑖,max in group 𝐺𝑖
is given by
𝑀
𝑀 βˆ‘ 𝑗
π‘Žπ‘—π‘–,max = ⌈ βŒ‰ or π‘Žπ‘—π‘–,max = ⌊ βŒ‹,
π‘Žπ‘–,max = 𝑀, βˆ€π‘–, 𝑗.
𝑖
𝑖
𝑗
(14)
The largest combination has the following property:
βˆ‘
βˆ‘
𝑃𝑓 (π‘Žπ‘—π‘–,max ) <
𝑃𝑓 (π‘Žπ‘—π‘–,𝑙 ),
(15)
2727
𝑗
𝑗
6
where {π‘Žπ‘—π‘–,𝑙 } excludes the largest combination 𝐢𝑖,max .
Proof: It is easyβˆ‘to see that (14) has a unique solution
under the constraint 𝑗 π‘Žπ‘—π‘–,max = 𝑀 . Without loss of generality, consider a combination 𝐢𝑖,𝑙 in group 𝐺𝑖 . By recursively
utilizing the properties of discrete midpoint convexity and
equidistance convexity from Lemma 3, we have the following
procedure: 1) for π‘Ž1𝑖,𝑙 and π‘Žπ‘–π‘–,𝑙 , since 𝑃𝑓 (π‘Ž1𝑖,𝑙 ) + 𝑃𝑓 (π‘Žπ‘–π‘–,𝑙 ) β‰₯
𝑃𝑓 (⌈
π‘Ž1𝑖,𝑙 +π‘Žπ‘–π‘–,𝑙
2
π‘Ž1𝑖,𝑙 +π‘Žπ‘–π‘–,𝑙
βŒ‰) + 𝑃𝑓 (⌊
π‘Ž1𝑖,𝑙 +π‘Žπ‘–π‘–,𝑙
π‘Ž1𝑖,𝑙 +π‘Žπ‘–π‘–,𝑙
2
βŒ‹), we replace π‘Ž1𝑖,𝑙 and π‘Žπ‘–π‘–,𝑙 with
⌈ 2 βŒ‰ and ⌊ 2 βŒ‹ and obtain a new combination 𝐢𝑖,𝑙1 .
It is obvious that 𝐢𝑖,𝑙1 ≽ 𝐢𝑖,𝑙 . 2) Re-arrange the order within
𝐢𝑖,𝑙1 to meet the ordering requirement for π‘Žπ‘—π‘–,𝑙 mentioned in
Definition 1. Then repeat the same operation in 1) to gain
a larger 𝐢𝑖,𝑙2 . 3) The logic behind the previous two steps is
that we always take the largest and smallest π‘Žπ‘—π‘–,𝑙 in a group
and replace them with the average of them. By doing so, we
have a larger group due to Lemma 3. And it is apparent that by
performing the operations in 1) and 2) for several times, all π‘Žπ‘—π‘–,𝑙
will equal to the ones given in (14). The largest combination
in 𝐺𝑖 is then obtained and (15) follows.
Proposition 2 reveals a similar result as in (13), i.e. in each
group when exactly 𝑖 channels need to be sensed, one should
distribute the SUs among the channels as equally as possible,
in order to gain the minimum summation of the probabilities
of false alarm. This interesting result is similar to the waterfilling property [18] to some extend, where the same amount
of power will be allocated if the channels are homogeneous
in noise. Here for the homogeneous channels, SUs are equally
assigned for sensing.
On finding out the largest combination in each group, the
last step towards the optimal solution to (P2) is to find the
largest one among these 𝑀 combinations 𝐢𝑖,max , 𝑖 = 1, ..., 𝑀 .
The optimal solution and its related properties is given in the
following theorem.
Theorem 1: (i) The optimal solution of (P2) is 𝐢1,max , i.e.
all 𝑀 SUs sense one channel cooperatively, if the condition
𝑀
𝑀
βŒ‰) + 𝑃𝑓 (⌊ βŒ‹) βˆ’ 𝑃𝑓 (𝑀 ) βˆ’ 1 β‰₯ 0
(16)
2
2
holds, which is a necessary and sufficient condition.
(ii) If condition (C0) holds for 𝑀 β€² , then for all 𝑀 < 𝑀 β€² ,
condition (C0) also holds.
(iii) If condition (C0) does not hold for 𝑀 β€² , then for all
𝑀 > 𝑀 β€² , condition (C0) will never hold. In other words,
the BS will never assign all SUs to cooperatively sense one
channel if the network has 𝑀 or more SUs.
Proof: We first assume (i) and (ii) hold for 𝑀 βˆ’ 1. We
begin with the proof of sufficiency. Consider two consecutive
groups, 𝐺𝑖 and 𝐺𝑖′ where 𝑖′ = 𝑖 + 1 and 𝑖 β‰₯ 2. Consider 𝐺𝑖 .
From Lemma 3 and Proposition 2, some manipulations can be
perform on 𝐢𝑖,max to obtain
(C0 :) 𝑃𝑓 (⌈
𝐢𝑖,max
≽ 𝐢𝑖,𝑙
β€²
β€²
relationship in the following. Meanwhile 𝐢𝑖,𝑙 above is also
a valid combination and will not cause any problem. Since
(14) reveals that π‘Žπ‘—π‘–,max βˆ’ π‘Žπ‘—+1
𝑖,max ≀ 1, it then follows that
β€²
β€²
β€²
β€²
β€²
π‘Žπ‘–β€² βˆ’1 +π‘Žπ‘–β€²
β€²
β€²
β€²
β€²
𝑖 βˆ’1
𝑖
𝑖
𝑃𝑓 (π‘Žπ‘–π‘–β€² βˆ’1
,max ) + 𝑃𝑓 (π‘Žπ‘–β€² ,max ) βˆ’ 𝑃𝑓 (π‘Žπ‘–β€² ,max + π‘Žπ‘–β€² ,max ) βˆ’ 1 β‰₯ 0,
which implies 𝐢𝑖,𝑙 ≽ 𝐢𝑖′ ,max . Since 𝐢𝑖,max ≽ 𝐢𝑖,𝑙 , this result
builds a bridge between two consecutive groups and reveals
that the largest combination 𝐢𝑖,max for 𝑖 β‰₯ 2 is 𝐢2,max . Then
𝑀
for 𝑀 , if the following inequality 𝑃𝑓 (⌈ 𝑀
2 βŒ‰) + 𝑃𝑓 (⌊ 2 βŒ‹) βˆ’
𝑃𝑓 (𝑀 ) βˆ’ 1 β‰₯ 0 holds, then 𝐢1,max ≽ 𝐢2,max . The inequality
is actually the same as given in (i).
We prove the necessity. In the results above, we have shown
that 𝐢2,max is the largest combination excluding 𝐢1,max .
Therefore in order for the combination 𝐢1,max to be the
optimal solution, the only requirement is 𝐢1,max ≽ 𝐢2,max ,
which is satisfied if condition (C0) holds.
Now we prove (iii). Assume for 𝑀 condition (C0) does
𝑀
not hold, i.e. 𝑃𝑓 (⌈ 𝑀
2 βŒ‰) + 𝑃𝑓 (⌊ 2 βŒ‹) βˆ’ 𝑃𝑓 (𝑀 ) βˆ’ 1 < 0. Then
consider the left hand side in the case of 𝑀 +1. Since ⌈ 𝑀
2 βŒ‰=
𝑀 +1
⌊ 𝑀2+1 βŒ‹ and ⌊ 𝑀
2 βŒ‹ + 1 = ⌈ 2 βŒ‰ can be easily verified, after
some manipulation we arrive at
𝑀
𝑀
𝑃𝑓 (⌈ βŒ‰) + 𝑃𝑓 (⌊ βŒ‹) βˆ’ 𝑃𝑓 (𝑀 ) βˆ’ 1
2
2
[
]
𝑀 +1
𝑀 +1
βˆ’
𝑃𝑓 (⌈
βŒ‰) + 𝑃𝑓 (⌊
βŒ‹) βˆ’ 𝑃𝑓 (𝑀 + 1) βˆ’ 1
2
2
𝑀
𝑀 +1
= 𝑃𝑓 (⌊ βŒ‹) βˆ’ 𝑃𝑓 (⌈
βŒ‰) βˆ’ [𝑃𝑓 (𝑀 ) βˆ’ 𝑃𝑓 (𝑀 + 1)]
2
2
which is larger than 0 from the decreasing and convex property
of π‘ƒΛœπ‘“ (π‘š). This result implies that 𝑃𝑓 (⌈ 𝑀2+1 βŒ‰)+𝑃𝑓 (⌊ 𝑀2+1 βŒ‹)βˆ’
𝑃𝑓 (𝑀 + 1) βˆ’ 1 < 0, which means 𝐢1,max β‰Ό 𝐢2,max for
𝑀 + 1. In this case, assigning all SUs to sense one channel
cooperatively is always not the best action, which finishes the
proof of (iii). In fact, (ii) follows the same argument as (iii)
and hence be proved as well. To this end we complete our
self-contained proof for Theorem 1.
Theorem 1 introduces the condition that the BS assigns
all SUs to cooperatively sense one channel in order to gain
the largest objective value, given the number of SUs 𝑀 . The
structural results show that sensing less channel to gain higher
sensing accuracy is better in the sense of the objective function
in (P2) and condition (C0). In addition, (ii) and (iii) reveals
that for given system parameters, there exists a threshold value
of 𝑀 , which determines whether the combination 𝐢1,max
is the optimal solution. Now turn back to (P1). After some
manipulation, we have
β€²
𝑖 βˆ’1
𝑖
= {π‘Ž1𝑖′ ,max , π‘Ž2𝑖′ ,max , ..., π‘Žπ‘–π‘–β€² βˆ’2
,max , (π‘Žπ‘–β€² ,max + π‘Žπ‘–β€² ,max )}.
β€²
β€²
π‘Žπ‘–β€² βˆ’1 +π‘Žπ‘–β€²
β€²
𝑖 ,max
𝑖 ,max
π‘Žπ‘–π‘–β€² βˆ’1
βŒ‹ and π‘Žπ‘–π‘–β€² ,max = ⌈ 𝑖 ,max 2 𝑖 ,max βŒ‰.
,max = ⌊
2
In this case, from the assumption that (ii) holds we have
β€²
𝑖 βˆ’2
𝑖
Note that (π‘Žπ‘–π‘–β€² βˆ’1
,max +π‘Žπ‘–β€² ,max ) is larger than π‘Žπ‘–β€² ,max , however we
do not change to the correct ordering for easier showing the
2728
πΌΛœπ‘‘ (b(𝑑 βˆ’ 1))
βˆ‘
{
}
=
(1 βˆ’ 𝑃𝑓 (π‘Žπ‘› (𝑑)))(𝐿 βˆ’ πœ‚ βˆ’ 𝜏˜)(𝐡 βˆ’ 𝑒𝑑 )π’₯2
{𝑛:π‘Žπ‘› (𝑑)>0}
+
βˆ‘
{𝑛:π‘Žπ‘› (𝑑)>0}
{
}
(1 βˆ’ 𝑃¯π‘‘ )(𝐿 βˆ’ πœ‚ βˆ’ 𝜏˜)(βˆ’π‘’π‘‘ βˆ’ 𝑒𝑀 )π’₯2 βˆ’ πœΛœπ‘ ,
7
where π’₯2 = 𝑝10 + 𝑏𝑛0 (𝑑 βˆ’ 1)(𝑝00 βˆ’ 𝑝10 ). It is obvious that
πΌΛœπ‘‘ (b(𝑑 βˆ’ 1))∣a(𝑑)=𝐢1,max β‰₯ πΌΛœπ‘‘ (b(𝑑 βˆ’ 1))∣a(𝑑)=𝐢𝑖,𝑙 , for all 𝑖, 𝑙,
since the second term is negative. As a result, the optimal
solution obtained for (P2) is also the optimal one for (P1).
During the derivation of the condition (C0), we analytically
show the methods for obtaining the second largest combination
among all possible combinations, then compare it with our
desired combination (i.e., 𝐢1,max ) to gain the final condition. It
can be seen from the derivation procedure that when the channels have different belief values and be ordered accordingly
in an descending way, (C0) is easier to be satisfied. Hence
considering the channels with the same belief values is the
most stringent case. By utilizing the intrinsic properties of our
problem and Lemma 3, the optimal solution of a combinatorial
optimization problem is obtained in an analytical way. Another
significance of the result is that given the number of SUs 𝑀 , it
can be immediately shown that whether we should put all the
efforts (assigning all SUs) to sense one channel, i.e. whether
𝐢1,max is the optimal solution by simply testing (C0).
Although only the case that (C0) is satisfied is considered
here, our framework can be extended to more general case,
i.e. when 𝑀 is large and (C0) is violated. An interesting
observation from our framework is that when 𝑀 becomes
large, more channels should be sensed. The reason is in
this situation, exploiting more spectrum opportunities is more
beneficial, which is in fact the tradeoff lies in our problem.
Showing the optimal solution for general cases is left for our
future work.
Specifically, we would like to point out that although we
assume π‘ƒΛœπ‘“ (π‘š) is convex, our contribution will not be compensated even if π‘ƒΛœπ‘“ (π‘š) is not always convex on the domain of
π‘š. As mentioned before, the concave section can only occur
in the beginning of the domain due to the value of π‘ƒΛœπ‘“ (π‘š) is
bounded by zero. By working out the range where π‘ƒΛœπ‘“ (π‘š) is
concave, we can follow the same procedure in the derivation
of the largest combination and the proof of Theorem 1, and
obtain the new combination ordering and the optimal solution
corresponding to the new property of π‘ƒΛœπ‘“ (π‘š). In the following
sections, we study the case that 𝑀 satisfies condition (C0),
and assume it holds for all 𝜏˜ within the range of [0, 𝐿 βˆ’ πœ‚].
V. S TRUCTURE OF THE O PTIMAL P OLICY
A. Optimal Cooperative Sensing Scheduling
It is natural to consider whether the myopic CSS is also the
optimal one under the same condition. It has been confirmed
in our extensive simulation results, that when condition (C0)
is satisfied the optimal CSS is to assign all SUs to sense one
channel. However, it turns out to be difficult to analytically
prove this result since the imperfect spectrum sensor introduces complexity in the belief vector update. Moreover, we
have one more observation compared to [8], which implies
more information needs to be handled and the problem studied
in this paper appears to be more challenging.
Although it is difficult to deal with the extension of the
myopic CSS to the optimal case, now we prove a simple but
nontrivial case, where two SUs and two channels exist, while
considering fixed sensing duration. The reward parameters 𝑒𝑑 ,
𝑒𝑀 and 𝑐 is set to zero for expression simplicity. In fact, the
parameters with general values can be easily incorporated [11].
𝑝01
Similarly as [8], here we assume 𝑃𝑓 (π‘š) ≀ 𝑝𝑝10
, βˆ€π‘š.
00 𝑝11
Theorem 2: Consider the network with 2 SUs and 2 channels. The optimal CSS at any slot 𝑑 is to assign all SUs to
cooperatively sense the channel given by arg max𝑛 𝑏𝑛0 (𝑑 βˆ’ 1),
if condition (C0) holds.
Proof: First assume we always assign all SUs to sense
one channel. From the result in [8], the BS will choose the
channel with the largest belief value, i.e. arg max𝑛 𝑏𝑛0 (𝑑 βˆ’ 1).
Now we need to compare the expected value obtained under
the following two cases: (i) two SUs sense channel 1; (ii) each
SU senses one channel.
At the last time slot 𝑑 = 𝑇 , the optimal action is actually the
myopic action. Suppose for 𝑑 + 1 < 𝑇 , the BS will assign all
SUs to cooperatively sense the channel with the largest belief
value. We need to show it also holds for 𝑑. Same as [8], denote
𝑉ˆ𝑑 (b(𝑑 βˆ’ 1); 𝑔) as the expected total reward obtain by sensing
action 𝑔 in slot 𝑑 followed by the myopic policy in future slots.
Similarly, denote 𝐼𝑑 (b(𝑑 βˆ’ 1); 𝑔) as the expected immediate
reward obtained by sensing action 𝑔 in slot 𝑑. Further define
𝑔 = 1 as the action that two SUs sense channel 1 and 𝑔 = 2
as the action that each SU senses one channel. In order to
show the optimality of the myopic policy, we need to show
𝑉ˆ𝑑 (b(𝑑 βˆ’ 1); 𝑔 = 1) βˆ’ 𝑉ˆ𝑑 (b(𝑑 βˆ’ 1); 𝑔 = 2) β‰₯ 0.
For both actions, the observation and system state at 𝑑
determine the channel selected in 𝑑 + 1. Therefore, similar
arguments in the proof of Theorem 2 in [8] can be applied. It
follows that
𝑉ˆ𝑑 (b(𝑑 βˆ’ 1); 𝑔 = 1) βˆ’ 𝑉ˆ𝑑 (b(𝑑 βˆ’ 1); 𝑔 = 2)
= [𝐼𝑑 (b(𝑑 βˆ’ 1); 𝑔 = 1) βˆ’ 𝐼𝑑 (b(𝑑 βˆ’ 1); 𝑔 = 2)] + Ξ›
+ [𝑏10 (𝑑 βˆ’ 1)(1 βˆ’ 𝑏20 (𝑑 βˆ’ 1))𝑃𝑓 (2) βˆ’ (1 βˆ’ 𝑏10 (𝑑 βˆ’ 1))
𝑏20 𝑃𝑑 𝑃𝑓 (1)]Ξ”(𝑝00 𝑝11 βˆ’ 𝑝10 𝑝01 ),
(17)
where Ξ› is a positive term and Ξ” = 𝑉ˆ𝑑 (1∣[0, 1]) βˆ’ 𝑉ˆ𝑑 (1∣[1, 0]).
𝑉ˆ𝑑 (π‘”βˆ£[𝑠1 , 𝑠2 ]) denotes the expected total reward starting from
𝑑 under the action 𝑔 and system state s(𝑑 βˆ’ 1) in slot 𝑑 βˆ’ 1,
the same as in [8]. The first term in the right hand side of
(17) is nonnegative due to condition (C0) holds. Denote the
last term as π’₯3 . After some manipulation [19], we have π’₯3 β‰₯
𝑝10 𝑝01 𝑃𝑓 (2)𝑃𝑑 𝑃𝑓 (1) > 0, which leads to 𝑉ˆ𝑑 (b(𝑑 βˆ’ 1); 𝑔 =
1) βˆ’ 𝑉ˆ𝑑 (b(𝑑 βˆ’ 1); 𝑔 = 2) > 0.
Theorem 2 reveals that in this 2 SUs 2 channels simplified
model, the myopic CSS is actually optimal, which highly
simplifies the procedure of finding the optimal policy. This
conclusion provides nontrivial results and insights to the problem at hand. However to show the optimality of the myopic
sensing scheduling for general 𝑀 and 𝑁 is highly challenging
and will be left as our future work. In the following sections
we focus on the case that the optimal CSS is assigning all
SUs to sense one channel.
2729
8
B. Structure of the Optimal Sensing Time
The myopic policy consists of two parts: CSS and the
sensing time πœΛœβˆ— (𝑑), where the latter one is given by πœΛœβˆ— (𝑑) =
arg max𝜏 (𝑑) 𝐼𝑑 (b(𝑑 βˆ’ 1)), which is the solution of a statistic
optimization problem. It can be shown that there is only one
maximum 𝜏 (𝑑) within the range of [0, 𝐿 βˆ’ πœ‚] by utilizing the
method similar to [15]. The myopic solution can be obtained
by many popular effective searching algorithms with low
complexity. Although the optimal sensing time is very difficult
to obtain, some insights about its structure can be shown in
the following proposition.
Proposition 3: (i) The optimal sensing time 𝜏 βˆ— (𝑑) that maximizes the total expected reward is the same as the myopic
sensing time πœΛœβˆ— (𝑑) that maximizes the immediate expected
reward, if 𝑝00 𝑝11 = 𝑝10 𝑝01 . (ii) The optimal sensing time
𝜏 βˆ— (𝑑) is no smaller than πœΛœβˆ— (𝑑), βˆ€π‘‘.
Proof: (i): 𝑝00 𝑝11 = 𝑝10 𝑝01 implies 𝑝00 = 𝑝01 = 𝑝11 =
𝑝10 . All the belief vectors after state transition will be set to the
same value 𝑝10 , in which case the expected future reward in
any 𝑑 will not be influenced by the probability of false alarm,
which is related to sensing duration. Since πœΛœβˆ— (𝑑) maximizes
the immediate reward, the optimal sensing time 𝜏 βˆ— (𝑑) should
be equivalent to πœΛœβˆ— (𝑑).
Sketch of the Proof of (ii): Consider 𝜏 (𝑑) ≀ πœΛœβˆ— (𝑑). It can
be proved that the expected future reward obtained under
πœΛœβˆ— (𝑑) is larger than that under 𝜏 (𝑑). Since πœΛœβˆ— (𝑑) maximizes
the immediate reward, it also maximizes the total expected
remaining reward. Therefore, only when 𝜏 (𝑑) > πœΛœβˆ— (𝑑) could
𝜏 (𝑑) result in a larger total expected remaining reward than
πœΛœβˆ— (𝑑). Details please refer to [19].
Here we arrive at a similar conclusion as mentioned in
[10]. Although the myopic sensing time cannot always be
generalized to the optimal sensing time, the analysis reveals
some useful insights to the structure of the optimal sensing
time. Moreover, the myopic sensing time can serve as a
good tradeoff between the computational complexity and the
optimality while the performance is acceptable [10] [4].
model the situation is more complicated, and that slot may
refer to: (i) no ACK is received, and negative reward is obtain;
and (ii) the sensing outcome is busy and thus no transmission
is performed, only the energy for sensing is consumed. Let
β„’π‘˜ denotes the length of π‘˜-th transmission period.
1) Average Successful Transmission Amount: We define
the average successful transmission amount ℋ𝐡 as the ratio
between the overall length of successful transmission duration
and the overall length of transmission duration (including
the successful
βˆ‘πΎ ones, failed ones and silent ones). Let β„’π‘˜ =
β„’π‘˜
limπ‘˜β†’βˆž π‘˜=1
denote the average length of a transmission
𝐾
period. The average successful transmission amount ℋ𝐡 is
𝜏 (1)
, where π’Ÿ = (πΏβˆ’πœ‚ βˆ’ πœΛ†(1))β„’π‘˜ +
given by ℋ𝐡 βˆ‘
= 1βˆ’ πΏβˆ’πœ‚βˆ’Λ†
π’Ÿ
𝐾
πœΛ†(𝑏
B. Energy Efficiency Criteria
We define the Successful transmission oVer Collision (SVC)
criteria to measure the energy efficiency of CRN, which
reflects the ratio between meaningful energy consumption and
energy waste. Denote the SVC criteria as β„° 𝑆𝑉 𝐢 , which is
defined as the ratio between overall successful transmission
duration and the overall collision duration. The upper bound
of β„° 𝑆𝑉 𝐢 can be expressed as the following [19]:
β„° 𝑆𝑉 𝐢 =
VI. E NERGY-E FFICIENT R EWARD PARAMETER D ESIGN
Motivated by the findings in [11], we introduce the punishment 𝑒𝑀 for collisions. We are interested to see whether
this punishment can help to improve the energy efficiency
in CRNs compared to the general design that without the
punishment. Although the myopic policy does not always
preserve optimality, the performance of which still suffices
to reflect the impact of the punishment parameter on the
energy efficiency. On finding that our problem shares some
similar properties as [8], we utilize the method of analyzing
the steady-state reward in [8], where the only difference is
that our reward function is related to the sensing time 𝜏 (𝑑).
Detailed derivations of this section please refer to [19].
A. Analytical Expression of the Throughput
The concept of Transmission Period in [8] is applied here,
where under the situation of 𝑝00 β‰₯ 𝑝10 , the event of channel
switch is equivalent to a slot without positive reward. In our
)
πœΛ†(1) βˆ’ lim π‘˜=1𝐾 1,π‘˜ denotes the average length of transπ‘˜β†’βˆž
mission duration of a transmission period, 𝑏𝑖,π‘˜ denotes the
belief value for the 𝑖-th slot in π‘˜-th transmission period, πœΛ†(𝑏𝑖,π‘˜ )
denotes the optimal sensing time given the corresponding
belief value 𝑏𝑖,π‘˜ . Moreover, the upper bound of ℋ𝐡 is given by
𝜏 (1)
ℋ𝐡 ≀ 1βˆ’ πΏβˆ’πœ‚βˆ’Λ†
, where π’Ÿ = (πΏβˆ’πœ‚βˆ’Λ†
𝜏 (1))β„’+Λ†
𝜏 (1)βˆ’Λ†
𝜏 (¯π‘),
π’Ÿ
¯
𝑏
β„’ = 1 + 1βˆ’π‘00 (1βˆ’π‘ƒπ‘“ (𝑀 ;Λ†πœ (1))) and 𝑃𝑓 (π‘š; πœΛ†(𝑏)) denotes the
probability of false alarm achieved by π‘š cooperating SUs and
the sensing time πœΛ†(𝑏).
2) Average Collision Amount: The average collision
amount is defined as the ratio of overall length of transmission
duration that results in collision with the PUs and the overall
length of transmission duration. The upper bound of the aver𝑃¯π‘‘ )
age collision amount ℋ𝐢 is given by ℋ𝐢 = (πΏβˆ’πœ‚βˆ’Λ†πœ (1))(1βˆ’
.
π’Ÿ
π’Ÿ βˆ’ (𝐿 βˆ’ πœ‚ βˆ’ πœΛ†(1))
.
(𝐿 βˆ’ πœ‚ βˆ’ πœΛ†(1))(1 βˆ’ 𝑃¯π‘‘ )
(18)
VII. N UMERICAL R ESULTS
In this section, simulation results are presented to show
the impact of the punishment parameter 𝑒𝑀 on the energy
efficiency of the CRN. We consider the number of primary
channels to be 𝑁 = 10 and the number of SUs to be 𝑀 = 8.
The frame duration is chosen to be 𝐿 = 100π‘šπ‘ , and the
duration for sensing scheduling and result fusion is πœ‚ = 0.1𝐿.
The signal model is the same as [3], where the low SNR
region is considered and the sampling rate is 𝑓𝑠 = 4MHz.
The required detection probability is set to 𝑃¯π‘‘ = 0.9, and the
parameters related to the reward function are set to 𝐡 = 10,
𝑒𝑑 = 1 and 𝑐 = 10.
In Fig. 1(a), we compare the value of the efficiency criteria
β„° 𝑆𝑉 𝐢 achieved in the upper bound derived and the one
achieved in the simulation, under the transition probabilities
𝑝00 = 0.8, 𝑝10 = 0.7 and SNR value 𝛾 = βˆ’14dB. We vary
2730
9
40
Energy efficiency
Energy efficiency
38.5
Upper Bound
Simulation
38.4
38.3
38.2
38.1
5
10
15
20
25
35
30
25
Change ew
20
Change B
15
20
(a) Simulation vs. upper bound
Fig. 1.
15
10
Ξ³
step
(b) Tuning 𝑒𝑀 vs tuning 𝐡
Performance of energy efficiency by introducing punishment 𝑒𝑀
the value of the punishment parameter 𝑒𝑀 from 30 to 550,
where the step size is 40 in Fig. 1(a). Note that considering
the practical situation, the sensing duration cannot use up
the whole frame, otherwise even the sensing result is idle,
transmission will never be performed. In this case, the value
of 𝑒𝑀 cannot be too large so as to avoid the situation mentioned
above for all belief values. It can be seen from the figure that
the difference between the upper bound and the simulation
results is not big (within 1%). Moreover, the optimal value of
𝑒𝑀 is achieved at the boundary of making the model practical.
Fig. 1(a) confirms our idea that by appropriately choosing the
value of 𝑒𝑀 , better energy efficiency can be achieved.
We consider two different schemes in Fig. 1(b): 1) adjusting
the punishment 𝑒𝑀 which is introduced in our work; 2) adjusting the reward for successful transmission 𝐡 while setting
𝑒𝑀 = 0, which is the general design. The maximum values of
β„° 𝑆𝑉 𝐢 of the two schemes are compared under different SNR
values, which range from βˆ’24dB to βˆ’8dB. It is shown in Fig.
1(b) that the first scheme leads to better energy efficiency than
the second one. It can be observed that the improvement is
large in the low SNR region while small in high SNR region.
The reason is that by tuning 𝑒𝑀 , the BS has better sensing
performance and makes cautious transmission decisions to
avoid collision and perform more successful transmissions;
while by tuning 𝐡, the collision will not be taken into account.
In low SNR regions, the sensing performance is not good, as a
result even the BS tries to gain higher throughput by tuning 𝐡,
the throughput is limited by bad sensing performance and the
energy efficiency is low. When SNR is high, only a very small
duration for sensing is needed to maximize the corresponding
value function and the sensing performance tends to become
close in both schemes. Fig. 1(b) reveals that by introducing
the punishment parameter, higher energy efficiency can be
obtained compared to the traditional design.
VIII. C ONCLUSION
In this paper several problems related to energy-efficient
CRN design are studied in the framework of POMDP. For
the myopic CSS problem, which is an NP-hard combinatorial
optimization problem, we provide an analytical framework and
obtain the solution analytically. The optimality of the myopic
CSS is proved under a the case of 2 channels and fixed sensing
duration. We also study the structure of the optimal sensing
duration and establish the condition for the optimality of the
myopic sensing duration. On deriving the upper bound of the
myopic policy performance, we show in the simulations that
by appropriately selecting the punishment parameter, energy
efficiency can be improved compared to the traditional design.
The CSS problem can be further exploited in the future
using our framework. In this paper only the case of assigning
all SUs to sense one channel is discussed. What will be the
optimal solution when 𝑀 increases and condition (C0) is
not satisfied remains an open question. Another interesting
direction is to investigate the shaping problem of the reward
function. Instead of the simple linear transformation adopted
in this paper, more delicate reward functions that can bring
higher energy efficiency to the CRNs may exist.
R EFERENCES
[1] J. Palicot, β€œCognitive Radio: An Enabling Technology for the Green
Radio Communications Concept,” in Proc. International Conference on
Wireless Communications and Mobile Computing, Jun. 2009.
[2] K. B. Letaief and W. Zhang, β€œCooperative Spectrum Sensing,” Cognitive
Wireless Communication Networks, Springer, pp. 115C138, Oct. 2007.
[3] Y.C. Liang, Y. Zeng, E.C.Y. Peh, and A.T. Hoang, β€œSensing-Throughput
Tradeoff for Cognitive Radio Networks,” IEEE Trans. Wireless Commun.,
vol.7, no.4, pp.1326-1337, Apr. 2008.
[4] Q. Zhao, L. Tong, A. Swami, and Y. Chen, β€œDecentralized Cognitive
MAC for Opportunistic Spectrum Access in Ad Hoc Networks: A
POMDP Framework,” IEEE J. Sel. Areas Commun., vol.25, no.3, pp.589600, Apr. 2007.
[5] D.E. Dominici, β€œThe Inverse of the Cumulative Standard Normal Probability Function,” Integral Transforms and Special Functions, vol. 14, no.
4, pp. 281-292, Aug. 2003.
[6] K. Murota, β€œDiscrete Convex Analysis,” Mathematical Programming,
Springer, vol. 83, no. 1-3, pp. 313-371, Jan. 1998.
[7] K. Murota, β€œRecent Developments in Discrete Convex Analysis,” Research Trends in Combinatorial Optimization, pp. 219-260, Nov. 2008.
[8] K. Liu, Q. Zhao and B. Krishnamachari, β€œDynamic Multichannel Access
With Imperfect Channel State Detection,” IEEE Trans. Signal Process.,
vol. 58, no. 5, pp. 2795-2808, Apr. 2010.
[9] R. Smallwood and E. Sondik, β€œThe optimal control of partially observable
Markov processes over a finite horizon,” Operations Research, pp. 10711088, 1971.
[10] A.T. Hoang, Y.C. Liang, D.T.C Wong, Y. Zeng and R. Zhang, β€œOpportunistic Spectrum Access for Energy-constrained Cognitive Radios”,
IEEE Trans. Wireless Commun., vol. 8, no. 3, pp. 1206-1211, Mar. 2009.
[11] A.Y. Ng, D. Harada and S. Russell, β€œPolicy Invariance Under Reward
Transformations: Theory and Application to Reward Shaping,” in Proc. of
The Sixteenth International Conference on Machine Learning, Jun. 1999.
[12] Y. Chen, Q. Zhao, A. Swami, β€œJoint design and separation principle for
opportunistic spectrum access in the presence of sensing errors,” IEEE
Trans. Inf. Theory, vol. 54, no. 5, pp. 2053-2071, 2008.
[13] S.H. Ahmad, M. Liu, T. Javidi, Q. Zhao and B. Krishnamachari,
β€œOptimality of Myopic Sensing in Multi-Channel Opportunistic Access,”
IEEE Trans. Information Theory, vol. 55, No. 9, pp. 4040-4050, 2009.
[14] Y.J. Choi, Y. Xin and S. Rangarajan, β€œOverhead-Throughput Tradeoff
in Cooperative Cognitive Radio Networks,” in Proc. of IEEE WCNC, pp.
1-6, Apr. 2009.
[15] E.C.Y. Peh, Y.C. Liang, Y.L. Guan and Y. Zeng, β€œOptimization of Cooperative Sensing in Cognitive Radio Networks: A Sensing-Throughput
Tradeoff View,” IEEE Trans. Vehicular Technology, vol. 58, no. 9, pp.
5294-5299, Nov. 2009.
[16] R. Fan and H. Jiang, β€œOptimal Multi-Channel Cooperative Sensing in
Cognitive Radio Networks,” IEEE Trans. Wireless Commun., vol. 9, no.
3, pp. 1128-1138, Mar. 2010.
[17] C. Song and Q. Zhang, β€œCooperative Spectrum Sensing with MultiChannel Coordination in Cognitive Radio Networks,” in Proc. of IEEE
ICC 2010, pp. 1-5, Jul. 2010.
[18] Y. Wu and D.H.K. Tsang, β€œDistributed Power Allocation Algorithm for
Spectrum Sharing Cognitive Radio Networks with QoS Guarantee,” in
Proc. of IEEE INFOCOM 2009, pp. 981-989, Jun. 2009.
[19] T. Zhang and D.H.K. Tsang, β€œOptimal Cooperative Sensing Scheduling
for Energy-Efficient Cognitive Radio Networks”, Technical Report, Aug.
2010, http://eez058.ece.ust.hk/publication/OptScheduling.pdf
2731