OVERHEAD IN COMMUNICATION SYSTEMS
AS THE COST OF CONSTRAINTS
A Dissertation
Submitted to the Graduate School
of the University of Notre Dame
in Partial Fulfillment of the Requirements
for the Degree of
Doctor of Philosophy
by
Brian P. Dunn
J. Nicholas Laneman, Director
Graduate Program in Electrical Engineering
Notre Dame, Indiana
December 2010
© Copyright by
Brian Dunn
2010
All Rights Reserved
OVERHEAD IN COMMUNICATION SYSTEMS
AS THE COST OF CONSTRAINTS
Abstract
by
Brian P. Dunn
This dissertation develops a perspective for studying overhead in communication systems that contrasts the traditional viewpoint that overhead is the “nondata” portion of transmissions. By viewing overhead as the cost of constraints
imposed on a system, information-theoretic techniques can be used to obtain fundamental limits on system performance. In principle, protocol overhead in practical implementations can then be benchmarked against these fundamental limits
in order to identify opportunities for improvement.
We examine three sources of overhead that have been studied in both information theory and networking using different models and metrics. For multi-access
communication systems, we compute constrained capacity regions for two binary
additive channels with feedback and develop inner and outer bounds on the capacity region of the packet collision channel with feedback that appear to be tight
numerically. We develop bounds on the protocol overhead required to meet an
average delay constraint and then use these bounds to characterize rate-delay
tradeoffs for communicating a bursty source over a noisy channel. Finally, we
study information-theoretic security in timing channels and show that non-zero
secrecy rates can be achieved over the wiretap timing channel using a deterministic
encoder.
CONTENTS
FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
CHAPTER 1: INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . .
1
1.1
Overhead in Communication Systems . . . . . . . . . . . . . . . .
1
1.2
Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3
Outline of the Dissertation . . . . . . . . . . . . . . . . . . . . . .
7
CHAPTER 2: BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . .
8
2.1
2.2
2.3
2.4
Protocol Overhead in Current Systems . . . . . . . . . . . . . . .
8
2.1.1
Protocol Overhead at Network and Host Layers . . . . . .
10
2.1.2
Protocol Overhead at PHY and MAC Layers . . . . . . . .
12
2.1.3
Non-Data Viewpoint on Protocol Overhead
. . . . . . . .
17
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.2.1
Contention & Multi-Access . . . . . . . . . . . . . . . . . .
20
2.2.2
Channel Training & Estimation . . . . . . . . . . . . . . .
22
2.2.3
Control Information & Network State . . . . . . . . . . . .
23
Motivation & Topics . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.3.1
An Unconsummated Union . . . . . . . . . . . . . . . . . .
25
2.3.2
Dissertation Topics . . . . . . . . . . . . . . . . . . . . . .
26
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . .
28
CHAPTER 3: A PERSPECTIVE ON COMMUNICATION OVERHEAD
29
3.1
Overhead Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.2
Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.2.1
Channel Coding with Unconstrained Inputs . . . . . . . .
32
3.2.2
Channel Coding with Constrained Inputs . . . . . . . . . .
33
ii
CHAPTER 4: MULTI-ACCESS COMMUNICATION . . . . . . . . . . .
39
4.1
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
4.2
System Model & Definitions . . . . . . . . . . . . . . . . . . . . .
43
4.2.1
General System Model . . . . . . . . . . . . . . . . . . . .
43
4.2.2
Function and Variable Definitions . . . . . . . . . . . . . .
45
Overhead in Multiple Access Channels . . . . . . . . . . . . . . .
47
4.3.1
MAC without Feedback . . . . . . . . . . . . . . . . . . .
49
4.3.2
MAC with Perfect Feedback . . . . . . . . . . . . . . . . .
50
4.3.3
MAC with Generalized Feedback . . . . . . . . . . . . . .
54
Binary Additive Channels with Feedback . . . . . . . . . . . . . .
56
4.4.1
Binary Additive MAC with Feedback . . . . . . . . . . . .
57
4.4.2
Binary Additive Noisy MAC with Feedback . . . . . . . .
61
Packet Collision Channel with Feedback . . . . . . . . . . . . . .
64
4.5.1
Channel Model . . . . . . . . . . . . . . . . . . . . . . . .
65
4.5.2
Capacity Region . . . . . . . . . . . . . . . . . . . . . . .
66
4.5.3
Distribution Constraints . . . . . . . . . . . . . . . . . . .
72
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
CHAPTER 5: DELAY CONSTRAINTS . . . . . . . . . . . . . . . . . . .
78
4.3
4.4
4.5
4.6
5.1
Rate-Distortion Preliminaries . . . . . . . . . . . . . . . . . . . .
79
5.1.1
System Model . . . . . . . . . . . . . . . . . . . . . . . . .
79
Overhead for a Discrete-Time Bursty Source . . . . . . . . . . . .
83
5.2.1
System Model . . . . . . . . . . . . . . . . . . . . . . . . .
84
5.2.2
Protocol Overhead for Slotted Arrivals . . . . . . . . . . .
84
5.2.3
Bernoulli and Poisson Comparisons . . . . . . . . . . . . .
86
Bursty Sources over Noisy Channels . . . . . . . . . . . . . . . . .
87
5.3.1
System Model . . . . . . . . . . . . . . . . . . . . . . . . .
88
5.3.2
Outer Bounds on Rate-Delay Tradeoff . . . . . . . . . . .
93
5.3.3
Inner Bounds on Rate-Delay Tradeoff . . . . . . . . . . . .
96
5.3.4
Stability Region . . . . . . . . . . . . . . . . . . . . . . . .
98
CHAPTER 6: SECRECY IN TIMING CHANNELS . . . . . . . . . . . .
101
5.2
5.3
6.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
101
6.2
System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
103
iii
6.2.1
Channel Model . . . . . . . . . . . . . . . . . . . . . . . .
103
6.2.2
Information Spectrum Methods . . . . . . . . . . . . . . .
106
6.2.3
Codes, Capacity, and Secrecy Capacity . . . . . . . . . . .
107
6.3
Stochastic vs. Deterministic Coding for Secrecy . . . . . . . . . .
109
6.4
Secrecy Rates for Parallel Queues . . . . . . . . . . . . . . . . . .
111
6.4.1
Secrecy with Stochastic Encoding . . . . . . . . . . . . . .
112
6.4.2
Secrecy with Deterministic Encoding . . . . . . . . . . . .
116
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
120
CHAPTER 7: CONTRIBUTIONS AND FUTURE WORK . . . . . . . .
122
6.5
7.1
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
122
7.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
124
7.2.1
Overhead of Multiple Constraints . . . . . . . . . . . . . .
124
7.2.2
Constraints at Multiple Layers . . . . . . . . . . . . . . . .
125
7.2.3
Design of Practical Protocols . . . . . . . . . . . . . . . .
125
APPENDIX A: NOTATION & DEFINITIONS . . . . . . . . . . . . . . .
126
A.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
126
A.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
126
APPENDIX B: PROOFS FOR CHAPTER 4 . . . . . . . . . . . . . . . .
128
B.1 Proof of Theorem 4.3 . . . . . . . . . . . . . . . . . . . . . . . . .
128
B.2 Proof of Theorem 4.4 . . . . . . . . . . . . . . . . . . . . . . . . .
132
B.3 Proof of Theorem 4.5 . . . . . . . . . . . . . . . . . . . . . . . . .
146
APPENDIX C: PROOFS FOR CHAPTER 5 . . . . . . . . . . . . . . . .
152
C.1 Proof of Lemma 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . .
152
C.2 Proof of Theorem 5.1 . . . . . . . . . . . . . . . . . . . . . . . . .
154
C.3 Proof of Theorem 5.2 . . . . . . . . . . . . . . . . . . . . . . . . .
157
APPENDIX D: PROOFS FOR CHAPTER 6 . . . . . . . . . . . . . . . .
161
D.1 Proof of Lemma 6.1 . . . . . . . . . . . . . . . . . . . . . . . . . .
161
D.2 Proof of Proposition 6.2 . . . . . . . . . . . . . . . . . . . . . . .
164
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
166
iv
FIGURES
1.1
Layered protocol architecture. . . . . . . . . . . . . . . . . . . . .
4
1.2
An arbitrary network of mobile users. . . . . . . . . . . . . . . . .
5
2.1
The seven layer OSI model for network protocol design. . . . . . .
9
2.2
Data encapsulation and the structure of packet headers in TCP/IP. 10
2.3
Probability distribution of IP packets. . . . . . . . . . . . . . . . .
11
2.4
Structure of an IEEE 802.3 Ethernet packet. . . . . . . . . . . . .
13
2.5
Structure of an IEEE 802.11b Wi-Fi packet. . . . . . . . . . . . .
15
2.6
The hidden node problem in wireless networks. . . . . . . . . . . .
16
2.7
Transmission of a Wi-Fi acknowledgment (ACK) packet. . . . . .
17
2.8
Comparison of 802.11b and 802.11g protocol overhead duration. .
18
3.1
The binary symmetric channel (BSC). . . . . . . . . . . . . . . .
35
3.2
Repetition coding for the BSC. . . . . . . . . . . . . . . . . . . .
36
3.3
Overhead cost of repetition coding for the BSC. . . . . . . . . . .
37
4.1
Multiple access channel with feedback. . . . . . . . . . . . . . . .
44
4.2
Capacity region of the BA-MAC-FB
. . . . . . . . . . . . . . . .
60
4.3
Dependence balance bounds for the BAN-MAC-FB. . . . . . . . .
62
4.4
Bounds on the capacity region of the BAN-MAC-FB . . . . . . .
64
4.5
Rate regimes for the packet collision channel with feedback . . . .
68
4.6
Capacity region of the packet collision channel with feedback. . .
74
4.7
Symmetric capacity of the packet collision channel with feedback.
76
5.1
Bounds on protocol overhead for discrete arrivals . . . . . . . . .
86
5.2
Bounds on protocol overhead for discrete and continuous arrivals .
88
5.3
Bursty source over a noisy channel. . . . . . . . . . . . . . . . . .
89
5.4
Bounds on rate-delay tradeoff. . . . . . . . . . . . . . . . . . . . .
96
5.5
Fixed-to-variable length source coding. . . . . . . . . . . . . . . .
97
6.1
Block diagram of the wiretap timing channel . . . . . . . . . . . .
104
6.2
Achievable secrecy rates for parallel queues. . . . . . . . . . . . .
113
v
ACKNOWLEDGMENTS
First and foremost I would like to thank my advisor, Dr. J. Nicholas Laneman, for his support and guidance throughout my graduate career. It has been a
pleasure to work with him for the past seven years and I look forward to future
collaborations after my time at Notre Dame has passed.
I would like to thank my defense committee members, Dr. Matthieu Bloch,
Dr. Tom Fuja, and Dr. Martin Haenggi, both for laying the technical foundation
several years ago for me to pursue this dissertation and for taking the time to
serve on my committee. In particular, I would like to thank Dr. Bloch for the
many discussions we had over long lunches and many trips to Starbuck’s.
A special thanks is due to multiple generations of the Laneman research group
who I have enjoyed working and developing friendships with during my time
at Notre Dame: Wenyi Zhang, Deqiang Chen, Shiva Kotagiri, Michael Dickens,
Glenn Bradford, Utsaw, Ebrahim, Zhanwei, and the next generation of students,
along with two of our visitors, Harold Sneessens and Ioannis Krikidis. I would
also like to thank Neil Dodson, who has taught me more about engineering than
most of the classes I took.
I would like to thank all of my friends and fellow graduate students throughout
my time here. In particular, I would like to thank, in order of completion, Dr. Jeff
Bean, Dr. Dane Wheeler, Dr. Tomas Estrada, and Dr. Aaron Prager. Finally, I
would like to thank my parents, Pat and Carol, and my siblings and their spouses,
Chris, Andrea, Jenny, and Jeremiah (Drs. Dunn, Dunn, Dunn, and White), for
all of their support and encouragement.
vi
CHAPTER 1
INTRODUCTION
Wireless communication has become an integral part of our society. To a
large extent, the performance of many point-to-point communication systems is
approaching fundamental limits on transmission capacity. In order to meet the
needs of more demanding users and applications, we must therefore turn to other
directions for improving system performance.
1.1 Overhead in Communication Systems
Fundamental limits on overhead in communication systems could provide clear
insight on the performance of protocols in practical systems and could help to identify opportunities for improvements in their design. In this dissertation we consider
overhead to be the cost of meeting particular constraints imposed on system design, which is in contrast to the traditional notion that overhead is the “non-data”
portion of a protocol. Through this perspective, we develop information-theoretic
bounds on overhead in communication systems.
First we consider multi-access systems and show how standard capacity results for multiple access channels fit within our framework. We then compute the
constrained capacity regions for two binary additive multiple access channels with
feedback under the constraint that both user’s codebooks are generated according to a common distribution. Finally, we introduce the K-bit packet collision
channel with feedback and completely characterize its capacity region. For the
1
packet collision channel we show that the common distribution constraint reduces
throughput by a factor of two.
We investigate the overhead required to transmit a bursty source over a noisy
channel. We develop new bounds on the information required to encode start
and stop times for discrete time sources under an average delay constraint. We
also obtain bounds on rate-delay tradeoffs for communicating bursty sources over
noisy channels with feedback and point to areas for improvement.
Finally, we study information-theoretic security in timing channels. Using
results on the capacity of timing channels, we consider a model in which data
security is provided at the Network and Data Link Layers by exploiting the timing
of packets that enter parallel queuing systems. We obtain achievable secrecy
rates for stochastic encoding and determine the secrecy capacity exactly for a
deterministic encoder.
This work is carried out with the goal of developing fundamental limits on overhead from the abstracted viewpoint that overhead is the cost of system constraints.
In the next section we motivate the need for this perspective by considering the
complexities involved with a complete analysis of overhead in communication systems.
1.2 Motivating Example
By examining the overhead in Ethernet, Wi-Fi, and TCP/IP protocols in
Chapter 2, we show that a significant portion of overall Internet traffic is due to
protocol overhead. A natural question is can protocols can be designed to offer
better performance? Could the same system support twice as much information
using different protocols? Surprisingly, the framework needed to answer this ques-
2
tion does not exist, and fundamental limits on overhead are largely unknown.
Even without fundamental limits with which to compare the performance of
current protocols, it is easy to see that the efficiencies of current protocols can be
improved. Historically, the notion of layering has been used to simplify the design
of communication systems, and within this context we identify several sources of
inefficiency. Figure 1.1 illustrates a prevalent layering abstraction used to represent the relationship between different protocol functions in modern wireless
networks [1]. Lower level layers, such as the Physical (PHY), Link / Medium
Access Control (MAC), and Network Layers, manage interactions beginning with
the physical medium and perform functions such as error control coding, physical
addressing, access control, and routing, respectively. Higher level layers, such as
the Transport and Application Layer, deal more with application data and manage the presentation of this information to lower level layers in a generic format
that is independent of any specific application.
A primary source of overhead arises from the generic design of each layer to
meet the performance requirements of various applications or physical media. Additional overhead required for one system may be unnecessary in another system
with less stringent performance requirements. Also, related functions are often
implemented at different layers, resulting in duplication of overhead, such as Data
Link Layer retransmissions of complete packets or frames when only a small portion of bits are lost during transmission over the wireless medium. Finally, because
a primary purpose of layering is to encapsulate information from one layer to the
next, each layer can add additional overhead on top of overhead from other layers.
The associated nesting of protocol overhead can ultimately consume a significant
portion of network resources.
3
User 1
User 2
Application
Compression
Application
Transport
Flow Control
Transport
Network
Routing
Network
Link/MAC
Reliability
Link/MAC
Physical
Channel Coding
Physical
Wireless Medium
Figure 1.1: Layered protocol architecture.
Modern protocols do not strictly adhere to the functional layering depicted
in Figure 1.1 and instead take advantage of many cross-layer interactions for improved performance. However, the inefficiencies illustrated within this framework
carry over to the protocols used in practical systems today. The concept of layering is also useful for understanding why fundamental limits on overhead do not
currently exist.
The first challenge is that information is not quantified using the same metrics
at all layers. Bits of data at higher layers may not be directly comparable to bits
of information at the Physical Layer. Performance is also quantified differently
depending upon the layer; data throughput at the Network Layer is not the same
as the information-theoretic notion of capacity used at the Physical Layer. In
terms of information, these differences are well understood and metrics clearly
defined. But what is considered overhead at one layer is often not considered
4
Figure 1.2: An arbitrary network of mobile users, where the arrows
depict an active communication link.
overhead at the next layer, and thus it becomes difficult to define overhead in
an operationally significant way. Even if overhead at different layers is quantified
using a comparable metric, it is not clear that less overhead actually improves the
overall efficiency of the communication system. In particular, reducing overhead
at one layer can increase overhead at another.
These issues are illustrated clearly by the following specific example in which
we consider a network of mobile devices communicating with one another, such
as that shown in Figure 1.2. If users frequently join and leave the network, or
terminals are only active for short bursts of time, a particular routing protocol may
require regular updates about the current state of the network, such as the physical
network topology, the set of active users, or the average traffic of individual users.
A more efficient routing protocol may have a lower overhead cost by reducing
the number of control packets transmitted, and in turn using less detailed information about the network state to perform routing. However, when new users
join the network or current users become active, the savings in overhead offered
5
by the improved routing protocol may be negated by an increase in the number
of transmission attempts required to successfully send a packet due to increased
collisions at the Physical Layer.
Consider an alternative situation in which a new user transmits infrequent
bursts of long duration. A MAC Layer that employs carrier sensing will wait for
the wireless medium to become available before beginning transmission. Packets
from the waiting user may be delivered successfully soon after the channel becomes
available, but the average time taken to resolve any collisions may be longer than
if another layer provides additional coordination. For example, a more advanced
routing algorithm that uses additional protocol overhead could gather detailed
information about the network before making routing decisions in order to reduce
MAC Layer collisions and retransmissions.
The main point of this example is to illustrate the fact that it can be difficult
or inappropriate to compare the overhead costs of different protocols. Doing so
directly may not account for the associated benefits experienced by other layers in
the network. Thus, even if it is possible to reduce the overhead of a specific protocol, minimizing overhead at one layer may not result in improved performance
overall.
All of the issues discussed in this section demonstrate the need for a better
understanding of overhead in communication systems and serve as the motivation
for this dissertation. A key insight we build from is that a more abstract definition of overhead is required to facilitate meaningful analysis. The perspective we
develop in this dissertation is that overhead can be viewed as costs of constraints
imposed on a system. This viewpoint allows us to use information-theoretic tools
to precisely quantify fundamental limits on overhead in communication systems.
6
Our study of overhead for different types of constraints is then carried out from
within this perspective, but we recognize that a comprehensive study of overhead
in communication systems goes well beyond what is considered in this dissertation.
In particular, we do not consider a large body of work from the data networking
community that evaluates the overhead of various specific protocols, extending
most recently to sensor and ad hoc networks.
1.3 Outline of the Dissertation
This dissertation continues as follows. Chapter 2 motivates the need for a
better understanding of overhead, summarizes and reviews related work, and discusses the motivation behind the dissertation. Chapter 3 presents the perspective
on overhead that forms the basis of this dissertation — that overhead can be
viewed as the cost of constraints imposed on a system. The main results of the
dissertation are then developed in Chapters 4, 5, and 6. Chapter 4 considers
constraints in multi-access communication systems and shows how standard capacity results for multiple access channels fit within the framework discussed in
Chapter 3. Chapter 5 considers delay constraints and develops bounds on overhead and the associated rate-delay tradeoffs for communicating a bursty source
over a noisy channel. Chapter 6 considers security constraints through a novel
application of timing channels in which messages are securely encoded on the
interarrival times of packets that enter parallel queues. Finally, Chapter 7 summarizes our contributions and gives directions for future research.
7
CHAPTER 2
BACKGROUND
The goal of this chapter is to adequately motivate the study of protocol overhead in communication systems that is pursued in this dissertation, summarize
and review related work, and discuss the motivation behind the dissertation.
Section 2.1 discusses protocol overhead in Internet traffic by considering protocol overhead in TCP/IP, Ethernet, and Wi-Fi protocols. Section 2.2 reviews how
information theory has been used to study some of the forms of protocol overhead that are identified. Finally, Section 2.3 discusses some of the higher level
motivation for this dissertation.
2.1 Protocol Overhead in Current Systems
There are a variety of sources of protocol overhead in communication systems.
A significant source of protocol overhead is information used by the protocol itself.
Protocol information may be used to convey information about the source and
destination of a message or may contain error control bits used to determine if
errors have occurred during transmission, request retransmission when errors have
occurred, or simply to delineate the beginning and end of a message. Other sources
of protocol overhead include physcial layer preambles, retransmissions due to failed
communication attempts, and time spent arbitrating for the physical medium in
a multi-access system. In this section we discuss the different sources of protocol
overhead in a communication system by examining the sequence of protocols that
8
7
Application
6
Presentation
5
Session
4
Transport
3
Network
2
Data Link / MAC
1
Physical
Host Layers
Media Layers
Figure 2.1: The seven layer Open Systems Interconnection model
(OSI model) provides a useful framework for network protocol design.
are used to translate a computer user’s data into the signal that is received by
another terminal in the network.
The concept of layering, or modular design, is central to all modern communication networks. By partitioning the design of a communication system into a
sequence of hierarchal black-boxes, each layer can be designed independently of
one another. The seven layer Open Systems Interconnection model (OSI model)
for network architecture shown in Figure 2.1 provides a useful abstraction for network protocol design, although the design of many communication systems does
not strictly adhere to the OSI model [2, 3].
As a user’s data is passed down the communication protocol stack, each layer
adds additional control and identification information to the user’s data. In order
to understand how each layer contributes to the overall protocol overhead we first
consider the protocol information in TCP/IP protocols, which correspond to the
Host and Network Layers in the OSI model, and then examine protocol overhead
at the MAC and PHY Layers by considering Ethernet and Wi-Fi protocols.
9
0
0
1
Source Port
2
3
Destination Port
4
Sequence Number
8
Acknowledgement Number
12 Offset 16
0
Control
Checksum
0
1
Ver/IHL
Serv Type
Window Size
Urgent Pointer
2
TCP Header
3
Total Length
4
Identification
Flags
8
TTL
Header Checksum
Protocol
User Data
Frag. Offset
12
Source IP Address
16
Destination IP Address
IP Header
CRC
Figure 2.2: Data encapsulation and the structure of packet headers
in TCP/IP.
2.1.1 Protocol Overhead at Network and Host Layers
The TCP/IP Internet Protocol Suite originally defined in [4, 5] specifies a
collection of networking protocols that enable computers to communicate over
a network. TCP/IP roughly corresponds to the functionality provided by the
Host and Network Layers in the seven layer OSI model. TCP/IP provides endto-end connectivity for application data (or data from a higher layer protocol)
by specifying how source data should be packetized, addressed, routed, and finally reassembled at the destination. TCP/IP is designed to be agnostic of the
underlying physical medium that is used for communication.
As discussed in Section 1.2, a primary purpose of layering is to encapsulate
data as it moves from one layer to the next. Protocol information and user data
are encapsulated and treated identically by each subsequent layer in the protocol
stack. The encapsulation process for transmitting data using TCP/IP is shown in
Figure 2.2, along with the associated protocol overhead. The combined protocol
10
64 Bytes
24%
2%
32%
65-127 Bytes
128-255 Bytes
256-511 Bytes
6%
14%
512-1023 Bytes
22%
1024-1518 Bytes
Figure 2.3: Probability distribution for packets encoded using the
Internet Protocol (IP).
information for a data packet encapsulated with TCP and IPv4 can be classified
as:
• 12 bytes of addressing information,
• 24 bytes of control information about the protocol itself, such as the version
number, mode of operation, packet length, or quality of service requested,
and
• 8 bytes of error control checksums.
Therefore, an IP packet that is passed on to Layer 2 contains 44 bytes of protocol
information.
Because the protocol information for an IP packet is a fixed number of bytes,
the proportion of protocol information to user information can be computed from
the length of the data packet. The probably distribution for IP packet sizes
observed in one IEEE study is shown in Figure 2.3 [6]. The average packet length
computed using this distribution is about 412 bytes, 44 of which are protocol
information. Therefore, by the time a user’s data reaches the Data Link / MAC
11
Layer, over 10 % of the packet is actually protocol information.
2.1.2 Protocol Overhead at PHY and MAC Layers
Protocol overhead in wired and wireless communication networks often serves
different purposes. The additional complexities of the wireless medium, such as
its broadcast nature and varying channel characteristics, require additional protocol overhead for coordinating transmissions and resolving conflicts between users.
Applications that require wireless communications are often more dynamic, with
users frequently joining and leaving networks. On the other hand, virtually all
commercially deployed wireless networks rely upon some underlying wired network, such as for communication among distributed base stations or radio towers.
It is therefore important to understand the role of protocol overhead in both wired
and wireless systems.
Because higher level layers interact with each other through the exchange of
packetized data, the only source of protocol overhead from these layers is the additional protocol information included in packet headers and trailers. In addition to
protocol information, at the PHY and MAC Layer we must consider other sources
of protocol overhead related to how signals are sent over the underlying physical
medium and the way in which multiple users share that medium.
Two example PHY/MAC protocols in use today are Ethernet over twisted
pair copper wire and IEEE 802.11 Wi-Fi. The Ethernet protocol is used for local
area networks (LANs) that connect devices within local geographic areas, such as
within a home, office, building, or campus. Wi-Fi refers to the complementary
wireless LAN (WLAN) protocols used for similar purposes to connect devices
wirelessly.
12
6
6
2
46-1500
4
Destination
Source
Type
Payload
FCS
Figure 2.4: Structure of an IEEE 802.3 Ethernet packet with the
length of each component shown in bytes.
Wired Networks
The Ethernet protocol standardized in IEEE 802.3 [7] specifies the signaling
format for the Physical and MAC Layers in the OSI networking model. A typical
802.3 Ethernet packet is shown in Figure 2.4, where each packet consists of:
• 12 bytes for addressing,
• 2 bytes for the packet type,
• 46 to 1500 bytes for the payload, and
• 4 bytes for the cyclic redundancy check (CRC).
The smallest packet is 64 bytes, consisting of a 46 byte payload and 18 bytes
of protocol information. The largest packet has the same 18 bytes of protocol
information, but a 1500 byte payload. Thus, in terms of the non-data portion
of an Ethernet packet, protocol information in Ethernet packets can range from
just over 1 % of the overall packet to about 28 % of the overall packet, depending
upon the size of the payload.
In order to determine the impact of this protocol information on the overall
network throughput, the distribution for IP packet size in Figure 2.3 can be used
to compute that the average amount of protocol information in Ethernet packets
is 12.3 %. However, there are additional costs beyond protocol information that
are associated with using the Ethernet protocol relative to a protocol designed for
13
use over a dedicated point-to-point link.
In addition to 18 bytes of protocol information, the Ethernet specification also
requires an 8 byte synchronization preamble and a 12 byte inter-frame gap to
provide an opportunity for network hubs to inject packets from other sources.
Thus, even for a fully utilized Ethernet link with no collisions, there is an additional 20 bytes of protocol overhead that is not considered to be part of the
packet, but nonetheless is required for the transmission of a packet. Taking this
additional protocol overhead into account increases the average per packet protocol overhead to approximately 22 % for a 412 byte IP packet before considering
the protocol overhead associated with packet collisions and retransmissions. The
Ethernet MAC Layer uses carrier sense multiple access with collision detection
(CSMA/CD), which further contributes to the overall protocol overhead.
Wireless Networks
The Wi-Fi protocol standardized in IEEE 802.11 [8] specifies the signaling
format for the MAC and PHY Layers in the OSI networking model. The additional
challenges associated with communicating over a wireless channel significantly
impacts protocol design at the MAC and PHY Layers. In this section we first
discuss some of the additional functionality provided by the IEEE 802.11 MAC
protocol relative to Ethernet, and then compute the protocol overhead for IEEE
802.11b Wi-Fi.
The structure of an IEEE 802.11b Wi-Fi packet is shown in Figure 2.5. Because
the Physical Layer preamble and header are transmitted at a data rate that is
independent of the data rate used to transmit the MAC packet data unit, the
protocol overhead is specified in terms of the signaling duration in microseconds
rather than in terms of bytes. Although newer Wi-Fi protocols such as 802.11g
14
72 !s
24 !s
PHY Preamble & Header
28 bytes
46-1500 bytes
4 bytes
MAC Header
IP Packet
FCS
Figure 2.5: Structure of an IEEE 802.11b Wi-Fi packet.
reduce the total signaling duration of the PHY preamble and header to 20 µs from
96 µs, in general, if an 802.11b-only device is present on an 802.11g network, the
longer signaling format will be used.
Based upon the maximum PHY data rate of 11 Mbps for an 802.11b transmission, it takes approximately 420 µs to transmit a 412 byte IP packet. Transmission of the IP packet data takes 300 µs, whereas 120 µs is used for transmission of
MAC and PHY protocol overhead. Therefore, on average approximately 71 % of
an 802.11b packet transmission is used for communication of the end user’s data.
The IEEE 802.11 medium access control (MAC) Layer uses carrier sense multiple access with collision avoidance (CSMA/CA) for channel access, compared
to carrier sense multiple access with collision detection (CSMA/CD) mentioned
earlier for Ethernet. There are two main reasons for this difference. First, in comparison to wireline drivers in wired networks, it is not currently feasible to design
low-cost wireless transceivers that can detect a transmission from a third party
while simultaneously transmitting. Second, as illustrated by Figure 2.6, even if
no collision is detected at the transmitting node, a collision can still occur at the
receiving node, a situation referred to as the hidden node problem [9]. This is due
to the fact that the received signal strength depends upon the receiver’s distance
from the transmitter, in addition to many other factors in the surrounding environment, making it difficult for the transmitting node to accurately determine the
15
NODE 1
NODE 2
Wireless
Access
Point
Figure 2.6: The hidden node problem in wireless networks occurs
when two users can communicate with a central access point, but are
unable to receive each other’s transmissions.
received signal without explicit information fed back from the destination.
To partially overcome these challenges, CSMA/CA is used to sense whether or
not the medium is busy before beginning a transmission and thus avoid collisions
when possible [10]. To account for the inevitable collisions that will still occur,
a positive acknowledgement by the receiver is required for all packets that are
received successfully. These acknowledgements can increase the overall bandwidth
that is required to support a given data rate [11].
In order to account for the additional protocol overhead associated with CSMA/CA,
consider the best case scenario of a single terminal communicating with an access point in an interference-free environment such that collisions are entirely
avoided. As shown in Figure 2.7, the distributed coordination function (DCF) of
the 802.11b MAC requires a minimum backoff time between consecutive transmissions of 50 µs to allow for transmission and reception of an acknowledgment
packet from the previous data frame. Therefore, a 420 µs PHY transmission can
be sent at most every 470 µs, reducing the theoretical maximum efficiency of the
16
≥ 50 µs in 802.11b
≥ 28 µs in 802.11g
DATA
ACK
DATA
Figure 2.7: Transmission of a Wi-Fi acknowledgment packet during
the mandatory backoff period between consecutive data transmissions.
802.11b MAC and PHY protocols to just under 64 % for a 412 byte IP packet.
Combining the analysis of protocol overhead in TCP, IP, and Wi-Fi, Figure 2.8
shows the protocol overhead for TCP/IP packet transmissions using 802.11b and
802.11g at Physical Layer data rates of 11 Mbps and 54 Mbps, respectively. Because of significantly higher Physical Layer data rates and less time spent on the
inter-frame backoff and Physical Layer preamble transmission, 802.11g greatly
outperforms 802.11b on a throughput basis. However, during the time spent on
transmission of protocol overhead and the backoff period, 802.11g could send an
additional 400 bytes of user information at 54 Mbps, compared to an additional
278 bytes of information for 802.11b at 11 Mbps.
2.1.3 Non-Data Viewpoint on Protocol Overhead
The set of TCP/IP networking protocols, together with the IEEE 802.3 Ethernet or IEEE 802.11 Wi-Fi protocols previously discussed, comprise an entire
communication protocol stack. In order to compute the efficiency of these protocols, it was convenient to express protocol information as the number of data
bytes that could be sent in the same amount of time over an error-free bit pipe
that operates at the maximum Physical Layer data rate. This approach was a
natural extension to the way that the inter-frame gap in Ethernet was defined in
17
TCP/IP + 802.11b
278 Bytes at 11 Mbps
Backoff
PHY Preamble & Header
MAC
TCP/IP
400 Bytes at 54 Mbps
TCP/IP + 802.11g
Figure 2.8: Comparison of signaling duration for TCP/IP packets
sent using IEEE 802.11b and 802.11g with Physical Layer data rates
of 11 Mbps and 54 Mbps, respectively; illustrated to scale.
terms of bytes, rather than in terms of time. However, it is somewhat awkward to
quantify protocol overhead for the Physical Layer preamble and backoff window
in 802.11b and 802.11g using this “non-data” viewpoint because in order to quantify the protocol overhead in bytes (as if it were protocol information) requires
specifying the associated data rate.
There are additional drawbacks to quantifying overhead using the non-data
viewpoint. In addition to the challenges associated with quantifying Physical
Layer transmissions that occur at a data rate other than the data rate used for
payload transmissions, the non-data viewpoint on overhead does not account for
the fact that even an optimal protocol may still require some amount of protocol
information to function properly. For example, some form of addressing scheme
is required in virtually all data networks, and so address information cannot be
entirely eliminated. However, the non-data viewpoint does not allow us to distinguish between the number of bytes a protocol uses for addressing information and
18
the number of bytes actually required for the most efficient representation of that
information.
Furthermore, because data encoded using a non-systematic channel code cannot be separated into a “data” portion and an “overhead” portion [12], the nondata viewpoint on protocol overhead fails to capture the inefficiency associated
with sub-optimal channel coding. Even for a systematic channel code, it seems
inappropriate to consider error-control bits that are required for reliable decoding
to be protocol overhead. Yet as we saw when comparing 802.11b and 802.11g,
the most significant gains in 802.11g are due to substantial improvements in the
maximum Physical Layer data rate. Quantifying this inefficiency falls within the
domain of information theory, and so in the next section we discuss how various
theoretical work has looked at some of the forms of protocol overhead that were
identified in this section.
2.2 Related Work
In Section 2.1 we looked at several protocols that are currently in widespread
use and discussed how protocol overhead in these protocols can be quantified using
the non-data viewpoint. In this section we review a variety of work related to
sources of protocol overhead in TCP/IP and Wi-Fi protocols that were identified
in the previous section. We focus on protocol control information contained in
packet headers (network state information), the Physical Layer preamble (channel
estimation), and the backoff time between Physical Layer transmissions required
by CSMA/CD and CSMA/CA in Ethernet and Wi-Fi (multi-access).
19
2.2.1 Contention & Multi-Access
A scenario in which multiple users communicate to a central receiver over a
shared AWGN channel is examined in [13]. The authors use the intuition that
the per-user capacity of the AWGN multiple access channel with single user decoding is based on the number of active users to draw parallels with classical
processor-sharing systems [14]. These insights lead to more rigorous analysis in
which tradeoffs are derived between system loading and the per-message-length
error exponent. The authors also show that the limiting form of the service demand for each user and the service rate offered by the receiver correspond to the
average entropy of each message and the mutual information between the input
and output of the channel, respectively. As mentioned in [15], there are also similarities between this work and the IS-99 [16] link-level data protocol of the IS-95
cellular standard, which was published around the same time.
A similar model is considered in [17], in which a random subset of many bursty
users share a common channel for communication to a central receiver. By using
information-theoretic capacity limits, bounds on the tradeoffs between average
service rate, available bandwidth, and transmitter power are obtained. [13] goes
on to explicitly characterize the relationship between achievable error probability
and message coding block length through error exponent analysis, but in doing so
does not retain the additional dimension of dynamic power allocation.
The collision resolution approach to multi-access communications is based
upon a collision model in which overlapping, or colliding, packets cannot be completely recovered by the receiver. This idea originally came about through the
development of ALOHA [18], for which the analysis centered around computing
the maximum achievable throughput assuming all users have an unlimited reser-
20
voir of messages to send and instantaneous noiseless feedback is available to inform
transmitters if packets are successfully received. These algorithms have also been
studied in the context of bursty message sources, in which case the metric of interest is the stability region, i.e., the set of arrival rates for which each user’s queue
remains finite and the system is stable [19–22].
It is interesting to note that under one model in which colliding packets are
completely lost, feedback is not necessary to ensure error free communications.
This fact was demonstrated in [23] through examination of the collision channel
without feedback, along with a similar formulation given in [24], in which each
user’s transmission times correspond to a unique, deterministic, protocol sequence.
This fact contrasts the way in which the protocols discussed in Section 2.1.2 are
designed, suggesting that the protocol overhead in Wi-Fi due to the transmission
of acknowledgement packets may not be explicitly required.
One critical difference between the channel access method used in Wi-Fi and
the models considered in [23] and [24] is that transmission times for Wi-Fi users
form a stochastic process that is based on the sequence of collisions and transmission times for other users. However, [25] explores the idea of error-free communications over a collision channel without feedback under the assumption of
equal user rates, with the fundamental difference that users do not have unique
predetermined protocol sequences and instead transmissions occur according to a
stochastic process.
In the limit of a large number of users, the asymptotic throughput for symmetric users of equal rate with deterministic protocol sequences is found to be 1/e
in [23]. The maximum throughput of slotted ALOHA is also 1/e, but ALOHA has
the additional requirement that feedback is available to transmitting users [26, 27].
21
Stochastic protocol sequences only achieve a throughput of 1/2e, which is the
same as the maximum throughput of unslotted ALOHA [18]. A more thorough
discussion on the various models used to study multi-access systems can be found
in [15, 28], and the different ways that multi-access system performance can be
quantified is discussed in [29].
2.2.2 Channel Training & Estimation
In Section 2.1.2 we saw that the training sequence preamble sent for channel
estimation before data transmission can represent over 35 % of the protocol overhead in a Wi-Fi packet transmission. For many practical systems channel training
and estimation is done by using pilot tones or by sending predetermined training
symbols prior to the transmission of a packet. Because the receiver knows the
pilot tone or training sequence and the transmit power, it is able to estimate the
gain of the channel for the duration of a coherence time. However, the data sequence itself could also be used for channel training and there is no fundamental
necessity for separating the operations of channel training and data transmission.
Other work characterizes the capacity of systems with no state information at
either the transmitter or receiver [30, 31], demonstrating that channel estimation
is not always necessary. This leads to the question of when it is appropriate to
devote resources to channel estimation as opposed to communicating without accurate channel state information. Researchers have begun to address this question
by investigating tradeoffs between channel training and data transmission. Building on a general framework for state-dependent networks with side information
and partial state recovery from [32], the authors of [33, 34] develop a model for
analyzing the tradeoff between achievable information rates and the fidelity with
22
which a noisy channel can be estimated.
In [34], a constrained channel coding approach is used to develop the capacitydistortion function that characterizes fundamental limits on joint communication
and channel estimation. Among many possible multi-terminal extensions of this
framework, a two-hop version is explored in [35], where the authors obtain upper
and lower bounds on the end-to-end capacity subject to a distortion constraint on
estimating the channel state. The framework in [34] is a useful first step. However,
correlated fading processes [36] may need to be incorporated into the model in
order to capture the tradeoff between channel estimation and data transmission
as the channel gain varies slowly over time relative to the duration of a codeword
transmission.
2.2.3 Control Information & Network State
In Section 2.1.2 we saw that the majority of protocol information in TCP/IP
is related to estimation and communication of the current state of the source,
channel, or network. This state information can consist of knowledge about the
data rate of messages generated by a given user (source state), status of fading
between source and destination (channel state), or the set of currently active users,
their locations, and associations with infrastructure nodes (network state). The
term side information is used to refer to the availability of such information at
the encoder, decoder, or both [37]. State or side information can be modeled
by specification of a complete probability distribution, moments of a probability
distribution, or realizations of random variables.
For a given network configuration and routing protocol the number of bits required to address a packet and represent the network topology can be determined.
23
If the network is static, then once every node becomes fully aware of the network
configuration this portion of the routing protocol overhead becomes negligible. If
the network topology is dynamically changing, each node must be continually updated with information about these changes. A variety of work seeks to quantify
the protocol overhead of specific routing protocols, including [38–41], but without
a more abstract model the intuition that can be gained is limited.
[42] considers the entropy of the optimal routing path as a function of a random adjacency matrix describing the network topology and the authors provide
bounds on the length of codewords needed to convey information about optimal
routes from an omniscient node to other nodes. [43] studies the amount of information required to keep track of set membership in a two level hierarchal network
along with cluster-head addresses. This work is expanded to general hierarchal
networks with proactive routing in [44], and [45] incorporates the addressing protocol overhead of reactive routing protocols as a function of traffic patterns (node
mobility). [46] develops a more general treatment of routing protocol overhead by
formulating the problem using a rate-distortion approach. However, the analysis
is limited to very specific assumptions about node mobility, and only geographic
routing is considered.
In the next section we discuss some of the challenges associated with information theory and communication networks that served as the initial motivation for
this dissertation.
24
2.3 Motivation & Topics
2.3.1 An Unconsummated Union
Information theory and communication networks have each played a unique
role in the development of current communication systems. Information theory
provides a mathematical framework within which the fundamental limits of communication systems can be examined [47–49]. Information theory establishes and
justifies several metrics that are fundamental to communications, such as how
to characterize information rates and channel capacity in meaningful and useful
ways. Perhaps most importantly, information theory goes on to precisely quantify
these metrics for a variety of models, creating a well-defined target for designers
to work towards, and also motivates efficient architectures for implementation.
Where information theory falls short of providing insights for the design of
practical systems capable of connecting many distributed users, networking steps
in to fill this gap. The field of communication networks provides the engineering
tools required to address a host of more practical issues that do not fit neatly
into a block diagram, or for which quantification of the associated performance
limits are intractable [1]. Changes in network topology, establishing connections
between users, and deciding how best to route blocks of data through the network
are all dealt with in networking.
It is clear that communication technology would not be where it is today
without significant contributions from both the information theory and networking communities. Why then, in the words of Ephremides and Hajek [15], has
“information theory not yet made a comparable mark in the field of communication networks, the sister field and natural extension of communication theory,
that is today, and is likely to remain for many years, the center of activity and
25
attention in most information technology areas?” They continue:
The principal reason for this failure is twofold. First, by focusing
on the classical point-to-point, source–channel–destination model of
communication, information theory has ignored the bursty nature of
real sources. Early on there seemed to be no point in considering
the idle periods of source silence or inactivity. However, in networks,
source burstiness is the central phenomenon that underlies the process
of resource sharing for communication. Secondly, by focusing on the
asymptotic limits of the tradeoff between accuracy and rate of communication, information theory ignored the role of delay as a parameter
that may affect this tradeoff. In networking, delay is a fundamental
quantity, not only as a performance measure, but also as a parameter
that may control and affect the fundamental limits of the rate-accuracy
tradeoff.
And so it seems that the two disciplines are at odds: information theory focuses on
characterizing asymptotic limits of the rate-accuracy tradeoff by explicitly ignoring
delay, whereas networking attempts to explicitly characterize the role of burstiness
and the delay as a fundamental parameter that affects the rate-accuracy tradeoff.
Despite these differences, many important insights in networking have in fact
come from information theorists. However, several areas that directly relate to networking and come from information-theoretic origins have not adequately treated
the role of delay and burstiness, and have had equally limited impact on networking. Ephremides and Hajek [15] refer to multiuser information theory [50] and
multiuser detection [51] in this context.
2.3.2 Dissertation Topics
The most interesting and challenging problems in communication systems today are at the intersection of information theory and networking. Observations
from the protocols we studied throughout this chapter suggest a stronger relationship between protocol overhead and many of the challenges discussed in [15] that
26
have become the primary motivation for this dissertation. The viewpoint on protocol information in [52] originally spurred our thinking about protocol overhead
as the cost of constraints, and the issues discussed in [15] strongly influence the
direction we take with this idea in the dissertation.
Timing Information
Timing information plays a role in all three of the main chapters of the dissertation. The problems considered in Chapter 5 and Chapter 6 are closely connected
to source and channel coding duals for timing information [53]. In Chapter 5
source burstiness generates additional timing information that we characterize
as a lossy source coding problem, whereas in Chapter 6 timing information in a
queueing system forms a related noisy channel coding problem.
Feedback and ARQ
In Chapter 2 we saw that ARQ packets, and more generally feedback information, comprise a significant portion of protocol overhead observed in Wi-Fi
protocols, and in [15] it was said that the role of feedback and ARQ in communication systems requires more attention. Feedback is central to Chapter 4 where
we consider packet collision channels with acknowledgement and collision feedback
from an information-theoretic perspective. In Chapter 5 ARQ is used to develop
bounds for the source coding problem with timing information mentioned above.
Encryption and Security
Covert channels and timing channels were discussed in [15], and security was
also mentioned. The model we consider in Chapter 6 incorporates elements from
27
both covert and timing channels. Although this chapter did not consider Application Layer sources of protocol overhead, encryption and security protocol overhead
also consume additional resources in communication systems.
2.4 Chapter Summary
In this chapter we characterized the protocol overhead in TCP/IP, Ethernet,
and Wi-Fi protocols. The non-data viewpoint on protocol overhead was useful for quantifying certain types of protocol overhead, such as the protocol control information contained in packet headers, but it falls short of identifying the
most significant opportunities for improvement. In the next chapter we show how
information-theoretic tools can be used to refine the non-data viewpoint on protocol overhead and present the perspective on protocol overhead in communication
systems that is used throughout the remainder of this dissertation.
28
CHAPTER 3
A PERSPECTIVE ON COMMUNICATION OVERHEAD
This chapter develops a perspective on communication overhead that forms
the basis of the dissertation. Our objective is to demonstrate that by viewing
overhead as the cost of constraints imposed on communication systems, existing
tools can be used to quantify overhead in an operationally significant way. The
explicit characterization of overhead cost can then be used to better understand
the overhead present in practical protocols, such as those considered in Chapter 2.
Section 3.1 briefly discusses what is needed to develop a quantitative understanding of overhead in communication systems and gives an operational definition of
overhead cost. Section 3.2 uses the existing framework for channel coding with
cost constraints to relate the operational definition of overhead cost to a mathematical definition, and also gives a simple example showing how overhead cost
can be computed.
3.1 Overhead Cost
Because overhead can reduce the efficiency of a protocol, it is often considered
a cost on the system. But it is rarely the case that the efficiency of a protocol
can be improved simply by replacing “overhead bits” with “data bits.” Consider
a protocol that encodes a user’s messages onto n-bit packets for transmission over
a noisy channel. Some of the drawbacks associated with considering overhead to
be the “non-data” portion of a packet were illustrated in Section 2.1.3.
29
First, non-data bits may be explicitly required to decode data bits. For example, a portion of the n-bit packet may be used to specify the rate of a forward
error correction (FEC) code used to encode the user’s message onto the remaining
bits. Without knowledge of the FEC code’s rate, the decoder does not know the
size of the message that was sent and the message cannot be decoded.
Second, by replacing non-data bits with additional data bits, the probability
of error may increase. For example, if a systematic FEC code is used to transmit
k data bits, replacing the error-control bits with additional data bits will increase
the probability of error for the original k data bits.
Finally, it is often meaningless to explicitly distinguish data and non-data
bits. For example, many systematic FEC codes have non-systematic equivalents
that provide identical error-control performance. However, from the non-data bits
viewpoint on overhead, the non-systematic code has less overhead, even though
the systematic code may offer the advantage of easier decoding.
The conclusion we draw from these observations is that defining what portion of a protocol is “data” versus what portion is “overhead” may not be the
most relevant distinction. If the purpose of distinguishing overhead from data is
to understand what gains may be realized by an improved protocol with a lower
“overhead cost,” it seems preferable to define overhead explicitly as such. Accordingly, we consider overhead cost to be the reduction in system performance that
results from a constraint on the design of a protocol or system.
The remainder of this section establishes an operational definition for the overhead cost of a system constraint as the difference between baseline system performance and constrained system performance. In Section 3.2, we will relate the
operational definition of overhead cost given in this section to mathematical opti-
30
mization problems that can be explicitly computed using established informationtheoretic coding theorems. Establishing this connection enables us to quantify
overhead cost in an operationally significant way.
Let S be the set of all possible protocols for a given communication system and
define a function f (ω) that assigns a performance metric to each protocol ω ∈ S.
Define the baseline system performance P given by
P = max f (ω).
ω∈S
(3.1)
Now consider a subset of protocols SΓ ⊂ S that meet an additional system constraint Γ. Define the constrained system performance PΓ as
PΓ = maxf (ω).
ω∈SΓ
(3.2)
Using (3.1) and (3.2), we give an operational definition for overhead cost.
Definition 3.1. The overhead cost of a system constraint SΓ ⊂ S is given by
O = P − PΓ ,
(3.3)
where P is the unconstrained system performance (3.1) and PΓ is the performance
of the same system subject to the constraint Γ (3.2).
The definitions given in this section are meant only to convey the general concept of overhead as the reduction in performance that results from a constraint
on system design. The definitions for system performance and constrained system performance given in (3.1) and (3.2), respectively, are specified using scalar
performance functions, such as the rate of reliable communication that is achiev31
able with a given protocol. However, in problems with multiple users, it is useful
to consider vector performance functions such as the region of rate pairs that is
achievable with a given protocol. In this case the subtraction in Definition 3.1
should be interpreted as the set difference, and the overhead cost will be given in
terms of an overhead region instead of a scalar quantity.
Next, we show how information-theoretic coding theorems can be used to explicitly compute the overhead cost of a system constraint.
3.2 Illustrative Example
In this section we give a simple example to illustrate how the conceptual framework established in the previous section can be used to compute overhead cost and
show how it differs from the non-data viewpoint on overhead. We use the channel
coding theorem to relate baseline system performance to a mutual information
maximization, and use the framework for channel coding with cost constraints to
relate constrained system performance to a constrained mutual information maximization. The notation used in this section and the remainder of the dissertation
is given in Appendix A.
3.2.1 Channel Coding with Unconstrained Inputs
Consider the problem of channel coding for a discrete memoryless channel
pY|X (y|x) with input X ∈ X and output Y ∈ Y (see, e.g., [48, 49] for a complete
definition of the channel coding problem). The capacity of this channel is given
by
C = max I(X; Y),
pX (x)
32
(3.4)
where pX,Y (x, y) = pX (x)pY|X (y|x) and the average mutual information can be
computed as
I(X; Y) = E [I(X; Y)] ,
(3.5)
where
I(X; Y) := log
pX,Y (X, Y)
pX (X)pY (Y)
(3.6)
is the mutual information which is treated as a random variable throughout the
thesis.
Within the context of system performance defined in (3.1), the optimization
space S is the set of all single-letter distributions pX (x) on X, and the system
performance for a given pX (x) ∈ S and fixed pY|X (y|x) is given by the average
mutual information defined in (3.5). The operational significance of this particular
baseline system performance is established by the channel coding theorem [48, 49].
3.2.2 Channel Coding with Constrained Inputs
Consider the related problem of channel coding with a cost constraint on inputs to the channel (see, e.g., [48, Section 7.3], [54, Section 3.6] for a complete
description of channel coding with cost constraints). Let cn : Xn → R be a cost
function that assigns a cost cn (xn ) to each xn ∈ Xn . In the framework of channel
coding with cost constraints, codewords belonging to a code Cn = (un1 , . . . , unM )
must satisfy
1
cn (uni ) ≤ Γ,
n
i = 1, 2, . . . , M,
33
(3.7)
for all n, where Γ is an arbitrary constant. Define the set of all distributions
pXn (xn ) on input processes {Xn }∞
n=1 that satisfy the cost constraint Γ as
1
n
n
cn (X ) ≤ Γ = 1 for all n = 1, 2, . . . .
SΓ = pXn (x ) : P
n
(3.8)
The channel capacity with cost constraint Γ is then given by
CΓ = max p-lim inf
pXn ∈SΓ
n→∞
1
I(Xn ; Y n ),
n
(3.9)
where
1
1
pXn ,Y n (Xn , Y n )
I(Xn ; Y n ) := log
n
n
pXn (Xn )pY n (Y n )
(3.10)
is the mutual information rate, and the limit infimum in probability of the mutual
information rate is defined as [54]
1
1
n
n
n
n
I(X ; Y ) < β = 0 .
p-lim inf I(X ; Y ) := sup β : lim P
n→∞
n
n
n→∞
(3.11)
Because the limit infimum in probability of the mutual information rate is
upper bounded by the average mutual information and because SΓ ⊂ S, the maximization in (3.9) cannot be greater then the corresponding maximization in (3.4).
The channel capacity with cost constraint Γ must therefore be less than or equal
to the unconstrained capacity given by (3.4). We call the difference between the
unconstrained capacity C and constrained capacity CΓ the overhead of channel
coding with cost constraints.
Compare this definition of overhead to the non-data viewpoint used in Section 2.1
to compute overhead in Ethernet packets. One of the sources of overhead was a
cyclic redundancy check (CRC) that is appended to the end of every packet to
34
0
1−p
0
1−p
1
p
p
1
Figure 3.1: The binary symmetric channel with crossover probability p.
serve as an error detection mechanism. By defining the cost function
cn (xn ) =
n,
xn satisfies CRC,
∞, xn does not satisfy CRC,
(3.12)
and considering Γ = 1, (3.9) can be used to compute the data rates that are
achievable under the Ethernet protocol’s constraint that packets include a CRC.
As a simplified example, consider communication over the binary symmetric
channel (BSC), with crossover probability p, shown in Figure 3.1. The capacity
of the BSC is given by
C(p) = 1 − h(p),
(3.13)
where h(p) denotes the binary entropy function defined in Appendix A.
In order to model redundancy that has been added by a higher layer, such
as the addition of CRC bits in Ethernet, assume that the encoder must operate
subject to a repetition coding constraint in which each pair of consecutive inputs
to the channel are two identical symbols. Two uses of the BSC with the same
input symbol is equivalent to a single use of the binary symmetric erasure channel
shown in Figure 3.2. By symmetry of the BSEC, a uniform input distribution is
35
(1 − p)2
00
2p(1 − p)
p2
00
{01, 10}
p2
11
2p(1 − p)
(1 − p)
2
11
Figure 3.2: Illustration of two uses of a binary symmetric channel
with crossover probability p under a repetition coding constraint as a
single use of the binary symmetric erasure channel.
optimal and (3.9) can be computed as
1
CΓ (p) = I(X21 ; Y12 )
2
1 (3) 1
1
1
= h
− p(1 − p), − p(1 − p) − h(3) (p2 , (1 − p)2 ),
2
2
2
2
(3.14)
(3.15)
where h(3) (p1 , p2 ) is defined in Appendix A.
The overhead cost of a repetition coding constraint for the BSC as a function
of the crossover probability p is then given by
1 (3) 1
1
1
C(p) − CΓ (p) = 1 − h(p) − h
− p(1 − p), − p(1 − p) + h(3) (p2 , (1 − p)2 ).
2
2
2
2
The baseline performance C(p) and the constrained performance CΓ (p) are shown
in Figure 3.3. Note that the overhead cost shown in the figure goes to zero as
p approaches 0.5, which is in contrast to the fixed overhead cost of 0.5 bits per
channel use for all p under the non-data viewpoint that was discussed in Chapter 2.
The optimization in (3.9) is given as a general formula for channel capacity
with multi-letter cost constraints, which is required for a multi-letter constraint
36
Capacity (bits per channel use)
1.0
0.8
Overhead Cost
0.6
C(p): Unconstrained BSC capacity
0.4
0.2
CΓ (p): BSC capacity with
repetition coding constraint
0.1
0.2
0.3
0.4
0.5
Crossover probability, p
Figure 3.3: Illustration of the overhead cost of a repetition coding
constraint for the binary symmetric channel with crossover probably p.
From the non-data viewpoint, the overhead cost is 0.5 bits per channel
use for all p.
such as CRC in Ethernet packets. However, the channel capacity of stationary
memoryless channels with additive cost constraints satisfying
cn (xn ) =
n
X
c(xi )
(3.16)
CΓ = max I(X; Y),
(3.17)
n
o
SΓ := pX (x) : E [c(X)] ≤ Γ .
(3.18)
i=1
can be given more simply as
pX ∈SΓ
where
This simplification still encompasses a wide variety of relevant cost constraints,
37
such as an average power constraint on inputs to a channel.
The reduction in capacity resulting from an average power constraint is not
typically thought of as “overhead.” However, just as CRC overhead in Ethernet
packets and repetition coding for the BSC reduce system performance, an average
power constraint reduces the set of distributions over which we can optimize in
(3.17) and correspondingly reduces the amount of information that can be reliably
sent over the channel. These examples illustrate the connection between the viewpoint for studying overhead established in this chapter and sources of overhead
found in many practical systems.
38
CHAPTER 4
MULTI-ACCESS COMMUNICATION
This chapter examines multiple access channels within the framework for
studying overhead that was discussed in the previous chapter. General background information on multiple access channels is given in Section 4.1. The system model we consider and other definitions are given in Section 4.2. Section 4.3
reviews known results for the capacity region of various multiple access channels
with and without feedback from within our framework. Section 4.4 computes the
constrained capacity region of two binary additive multiple access channels with
feedback. Finally, in Section 4.5 we introduce the K-bit packet collision channel with degraded feedback and develop inner and outer bounds on the capacity
region. Numerical evaluation suggests that the bounds are tight and therefore correspond with the capacity region. For the packet collision channel we also define
the asymptotic packet throughput rate and show that for this channel the cost
of a common distribution constraint is a loss in packet throughput by a factor of
two.
4.1 Background
In Chapter 2 we observed that a significant source of protocol overhead in
wireless protocols has to do with how multiple users communicate over a common
channel. Unlike point-to-point channels for which the central problem is how to
deal with channel noise, the primary technical focus in multi-access is how to share
39
a single channel among a small set of active users that are typically drawn from a
larger population. A detailed review of the ways in which multi-access has been
studied was given in [28], where the focus was on the collision resolution approach
to multi-access and multi-access information theory.
As discussed in Section 2.2.1, the collision resolution approach to multi-access
focuses on packet collision models in which multiple users transmit packets of
information to a common destination and colliding packets are either partially
or completely lost at the receiver. One example of how a packet collision model
has been studied in information theory is the collision channel without feedback
from [23], where the authors show that codeword sets can be chosen such that
they are mutually non-interfering. Assignment of codewords to users requires
that each user be able to uniquely identify themself and have at least partial
knowledge of the number of active users in the system. However, in a system
for which this information is available, frequency or time division multiple access
(FDMA/TDMA) techniques may be viable alternatives.
Unique user identification in packet collision channels could be used to resolve
collisions simply by waiting a certain number of slots that is determined by each
user’s identity such that full channel utilization is possible [28]. A common approach used in contention resolution to preclude such coordination is the modeling
assumption that there are an infinite number of users in the system and that each
new packet arrives at a new user with an empty queue. Because the systems for
which such multi-access techniques are relevant often involve a frequently changing set of low-rate users, the insights gained from models that use the infinite user
assumption still bear some relevance to practical systems.
In this chapter we consider a constraint that can be applied to information the-
40
oretic multi-access models to preclude complete time orthogonalization of transmitted signals, modeling the fact that such coordination is not always possible.
Specifically, we consider three different multiple access channels with feedback and
characterize their capacity regions under the constraint that the same probability
distribution is used to generate each user’s codebooks in a random coding argument [48, 49]. The channel models we consider build upon work from multi-access
information theory, which we now briefly review.
It is well known that feedback can enlarge the capacity region of multiple
access channels with feedback (MAC-FB) [55], which is in contrast to the case for
point-to-point memoryless channels [56]. The way in which feedback can increase
the capacity of some multiple access channels is to make the inputs of each user
dependent [57]. However, the way in which this dependency is generated and used
for more efficient encoding over multiple access channels with feedback makes the
complete characterization of the information capacity region a challenging task.
An achievable rate region for full, or noiseless, feedback was found by Cover
and Leung [58] using block Markov superposition coding and list decoding. The
complete capacity region for a class of discrete memoryless multiple access channels with feedback, which includes the channel from [55], was later given in [59].
The proof shows that in channels for which at least one of the inputs, say X1 , is
completely determined by the output Y and the other input, X2 , the lower bound
given by the Cover–Leung achievable rate region is tight.
An achievable rate region for multiple access channels with feedback signals
that differ from the channel output, or with generalized feedback, was given by
Carleial [60]. An interesting result from [60] is that the Cover–Leung region is
still achievable if only one of the two users receives the feedback signal. However,
41
although [60] gives an expression for an achievable rate region, it is given in
terms of 17 rate bounds and is typically challenging to compute. The generalized
feedback model is important because in addition to modeling feedback signals that
differ from the output of the channel, it can be used to model feedback signals
that are degraded versions of the channel output or are corrupted by noise. In
this sense, the degraded feedback signals considered in this chapter are a special
case of generalized feedback.
Later, Hekstra and Willems [61] introduced the dependence balance bound
for the single output two-way channel. Their motivation was to quantify the notion that transmitters cannot “consume” more dependence than what they have
“produced” through previous transmissions. In general, the resulting additional
constraint on the set of allowable input distributions gives a tighter outer bound
than what is possible from cut-set bounds [62]. Although in [63] it was shown that
the capacity of the two-user AWGN MAC with feedback is given by the cut-set
bound, it was recently shown in [64] that this is not true for the binary additive
MAC with feedback, the discrete input analog to the AWGN MAC with feedback.
Thus, dependence balance bounds are an important tool for evaluating the information capacity of multiple access channels with degraded or noisy feedback. The
general concept behind the dependence balance bound in [61] is also used in [62]
to develop outer bounds on the capacity of common-output discrete memoryless
networks, and in some cases these bounds are shown to be tight.
Even with an additional constraint on input distributions resulting from dependance balance bounds, it is often still challenging to actually compute the
associated rate region. In [65] it was shown that a uniform binary auxiliary random variable is sufficient to compute the maximum sum rate point in the capacity
42
region of the binary additive multiple access channel with feedback. A characterization of the asymmetric rate points that are achievable using a uniform binary
auxiliary random variable was given in [66], where simple feedback strategies for
this channel were developed.
4.2 System Model & Definitions
This section summarizes the system model used throughout the chapter and
defines several functions and variables that will be used to characterize the various
capacity regions we study.
4.2.1 General System Model
The general model for a two-user multiple access channel with feedback used
in this chapter is shown in Figure 4.1. Although we consider channels for which
YF 6= Y, the feedback signals for all of the channels considered in this chapter are degraded version of the main channel output Y and therefore satisfy
p(Y, YF |X1 , X2 ) = p(Y|X1 , X2 )p(YF |Y) such that (X1 , X2 ) ←
→ Y ←
→ YF forms a
Markov chain.
Define a two-user discrete memoryless multiple access channel with generalized
symmetric feedback, input alphabets X1 and X2 , and output alphabets Y and YF
by its probability transition function pY,YF |X1 ,X2 (y, yF |x1 , x2 ) for all (x1 , x2 , y, yF ) ∈
X1 × X2 × Y × YF .
An (n, M1 , M2 , n )-code for the MAC with generalized symmetric feedback
consists of the following:
• message sets M1 = {1, . . . , M1 } and M2 = {1, . . . , M2 }, from which random
variables M1 and M2 are drawn uniformly;
43
YF
M1
Encoder 1
X1
p(y, yF |x1 , x2 )
M2
Encoder 2
Y
Decoder
(M̂1 , M̂2 )
X2
YF
Figure 4.1: Block diagram for the two-user multiple access channel
with generalized feedback.
• two sequences of encoding functions ϕ1,i and ϕ2,i for i = 1, . . . , n with
ϕ1,i : M1 × YFi−1 → X1 ,
i = 1, . . . , n
ϕ2,i : M2 × YFi−1 → X2 ,
i = 1, . . . , n
used to encode and transmit messages M1 and M2 , respectively, through n
channel uses; and
• a decoding function ψn : Yn → M1 × M2 that, upon observation of n channel
outputs, selects the correct message pair with average probability of error
n := P [ψn (Y n ) 6= (M1 , M2 )].
A rate pair (R1 , R2 ) is achievable for a MAC with generalized symmetric feedback if there exists a sequence of (n, M1 (n), M2 (n), n )-codes satisfying
lim inf
n→∞
1
log M1 (n) ≥ R1 ,
n
lim inf
n→∞
1
log M2 (n) ≥ R2 ,
n
and
lim sup n = 0.
n→∞
The capacity region C of the MAC with generalized symmetric feedback is the
44
closure of the set of all achievable rate pairs (R1 , R2 ). The symmetric capacity
Csym of the MAC with generalized symmetric feedback is the maximum R ≥ 0
such that (R, R) ∈ C.
In this chapter we will also investigate the impact of transmission with a common distribution for both users as given by the constraint
pX1 |S (x|s) = pX2 |S (x|s),
s = 1, . . . , |S|
(4.1)
on input distributions for computing capacity regions.
4.2.2 Function and Variable Definitions
This section presents variable definitions and properties of composite functions
that are used throughout the chapter. Specifically, when we develop outer bounds
on the capacity region of various multiple access channels, the functions φ(·) and
f (·, ·) will be used to parameterize functions of variables that depend on the
auxiliary random variable S (i.e., uS , u1S , and u2S ) by functions of variables that
do not depend on S (i.e., u, u1 , and u2 ).
The function
φ(x) =
√
1− 1−2x , 0 ≤ x ≤ 1/2,
2
√
1− 2x−1 , 1/2 < x ≤ 1.
2
(4.2)
was introduced in [65] to characterize the capacity region of channels similar to
those studied in this chapter. It was also shown in [65] that the composite function
h(φ(x)) is concave in x for 0 ≤ x ≤ 1 and symmetric about x = 1/2, where h(·)
is the binary entropy function defined in Appendix A. Furthermore,
φ(2x(1 − x)) = min(1, 1 − x)
45
(4.3)
and
h(φ(2x(1 − x))) = h(x)
(4.4)
hold for 0 ≤ x ≤ 1.
For x ∈ [0, 1/2] and y ∈ [0, 1/2] the function f (x, y) is defined as
f (x, y) = φ(x) + φ(y) − 2φ(x)φ(y)
p
1 − (1 − 2x)(1 − 2y)
=
.
2
(4.5)
(4.6)
It was shown in [64] that f (x, y) is jointly convex in (x, y) for x ∈ [0, 1/2], y ∈
[0, 1/2].
The rate regions we develop will involve the auxiliary random variables S,
T1 , and T2 , where S is used to single letterize multi-letter mutual information
expressions, and T1 and T2 indicate whether or not the first user and second user
are transmitting, respectively. For convenience, pS (s), pT1 |S (t1 |S), pT2 |S (t2 |S) can
be described by the following variables:
ps = Pr[S = s],
(4.7)
q1s = Pr[T1 = 0|S = s],
(4.8)
q2s = Pr[T2 = 0|S = s],
(4.9)
46
for s = 1, . . . , |S|. Also, let
us = q1s + q2s − 2q1s q2s ,
u=
X
u1 =
X
u2 =
X
ps u s ,
(4.10)
ps u1s ,
(4.11)
ps u2s ,
(4.12)
s
u1s = q1s (1 − q1s ),
s
u2s = q2s (1 − q2s ),
s
and
q1 =
X
ps q1s ,
q2 =
s
X
ps q2s .
(4.13)
s
Note that because 0 ≤ qis ≤ 1 for i = 1, 2, s = 1, . . . , |S| we have that u, u1 , and
u2 all lie within the range [0, 1/4].
Finally, we use a bar over an arbitrary variable x to denote the quantity 1 − x
such that x = 1 − x.
4.3 Overhead in Multiple Access Channels
It is natural to think of overhead in a multi-access communication system as the
loss of performance associated with the fact that uncoordinated users communicate
over a common channel. In this section we review known capacity results for
multiple access channels from within the context of the framework for studying
overhead that was established in Chapter 3. We consider multiple access channels
with and without feedback and illustrate that in both cases the capacity region
can be expressed in the same form as the capacity region for coordinated users
with additional constraints to account for the lack of coordination.
For the classical multiple access channel given in Section 4.2.1, or the multiple
access channel with “uncoordinated users,” the appropriate baseline for compari-
47
son is the performance of an equivalent system in which users are fully coordinated.
The capacity region for such a baseline system is the capacity of the corresponding
point-to-point channel with two inputs X1 and X2 , which is given by
C=
o
[ n
(R1 , R2 ) : R1 + R2 ≤ I(X1 , X2 ; Y) .
(4.14)
p(x1 ,x2 )
Because the capacity of a memoryless point-to-point channel does not increase
with feedback, (4.14) also represents the capacity region of coordinated users with
feedback.
The capacity region for coordinated users is given as the union of a single sum
rate bound taken over all joint distributions on X1 × X2 , and so the user’s inputs
may be arbitrarily correlated. As we shall see, the capacity region of the multiple
access channel with uncoordinated users can be computed by appropriately:
• constraining the set of distributions over which the union is computed;
• introducing individual rate constraints on R1 and R2 ; and
• additionally constraining the sum rate term.
Depending upon the particular model considered (with or without feedback, perfect or generalized feedback) each of these three constraints may take a different
form and result in a different reduction of the capacity region relative to (4.14).
As we review known results for the capacity region of multiple access channels in
the next few sections we emphasize the form that each of these constraints takes
in order to highlight the connection to the framework given in Chapter 3.
48
4.3.1 MAC without Feedback
The capacity region of the discrete memoryless multiple access channel without
feedback is given by the closure of the set of (R1 , R2 ) satisfying
R1 ≤ I(X1 ; Y|X2 , S),
(4.15)
R2 ≤ I(X2 ; Y|X1 , S),
(4.16)
R1 + R2 ≤ I(X1 , X2 ; Y|S),
(4.17)
p(s, x1 , x2 , y) = p(s)p(x1 |s)p(x2 |s)p(y|x1 , x2 ),
(4.18)
for some choice of
where it is sufficient to consider |S| ≤ 4 [49].
Relative to the capacity region of coordinated users given by (4.14), the capacity region of the MAC with uncoordinated users introduces three additional
constraints. First, the set of distributions over which the capacity region must be
computed is restricted to those for which X1 and X2 are conditionally independent
given the auxiliary random variable S. Second, individual rate constraints on R1
and R2 are required. Finally, the sum rate bound is reduced to I(X1 , X2 ; Y|S).
The fact that I(X1 , X2 ; Y|S) is upper bounded by the sum rate bound in (4.14) follows from the fact that conditioning reduces entropy and the Markov relationship
49
S←
→ (X1 , X2 ) ←
→ Y, as shown by the following chain of inequalities:
I(X1 , X2 ; Y|S) = H(Y|S) − H(Y|X1 , X2 , S)
(4.19)
≤ H(Y) − H(Y|X1 , X2 )
(4.20)
= I(X1 , X2 ; Y).
(4.21)
4.3.2 MAC with Perfect Feedback
Unlike multiple access channels without feedback, there is currently no closed
form (i.e., single letter) expression for the capacity region of a general discrete
memoryless multiple access channel with perfect feedback. The bounds we consider in this chapter are an inner bound on the capacity region given by Cover
and Leung [58], and an outer bound given by Hekstra and Willems [61].
The Cover-Leung achievable rate region for a MAC with perfect feedback
(MAC-FB) [58] is given by the closure of
RCL =
[
n
(R1 , R2 ) : R1 < I(X1 ; Y|X2 , S),
p(s,x1 ,x2 ,y)∈PCL
R2 < I(X2 ; Y|X1 , S),
o
R1 + R2 < I(X1 , X2 ; Y) ,
(4.22)
where PCL is the set of all joint distributions on S, X1 , X2 , and Y of the form
p(s, x1 , x2 , y) = p(s)p(x1 |s)p(x2 |s)p(y|x1 , x2 ),
(4.23)
and it is sufficient to consider |S| ≤ min(|X1 ||X1 | + 1, |Y| + 2).
Note that the set of joint distributions for the RCL bound and the individual
rate constraints on R1 and R2 are identical to those for the MAC without feedback.
50
However, the RCL bound’s sum rate constraint is identical to the corresponding
bound for coordinated users given in (4.14).
The simplest outer bound on the capacity region of the MAC with feedback is
the cut-set bound (see, e.g., [49, Theorem 14.10.1]), which is given by
RCS =
[
p(x1 ,x2 ,y)∈PCS
n
(R1 , R2 ) : R1 ≤ I(X1 ; Y|X2 ),
R2 ≤ I(X2 ; Y|X1 ),
o
R1 + R2 ≤ I(X1 , X2 ; Y) ,
(4.24)
where PCS is the set of joint distributions on X1 , X2 , and Y of the form
p(x1 , x2 , y) = p(x1 , x2 )p(y|x1 , x2 ).
(4.25)
Note that relative to coordinated transmission, the cut-set bound only introduces
individual rate constraints on R1 and R2 and does not improve upon the sum
rate bound or restrict the set of joint distributions over which the union must be
computed. In (4.25) X1 and X2 can be arbitrarily correlated, but for a specific
MAC with feedback it may not be possible to generate such arbitrary dependence
between the inputs, which is why the cut-set bound is often loose.
In contrast to the cut-set bound which allows for arbitrarily correlated X1
and X2 in (4.25), dependence balance outer bounds seek to limit the form of the
dependence between the channel inputs to that which the channel can actually
support. The motivation behind dependence balance constraints is the notion that
a channel cannot “consume” more dependence than it is has “produced” through
previous transmissions. We give three forms of dependence balance outer bounds
for the MAC with feedback, all of which were originally given in [67], [61].
51
The first is the standard dependence balance bound RDB which seeks to limit
the structure of the dependence between X1 and X2 by imposing an additional
constraint on the set of allowable input distributions, where RDB is given by
RDB =
[
p(s,x1 ,x2 ,y)∈PDB
n
(R1 , R2 ) : R1 ≤ I(X1 ; Y|X2 , S),
R2 ≤ I(X2 ; Y|X1 , S),
o
R1 + R2 ≤ I(X1 , X2 ; Y) ,
(4.26)
where PDB is the set of joint distributions on S, X1 , X2 , and Y of the form
p(s, x1 , x2 , y) = p(s)p(x1 , x2 |s)p(y|x1 , x2 )
(4.27)
that satisfy the dependence balance constraint
I(X1 ; X2 |S) ≤ I(X1 ; X2 |Y, S),
(4.28)
where |S| ≤ min(|X1 ||X1 | + 1, |Y| + 2). The dependence balance bound improves
upon the cut-set bound through the additional constraint on joint distributions
given by (4.28).
Although the dependence balance constraint (4.28) reduces the set of distributions over which the rate bounds in (4.26) must be evaluated, it is still often
difficult to explicitly characterize the resulting rate region. A “genie aided” parallel channel extension for the dependence balance bound can be used to further
limit the set of p(x1 , x2 |s) that must be considered, at the expense of enlarging
the resulting rate region through additional “leak terms” that appear in the rate
bounds.
Consider a channel p+ (z|x1 , x2 , y, s) connected in parallel to the MAC with
52
feedback such that Z is available to the decoder and to both encoders. A new
dependence balance bound can be obtained by replacing Y in RDB with the pair
(Y, Z) and modifying the set of joint distributions accordingly. The bound can
then be strengthened by including the “Shannon” rate constraints that reflect that
the codes must still perform correctly without the parallel channel. Accordingly,
for any parallel channel p+ (z|x1 , x2 , y, s), the capacity region of the MAC with
feedback is contained in
RDB-PC =
n
[
p(s,x1 ,x2 ,y)∈PDB-PC
(R1 , R2 ) : R1 ≤ min [I(X1 ; Y|X2 ), I(X1 ; Y, Z|X2 , S)] ,
R2 ≤ min [I(X2 ; Y|X1 ), I(X2 ; Y, Z|X1 , S)] ,
o
R1 + R2 ≤ min [I(X1 , X2 ; Y), I(X1 , X2 ; Y, Z|S)] ,
(4.29)
where PDB-PC is the set of joint distributions on S, X1 , X2 , Y, and Z of the form
p(s, x1 , x2 , y, z) = p(s)p(x1 , x2 |s)p(y|x1 , x2 )p+ (z|x1 , x2 , y, s)
(4.30)
such that for all s
p+ (z|x1 , x2 , y, s) = F (pX1 ,X2 |S (x1 , x2 |s))
(4.31)
I(X1 ; X2 |S) ≤ I(X1 ; X2 |Y, Z, S),
(4.32)
and
where |S| ≤ |X1 ||X2 | + 3.
Note that the parallel channel in (4.31) can vary with each value of S, in
which case the resulting bound is called an adaptive parallel channel extension
53
bound. Although an adaptive parallel channel can result in tighter bounds than
the corresponding fixed parallel channel, for certain channels the additional complexity precludes a closed form characterization of the resulting rate region. It
is important to note that the parallel channel is chosen ahead of time and each
choice of parallel channel results in a new dependence balance bound. Thus multiple bounds from different parallel channels can be combined to produce a tighter
outer bound on the capacity region.
Although a parallel channel extension based dependence balance bound can
result in additional leak terms relative to the standard dependence balance bound,
both bounds are constrained versions of the baseline capacity region given by
(4.14). Therefore, both bounds fit within the framework established in Chapter 3.
4.3.3 MAC with Generalized Feedback
In Section 4.5 we introduce the packet collision channel with feedback and
characterize its capacity region. For this channel the feedback signal is a degraded
version of the channel output and thus the Cover-Leung achievable rate region
does not directly apply. Instead we make use of an achievable rate region for
the multiple access channel with generalized feedback given by Carleial [60]. The
complete form of the Carleial achievable rate region for the special case of identical
feedback to each user is given by (B.16) in Appendix B.2.
The cut-set and dependence balance outer bounds from the previous section
directly apply to any MAC with degraded feedback. However, it is easy to improve
upon the dependence balance bound by explicitly incorporating the feedback signal into the dependence balance constraint. A dependence balance bound for the
multiple access channel with generalized feedback was also given in [68], where it
54
was used to derive outer bounds for interference channels with generalized feedback. The corresponding proof was given in [69], where the focus was on Gaussian
interference networks. We give a parallel channel extension to the dependence
balance bound for the MAC with generalized feedback, which, although it follows
immediately from [61] and [69], appears to be new.
Theorem 4.1. For any parallel channel extension p+ (z|x1 , x2 , y, yF1 , yF2 , s), the
capacity region of the multiple access channel with generalized feedback signals YF1
and YF2 to users 1 and 2, respectively, is contained in the region RDB-G , where
RDB-G =
n
[
p(s,x1 ,x2 ,y,yF ,yF )
1
2
(R1 , R2 ) : R1 ≤ min [I(X1 ; Y|X2 ), I(X1 ; Y, Z|X2 , S)] ,
∈PDB-G
R2 ≤ min [I(X2 ; Y|X1 ), I(X2 ; Y, Z|X1 , S)] ,
o
R1 + R2 ≤ min [I(X1 , X2 ; Y), I(X1 , X2 ; Y, Z|S)] ,
(4.33)
where PDB-G is the set of joint distributions on S, X1 , X2 , Y, YF1 , YF2 , and Z of
the form
p(s, x1 , x2 , y, yF1 , yF2 , z) =
p(s)p(x1 , x2 |s)p(y, yF1 , yF2 |x1 , x2 )p+ (z|x1 , x2 , y, yF1 , yF2 , s)
(4.34)
such that for all s
p+ (z|x1 , x2 , y, yF1 , yF2 , s) = F (pX1 ,X2 |S (x1 , x2 |s))
(4.35)
I(X1 ; X2 |S) ≤ I(X1 ; X2 |YF1 , YF2 , Z, S),
(4.36)
and
where |S| ≤ |X1 ||X2 | + 3.
55
Proof: The proof uses standard converse techniques. It follows from [69, Section
14.3] by replacing Y with the pair (Y, Z) and including the additional Shannon
rate constraints, as was done in [61] for the single-output two-way channel.
In this section we studied the general form of the capacity region for coordinated and uncoordinated users and discussed the corresponding constraints on the
capacity region. In the following two sections we illustrate the application of our
framework for studying overhead in multi-access systems by computing baseline
and constrained capacity regions for two types of binary additive multiple access
channels with feedback in Section 4.4 and for the packet collision channel with
feedback in Section 4.5.
4.4 Binary Additive Channels with Feedback
The previous section reviewed existing results for the capacity region of multiple access channels within the context of the framework for studying overhead as
the cost of constraints given in Chapter 3. The baseline system for comparison was
the multiple access channel with “coordinated” users, where the corresponding capacity region was given by the equivalent single-user capacity of a point-to-point
channel.
In addition to considering the overhead associated with the lack of coordination
among users, we can study the reduction of the capacity region that results from
additional constraints on the joint input distribution. To illustrate this fact, in
this section we compute baseline and constrained capacity regions for two binary
additive multiple access channels with feedback under a constraint that each user’s
codebook is generated using an identical distribution.
56
4.4.1 Binary Additive MAC with Feedback
The output of the binary additive multiple access channel with feedback studied in this section is given by
Y = X1 + X2 ,
(4.37)
where in the context of the general channel model given in Section 4.2.1 YF =
Y. The binary additive MAC with feedback (BA-MAC-FB) is perhaps the best
example of a multiple access channel with feedback for which the capacity is
known. The capacity region of the binary additive MAC without feedback was
originally determined by Liao [70]. The binary additive MAC with feedback was
first studied by van der Meulen [71], where the symmetric rate pair (0.79113,
0.79113) was shown to be achievable. In [55], the BA-MAC-FB was used to show
that the capacity region of a multiple access channel can increase with feedback.
Cover and Leung showed that the maximum symmetric rate point (or symmetric
capacity) of the BA-MAC-FB lies strictly below the “total cooperation line,” e.g.,
below the boundary of rate points that are achievable when both users know each
other’s messages ahead of time [58].
In [65] Willems showed that van der Meulen’s achievable rate pair lies on the
boundary of the capacity region, therefore determining the symmetric capacity
exactly. It was also demonstrated that the van der Meulen rate pair can be
achieved with a uniformly distributed binary auxiliary random variable. Willems
also showed in [59] that the Cover-Leung achievable rate region is tight for the
class of multiple access channels with feedback for which one input, say X1 , can be
given as a deterministic function of the other input, X2 , and the channel output, Y.
Because the BA-MAC-FB falls into this class of channels, the capacity region of
57
the BA-MAC-FB is given by the Cover-Leung rate region. More recently, Tandon
and Ulukus showed that all of the asymmetric rate points in the capacity region
can also be achieved using a uniform binary auxiliary random variable and gave
a closed form characterization of the corresponding capacity region [64].
Capacity Region
Using results from [64] we can determine the capacity region of the BA-MACFB with common distributions and obtain a simple closed form characterization
of the region, both of which are given by the following theorem.
Theorem 4.2. The capacity region of the BA-MAC-FB with common distributions is given by
[
CΓ =
0≤w≤1/2
n
(R1 , R2 ) : R1 ≤ h(φ(w)),
R2 ≤ h(φ(w)),
(4.38)
o
R1 + R2 ≤ h(w) + 1 − w .
Proof: The Cover-Leung rate region for the BA-MAC-FB is the closure of
RCL =
[
p(s,x1 ,x2 ,y)∈PCL
n
(R1 , R2 ) : R1 < H(X1 |S),
R2 < H(X2 |S),
o
R1 + R2 < H(Y) ,
(4.39)
where PCL was defined in Section 4.3.2. The following closed form characterization
58
of the capacity region of the BA-MAC-FB was given in [64]:
[
C=
(u1 ,u2 )∈P
n
(R1 , R2 ) : R1 ≤ h(φ(2u1 )),
R2 ≤ h(φ(2u2 )),
(4.40)
o
R1 + R2 ≤ h(f (2u1 , 2u2 )) + 1 − f (2u1 , 2u2 ) ,
where the set P is defined as
1
1
P = (u1 , u2 ) : 0 ≤ u1 ≤ , 0 ≤ u2 ≤
.
4
4
(4.41)
Because pX1 |S (x|s) = pX2 |S (x|s) for common distributions we have that u1s = u2s
for all s and therefore u1 = u2 . For x ≤ 1/4 we have that f (x, x) = x. Letting
w = 2u1 = 2u2 in (4.40) gives the desired result.
Symmetric Capacity
Using Theorem 4.2 the symmetric capacity of the BA-MAC-FB with common
distributions can be computed as
Csym
h(w) + 1 − w
.
= max1 min h(φ(w)),
2
0≤w≤ 2
(4.42)
The value of w that maximizes the minimization in (4.42) is given by the unique
solution to
h(φ(w)) =
h(w) + 1 − w
2
(4.43)
over 0 ≤ w ≤ 12 . Evaluating (4.43) numerically leads to w∗ = 0.36236, which
corresponds to Csym = 0.79113 in (4.42). Note that Csym = 0.79113 coincides
with the symmetric capacity of the BA-MAC-FB originally found in [65]; therefore,
59
1
Overhead cost
R2
Unconstrained
Common distribution
No feedback
R1
1
Figure 4.2: Capacity region of the binary additive MAC with feedback.
coding with common distributions can achieve all symmetric rate points in the
capacity region of the BA-MAC-FB.
Discussion
The capacity region of the BA-MAC-FB with and without common distributions is shown in Figure 4.2, along with the capacity region of the BA-MAC
without feedback for reference. It is clear from Figure 4.2 that coding with common distributions can take advantage of feedback to enlarge the capacity region of
the binary additive MAC. Using Theorem 4.2 we showed that for symmetric rate
points coding with common distributions can take full advantage of the feedback
link. However, for asymmetric rate pairs the gains that are possible with feedback
are not as significant as they are for unconstrained coding.
In the next section we consider a channel with a non-zero cost to coding with
60
common distributions at all rates for which feedback enlarges the capacity region.
4.4.2 Binary Additive Noisy MAC with Feedback
The binary additive noisy MAC with feedback extends the BA-MAC-FB to
include uniform binary additive noise N to the channel output Y given by
Y = X1 + X2 + N.
(4.44)
The capacity region of the BAN-MAC-FB is not known. Kramer [62] showed that
the Cover-Leung bound on the symmetric capacity is strictly loose. The tightest
known outer bound for the capacity region was presented in [64]. Although the unconstrained capacity region is unknown, the following theorem gives a closed form
characterization of the constrained capacity region under a common distribution
constraint.
Theorem 4.3. The capacity region of the BAN-MAC-FB with common distributions is given by
(
CΓ =
[
0≤w≤1/2
1
(R1 , R2 ) : R1 ≤ h(φ(w)),
2
1
R2 ≤ h(φ(w)),
2
)
1−w
R1 + R2 ≤ h
.
2
(4.45)
The complete proof of Theorem 4.3 is given in Appendix B.1. The proof in(1)
volves computing two parallel channel extension dependence balance bounds RDB,Γ
(2)
and RDB,Γ and showing that their intersection is equivalent to the Cover-Leung
achievable rate region evaluated over distributions that satisfy the common distri-
61
0.55
0.5
No Feedback
R2
0.45
0.4
0.35
0.3
0.3
0.35
0.4
R1
0.45
0.5
0.55
Figure 4.3: Dependence balance bounds for the BAN-MAC-FB.
(1)
(2)
(1)
bution constraint. RDB,Γ , RDB,Γ , and their unconstrained counterparts RDB and
(2)
RDB are shown in Figure 4.3, along with the capacity region of the binary additive
noisy MAC without feedback for comparison.
Using Theorem 4.3 we obtain a closed form expression for the symmetric capacity of the BAN-MAC-FB with common distributions and show that it corresponds
to the maximum symmetric rate point of the unconstrained Cover-Leung achievable rate region. The symmetric capacity with common distributions is given as
1
1 1−w
= max min h(φ(w)), h
.
0≤w≤1/2
2
2
2
Csym
(4.46)
Noting that h(φ(w)) is strictly increasing for w ∈ [0, 1/2] and h((1 − w)/2) is
strictly decreasing for w ∈ [0, 1/2], the w∗ that maximizes (4.46) is given by the
62
unique solution to
1
1 1−w
h(φ(w)) = h
2
2
2
√
over w ∈ [0, 1/2]. Solving (4.47) we have that w∗ = 2 − 1 and
Csym
1
1
= h √
≈ 0.43621,
2
2
(4.47)
(4.48)
which coincides with the maximum symmetric rate point of the Cover-Leung
achievable rate region for the BAN-MAC-FB.
The common distribution capacity region of the BAN-MAC-FB is shown in
Figure 4.4, along with the best known inner and outer bounds for the unconstrained capacity region and the capacity region without feedback. Note that
(1)
(2)
although RDB,Γ and RDB,Γ in Figure 4.3 appear to be linear, and, therefore,
the common distribution capacity region in Figure 4.4, by computing the parametric derivatives dR1 /dw and dR2 /dw of the expression for capacity given by
Theorem 4.3 we see that they are not.
Although we showed that the Cover-Leung achievable rate region for unconstrained coding coincides with the common distribution capacity region at the
symmetric rate point (0.43621, 0.43621), from Figure 4.4 it is clear that the CoverLeung bound is strictly larger than the common distribution capacity region for
all of the asymmetric rate pairs for which feedback increases capacity. However,
in [62] Kramer used directed information to show that the symmetric rate point
(0.43879, 0.43879) is achievable and thus the Cover-Leung bound is loose at this
point. Therefore, there is a non-zero cost to transmission with common distributions at all rate points for which feedback increases capacity, which is in contrast
to the case for the BAN-MAC-FB’s noiseless counterpart, the BA-MAC-FB con-
63
0.55
Unconstrained Outer Bound (DB)
Unconstrained Inner Bound (CL)
Kramer’s Achievable Point
Common Distribution Capacity
No Feedback
0.5
R2
0.45
0.4
0.35
0.3
0.3
0.35
0.4
R1
0.45
0.5
0.55
Figure 4.4: Bounds on the capacity region of the binary additive
noisy MAC with feedback.
sidered in Section 4.4.1. Also note that in Figure 4.4 the capacity region without
feedback is for both unconstrained coding and transmission with common distributions.
4.5 Packet Collision Channel with Feedback
The multiple access channel model shown in Figure 4.1 is typically used to
study multi-user transmission at the bit level. However, practical communication
systems often rely on multi-user arbitration at the packet level. In this section
we study such packet-level arbitration from an information-theoretic viewpoint
by considering communication over a packetized collision channel with degraded
feedback.
We find inner and outer bounds on the capacity region of the K-bit packet
64
collision channel with collision and acknowledgement feedback. When evaluated
numerically these bounds appear to be tight, which suggests that they coincide
with the capacity region. For coding with a common distribution constraint we
determine the capacity region exactly. We give closed form characterizations of
the symmetric capacity for unconstrained and constrained coding and show that a
binary uniform auxiliary random variable is sufficient to obtain all of the symmetric rate points in the capacity region. Finally, we determine the asymptotic packet
throughput as K → ∞ and show that communication with common distributions
reduces throughput by a factor of two.
4.5.1 Channel Model
In multi-access systems information about the success or failure of packet
transmissions is often provided to the transmitters via a feedback link in order to facilitate more reliable communication. In this section we consider an
information-theoretic model for packetized channel access with “0, 1, c” feedback
that represents if the channel was idle, a packet was successfully received, or a
collision occurred in each time slot.
For every channel use of the packet collision channel with feedback each user
either transmits a K bit packet X̃i ∈ {0, . . . , 2K }, i = 1, 2, or remains silent. Let
T1 and T2 be random variables indicating whether user 1 and user 2 transmit in
a given time slot, such that each user’s channel input is given by X1 = T1 X̃1 and
65
X2 = T2 X̃2 , respectively. The channel output Y is given by
0,
X1 ,
Y=
X2 ,
c,
T1 = 0, T2 = 0,
T1 = 1, T2 = 0,
(4.49)
T1 = 0, T2 = 1,
T1 = 1, T2 = 1,
and the feedback signal YF is
YF = T1 + T2 ,
(4.50)
where YF = 0 if neither user transmits, YF = 1 if exactly one user transmits, and
YF = 2 or “c” for collision if both users transmit. Note that because the feedback
signal is a degraded version of the main channel output this channel model is a
special case of degraded feedback.
4.5.2 Capacity Region
This section presents inner and outer bounds for the capacity region of the
packet collision channel with feedback. The bounds appear to be tight for all
rates when computed numerically, although we have been unable to show this
analytically. We compare these bounds to the capacity region without feedback
and to the capacity region with feedback and a common distribution constraint.
First, we obtain an inner bound by computing the Carleial achievable rate
region for a particular distribution and find a closed form characterization of this
bound, as given by the following theorem.
Theorem 4.4. All rate pairs (R1 , R2 ) ∈ C are achievable over the K-bit packet
66
collision channel with feedback, where
C=
[
(u1 ,u2 )∈P
n
(R1 , R2 ) : R1 ≤ h(φ1 ) + Kφ1 φ2 ,
R2 ≤ h(φ2 ) + Kφ1 φ2 ,
(4.51)
o
(3)
R1 + R2 ≤ h (φ1 φ2 , φ1 φ2 ) + K(1 − φ1 φ2 + φ1 φ2 ) ,
and the set P is defined as
1
1
P = (u1 , u2 ) : 0 ≤ u1 ≤ , 0 ≤ u2 ≤
,
4
4
(4.52)
and we have defined φ1 = φ(2u1 ), φ2 = φ(2u2 ), φ1 = 1 − φ1 , and φ2 = 1 − φ2 .
The proof of Theorem 4.4 is given in Appendix B.2 and is given in two parts.
First we compute the Carleial achievable rate region RC for a particular distribution and show that all rates (R1 , R2 ) ∈ RC are achievable, where
RC =
[
0≤q1s ≤1
0≤q2s ≤1
s∈{1,...,|S|}
X (R1 , R2 ) : R1 ≤
ps h(q1s ) + Kq 1s q2s ,
s
R2 ≤
X
R1 + R2 ≤
X
ps h(q2s ) + Kq1s q 2s ,
(4.53)
s
s
(3)
ps h (q1s q2s , q 1s q 2s ) + K(q1s + q2s − 2q1s q2s ) .
Then we derive a closed-form characterization of RC and show that it is equivalent
to C as given in (4.51).
A plot of the rate region given by (4.51) for K = 2 is shown in Figure 4.5.
Because cooperating users can achieve the cardinality bound for this channel,
the total cooperation line is the set of (R1 , R2 ) satisfying R1 + R2 = log2 |Y| =
log2 (2K + 2). Note that the rate region depicted in Figure 4.5 is co-linear with
the total cooperation region along a portion of the total cooperation line, which
67
2.5
Total cooperation region
2
Achievable with
non-uniform binary S
R2
1.5
1
Achievable with
uniform binary S
0.5
0
Overhead cost
Achievable without
time sharing
0
0.5
1
1.5
2
2.5
R
1
Figure 4.5: Illustration of the three rate regimes that require different
forms of time sharing to achieve capacity over the packet collision
channel with feedback.
we characterize in the following lemma.
Lemma 4.1. All (R1 , R2 ) ∈ RTC are achievable and on the total cooperation line,
where
RTC =
(R1 , R2 ) : log2 (2K + 2) − h(q ∗ ) − Kq ∗ 2 ≤ R1 ≤ h(q ∗ ) + Kq ∗ 2 ,
K
R2 = log2 (2 + 2) − R1 ,
and
q∗ =
Proof:
1+
q
2
2K −2
2K +2
.
(4.54)
(4.55)
Because R1 + R2 = log2 (2K + 2) for (R1 , R2 ) ∈ RTC , it follows that all
rate pairs specified by (4.54) lie on the total cooperation line. To show that all
(R1 , R2 ) ∈ RTC are achievable we use the form of the Carleial achievable rate
68
region given in (4.53). Let |S| = 1. Using the fact that
h(3) (a, b, c) ≤ h(b) + 1 − b,
(4.56)
the sum rate term in (4.53) can be upper bounded as
R1 + R2 ≤ h(3) (q1 q2 , q 1 q 2 ) + K(q1 + q2 − 2q1 q2 )
≤ h(u) + 1 − (K − 1)u,
(4.57)
(4.58)
where we have defined u = q1 + q2 − 2q1 q2 . Now (4.58) is uniquely maximized by
u = u∗ , where
u∗ =
1
,
1 + 21−K
(4.59)
resulting in a maximum value of
h(u∗ ) + 1 − (K − 1)u∗ = log2 (2K + 2).
(4.60)
Note that (4.56) is met with equality only if a = c; therefore, the inequality in
(4.58) is strict unless q1 q2 = q 1 q 2 . This implies that q1 + q2 = 1, and by defining
q = q1 = 1 − q2 we have
q=
1±
√
2u∗ − 1
.
2
(4.61)
Evaluating (4.53) with |S| = 1 for the two choices of q given by (4.61) and time
sharing between codebooks yields the desired result.
The proof of Lemma 4.1 required non-uniform time sharing between two distributions to achieve all points on the total cooperation line. Uniform time sharing
between the two distributions given implicitly by (4.61) yields the symmetric ca-
69
pacity of the channel. This can be seen by computing (4.53) for a distribution
specified by p1 = p2 = 1/2 and
q11 = 1 − q12 = q ∗ ,
(4.62)
q22 = 1 − q21 = q ∗ ,
(4.63)
with q ∗ defined in (4.55). The resulting achievable rate region is given by
1
1
∗
(R1 , R2 ) : R1 ≤ h(q ) + K
−
,
2 2K + 2
1
1
∗
R2 ≤ h(q ) + K
−
,
2 2K + 2
K
R1 + R2 ≤ log2 (2 + 2) .
(4.64)
(4.65)
(4.66)
Note that for finite K we have
∗
h(q ) + K
1
1
− K
2 2 +2
>
log2 (2K + 2)
,
2
(4.67)
and so
Csym =
log2 (2K + 2)
.
2
(4.68)
Additionally, the fact that the inequality in (4.67) is strict implies that uniform
time sharing can achieve certain points on the total cooperation line aside from the
maximum symmetric rate point. The different time-sharing regions are illustrated
in Figure 4.5, with explicit points shown to indicate the boundary between each.
In the next theorem we give an outer bound to the capacity region of the packet
collision channel with feedback.
Theorem 4.5. The capacity region of the K-bit packet collision channel with
70
feedback is contained in the region given by
C=
[
0≤q1s ≤1
0≤q2s ≤1
s∈{1,...,|S|}
i
X h
(R1 , R2 ) : R1 ≤
ps h(q1s ) + Kq 1s q2s ,
s
R2 ≤
X
h
i
ps h(q2s ) + Kq1s q 2s ,
(4.69)
s
R1 + R2 ≤ h (a, d) + K(q1 + q2 − 2a) ,
(3)
where
a=
X
ps q1s q2s ,
(4.70)
ps q 1s q 2s ,
(4.71)
s
d=
X
s
and |S| ≤ 5.
The complete proof of Theorem 4.5 is given in Appendix B.3. The proof relies on a parallel channel extension dependence balance bound for the multiple
access channel with generalized feedback that was given earlier in Theorem 4.1.
The dependence balance bound is given as the intersection of pentagons generated by all choices of joint distributions p(s, x1 , x2 , y, yF ) that satisfy a dependence balance constraint. We show that the pentagon generated by each
p(s, x1 , x2 , y, yF ) ∈ PDB is contained in a pentagon generated by the corresponding distribution p∗ (s, x1 , x2 , y, yF ) with X̃1 and X̃2 normalized to be uniform and
independent. By showing that the resulting normalized distribution also satisfies
the dependence balance constraint, it follows that it is sufficient to consider X1
and X2 conditionally independent given S. Theorem 4.5 follows by computing the
corresponding intersection of rate regions for conditionally independent X1 and
X2 .
71
The inner and outer bounds given in Theorem 4.4 and Theorem 4.5 differ in
their sum rate terms for |S| > 1. The sum rate constraint for the achievable rate
region given in Theorem 4.4 is H(Y|S), and for the outer bound in Theorem 4.5
it is H(Y). The key inequality that is needed to show the two regions coincide is
!
h(3)
X
ps q1s q2s ,
s
X
ps q 1s q 2s
s
≤ h(3) (φ1 φ2 , φ1 φ2 ),
(4.72)
which does not hold for all q1s , q2s ∈ [0, 1]. If the feasible space of (q1s , q2s ) pairs
can be restricted to those for which (4.72) is valid then the achievable rate region
given by Theorem 4.4 coincides with the capacity region of the channel.
Although we have not been able to show that the outer bound in Theorem 4.5 is
tight analytically, the region obtained by evaluating (4.69) numerically for |S| = 3
does coincide with the achievable rate region given in (4.51) for all (R1 , R2 ). Therefore, we conjecture that the bound given in Theorem 4.4 is tight and corresponds
to the capacity region of the channel. In the next section we investigate the impact of a common distribution constraint on the capacity region and characterize
the packet throughput rate as K → ∞ for both unconstrained coding and coding
with a common distribution constraint.
4.5.3 Distribution Constraints
Theorem 4.4 and Theorem 4.5 can be used to find the capacity region under
a common distribution constraint, as given by the following corollary.
Corollary 4.1. The capacity region of the packet collision channel with feedback
72
and a common distribution constraint is given by CΓ , where
CΓ = (R1 , R2 ) :R1 ≤ 1 + K/4, R2 ≤ 1 + K/4,
R1 + R2 ≤ (K + 3)/2 .
(4.73)
Proof: Let qs = q1s = q2s for s = 1, . . . , |S|. The sum rate term in (4.69) can be
bounded as follows:
!
R1 + R2 ≤ h(3)
X
ps qs2 ,
s
X
s
ps (1 − qs )2 ,
X
s
ps 2qs (1 − qs )
+K
X
s
ps 2qs (1 − qs )
≤ h(u) + 1 + (K − 1)u
≤ (K + 3)/2,
(4.74)
and similarly,
R1 ≤
X
s
h
i
ps h(qs ) + Kqs (1 − qs )
(4.75)
≤ 1 + Ku/2
(4.76)
≤ 1 + K/4,
(4.77)
and
R2 ≤ 1 + K/4,
where we have used the fact that u =
P
s
(4.78)
ps qs (1 − qs ) ≤ 1/2. Combining the above
inequalities shows that rate pairs outside of CΓ cannot be achieved. Computing
73
3.5
No Feedback
Collision Feedback
Common Codebooks
3
2.5
2
R2
K=3
1.5
K=2
1
K=1
0.5
0
0
0.5
1
1.5
R1
2
2.5
3
3.5
Figure 4.6: Capacity region of the packet collision channel with feedback.
(4.53) for |S| = 1 gives the achievable rate region CΓ , where
CΓ = (R1 , R2 ) :R1 ≤ h(q) + Kq(1 − q), R2 ≤ h(q) + Kq(1 − q),
(4.79)
R1 + R2 ≤ h(3) (q 2 , 1 − q − q 2 , 2q(1 − q)) + K2q(1 − q)
for any 0 ≤ q ≤ 1. Note that all three inequalities in (4.79) are simultaneously
maximized for q = 1/2. Evaluating (4.79) for this value of q gives the desired
result.
Figure 4.6 shows the capacity region with and without common distributions
for several values of K. Note that for K = 1 common distributions achieve all
points on the total cooperation line, however as K increases the cost of common
distributions relative to unconstrained transmission grows.
From Corollary 4.1, the symmetric capacity of the packet collision channel
74
with feedback is given by
Csym =
K +3
.
4
(4.80)
Define the packet throughput rate of a K-bit multiple access channel as
Csym
.
K→∞ K
T = lim
(4.81)
From (4.68), we have that for unconstrained transmission
log2 (2K + 2)
1
= ,
K→∞
2K
2
T = lim
(4.82)
whereas for common distributions
K +3
1
= .
K→∞ 4K
4
T = lim
(4.83)
Therefore, the asymptotic cost to a common distribution constraint for the packet
collision channel with feedback is a loss in packet throughput by a factor of two.
A plot of symmetric capacity illustrating how the gap grows with K is shown
in Figure 4.7, where the packet throughput rate T corresponds to the slope for
asymptotically large K.
4.6 Summary
This chapter illustrated how our viewpoint can be used to characterize overhead in multi-access systems. From the viewpoint of overhead we established in
Chapter 3, the reduction in the capacity region due to the additional constraint
on encoders represents the protocol overhead associated with multi-access communications for the channel models considered in this chapter.
75
5.5
5
Symmetric Capacity (bits)
4.5
4
Overhead
Cost
Unconstrained
3.5
3
2.5
Common
Distribution
Constraint
2
1.5
1
1
2
3
4
5
6
7
Packet Length, K (bits)
8
9
10
Figure 4.7: Symmetric capacity of the packet collision channel with
feedback.
We considered an information-theoretic model for a two-user packet collision
channel with feedback. Users were precluded from performing TDMA by imposing the constraint that their codebooks be drawn from a common distribution.
The use of a common distribution plays a role during codebook generation in an
associated random coding argument, but the actual codebook that each user uses
for communication is unique. However, unlike for the collision channel without
feedback [23], by generating codewords according to a common distribution we ensure that there is no essential connection between the structure of the codewords
themselves and the users to which they are assigned.
Structured codeword design that depends on unique user identification may
be reasonable for a communication system with two users, but the multi-access
systems for which a packet collision model is most relevant often consist of a small
76
number of active users that are frequently changing. An arbitrary assignment of
codebooks to users more closely resembles the multi-access protocols that we studied in Chapter 2, where user identification information was contained in packet
headers and treated the same as data by the Physical and MAC Layers.
Packet throughput in the packet collision channel with feedback was reduced
by a factor of two by a common distribution constraint. This suggests that multiaccess protocols that do not make use of unique user identification can incur a
significant cost in terms of packet throughput.
77
CHAPTER 5
DELAY CONSTRAINTS
To a large extent, information theory has not focused on treatment of delay in
communication systems. This can be attributed to the fact that for many communications systems we are able to meet or get adequately close to fundamental
asymptotic limits while using finite block length codes. However, as discussed
in Section 2.3, a complete treatment of the way in which delay is connected to
information transmission within an information-theoretic context could help us to
better understand fundamental limits in more complex network-oriented settings.
There are three characteristics of communication systems that will be central
to a more general understanding of overhead and delay. The first is channel noise,
on which information theory has provided significant insights in the past. The
second is the role of feedback, which spans information theory and networking.
The third is packetization or discretization in time or another dimension. Thus,
we focus on better understanding the costs of meeting delay constraints and the
associated overhead within the context of these three characteristics.
In this chapter we explore several extensions and applications of [52]. In the
first application we consider a discrete time counterpart that can be used to model
discretization at multiple layers in the protocol stack. At higher layers, such as
the Application Layer, it models packetization in network settings. Moving down
the protocol stack to the MAC and Physical Layers, it allows us to consider the
slotted nature of many channel access protocols, and discrete-time channels that
are often used in practice due to implementation constraints.
78
For the second application, we consider a scenario in which a bursty source is
to be communicated error-free over a noisy point-to-point communication channel
both with and without feedback. Here we find that the finite bandwidth constraint of the point-to-point link results in additional queuing delay that is not
well accounted for in the established rate-distortion theorems. We then present
refinements to account for this bandwidth constraint and arrive at some improved
bounds on delay-overhead trade-offs.
This chapter is organized as follows. Section 5.1 reviews some of the ratedistortion tools and notation used throughout this chapter. Section 5.2 investigates overhead and source burstiness from a rate-distortion viewpoint. Finally,
Section 5.3 develops bounds on rate-delay tradeoffs that are based upon fundamental bounds on overhead.
5.1 Rate-Distortion Preliminaries
This section develops the rate-distortion definitions and tools that will be used
throughout this chapter. The formulation was originally used in [52] to establish
a lower bound on protocol overhead for a delay constraint. After first describing
the system model we discuss the relevant theorems from the literature.
5.1.1 System Model
Assume messages arrive at a network node according to a homogeneous Poisson
process of rate λ [72]. Messages are combined and encoded for transmission over
an arbitrary network to a destination node. A networking protocol is used that
delivers messages within an average delay constraint d.
For groups of N consecutive messages, denote the message arrival times at
79
the source node by KN = (K1 , K2 , . . . , KN ), and the corresponding delivery times
at the destination node as K̂N = (K̂1 , K̂2 , . . . , K̂N ). Define the delay for the nth
message as Dn := K̂n − Kn , with corresponding expectation dn := E [Dn ], and the
P
average delay for a group of N messages as 1/N N
n=1 E [Dn ]. Let PN (d) denote
the set of joint probability measures on KN and K̂N that:
• Have marginal distribution for KN satisfying the arrival process model;
• Result in Dn ≥ 0 ∀n with probability 1; and,
P
• Satisfy the average delay constraint 1/N N
n=1 E [Dn ] ≤ d.
For any joint distribution on a pair of random vectors we can compute the mutual
information I(KN ; K̂N ). For any network or protocol, this mutual information is
induced by the distributions on message arrival and delivery times. This mutual
information represents timing information sent over the channel and can be used
to develop fundamental bounds on protocol overhead.
To motivate the idea that I(KN ; K̂N ) relates to the amount of information being
sent to the destination about message arrival times, we summarize the perspective
introduced in [52]. A key observation is that if messages are delivered within some
average delay d, the destination is able to form a (perhaps noisy) estimate of the
sequence of message arrival times. If the destination can form a better estimate
of the message arrival times than it could by guessing randomly accordingly to
the marginal distribution of the message arrival process, then there is information
being communicated to the destination about message arrival times at the source.
This statement can be made more precise by introducing a rate-distortion formulation, which yields a mathematical characterization of the minimum amount of
timing information that is sent to the destination. For this purpose, define the
80
N -th order rate-distortion function as
1
I(KN ; K̂N ),
PN (d) N
RN (d) := inf
(5.1)
and the corresponding rate-distortion function as
R(d) := lim inf RN (d).
N →∞
(5.2)
Although rate-distortion theorems establish the relationship between (5.2) and
the operational definition of the rate-distortion function for source coding [49],
there is no rate distortion theorem that tells us the rate-distortion function as
defined in (5.2) is a lower bound on the rate required to achieve a delay less than
d, the reason being that encoding of timing information does not fit directly into
typical source coding frameworks. However, we can still leverage existing ratedistortion theorems by considering the appropriate representation of the arrival
and delivery times.
For instance, a stationary ergodic representation of the arrival times can be
formed by instead considering the sequence of interarrival times. However, under
this representation the distortion is no longer additive and thus standard block
and finite sliding-block rate-distortion converse arguments do not directly apply.
The solution is to consider process definitions of rate-distortion functions and
source coding theorems established in [73–77]. Instead of computing the distortion
between pairs of arrival or interarrival times, the process definition viewpoint gives
an alternative rate-distortion formulation that involves a single minimization over
all random processes instead of over distributions of random vectors.
Consider the random process N(t) defined as the counting process that rep-
81
resents the number of message arrivals up to time t. Using this representation,
define the distortion measure
ρ(N(t), N̂(t)) :=
N̂(t) − N(t), N̂(t) ≤ N(t),
∞,
else,
(5.3)
and the average distortion per unit time
1
ρT (N[0,T ] , N̂[0,T ] ) :=
T
Z
T
ρ(N(t), N̂(t)).
(5.4)
0
The corresponding rate-distortion functions are then defined analogously to (5.1)
and (5.2). Under this alternative representation the converse given in [78, Theorem
7.2.5] applies, resulting in a lower bound on the information required to meet the
delay constraint. Note that although the process viewpoint for rate-distortion
functions can provide mathematical precision, the standard definitions are more
useful computationally.
From a practical standpoint, although achievability of the rate-distortion function is shown in [75, 76], the fact that delay increases linearly with block length
suggests rates arbitrarily close to R(d) with average delay d are not actually
achievable with block coding techniques. Although the arrival sequence could
be reconstructed close to within an average distortion of d, the reconstruction
times would not correspond to the actual protocol delays induced by the network. It would therefore be interesting to consider the application of sliding block
source coding techniques [76, 79, 80], or for transmission over a noisy channel,
sliding-block joint source/noisy-channel techniques [81]. These ideas represent an
interesting area for future research.
In the following section we consider a discrete time Bernoulli arrival process,
82
and build from the definitions and notation established in this section for consistency.
5.2 Overhead for a Discrete-Time Bursty Source
Communication systems typically exhibit some form of underlying cost to
achieving small delays or latency. Given a particular protocol we can sometimes
bound the relationship between delay and other system parameters. In contrast
to the model studied in [52], practical communication links are often slotted in
nature, and messages only arrive at discrete time instances. It can be the case
that the destination recovers timing information in a lossless fashion. This would
be impossible for a continuous arrival process, because it would require an infinite
amount of protocol information to be sent over the network.
In this section we investigate the amount of protocol information required for
a communication network to meet an average delay constraint for the delivery
of messages that arrive according to a Bernoulli random process. We obtain a
lower bound on this overhead as a function of the arrival rate and average delay.
Our model is a discrete-time analog of the Poisson arrival process considered by
Gallager, and we demonstrate that in the limit as slot duration goes to zero,
Gallager’s bound is recovered.
Section 5.2.2 gives the problem formulation, followed by the derivation of the
first-order rate-distortion function, which is a lower bound on protocol overhead.
Section 5.2.3 plots the overhead for slotted arrivals for various slot durations and
illustrate that our bound converges to the one for continuous arrivals given in [52].
83
5.2.1 System Model
We consider a discrete time communication system in which messages are
independently generated in each slot with probability p, and messages must be
sent to the destination with average delay (taken over all messages) less than or
equal to d. Let the discrete-time stochastic process X = X0 , X1 , ... be a sequence
of i.i.d. Bernoulli random variables with parameter (0 < p ≤ 1). Assume that for
k = 0, 1, . . ., if Xk = 1, then a message M is generated during the kth slot according
to the discrete distribution PM [m]. For the nth such Xk = 1 (1 ≤ n ≤ N ) denote
the corresponding arrival time as Kn = k, and the message delivery time as K̂n (1 ≤
n ≤ N ). Note that although we maintain the notation established in Section 5.1
for a Poisson arrival process, in this section all arrival and reconstruction times
are discrete.
Because message arrival times form a Bernoulli process, the message interarrival times (Kn+1 − Kn ) are i.i.d. geometric random variables with parameter
(0 < p ≤ 1), i.e. for n = 1, . . . , N − 1, (Kn+1 − Kn ) ∼ (1 − p)(k−1) p, k = 1, 2, . . ..
For notational convenience in Theorem 5.1, we have defined slot indexing starting
from zero, so the arrival time for the first message follows a shifted geometric
distribution of PK1 [k] = p(1 − p)k .
5.2.2 Protocol Overhead for Slotted Arrivals
This section gives one of the main contributions of this chapter. We develop a
lower bound on protocol overhead due to the transfer of timing information about
messages arriving according to a Bernoulli random process. In order to prove this
result, we will require the following lemma.
Lemma 5.1. For messages that arrive according to the Bernoulli process described
84
in Section 5.2.1 and the N -th order rate-distortion functions defined by (5.1), it
holds that
R1 (d) ≤ RN (d),
∀N.
(5.5)
Proof: The proof is conceptually similar to [52, Theorem 3], but for completeness
we give our own version in Appendix C.1.
From Lemma 5.1 and the definition for R(d) given in (5.2), we see that R1 (d)
is a lower bound on R(d), and therefore a lower bound on the protocol overhead
that is somehow communicated to the destination. We now compute R1 (d).
Theorem 5.1. For Bernoulli arrivals with parameter p, the first-order rate-distortion
function R1 (d) for timing information about message arrivals is given by
R1 (d) = sup
ν≥0
− νd + νk0 − (k0 + 1) log p − pk0 +1 − 1 log p−(k0 +1) − 1
(5.6)
ν
(e − 1)p
ν + pk0 [log p − νp]
k0 +1
+p
log
−
,
p
eν − 1
where p := 1 − p and
k0 (p, ν) :=
log
eν −1
eν −p
log p
+
− 1 .
(5.7)
Proof: The proof is given in Appendix C.2.
For sufficiently small d, k0 = 0 and (5.6) becomes
1−p
R1 (d) = sup (1 − p) log(1 − e ) − log p −
log(1 − p) − νd .
p
ν≥0
−ν
85
(5.8)
1
10
Decreasing arrival rate
Overhead (nats/slot)
0
10
-1
10
-2
10
-1
0
10
1
10
10
2
10
Average delay (slots)
Figure 5.1: Lower bounds on protocol overhead for Bernoulli arrivals
with parameter p ∈ {0.1, 0.3, 0.5, 0.7, 0.9}.
In this regime the optimal ν is given by ν = log[(d + 1 − p)/d], which leads to
R1 (d) = (1 − p) log
1−p
d+1−p
−
1−p
log(1 − p) − log p.
p
(5.9)
Figure 5.1 shows (5.6) evaluated numerically for a range of p. Because the
arrival process is disrete, as d → 0, R1 (d) approaches the entropy rate of the
arrival process. Note that this contrasts the continuous case in which the protocol
information grows without bound as d → 0.
5.2.3 Bernoulli and Poisson Comparisons
Although the discussion has so far been for a discrete arrival process with delay
measured in slots, for sufficiently small time intervals a Bernoulli process can serve
86
as a good approximation to a continuos-time Poisson process. Accordingly, one
would expect that as the slot duration becomes small, the two rate-distortion
functions would converge. In [52], Gallager considered a model similar to ours,
but for messages that arrive according to a Poisson process of rate λ. Because
of the Poisson process assumption, the message interarrival times and the arrival
time for the first message are i.i.d. exponential random variables with parameter
λ. Gallager derived the first-order rate-distortion function given by
R1 (d) = − log(1 − e−λd ),
(5.10)
and showed it is a lower bound on protocol overhead. Using the normalization
p = λT for various slot times, Figure 5.2 illustrates that our bound on protocol
overhead for discrete arrivals converges to the lower bound on protocol overhead
for the continuous case.
In the next section we combine the concept of rate-distortion bounds on overhead with a noisy channel and investigate the additional channel bandwidth that
is required to meet a delay constraint.
5.3 Bursty Sources over Noisy Channels
The second application we consider that is motivated by [52] investigates the
impact of source burstiness on bandwidth requirements for point-to-point links
with channel errors. Specifically, as in the previous section we consider an average
delay constraint, but instead of a general noise-free network we consider point-topoint communication over a channel that introduces errors in the transmissions.
We focus on the binary erasure channel (BEC) with feedback and examine the
bandwidth required to meet an average delay constraint, where bandwidth is
87
1
Overhead (nats/slot)
10
Continuous arrivals
0
10
Discrete arrivals
for decreasing T
-1
10
-2
10
-1
0
10
10
Average Delay (s)
Figure 5.2: Lower bounds on protocol overhead for discrete and continuous arrivals of average rate λ = 0.9 messages/second.
defined as the number of channel uses per second. Our results suggest that if an
application requires low average delay, transmission over the BEC with feedback
may require more protocol overhead than what is required for an error-free network
with the same average delay, such as that considered in [52]. However, the stability
region is the same in both cases, indicating that for sufficiently large average delays
any additional protocol overhead becomes negligible.
5.3.1 System Model
Consider the communication system depicted in Figure 5.3. Messages M1 , M2 , . . .
arrive at the encoder at times K1 , K2 , ... and are encoded and sent to the destination over a BEC with erasure probability . Let Mi = (Mi , Ki ) denote the ith
message-arrival time pair. Every 1/R seconds the encoder outputs a channel sym-
88
Bursty
Source
(M, K)
Sequential
Encoder
X
Y
BEC
Sequential
Decoder
(M̂, K̂)
Destination
Figure 5.3: Block diagram for the transmission of a bursty source
over a binary erasure channel with feedback.
bol X ∈ {0, 1}, and the decoder receives Y ∈ {0, 1, e} according to the memoryless
channel law
Y=
X, with probability 1 − ,
e,
(5.11)
with probability .
Upon transmission of a channel symbol the output of the BEC is immediately
available to the decoder and, via a noise-free feedback link, also to the encoder.
The corresponding channel capacity is therefore R(1 − ) bits/second and can be
achieved by sequential retransmission of uniformly distributed bits until each is
successfully received.1
The decoder uses the sequence of received symbols Y1 , Y2 , . . . to form an estimate M̂1 , M̂2 , . . . of the message sequence. It is assumed that the decoder reconstructs each message symbol as soon as possible, and these decoding times are
denoted as K̂1 , K̂2 , .... The delay for message Mi is a random variable given by
Di = K̂i − Ki > 0, and the expected delay averaged over all messages is defined as
k
1X
d := lim
E [Di ] ,
k→∞ k
i=1
(5.12)
when the limit exists.
We now give some required definitions.
1
This fact and other properties of the binary erasure channel with immediate error-free
feedback are further discussed in [15, Section VIII].
89
Definition 5.1. A bursty source S is characterized by the following:
• A continuous-time counting process N(t) specifying the number of messages
generated up to and including time t.
• A message alphabet M that is finite or countable, and an associated distribution on message symbols PM (m), m ∈ M.
The definition of an encoder-decoder can be made similar to the standard
form if we consider the input to the encoder to be a super-symbol that is a collection of all messages received over the last 1/R seconds. Message ordering must
be preserved at the decoder, as the encoder also needs partial knowledge of the
timing information. We compute the delay between message arrivals and their
reconstructions, so it is natural to include exact timing information in the ‘supersymbols.’ This also gives the encoder sufficient information to determine message
orderings.
Definition 5.2. A super-symbol U of rate R for the source S is a collection of
messages and their corresponding arrival times over a duration of 1/R seconds
corresponding to the time interval between two consecutive channel uses. Note
that if there are no message arrivals during a given time interval, then the corresponding super-symbol is U = ∅. If the alphabet for a message-arrival time pair
is M = M × R+ , then the alphabet for U is U = ∅ ∪ M∗ , where X∗ is defined as
S
k
X∗ := ∞
k=1 X .
A generic sequential encoder for the BEC with feedback outputs a channel
symbol at each time interval that is a function of all previous message supersymbols and all previous received channel symbols. The corresponding generic
sequential decoder outputs message reconstructions as soon as possible, but has
90
access to the entire history of received channel symbols each time it forms a
reconstruction.
Definition 5.3. A sequential encoder-decoder (F, G, R) for the sequence of supersymbols U1 , U2 , . . . of rate R consists of
• A sequence of encoder mappings
Fi : Ui × Yi−1 → {0, 1},
i = 1, 2, . . .
(5.13)
• A sequence of decoder mappings
Gi : {0, 1, }i → U,
i = 1, 2, . . .
(5.14)
Note that in (5.13) and (5.14) i is a time index, not the block length. Again,
the output of the encoder at time t = i/R is a function of all messages that arrive
up to time t and all previously received channel symbols, i.e., Xi = Fi (Ui1 , Y1i−1 ),
and the output of the decoder at time t = i/R is a function of all channel symbols
received up to that time, i.e., Ũi = Gi (Y1i ). The definitions allow the encoder
and decoder’s memory to go to infinity, but they do not concern delay. Note that
the output of Gi is not a reconstruction of Ui specifically, but is an arbitrary
super-symbol containing a set of message reconstructions. Furthermore, the set of
message reconstructions contained in Ũi need not be the same set as that contained
in any particular Ui .
The key difference between the definition of a sequential encoder-decoder given
here and the standard information-theoretic encoder-decoder definition is that
here encoders and decoders are indexed by time but not by message arrivals. One
purpose of defining super-symbols is to allow the encoding and decoding functions
91
to be indexed by time and super-symbol arrivals. This problem does not arise in
the standard information-theoretic context because time indexing and message
arrival indexing are equivalent. Note that the latter scenario corresponds to a
‘backlogged’ queue, where the encoder always has messages to send. However, if
the queue is not backlogged the Physical Layer does not automatically differentiate
between meaningful packet data and source idle times when the encoder has no
useful (message) information to send.
Definition 5.4. A rate-delay pair (R, d) is said to be achievable if there exists an
(F, G, R) sequential encoder-decoder for which
k
1X
E [Di ] ≤ d,
lim sup
k→∞ k i=1
(5.15)
and M̂k = Mk with probability 1, for k = 1, 2, . . .. The rate-delay region for a
source-channel pair is the closure of the set of achievable rate-delay pairs (R, d).
Definition 5.5. A rate R is said to be achievable if for some d < ∞ the ratedelay pair (R, d) is achievable. The stability region for a source-channel pair is
the closure of the set of achievable rates.
Throughout the remainder of the section, we focus on a particular source
model that highlights the issue of burstiness while keeping the analysis tractable.
Specifically, we assume the source S satisfies:
• Messages are memoryless uniform bits, i.e., Mk ∈ {0, 1} are i.i.d. with
PM (0) = PM (1) = 1/2.
• Message arrivals are Poisson, i.e., N(t) is the counting process for a homogeneous Poisson process with rate parameter λ. Therefore, the number of mes-
92
sages in any two disjoint intervals of 1/R seconds, N(t) = N(t)−N(t − 1/R),
are independent Poisson random variables with parameter λ.
5.3.2 Outer Bounds on Rate-Delay Tradeoff
In this section we give two independent outer bounds on the achievable ratedelay region, one using information theory and the other based on queueing theory.
Despite the difference in tools, each of the bounds are active in some rate-delay
regime for any value of .
Rate-Distortion Bound
Consider the mutual information between N message-arrival time pairs at
the encoder and their corresponding reconstructions at the decoder given by
I(MN ; M̂N ) = I(MN , KN ; M̂N , K̂N ). Because messages and arrival times are mutually independent, we have
I(MN , KN ; M̂N , K̂N ) ≥ I(MN ; M̂N ) + I(KN ; K̂N ).
(5.16)
The channel can support at most 1 − bits per channel use on average. Following
[52], the number of channel uses per second R required for reliable communication
over the BEC is therefore bounded by
1
1
N
N
N
N
I(M ; M̂ ) + inf I(K ; K̂ ) ,
R≥
lim inf
PN
1 − N →∞ N
(5.17)
where the infimum is over the set of joint distributions PN satisfying the average
delay constraint given in (5.12) and the conditions on the marginal distributions
from Section 5.1.
93
The first term in the argument of the lim inf in (5.17) is the average mutual rate
between messages and their reconstructions and therefore I(MN ; M̂N )/N = λ = 1.
Using the main result from [52], which states that
R(d) ≥ log2 (1 − e−λd ),
(5.18)
leads to the following outer bound on the achievable rate-delay region over the
erasure channel with feedback:
R≥
1 λ − log2 (1 − e−λd ) .
1−
(5.19)
Genie-Aided Bound
Consider an encoder that retransmits messages as they arrive, uncoded, until
each is successfully received, and sends an arbitrary channel symbol when there
are no messages to send. In our model, the decoder would have no way of determining which successfully received channel symbols correspond to valid messages
and which do not contain any information about the message sequence. However, if the decoder has access to side information indicating the position of valid
messages within the sequence of received channel symbols, it can losslessly reconstruct each message. Furthermore, as mentioned earlier, symbol retransmission
achieves capacity for the BEC with feedback while minimizing average delay. Accordingly, the genie-aided encoder-decoder does at least as well as any sequential
encoder-decoder, so the genie-aided rate-delay region outer bounds the (non-aided)
rate-delay region. It is interesting to note that the information supplied by the
genie to the destination represents the additional protocol information required
for decoding due to source burstiness.
94
Observe that the genie-aided encoder’s message queue can be modeled as an
M/G/1 queue with Poisson arrivals of rate λ and geometrically distributed service
times with mean service rate R(1 − )/. Because of the slotted channel model,
unlike for an M/G/1, a message arriving to an empty queue must wait until
the next channel slot to begin transmission (service). In fact, regardless of the
queue length, because of the slotted channel, each message waits an average of
1/2R seconds before entering the encoder’s queue. Therefore, the expected delay
for the genie-aided encoder-decoder is the sum of the average slotted delay and
the expected time spent passing through an M/G/1 queue. Using the PollaczekKhinchin (P-K) formula [1] to evaluate the latter expectation, we arrive at the
following bound on the achievable rate-delay region:
d≥
1
λ(1 + )
+
+
.
2R(1 − ) [R(1 − ) − λ] R(1 − ) 2R
(5.20)
Outer Bound Comparison
The P-K formula gives the expected throughput-delay tradeoff for a packetized
model. However, it fails to capture the transfer of protocol information beyond
what is contained in messages. For example, the encoder must somehow inform
the decoder of when a valid message is being sent over the channel. This can
be seen by observing that for certain average delays (or correspondingly channel
uses per second), the Gallager bound is strictly tighter than the P-K bound. For
simplicity, consider = 1 and λ = 1. The P-K bound in (5.20) simplifies to
d ≥ 1/[2(R − 1)] and (5.19) becomes R ≥ 1 − log2 (1 − e−d ). By comparing both
√
bounds we see that for values of average delay satisfying (1 − e−d )d > 1/ 2, i.e.,
d ∈ (0.206, 1.68), the Gallager bound is strictly tighter than the P-K bound. This
can be seen graphically in Figure 5.4.
95
Rate R (channel uses/s)
10
Inner Bounds
N=1
N=2
Genie-Aided
Bound
1
0.1
N=4
N=8
Gallager Bound
N = 16
1
10
Expected Delay (s)
100
Figure 5.4: Outer bounds (dashed) on the rate-delay tradeoff and
achievable regions (solid) for λ = 1 and an erasure probability of
= 0.1.
5.3.3 Inner Bounds on Rate-Delay Tradeoff
In this section we give an inner bound on the achievable rate-delay region by
considering a family of simple protocols that are practically motivated. We also
show that as intuition suggests, the stability region is {λ : λ < R(1 − )}.
The particular encoding strategy maps a fixed number of messages N to a
variable number of channel symbols B. The encoder waits to accumulate N messages, then sends a start bit of ’1’ to indicate the beginning of a transmission
followed by each of the N messages sequentially. A block diagram for the fixed
source sample-variable rate encoder (FV) is depicted in Figure 5.5, showing the encoder’s architecture as a Message Queue cascaded with a Transmit Buffer.
The
number of messages in the Message Queue (besides the start bit of ‘1’ in the darker
box) is denoted as QM , and the number of symbols in the transmit buffer is QT .
96
Message Queue
Transmit Buffer
QT = 0
0
X
1
M
QT > 0
QM = 2
QT = 1
Figure 5.5: Fixed-to-variable length source coding strategy.
The block size N is a design parameter that is chosen in advance and known to
both the encoder and decoder. The encoding operation of the FV coder of block
size N is described as follows:
1. As messages arrive they enter the Message Queue sequentially.
2. If QM ≥ N , the first N messages in the Message Queue, in addition to the
start bit of ‘1’, are transferred to the Transmit Buffer.
3. Every t = i/R seconds, the encoder must output a channel symbol. If the
Transmit Buffer is empty (QT = 0), it outputs a ‘0’, otherwise it continuously retransmits the rightmost symbol in the Transmit Buffer until it is
successfully received by the decoder. Upon successful transmission, the bit
is removed from the Transmit Buffer.
The corresponding decoder operates as follows:
1. Wait until a start bit of ‘1’ is successfully received.
2. Declare the next N channel symbols that are successfully received to be the
next N message estimates.
3. Go to step 1.
97
Note that as long as QM and QT remain bounded, the decoder will be able to
perfectly reconstruct the message sequence in some finite amount of time. We
examine the performance of FV coders for fixed N , in addition to the infimum
over all N , i.e., over the class of all FV coders for N = 1, 2, . . ..
If the encoder always has messages to send (is backlogged), the block size N
determines the amount of protocol overhead per message. N can be chosen large
in order to minimize this overhead, but in doing so the encoder must wait longer
to accumulate a sufficient number of messages, thereby increasing message delays.
Consequently, there is a tradeoff between the required data rate and the average
delay experienced by messages. The performance of FV codes for several block
sizes are shown in Figure 5.4, along with the ‘Gallager’ and ‘Genie-Aided’ outer
bounds.
5.3.4 Stability Region
Heuristically, as R → λ+ the queue lengths become large, and the Transmit
Buffer is rarely empty. In this case B & N + 1, the encoder is always sending
useful information, and
N +1
FV channel uses
.
→
E
source symbol
N
(5.21)
To approximate d as R → λ+ , consider R∗ . λ[(N + 1)/N ] as the average service
time of an M/D/1 queue, and apply the P-K formula [1] to obtain
R→1+
d ≈
N +1
1
1
+
+
,
2(N R − N − 1) 2N R 2R
98
(5.22)
where for simplicity we take λ = 1. For any practical scheme N must be finite,
but to compute the closure of achievable rates with FV coding we must consider
performance as N → ∞. Taking the limit of (5.22) as N → ∞ gives an asymptotically achievable R for large N with an average delay of
R→1+
lim d ≈
N →∞
1
1
+
,
2(R − 1) 2R
(5.23)
as long as the limit exists. Thus it seems that for any R > λ, N can be chosen
sufficiently large such that the expected delay remains bounded and R is (by
definition) achievable. This statement is made more formally in the following
theorem.
Theorem 5.2. The closure of the set of achievable rates under FV coding is
cl (RFV ) = {R : R ≥ λ/(1 − )}.
(5.24)
Proof: The proof is given in Appendix C.3.
In this section we examined the transmission of a bursty source over the binary
erasure channel with feedback subject to an expected delay constraint. Inner and
outer bounds on the achievable rate-delay region were obtained, and the stability
region was found to be identical to that for the typical packetized model. Our
results suggest that, although the stability region is not reduced, there is unavoidable overhead associated with the lossy encoding of queue idle times, something
that is often overlooked.
The protocols used in this section are inherently limited to the BEC with
feedback, and therefore inner bounds for more general noisy channels will require
alternative techniques. However, our results illustrate that even for the binary
99
erasure channel with feedback, satisfying a delay constraint over a noisy channel
may require additional protocol overhead beyond what is necessary in a noise-free
network.
100
CHAPTER 6
SECRECY IN TIMING CHANNELS
In this chapter we investigate the idea of providing information-theoretic security at the Network Layer by exploiting the timing information resulting from
queueing of packets between a source, an intended receiver, and other users in a
network. Specifically, we consider the secure transmission of messages by encoding
them onto the interarrival timing of packets that enter parallel queues.
We leverage recent results on the secrecy capacity of arbitrary wiretap channels
to obtain achievable secrecy rates. We also show that equivalent secrecy rates can
be achieved using a deterministic encoding strategy, which provides an example
contrasting the fact that for many memoryless channels a stochastic encoder is
required to achieve non-zero secrecy rates.
This chapter is organized as follows. Section 6.1 gives a high level introduction
to the chapter and Section 6.2 describes the specific system model we consider.
Section 6.3 discusses the need for stochastic encoding to achieve secure communications. Section 6.4 gives the main capacity results of the paper for stochastic and
deterministic encoders. Finally, Section 6.5 gives conclusions and other comments.
6.1 Motivation
In a network, security is often provided by higher-level layers using encryption
that relies on the intended receiver having access to information that other users
do not, such as a secret key or some form of common randomness. More recently,
101
physical-layer techniques for providing secure communication have emerged, stemming from the wiretap channel initially studied in [82], where differences between
the signals seen by an intended receiver and any third parties can be exploited to
provide security.
In this chapter we consider secure communication over the wiretap timing
channel in which a single source node in a network encodes information using the
time at which packets are injected into two parallel queues. Messages are reliably
decoded by the intended receiver using the interdeparture times of packets leaving
the first queue, while a wiretapper does not obtain any information about messages
from its observation of the interdeparture times at the second queue.
Despite the architectural simplicity of this model, it exhibits several interesting
characteristics. In certain regimes the full capacity of the channel is achievable in
secrecy, even when the capacity of the wiretapper’s channel is non-zero. Therefore, in some circumstances there is no cost in terms of capacity to meeting an
additional constraint on secrecy. Additionally, in order to highlight the interesting
characteristics of the wiretap timing channel, we show that stochastic encoding is
required to achieve non-zero secrecy rates over discrete memoryless channels. The
channel model we considered provides a counterexample to this usual notion that
a stochastic encoder is required to provide information-theoretic security.
It is interesting to note the similarities between the wiretap timing channel
we consider and work on covert communication over timing channels [83]. In
general, covert communication refers to sending information by exploiting some
system phenomenon that was not explicitly intended for communications [84]. For
example, using the timing of packets or affecting the service time of processes in
a multithreaded environment to convey information from one process to another.
102
Such communication is usually itself considered an abuse of the system, and the
goal is to prevent it from occurring. Any level of security this scenario affords is
contingent upon ignorance of the benevolent parties involved. Once the method of
covert communication is discovered it can often be prevented, although sometimes
at the expense of reduced system performance, such as increased delay or decreased
throughput.
In contrast to the canonical covert timing channel, timing information can be
used for secure message exchange over the wiretap timing channel even if all of
the involved parties are explicitly aware that communication is taking place. At a
minimum, this requires that the wiretapper does not receive the exact same signal
as the intended receiver. Despite this fundamental difference, the similarities
between covert timing channels and the model considered in this chapter are
interesting to note.
The main contributions of this chapter are:
• Achievable secrecy rates obtained using information-spectrum methods and
stochastic encoding;
• A necessary and sufficient condition for a deterministic encoder to achieve
non-zero secrecy rates;
• A secrecy rate region achievable using deterministic coding.
The next section presents the system model used throughout the chapter.
6.2 System Model
6.2.1 Channel Model
We consider secure communication through a simple network of two parallel
single-server queues as depicted in Figure 6.1. Information is encoded using the
103
Intended Receiver's Queue
µ1
(D1 , . . . , Dn )
(A1 , . . . , An )
µ2
(E1 , . . . , En )
Wiretapper's Queue
Figure 6.1: Secure communication over parallel queues.
arrival times for packets that enter both the top and bottom queues simultaneously, called the main queue and the wiretap queue, respectively. The intended
receiver observes the sequence of packet departure times from the main queue,
which has an average service rate of µ1 packets/s; the wiretapper observes the
sequence of packet departure times from the wiretap queue, which has an average
service rate of µ2 packets/s. The service rates of both queues are deterministic
and known to everyone.
It is assumed that packets do not carry any information or, equivalently, that
they contain some deterministic data, such as a source identifier. The generalization to information-bearing packets (possibly corrupted with noise) is straightforward from [85, Section IV] and standard results on the secrecy capacity of discrete
memoryless channels. The equivalent discrete-index channel model takes a vector
of n non-negative interarrival times as input, denoted An := (A1 , . . . , An ), such
P
that the kth packet enters both queues at time ki=1 Ai . Similarly, the output of
each channel is a vector of n non-negative packet interdeparture times, denoted as
Dn and En for the outputs of the main queue and wiretap queue, respectively, and
104
such that the departure time for the kth packet at the main (wiretap) channel is
Pk
Pk
i=1 Di (
i=1 Ei ).
The service time for the ith packet in the main (wiretap) queue is denoted
as Si (Ti ). The service times are mutually independent of each other, and of An ,
Di−1 , and Ei−1 . The waiting time of the ith packet is the time elapsed between the
(i − 1)th departure and the ith arrival, and is denoted Wi (Vi ). If the ith packet
arrives before service of the (i−1)th packet completes, then Wi = 0 (Vi = 0). More
explicitly, by letting [x]+ := max(0, x), the ith waiting times can be expressed as
"
Wi =
i
X
j=1
Aj −
i−1
X
#+
Dj
,
Vi =
j=1
" i
X
j=1
Aj −
i−1
X
#+
Ej
,
(6.1)
j=1
which are deterministic, causal, time-varying functions of Ai and Di−1 (Ei−1 ), and
have memory. This definition of waiting times times enables us to write single
letter expressions for the interdeparture times,
Di = Wi + Si
and Ei = Vi + Ti ,
(6.2)
which lead to a Markov relationship that will be used later:
Di ←
→ Wi ←
→ (An , Di−1 ) ←
→ (An , Di−1 , Ei−1 ) ←
→ (An , Ei−1 ) ←
→ Vi ←
→ Ei .
(6.3)
In contrast to “traditional” channel models, the dynamics of the single-server
queue are distinctive in that the resulting discrete-time channel: (i) has memory,
(ii) has outputs that depend on inputs in a non-linear fashion, and (iii) is nonstationary. Because of these characteristics, it is useful to exploit informationspectrum methods [54], which are now briefly reviewed to present our notation.
105
6.2.2 Information Spectrum Methods
Consider random variables X ∈ X and Y ∈ Y, where X and Y are continuous
alphabets. Let x and y denote sample values for X and Y, and let pX,Y (x, y),
pX (x), and pY (y) denote the joint and marginal densities, respectively. The mutual
information between the random variables X and Y is the random variable [86–88]
I(X; Y) := log
pX,Y (X, Y)
.
pX (X)pY (Y)
(6.4)
We refer to the expectation of the mutual information random variable taken over
the joint distribution pX,Y (x, y) as the average mutual information, and denote it
by
I(X; Y) := E [I(X; Y)] =
XX
pX,Y (x, y) log
x∈X y∈Y
pX,Y (x, y)
.
pX (x)py (y)
(6.5)
n ∞
n
n
Given two sequences of random variables {Xn }∞
n=1 and {Y }n=1 , where X ∈ X
and Y n ∈ Yn , the mutual information spectrum and rate-mutual information spectrum are defined as the probability distribution of the random variables I(Xn ; Y n )
and n1 I(Xn ; Y n ), respectively. Also, the spectral sup-mutual information rate and
spectral inf-mutual information rate are defined as [54]
1
1
n
n
n
n
I(X ; Y ) < β = 0
p-lim inf I(X ; Y ) := sup β : lim P
n→∞
n
n
n→∞
(6.6)
1
1
n
n
n
n
p-lim sup I(X ; Y ) := inf α : lim P
I(X ; Y ) > α = 0 ,
n→∞
n
n
n→∞
(6.7)
and
respectively. Although the p-lim inf and p-lim sup in (6.6) and (6.7) are always
well-defined, they are often difficult to compute. However, previous work on the
106
capacity of single-server queues [85] has shown that, for certain input processes
and service distributions, it is possible to bound these quantities, and in some cases
determine them exactly. As we will see in Section 6.4, these results enable us to
develop meaningful insights for the wiretap timing channel under consideration.
6.2.3 Codes, Capacity, and Secrecy Capacity
In this section we give definitions for reliable communication over a queueing
channel that are consistent with those given in [85], and then we extend these
definitions to incorporate a secrecy constraint.
Definition 6.1. An (n, Mn , Tn , n ) timing code consists of the following:
• a message set Mn = {1, . . . , Mn }, from which a random variable M is drawn
uniformly;
• an encoding function ϕn : Mn → An that maps messages onto codewords,
each of which is a vector of n non-negative interarrival times (a1 , . . . , an ),
P
where the kth arrival occurs at time ki=1 ai ;
• a decoding function ψn : Dn → Mn that, upon observation of all n departures
from the queue, selects a codeword with average probability of error n :=
P [ψn (Dn ) 6= M]; and
• the nth departure from the queue occurs on average no later than Tn ;
where the probability and expectation are computed with respect to the uniform
message choice and the queue distribution. The rate of the timing code is defined
as rn :=
1
Tn
log Mn .
Definition 6.2. A rate R is achievable if there exists a sequence of (n, Mn , Tn , n )
107
timing codes such that
lim inf rn ≥ Rs ,
n→∞
and
lim n = 0.
n→∞
The capacity, denoted C, is the supremum of the set of achievable secrecy
rates.
Definition 6.3. A rate R is achievable at output rate λ if it is achievable using a
sequence of (n, Mn , n/λ, n ) timing codes. The capacity at output rate λ, denoted
C(λ), is the supremum of the set of rates that are achievable at output rate λ.
Using these definitions, it was shown in [85] that the capacity of a single server
queue with average service rate µ can be computed as
C = sup C(λ).
(6.8)
λ<µ
After giving the corresponding definitions for communication subject to a secrecy
constraint, we will show that the secrecy capacity of wiretap timing channels can
be computed in a way analogous to (6.8).
Definition 6.4. An (n, Mn , Tn , n , δn ) wiretap timing code consists of an (n, Mn , Tn , n )
timing code where:
• the encoding function ϕn : Mn → An may be stochastic, in which case it is
characterized by a transition probability pAn |M (an |m);
• the average probability of error n and the average time of the nth departure are computed with respect to the uniform message choice, the queue
distribution, and the stochastic encoding function;
• the level of secrecy with respect to the output of the wiretap queue En is
characterized by δn := n1 I(M; En ); and
108
• the nth interarrival time satisfies
"
n−1
1 X
an = Tn − −
ai
µ i=1
#+
,
(6.9)
such that the nth arrival occurs exactly at time Tn − 1/µ.
If the encoding function ϕn is deterministic, then the wiretap timing code is
called a deterministic wiretap timing code. Also, note that the constraint on an
given in (6.9) is a technical condition that simplifies some of the proofs, but it
does not impact the resulting secrecy rates.
Definition 6.5. A secrecy rate Rs is achievable if there exists a sequence of
(n, Mn , Tn , n , δn ) wiretap timing codes such that
lim inf rn ≥ Rs ,
n→∞
lim n = 0,
n→∞
and
lim δn = 0.
n→∞
The secrecy capacity, denoted Cs , is the supremum of the set of achievable
secrecy rates.
Definition 6.6. A rate R is achievable in secrecy at output rate λ if it is achievable in secrecy using a sequence of (n, Mn , n/λ, n , δn ) wiretap timing codes. The
secrecy capacity at output rate λ, denoted Cs (λ), is the supremum of the set of
rates that are achievable in secrecy at output rate λ.
6.3 Stochastic vs. Deterministic Coding for Secrecy
In this section we demonstrate the connection between the need for stochastic
encoding to achieve secrecy and the structure of the channel. These results help
109
illustrate the characteristics of the wiretap timing channel that are different from
those of discrete memoryless channels.
Proposition 6.1. Consider an arbitrary channel (An , pEn |An , En ). For any >
0 and for sufficiently large n, a deterministic encoder that satisfies the secrecy
constraint limn→∞ n1 I(M; En ) = 0 must satisfy
1
I(An ; En ) ≤ .
n
(6.10)
Furthermore, codes generated according to a random coding argument for use in
a deterministic encoder cannot satisfy the secrecy constraint when used for communication over a discrete memoryless channel.
Proof: In order to satisfy the secrecy constraint, for any > 0 and for sufficiently
large n, the inequality n1 I(M; En ) ≤ must be satisfied. Using the chain rule for
mutual information to expand I(M; En ) and the Markov chain M ←
→ An ←
→ En
we have
n ≥ I(M; En ) = I(An ; En ) + I(M; En |An ) − I(An ; En |M)
(6.11)
= I(An ; En ) − I(An ; En |M)
(6.12)
= I(An ; En ) − H(An |M) + H(An |M, En ).
(6.13)
Using the non-negativity of average entropy and the fact that H(An |M) = 0 for a
deterministic encoder leads to (6.10).
Consider a discrete memoryless channel (A, pE|A , E) over which a user communicates with a (2nR , n) code Cn . For rates below capacity it is clear that the
mutual information between the input and output of the channel is proportional
to the blocklength n. In the following lemma we show that even for rates beyond
110
capacity, that is R > I(A; E), if the code Cn stems from an ensemble of codes
generated randomly according to some distribution pA , with overwhelming probability, Cn is such that the mutual information between codewords An and channel
outputs En is on the order of n I(A; E). In other words, the mutual information
scales linearly with the block length.
Lemma 6.1. Consider a discrete memoryless channel (A, pE|A , E). Let Cn be the
random variable representing a (2nR , n) code with rate R > I(A; E) and codewords
generated i.i.d. according to a distribution pA . Then, for any > 0,
lim PCn[I(An ; En |Cn ) ≥ n(I(A; E) − )] = 1.
n→∞
(6.14)
The proof of Lemma 6.1 leverages results on channel resolvability [54, 89]
and is given in Appendix D.1. The second part of Proposition 6.1 follows from
Lemma 6.1 and the necessary condition for a deterministic encoder to guarantee
secrecy given in (6.10).
In the next two sections we determine secrecy rates for the wiretap timing channel using stochastic and deterministic encoders. One consequence of
Proposition 6.1 is that, in general, the binning structure used in wiretap codes
is required to guarantee secrecy over discrete memoryless channels. We show that,
in contrast to the case for discrete memoryless channels, non-zero secrecy rates
can be achieved over the wiretap timing channel using deterministic encoding.
6.4 Secrecy Rates for Parallel Queues
This section presents the main capacity results of the chapter. First we show
that, analogous to reliable communication over timing channels, secrecy capacity
111
is equal to the supremum of the secrecy capacity at output rate λ for λ < µ1 . Then
we determine achievable secrecy rates for stochastic encoding. Finally, we show
that the same set of secrecy rates are achievable using a deterministic encoding
strategy, an achievable rate region that we believe corresponds to the secrecy
capacity with deterministic encoding.
We focus on the continuous time queues studied in [85], however the results
can be extended to both of the discrete time queuing models considered in [90]
and [91]. Throughout this section we assume that queues are initially empty. As
shown in [85] for the continuous-time case and [90] for the discrete-time case, this
assumption is not critical and timing capacities with queues initially in equilibrium
are no more than timing capacities with queues initially empty.
6.4.1 Secrecy with Stochastic Encoding
Proposition 6.2. The secrecy capacity of parallel ·/G/1 queues satisfies
Cs = sup Cs (λ),
(6.15)
λ<µ1
where Cs (λ) is parameterized by µ1 and µ2 , which are the service rates of the main
and wiretap queues, respectively.
Proof:
The proof is given in Appendix D.2 and follows from [85, Appendix II]
with modifications to account for the secrecy constraint.
112
e−1 µ1
Rate (nats/s)
Rs (µ2 )
Overhead
Cost
C(λ) = λ log
µ1
λ
µ1
e−1 µ1
µ2
Service Rate of Eavesdropper’s Queue, µ2
Figure 6.2: Achievable secrecy rates for parallel queues along with
C(λ), the timing capacity of the main channel at output rate λ, shown
for reference. The service rate of the main queue is µ1 packets/s, and
the service rate of the wiretap queue is µ2 packets/s.
Theorem 6.1. The secrecy capacity of parallel ·/M/1 queues satisfies
C = e−1 µ1 ,
µ2 /µ1 < e−1 ,
s
Cs ≥ µ2 log µµ21 , e−1 ≤ µ2 /µ1 < 1,
C = 0,
µ2 /µ1 > 1,
s
(6.16)
where µ1 and µ2 are the service rates of the main and wiretap queues, respectively,
and the capacity is given in nats/s.
Proof: From Proposition 6.2 and an expression for the secrecy capacity of an arbitrary wiretap channel given in [92] (also in [93] for discrete channels) in nats per
channel use, the secrecy capacity of the wiretap timing channel can be expressed
113
in nats per second as
1
1
n
n
n
n
Cs = nmax
λ p-lim inf I(U ; D ) − p-lim sup I(U ; E ) ,
{U ,An }∞
n
n
n→∞
n→∞
n=1
(6.17)
n
where {Un , An }∞
→ An ←
→ Dn En and λ is the average number
n=1 is subject to U ←
of channel symbols received1 by the intended receiver per second under the input
process {Un , An }∞
n=1 . Since all rates satisfying
1
1
n
n
n
n
Rs < max
λ p-lim inf I(A ; D ) − p-lim sup I(A ; E )
{An }∞
n
n
n→∞
n→∞
n=1
(6.18)
are achievable, a lower bound on secrecy capacity can be obtained by computing
(6.18) for a specific input process {An }∞
n=1 . Restricting ourselves to ·/M/1 queues
driven by a homogenous Poisson process of rate λ we have [85]
h
µ1 i +
1
I(An ; Dn ) = log
n
λ
(6.19)
h
µ2 i+
1
p-lim sup I(An ; En ) = log
,
n
λ
n→∞
(6.20)
p-lim inf
n→∞
and
where the [·]+ operators ensure that (6.19) and (6.20) are zero for λ > µ1 and
λ > µ2 , respectively.
Using (6.19) and (6.20) in (6.18) along with Burke’s output theorem for the
1
See [85, Appendix I] for a discussion on the important distinction between normalizing by
the average rate at which channel symbols are received versus the average rate at which they
are sent.
114
M/M/1 queue we have that all rates satisfying
Rs <
λ log µ1 , λ < µ2 < µ1
µ2
(6.21)
λ log µ1 , µ2 < λ < µ1
λ
for any value of λ < µ1 are achievable. In order to maximize the corresponding
bound on Cs , first note that λ log µλ takes its maximum value of e−1 µ at λ = e−1 µ.
Thus, for µ2 /µ1 < e−1 we can choose λ = e−1 µ1 and obtain a maximum in
(6.19) and minimum in (6.20) simultaneously. This corresponds to the timing
capacity of the the main channel and thus in this regime the bound is tight. For
e−1 < µ2 /µ1 < 1, (6.21) is tightest for λ = µ2 , where we have Cs ≥ µ2 log µµ12 , as
desired.
The bound on Cs given by Theorem 6.1 is shown in Figure 6.2 as a function
of µ2 /µ1 . Interestingly, since (6.19) is also the timing capacity of the main channel without secrecy, for µ2 /µ1 < e−1 there is no cost of meeting the additional
constraint of communicating in secrecy.
For all values of µ2 /µ1 the bound is tightest for a value of λ ≥ µ2 , in which case
the wiretap queue becomes unstable. This suggests trying a strategy of simply
“overloading” the wiretap queue by using an input process with average arrival
rate exceeding the queue’s average service time. At least intuitively, one might
expect that such a condition would result in asymptotic independence between An
and En , and thus provide some level of secrecy. Furthermore, if the asymptotic
independence of the channel’s input and output is sufficient to guarantee secrecy
on its own, a stochastic encoder would not be required.
115
6.4.2 Secrecy with Deterministic Encoding
The intuition developed at the end of the previous section is confirmed by the
following theorem, which states that a secrecy rate region identical to the one given
in Theorem 6.1 is achievable using a deterministic encoder that “overloads” the
wiretap queue. Although Theorem 6.1 only applies to ·/M/1 queues, the bound
in Theorem 6.2 holds for a generic service distribution.
In Section 6.3 we highlighted the connection between the channel characteristics and the need for stochastic encoding. For a stochastic encoder, the conditional
entropy H(An |M) is non-zero, and thus it is possible to satisfy the constraint given
by (6.10) in Lemma 6.1 without requiring that the mutual information over the
wiretap queue goes to zero. Stochastic encoding introduces memory in the channel and, in some sense, provides an additional degree of freedom whereby we do
not have to rely exclusively on the channel for secrecy.
In the following theorem we obtain the secrecy capacity for deterministic
encoding and show that it is identical to the achievable secrecy rates given by
Theorem 6.1, which are for a stochastic encoder. This suggests that for the wiretap timing channel the additional flexibility provided by stochastic encoding may
not be explicitly required.
Theorem 6.2. The secrecy capacity of parallel ·/G/1 queues using a deterministic
encoder satisfies
C = e−1 µ1 ,
µ2 /µ1 < e−1 ,
s
Cs ≥ µ2 log µµ21 , e−1 ≤ µ2 /µ1 < 1,
C = 0,
1 ≤ µ2 /µ1 ,
s
(6.22)
where µ1 is the average service rate of the main queue, µ2 is the average service
rate of the wiretap queue, and the capacity is given in nats/s.
116
Proof:
The timing capacity of the ·/G/1 queue with service rate µ1 (using a
deterministic encoder) satisfies [85]
C(λ) ≥ λ log
µ1
.
λ
(6.23)
Taking the supremum of (6.23) over λ > µ2 gives the desired form of the result.
We only need to show that a sequence of codes exists for which the condition
λ > µ2 is sufficient to guarantee n1 I(M; En ) → 0 as n → ∞, i.e., that overloading
the wiretapper’s queue can guarantee secrecy.
By the data processing inequality and the Markov relationship M ←
→ An ←
→
En , it is sufficient to show that we can construct a sequence of (n, Mn , n/λ, n )
deterministic timing codes with λ > µ2 , such that limn→∞ n1 I(An ; En ) = 0. We
make use of an equality from [85],
n
n
I(A ; E ) =
n
X
i=1
I(Vi ; Vi + Ti ) − D(PEi ,··· ,En ||
n
Y
PEi ),
(6.24)
i=1
which follows from the Markov chain in (6.3) and the chain rule for average mutual
information. By the non-negativity of D(·||·) and the single letter expression
Ei = Vi + Ti in (6.2), for any code we have that
n
1X
1
I(An ; En ) ≤
I(Vi ; Ei ).
n
n i=1
(6.25)
Next we show that a code can be constructed such that Vi converges in probp
ability to 0, denoted Vi →
− 0, and that for such a code the right side of (6.25)
117
converges to zero. From the definition of Vi in (6.1) we have that
"
#
i
i−1
1X
1X
P [Vi > ] = P
.
Aj −
Aj −
Ej >
Ej > = P
i j=1
i j=1
i
j=1
j=1
i
X
#
i−1
X
"
(6.26)
Noting that Ej ≥ Tj for all j we have
"
P
i
1X
i
j=1
Aj −
i−1
1X
i
#
Ej >
j=1
"
i
1X
≤P
i
i
j=1
Because the sequence Tj are i.i.d. service times,
to its mean
Aj >
1
i
i−1
1X
i
Pi−1
j=1
#
Tj +
j=1
.
i
(6.27)
Tj converges almost surely
1
.
µ2
Therefore, for any > 0, ν > 0, there exists an i0 (ν, ) such that
P
i
for all i ≥ i0 (ν, ) the probability that i−1
j=1 Tj is less than µ2 − i is less than ν.
Intuitively, because λ > µ2 , the gap between the sum of the arrival times and
the wiretapper’s service times in (6.27) can only increase. We show this explicitly
by concatenating a sequence of well-chosen timing codes to form a new code that
satisfies
P
"M N +k−1
X
Tj <
j=1
MX
N +k
j=1
#
Aj ≤ ν,
∀k ≥ 1.
Choose N = i0 (ν, ) as the length of a timing code CN . Let δ :=
and choose =
δ
2
(6.28)
1
µ2
−
1
λ
>0
such that
#
i
iδ
−
≤ ν,
P
Tj <
µ2
2
j=1
" i−1
X
∀i ≥ N.
(6.29)
By the definition of a timing code, the expected arrival of the nth packet averaged
over all codewords is at most
n
.
λ
Therefore, for at least half of the codewords,
the time of the nth packet arrival is less than nλ . By discarding codewords whose
P
N
last arrival exceeds nλ , we guarantee that N
j=1 Aj ≤ λ while incurring only a
118
negligible loss in rate. From the definition of δ it follows that 1/λ = 1/µ2 − δ and
we have
N
X
j=1
Aj ≤
Now let M ∈ N be such that M N δ ≥
N
− N δ.
µ2
2N
.
λ
(6.30)
Consider a new timing code CM N
obtained by concatenating M times the previous length N code. The new code is
of length M N and thus we have
MN
X
j=1
Aj ≤
MN
MNδ MNδ
−
−
.
µ2
2
2
(6.31)
Therefore, if we concatenate an additional timing code, we have
MX
N +k
j=1
MNδ
MN
−
≤ (M N + k)
Aj ≤
µ2
2
1
δ
−
µ2 2
,
∀k ∈ {1, . . . , N }, (6.32)
where the first inequality follows from our choice of M such that
N
λ
≤
MNδ
.
2
By
combining (6.29) and (6.32) we have that for any ν > 0,
P
"M N +k−1
X
Tj <
j=1
MX
N +k
j=1
#
Aj ≤ ν,
∀k ∈ {1, . . . , N }.
(6.33)
We can repeat the above concatenation argument to construct a sequence of Aj
such that for any ν > 0, (6.28) is met.
By combining (6.26), (6.27), and (6.28), we have constructed a code such that
p
p
Vi →
− 0. Therefore, pEi |Vi (ei |Vi ) →
− pEi |Vi (ei |0) and pEi (ei ) → pEi |Vi (ei |0) for all e,
from which it follows that
I(Ei ; Vi ) := log
pEi |Vi (Ei |Vi ) p
→
− 0.
pEi (Ei )
119
(6.34)
Because the information density (not normalized) converges in probability to zero,
P
the average mutual information rate n1 ni=1 I(Ei ; Vi ) converges to zero (see the
proof of Lemma 1 in [92]). Using this fact in (6.25) we have that
1
I(An ; En ) = 0,
n→∞ n
lim
(6.35)
and therefore the secrecy condition is met. Finally, we can append a single packet
to the constructed code such that (6.35) still holds and the constraint given in
(6.9) of Definition 6.4 is satisfied. For any λ > µ2 we have therefore constructed
a deterministic timing code at output rate λ that can be used for secure communication at rate λ log µλ1 , concluding the proof.
6.5 Discussion
In this chapter we considered a technique for providing information-theoretic
security at the network layer by exploiting randomness in timing information
that results from the queueing of packets between a source, an intended receiver,
and other users in a network. Despite the architectural simplicity of the model
we considered, basic queue configurations can be used as a rough approximation
for all the phenomena that induce delay in a communications network, and the
wiretap timing channel exhibits several interesting characteristics.
For certain values of queue service rates, the full capacity of the channel is
achievable in secrecy, even when the capacity of the wiretap channel is non-zero.
This means that, in some circumstances, meeting an additional secrecy constraint
incurs no loss of reliable communication rate. Furthermore, non-zero secrecy rates
are achievable over the wiretap timing channel using a deterministic encoder.
120
We considered the usual notion of weak secrecy, but it was recently shown that
secrecy capacity is identical under a variety of other secrecy metrics [92]. There is
reason to believe that for stationary memoryless channels this equivalence extends
to the usual notion of strong secrecy [94, 95]. A similar relationship may also hold
for the timing channels considered in this chapter; however, at present we do not
know if that is the case.
From an information-theoretic perspective, this chapter demonstrates how concepts typically applied to physical layer security can be used to gain insight to
security problems at the network layer. From a practical point of view, if there are
any benefits to be gained by exploiting timing information for security, protocols
could be deployed on systems that are already in use.
121
CHAPTER 7
CONTRIBUTIONS AND FUTURE WORK
This chapter summarizes the main contributions of the dissertation and gives
direction for future research. Section 7.1 summarizes our work and gives some
concluding remarks. Section 7.2 discusses a variety of directions for future research
relating to our viewpoint on overhead.
7.1 Contributions
In this dissertation we developed the perspective that overhead can be viewed
as the cost of constraints imposed on a communication system. We developed
this abstracted notion of overhead in order to allow for meaningful analysis that
is consistent with the notion that overhead is a cost on the system. Operationally,
these costs represented a reduction in system performance by constraining the
feasible set for information theoretic optimization problems.
The conceptual framework we use for studying overhead was presented in
Chapter 3. In the subsequent core chapters of the dissertation we used this framework to examine three sources of overhead that are relevant to active research
in wireless communications today: multi-user channel access (Chapter 4), delayconstrained communication (Chapter 5), and security in data networks (Chapter 6).
In Chapter 4 we considered multi-access communication systems, where we
began by reviewing existing capacity results for multiple access channels within
our overhead framework. The novel contributions of this chapter are:
122
• A new outer bound on the capacity region of the MAC with generalized
feedback that was based upon a parallel channel extension to the dependence
balance bound;
• Computation of the capacity region of the binary additive MAC with feedback and the binary additive noisy MAC with feedback under a common
distribution constraint;
• Introduction of the K-bit packet collision channel with feedback and characterization of inner and outer bounds on its capacity region for general K
which we believe are tight.
In Chapter 5 we studied protocol overhead for delay-constrained communications. The chapter builds upon Gallager’s work on protocol information [52],
which is what initially motivated our thinking for this dissertation. The main
contributions of this chapter are:
• A lower bound on protocol overhead for a discrete time arrival process that
is analogous to Gallager’s results for Poisson arrivals;
• An inner bound on the rate-delay tradeoff for communicating a bursty source
over a binary erasure channel with feedback that uses Gallager’s formulation
for protocol information;
• Various other inner and outer bounds on the rate-delay tradeoff that are
asymptotically tight for large delays.
The results on protocol overhead for a discrete time arrival process were presented
in [96], and the work on rate-delay tradeoffs for communicating a bursty source
over a noisy channel were presented in [97].
Finally, in Chapter 6 we studied information theoretic security in timing channels. The main contributions of this chapter are:
123
• Development of a novel model for secure communication over timing channels that exhibits several interesting properties;
• Inner and outer bounds on the secrecy capacity of timing channels with
stochastic encoders;
• An inner bound on the secrecy capacity for deterministic encoders which we
believe to be tight.
The model developed and studied in this chapter was first presented in [98], along
with inner and outer bounds on its secrecy capacity.
In the next section we discuss several opportunities for future work that we
identified throughout the course of this research.
7.2 Future Work
The focus of this dissertation was on developing fundamental limits on overhead for communication systems. The perspective on overhead given in this dissertation establishes the groundwork for many interesting extensions, some of which
we review here.
7.2.1 Overhead of Multiple Constraints
As highlighted by the motivating example given in Section 1.2, multiple protocols at separate layers in a communication system may interact to contribute
additional overhead to the system. Although multiple constraints can be viewed
from within the perspective given in this dissertation, translating the cost of multiple constraints into engineering guidelines for designing protocols that can interact with one another more efficiently could provide significant guidance for the
computer and data networking communities.
124
7.2.2 Constraints at Multiple Layers
Another class of problems for which a better understanding of overhead would
be useful for system designers relates to evaluating performance tradeoffs between
allocating overhead to different layers in a communication system. For example, in
fading environments there are interesting tradeoffs associated with the interaction
between improved error protection at the Physical Layer and increased retransmissions at the MAC Layer [99, 100]. Also for fading channels, an interesting
problem is the allocation of overhead bits between the source coder for enhanced
resolution of the source, and the channel coder for improved reliability during
transmission [101].
7.2.3 Design of Practical Protocols
The natural continuation of this work is to use the fundamental bounds developed in this dissertation as a guide for comparing existing protocols and designing
new ones. One example of where our work points to opportunities for improved
protocol performance relates to the use of timing information for the transmission
of information. From a security perspective, the achievable secrecy rates that we
establish for our queuing model are not sufficient to encrypt all Physical Layer
data. However, it would be interesting to consider designing protocols that use this
small amount of secure information for the exchange of keys used for encryption
at higher level layers, such as those discussed in [102, Chapter 7].
125
APPENDIX A
NOTATION & DEFINITIONS
This chapter summarizes notation used throughout the dissertation and gives
definitions for several information theoretic quantities.
A.1 Notation
Random variables and their sample values are denoted using a special font, e.g.,
a random variable X and an arbitrary sample value x. The probability distribution
of a random variable X is denoted as pX (x), or where the subscript is cumbersome
just p(x). Alphabets, and more generally sets, are denoted using a calligraphic
font, e.g., X, and the cardinality of a set X is denoted |X|.
The standard information theoretic quantities of (ensemble average) entropy,
mutual information, and divergence are denoted H(·), I(·; ·), and D(·), respectively,
which we will define in the next section. Chapter 6 makes use of information
spectrum theory, where quantities such as the mutual information random variable
are denoted similar to any other random variable, e.g., I(·; ·).
A.2 Definitions
Definition A.1. The entropy H(X) of a discrete random variable X is defined as
H(X) = −
X
p(x) log p(x).
x∈X
126
(A.1)
If (X, Y) ∼ p(x, y), the conditional entropy H(Y|X) is defined as
H(Y|X) =
X
x∈X
p(x)H(Y|X = x) = −
X
p(x)
x∈X
X
p(y|x) log p(y|x).
(A.2)
y∈Y
Definition A.2. For any two random variables X and Y with (X, Y) ∼ p(x, y),
the mutual information I(X; Y) is defined as
I(X; Y) = H(X) − H(X|Y) = H(Y) − H(Y|X).
(A.3)
Definition A.3. For a random variable X with |X| = k and p(xi ) = pi , i =
1, . . . , k, the binary entropy function h(k) (p1 , . . . , pk ) is defined as
(k)
h (p1 , . . . , pk ) = −
where pi ≥ 0, i = 1, . . . , k and
Pk
i=1
k
X
pi log2 pi ,
(A.4)
i=1
pi = 1. For shorthand notation only k − 1
arguments may be specified, in which case
h(k) (p1 , . . . , pk−1 ) = h(k) p1 , . . . , pk−1 , 1 −
k−1
X
!
pi
.
(A.5)
i=1
We denote h(2) (p, 1 − p) simply as h(p), and for a binary random variable X define
h(X) := h(Pr[X-0]).
127
APPENDIX B
PROOFS FOR CHAPTER 4
B.1 Proof of Theorem 4.3
Proof: (Achievability) A characterization of the Cover-Leung achievable rate region for the BAN-MAC-FB RCL was given in [64], where
(
RCL =
[
(u1 ,u2 )∈PCL
1
(R1 , R2 ) : R1 ≤ h(φ(2u1 )),
2
1
R2 ≤ h(φ(2u2 )),
2
)
1 − f (2u1 , 2u2
R1 + R2 ≤ h
2
(B.1)
and
PCL =
1
1
(u1 , u2 ) : 0 ≤ u1 ≤ , 0 ≤ u2 ≤
4
4
.
(B.2)
Letting w = 2u1 = 2u2 we have that all rates in the region
(
RCL,Γ =
[
w∈[0,1/2]
1
(R1 , R2 ) : R1 ≤ h(φ(w)),
2
1
R2 ≤ h(φ(w)),
2
)
1−w
R1 + R2 ≤ h
2
(B.3)
are achievable for the BAN-MAC-FB using common distributions.
Proof: (Converse) Consider two parallel channels Z = X1 and Z = X2 that result
in dependence balance bounds RDB1 and RDB2 , respectively. For both of these
128
channels the right side of the dependence balance constraint given by (4.32) is 0.
Therefore, I(X1 ; X2 |S) = 0 and we only need to consider conditionally independent
X1 and X2 with p(x1 , x2 |s) = p(x1 |s)p(x2 |s). RDB1 and RDB2 can then be given as
[
RDB1 =
p(x1 |s)p(x2 |s)
n
(R1 , R2 ) : R1 ≤ min [I(X1 ; Y|X2 ), H(X1 |S)] ,
1
R2 ≤ H(X2 |S),
2
(B.4)
o
R1 + R2 ≤ I(X1 , X2 ; Y) ,
and
[
RDB2 =
p(x1 |s)p(x2 |s)
n
1
(R1 , R2 ) : R1 ≤ H(X1 |S),
2
R2 ≤ min [I(X2 ; Y|X1 ), H(X2 |S)] ,
o
R1 + R2 ≤ I(X1 , X2 ; Y) ,
(B.5)
respectively, where in both bounds |S| ≤ |X1 ||X2 | + 3. In [64] it was shown that
a binary auxiliary random variable is sufficient to achieve all points in RDB1 and
RDB2 and an alternative characterization was given. Specifically, it was shown
that the capacity region of the BAN-MAC-FB is contained in the region RDB =
(1)
(2)
RDB ∩ RDB , where
(
(1)
RDB
=
[
(u1 ,u2 ,u)∈PDB
1
(R1 , R2 ) : R1 ≤ min h(φ(2u1 )), h(u) ,
2
1
R2 ≤ h(φ(2u2 )),
2
)
1−u
R1 + R2 ≤ h
,
2
129
(B.6)
and
(
(2)
RDB
[
=
(u1 ,u2 ,u)∈PDB
1
(R1 , R2 ) : R1 ≤ h(φ(2u1 )),
2
1
R2 ≤ min h(φ(2u2 )), h(u) ,
2
)
1−u
R1 + R2 ≤ h
,
2
(B.7)
and the set PDB is defined as
PDB
1
1
= (u1 , u2 , u) : 0 ≤ u1 ≤ , 0 ≤ u2 ≤ , f (2u1 , 2u2 ) ≤ u ≤ 1 − (u1 + u2 ) .
4
4
(B.8)
Because pX1 |S (x|s) = pX2 |S (x|s) we have that u1s = u2s for all s and therefore
u1 = u2 . Let w := 2u1 = 2u2 . For x ≤ 1/4 we have that f (x, x) = x and therefore
PDB,Γ is given as
PDB,Γ
1
= (w, u) : 0 ≤ w ≤ , w ≤ u ≤ 1 − w .
2
(B.9)
We now have that the common distribution capacity region of the BAN-MAC-FB
(1)
(2)
is contained in the region RDB,Γ = RDB,Γ ∩ RDB,Γ where
(
(1)
RDB,Γ
=
[
(w,u)∈PDB,Γ
1
(R1 , R2 ) : R1 ≤ min h(φ(w)), h(u) ,
2
1
R2 ≤ h(φ(w)),
2
)
1−u
R1 + R2 ≤ h
,
2
130
(B.10)
and
(
(2)
RDB,Γ
[
=
(w,u)∈PDB,Γ
1
(R1 , R2 ) : R1 ≤ h(φ(w)),
2
1
R2 ≤ min h(φ(w)), h(u) ,
2
)
1−u
R1 + R2 ≤ h
,
2
(1)
(2)
(1)
(B.11)
(2)
where PDB,Γ is defined in (B.9). RDB , RDB , RDB,Γ , and RDB,Γ are all shown
in Figure 4.3, along with the capacity region of the binary erasure noisy MAC
without feedback for comparison.
Next we show that RDB,Γ ≡ RCL,Γ and therefore the Cover-Leung bound is
(1)
(2)
tight. Consider weakened versions of RDB,Γ and RDB,Γ given by
(
(1)∗
RDB,Γ =
[
(w,u)∈PDB,Γ
(R1 , R2 ) : R1 ≤ h(φ(w)),
1
R2 ≤ h(φ(w)),
2
)
1−u
R1 + R2 ≤ h
,
2
(B.12)
and
(
(2)∗
RDB,Γ =
[
(w,u)∈PDB,Γ
1
(R1 , R2 ) : R1 ≤ h(φ(w)),
2
R2 ≤ h(φ(w)),
)
1−u
R1 + R2 ≤ h
.
2
(1)∗
(B.13)
(2)∗
Note that the sum rate terms in RDB,Γ and RDB,Γ are monotone decreasing in u
and therefore u = w uniquely maximizes both of the weakened bounds. Taking
131
(1)∗
(2)∗
the intersection of RDB,Γ and RDB,Γ yields
(
[
RDB,Γ =
0≤w≤1/2
1
(R1 , R2 ) : R1 ≤ h(φ(w)),
2
1
R2 ≤ h(φ(w)),
2
)
1−w
R1 + R2 ≤ h
,
2
(B.14)
and therefore RCL,Γ ≡ RDB,Γ , proving the desired result.
B.2 Proof of Theorem 4.4
Proof: An achievable rate region for the multiple access channel with generalized
feedback was given in [60]. Letting Z = (S, U1 , V1 , W1 , U2 , V2 , W2 , X1 , X2 , Y, YF ),
we have that for any
p(z) =p(s)p(u1 |s)p(v1 |s)p(w1 |s)p(u2 |s)p(v2 |s)p(w2 |s)
· δ(x1 |s, u1 , v1 , w1 , w2 )δ(x2 |s, u2 , v2 , w1 , w2 )
(B.15)
· p(y, yF |x1 , x2 ),
all (R1 , R2 ) = (R10 + R11 , R20 + R21 ) satisfying
R10 < I(U1 ; YF |S, W1 , W2 , U2 , V2 ),
(B.16a)
R20 < I(U2 ; YF |S, W1 , W2 , U1 , V1 ),
(B.16b)
R10 < I(U1 ; Y|S, W1 , W2 , U2 , V1 , V2 ) + I(W1 ; Y|S, W2 ),
(B.16c)
R20 < I(U2 ; Y|S, W1 , W2 , U1 , V1 , V2 ) + I(W2 ; Y|S, W1 ),
(B.16d)
R11 < I(V1 ; Y|S, W1 , W2 , U1 , U2 , V2 ),
(B.16e)
R22 < I(V2 ; Y|S, W1 , W2 , U1 , U2 , V1 ),
(B.16f)
132
R10 + R20 < I(U1 , U2 ; Y|S, W1 , W2 , V1 , V2 ) + I(W1 , W2 ; Y|S),
(B.16g)
R10 + R11 < I(U1 , V1 ; Y|S, W1 , W2 , U2 , V2 ) + I(W1 ; Y|S, W2 ),
(B.16h)
R10 + R22 < I(U1 , V2 ; Y|S, W1 , W2 , U2 , V1 ) + I(W1 ; Y|S, W2 ),
(B.16i)
R20 + R11 < I(U2 , V1 ; Y|S, W1 , W2 , U1 , V2 ) + I(W2 ; Y|S, W1 ),
(B.16j)
R20 + R22 < I(U2 , V2 ; Y|S, W1 , W2 , U1 , V1 ) + I(W2 ; Y|S, W1 ),
(B.16k)
R11 + R22 < I(V1 , V2 ; Y|S, W1 , W2 , U1 , U2 ),
(B.16l)
R10 + R20 + R11 < I(U1 , U2 , V1 ; Y|S, W1 , W2 , V2 ) + I(W1 , W2 ; Y|S),
(B.16m)
R10 + R20 + R22 < I(U1 , U2 , V2 ; Y|S, W1 , W2 , V1 ) + I(W1 , W2 ; Y|S),
(B.16n)
R10 + R11 + R22 < I(U1 , V1 , V2 ; Y|S, W1 , W2 , U2 ) + I(W1 ; Y|S, W2 ),
(B.16o)
R20 + R11 + R22 < I(U2 , V1 , V2 ; Y|S, W1 , W2 , U1 ) + I(W2 ; Y|S, W1 ),
(B.16p)
R10 + R11 + R20 + R22 < I(W1 , W2 , U1 , U2 , V1 , V2 ; Y|S),
(B.16q)
are achievable over the multiple access channel with generalized feedback, where
we have specialized the results from [60] to the case of a common feedback signal
YF to both users, and δ(·) denotes a degenerate probability distribution.
Let |W1 | and |W2 | both be equal to one, such that W1 and W2 are deterministic.
Choose p(v1 |s) and p(v2 |s) uniformly distributed on {1, . . . , 2K }. Let V1 = X̃1
and V2 = X̃2 . Define T1 = U1 and T2 = U2 , such that p(u1 |s) and p(u2 |s) are
binary random variables with conditional distributions characterized by q1s and
q2s , respectively, as defined in (4.8) and (4.9). Also, let f1 (S, U1 , V1 , W1 , W2 ) =
U1 V1 and f2 (S, U2 , V2 , W1 , W2 ) = U2 V2 such that X1 = U1 V1 = X̃1 T1 and X2 =
U2 V2 = X̃2 T2 . We now compute (B.16) for this particular choice of distribution.
133
Starting from (B.16a) we have
R10 < I(T1 ; YF |S, T2 , X̃2 )
= H(YF |S, T2 ) − H(YF |S, T1 , T2 )
X
= H(T1 |S) :=
ps H(T1 |S-s)
(B.17)
(B.18)
(B.19)
s
=
X
ps h(q1s ),
(B.20)
s
where (B.18) follows from the definition of mutual information and the fact that
YF is independent of X̃1 and X̃2 , and (B.19) follows from the definition of YF . By
computing (B.16b) similarly we have that
R20 <
X
ps h(q2s ).
(B.21)
s
Starting from (B.16c) we have
R10 < I(T1 ; Y|S, T2 , X̃1 , X̃2 )
(B.22)
= H(T1 |S, X̃1 , X2 )
X
=
ps h(q1s ),
(B.23)
(B.24)
s
where (B.23) follows because T1 is a deterministic function of Y and X2 , and (B.24)
follows from the fact that T1 is conditionally independent of X̃1 and X2 given S.
Computing (B.16d) similarly we have that
R20 <
X
ps h(q2s ).
s
134
(B.25)
Starting from (B.16e) we have
R11 < I(X̃1 ; Y|S, T1 , T2 , X̃2 )
(B.26)
= H(Y|S, T1 , X2 )
X
=
ps H(Y|S-s, T1 -1, X2 -0) Pr[T1 -1, X2 -0|S-s]
(B.27)
(B.28)
s
=K
X
ps q 1s q2s ,
(B.29)
s
where (B.27) follows because H(Y|S, X1 , X2 ) = 0, (B.28) follows because
H(Y|S-s, T1 -t1 , X2 -x2 ) = 0,
(t1 , x2 ) 6= (1, 0),
(B.30)
and in (B.29) we used the fact that T1 and T2 are conditionally independent given
S. By computing (B.16f) similarly we have that
R22 < K
X
ps q1s q 2s .
(B.31)
s
Starting from (B.16g) we have
R10 + R20 < I(T1 , T2 ; Y|S, X̃1 , X̃2 )
(B.32)
= H(Y|S, X̃1 , X̃2 )
X n
=
ps H(Y|S-s, X̃1 , X̃2 , X̃1 - X̃2 ) Pr[X̃1 - X̃2 |S-s]
s
=
X
s
o
+ H(Y|S-s, X̃1 , X̃2 , X̃1 =
6 X̃2 ) Pr[X̃1 =
6 X̃2 |S-s]
ps 2−K h(3) (q1s q2s , q 1s q 2s )
(B.33)
(B.34)
(B.35)
+ (1 − 2−K )h(4) (q1s q2s , q 1s q2s , q1s q 2s , q 1 q 2 ) ,
where we have used the fact that X̃1 and X̃2 are independent uniform random
135
variables over an alphabet of cardinality 2K . Starting from (B.16h) we have
R10 + R11 < I(T1 , X̃1 ; Y|S, T2 , X̃2 )
(B.36)
= H(Y|S, X2 )
X n
=
ps H(Y|S-s, X2 -0) Pr[X2 -0|S-s]
(B.37)
(B.38)
s
o
+ H(Y|S-s, X2 6=0) Pr[X2 6=0|S-s]
o
n
X
=
ps h(q1s ) + Kq 1s q2s + h(q1s )q 2s
(B.39)
s
=
X
ps h(q1s ) + Kq 1s q2s .
(B.40)
s
Computing (B.16i), (B.16j), and (B.16k) similarly we have
R10 + R22 <
X
ps h(q1s ) + Kq1s q 2s ,
(B.41)
ps h(q2s ) + Kq 1s q2s ,
(B.42)
ps h(q2s ) + Kq1s q 2s .
(B.43)
s
R20 + R11 <
X
s
R20 + R22 <
X
s
Starting from (B.16l) we have
R11 + R22 < I(X̃1 , X̃2 ; Y|S, T1 , T2 )
(B.44)
= H(Y|S, T1 , T2 )
X n
=
ps H(Y|S-s, T1 -0, T2 -1) Pr[T1 -0, T2 -1|S-s]
s
=
X
+ H(Y|S-s, T1 -1, T2 -0) Pr[T1 -1, T2 -0|S-s]
ps K(q 1s q2s + q1s q 2s ) .
s
136
(B.45)
o
(B.46)
(B.47)
Starting from (B.16m) we have
R10 + R20 + R11 < I(T1 , T2 , X̃1 ; Y|S, X̃2 )
(B.48)
= H(Y|S, X̃2 )
(B.49)
K
=
X
ps
s
=
X
s
2
X
H(Y|S-s, X̃2 -x)
(B.50)
x=1
h
ps h(4) q1s q2s , q 1s q 2s , q1s q 2s + 2−K q 1s q2s , 1 − 2−K q 1s q2s
i
+ 1 − 2−K q 1s q2s log2 (2K − 1) .
(B.51)
Computing (B.16n) similarly we have
R10 + R20 + R22 <
X
s
h
ps h(4) q1s q2s , q 1s q 2s , q 1s q2s + 2−K q1s q 2s , 1 − 2−K q1s q 2s
i
+ 1 − 2−K q1s q 2s log2 (2K − 1) .
(B.52)
Starting from (B.16o) we have
R10 + R11 + R22 < I(T1 , X̃1 , X̃2 ; Y|S, T2 )
(B.53)
= H(Y|S, T2 )
X n
=
ps H(Y|S-s, T2 -0) Pr[T2 -0|S-s]
s
=
X
+ H(Y|S-s, T2 -1) Pr[T2 -1|S-s]
(B.54)
o
ps h(q1s ) + K(q 1s q2s + q1s q 2s ) .
(B.55)
(B.56)
s
Computing (B.16p) similarly we have
R20 + R11 + R22 <
X
ps h(q2s ) + K(q 1s q2s + q1s q 2s ) .
s
137
(B.57)
And finally, computing (B.16q) we have
R10 + R11 + R20 + R22 <
X
s
ps h(3) (q1s q2s , q 1s q 2s ) + K(q1s + q2s − 2q1s q2s ) . (B.58)
Note that (B.20), (B.21), (B.29), and (B.31) together imply (B.24), (B.25),
(B.40), (B.41), (B.42), (B.43), (B.47), (B.56), and (B.57), and so the latter inequalities are redundant. Therefore, the set of all (R1 , R2 ) = (R10 + R11 , R20 + R22 )
satisfying
R11 < K
X
s
s
R20 <
X
R22 < K
X
R10 + R20 <
X
R10 <
X
ps h(q1s ),
ps h(q2s ),
ps q 1s q2s ,
(B.59)
ps q1s q 2s ,
(B.60)
s
s
ps 2−K h(3) (q1s q2s , q 1s q 2s )
s
−K
(B.61)
(4)
+ (1 − 2 )h (q1s q2s , q 1s q2s , q1s q 2s , q 1 q 2 ) ,
h
X
ps h(4) q1s q2s , q 1s q 2s , q1s q 2s + 2−K q 1s q2s , 1 − 2−K q 1s q2s
R10 + R20 + R11 <
i
s
(B.62)
+ 1 − 2−K q 1s q2s log2 (2K − 1)
X h
ps h(4) q1s q2s , q 1s q 2s , q 1s q2s + 2−K q1s q 2s , 1 − 2−K q1s q 2s
R10 + R20 + R22 <
i
s
−K
K
+ 1−2
q1s q 2s log2 (2 − 1)
(B.63)
X R10 + R11 + R20 + R22 <
ps h(3) (q1s q2s , q 1s q 2s ) + K(q1s + q2s − 2q1s q2s )
s
(B.64)
are achievable.
By choosing
R11 = K
X
ps q 1s q2s ,
R22 = K
s
X
s
138
ps q1s q 2s ,
(B.65)
(B.64) leads to
R10 + R20 <
X
ps h(3) (q1s q2s , q 1s q 2s ).
(B.66)
s
From the fact that
(4)
(3)
h (a, b, c, d) = h (a, b, c + d) + Pr[c + d]h
c
c+d
≥ h(3) (a, b, c + d)
(B.67)
(B.68)
it follows that the right side of (B.61) satisfies
X
s
ps 2−K h(3) (q1s q2s , q 1s q 2s ) + (1 − 2−K )h(4) (q1s q2s , q 1s q2s , q1s q 2s , q 1 q 2 )
≥
X
(B.69)
(3)
ps h (q1s q2s , q 1s q 2s ),
s
and therefore, for the choice of R11 and R22 given in (B.65), (B.61) is redundant.
Using (B.67) to expand the h(4) (·) term in (B.62) and (B.63) we find that they
are redundant if the inequality
−K
K − 1−2
q1s q 2s
q1s q 2s
h
log2 (2 − 1) < 1 +
q 1s q2s
q 1s q2s + q1s q 2s
K
(B.70)
holds for all 0 ≤ qis ≤ 1, i = 1, 2, which can be verified numerically. We now have
the simplified achievable rate region given by the set of (R1 , R2 ) satisfying
R1 ≤
X
R2 ≤
X
R1 + R2 <
X
ps h(q1s ) + Kq 1s q2s ,
(B.71)
ps h(q2s ) + Kq1s q 2s ,
(B.72)
ps h(3) (q1s q2s , q 1s q 2s ) + K(q1s + q2s − 2q1s q2s )
(B.73)
s
s
s
139
for some choice of 0 ≤ qis ≤ 1, i = 1, 2, s = 1, . . . , |S|. Next we derive upper
bounds on the terms that comprise (B.71)–(B.73) in terms of u1 and u2 , and then
give distributions that achieve these bounds.
Using (4.4), (4.3), and the concavity of entropy we have
X
ps h(q1s ) =
X
=
X
s
s
ps h(φ(2u1s (1 − u1s )))
(B.74)
ps h(φ(2u1s ))
(B.75)
s
≤ h(φ(2u1 )),
(B.76)
and similarly,
X
s
ps h(q2s ) ≤ h(φ(2u2 )).
(B.77)
Solving for q1s and q2s in (4.11) and (4.12) gives
q1s =
1±
p
1 − 2(2u1s )
2
and q2s =
1±
p
1 − 2(2u2s )
,
2
(B.78)
respectively, and therefore
n
o
n
o
q1s ∈ φ(2u1s ), 1 − φ(2u1s )
and q2s ∈ φ(2u2s ), 1 − φ(2u2s ) .
(B.79)
Because of the fact that 0 ≤ φ(x) ≤ 1/2 for 0 ≤ x ≤ 1 we have
φ(2uis ) ≤ 1 − φ(2uis ),
i = 1, 2.
(B.80)
Also, in order to bound the second term in (B.71) we will need the following
lemma.
140
Lemma B.1. The function
g(x, y) = (1 − φ(x))(1 − φ(y))
(B.81)
= 1 − φ(x) − φ(y) + φ(x)φ(y)
√
√
(1 + 1 − 2x)(1 + 1 − 2y)
=
4
(B.82)
(B.83)
is jointly concave in (x, y) over x ∈ [0, 1/2), y ∈ [0, 1/2).
Proof:
A function g(x, y) is jointly concave in (x, y) if the function −g(x, y)
is jointly convex in (x, y). Showing that a function is jointly convex in (x, y) is
equivalent to showing that its Hessian matrix is positive semi-definite [103]. The
Hessian matrix H of −g(x, y) is given by
H=
√
1+ 1−2y
4(1−2x)3/2
−1√
4 1−2x 1−2y
√
−1√
√
4 1−2x 1−2y
.
√
1+ 1−2x
4(1−2y)3/2
(B.84)
Mathematica can be used to verify that the two eigenvalues of H are non-negative
for x ∈ [0, 1/2), y ∈ [0, 1/2), which is equivalent to showing that H is positive
semi-definite, completing the proof.
Returning to (B.71), the second term can be upper bounded using (B.80),
141
which leads to
K
X
s
ps q 1s q2s ≤ K
X
=K
X
s
ps (1 − φ(2u1s ))(1 − φ(2u2s ))
(B.85)
ps g(2u1s , 2u2s )
(B.86)
s
!
≤ Kg
X
ps 2u1s ,
s
X
ps 2u2s
(B.87)
s
= Kg(2u1 , 2u2 )
(B.88)
= K(1 − φ(2u1 ))(1 − φ(2u2 )),
(B.89)
where (B.87) follows from application of Lemma B.1 and Jensen’s inequality. Similarly,
K
X
s
ps q1s q 2s ≤ (1 − φ(2u1 ))(1 − φ(2u2 )).
(B.90)
Next we bound each component of the sum rate term (B.73). Define us =
q1s + q2s − 2q1s q2s , along with φis = φ(2uis ) and φi = φ(2ui ), i = 1, 2. From (B.79)
it follows that
o
n
us ∈ φ1s φ2s + φ1s φ2s , φ1s φ2s + φ1s φ2s
n
o
= f (2u1s , 2u2s ), 1 − f (2u1s , 2u2s ) ,
(B.91)
(B.92)
where f (x, y) was defined in (4.5). Furthermore, because f (x, y) ≤ 1/2 for x ∈
[0, 1/2], y ∈ [0, 1/2], we have
us ≤ 1 − f (2u1s , 2u2s )
142
(B.93)
for all s. The second term in (B.73) can then be upper bounded as:
K
X
s
ps (q1s + q2s − 2q1s q2s ) = K
X
≤K
X
ps us
(B.94)
ps 1 − f (2u1s , 2u2s )
(B.95)
s
"s
≤K 1−f
!#
X
ps 2u1s ,
X
s
ps 2u2s
(B.96)
s
= K 1 − f (2u1 , 2u2 )
(B.97)
= K(1 − φ1 − φ2 + 2φ1 φ2 ),
(B.98)
where (B.95) follows from (B.93), and in (B.96) we have used Jensen’s inequality
and the convexity of f (x, y) over x ∈ [0, 1/2], y ∈ [0, 1/2], which was shown in
[64]. In order to bound the first term in (B.73) the following two lemmas will be
required.
Lemma B.2. The following inequality holds for all 0 ≤ x ≤ 1/2 and 0 ≤ y ≤ 1/2:
h(3) (xy, (1 − x)(1 − y)) ≤ h(3) ((1 − x)y, x(1 − y)).
(B.99)
Proof: The validity of (B.99) was verified using Mathematica.
Lemma B.3. The function
f2 (x, y) = h(3) ((1 − φ(x))φ(y), φ(x)(1 − φ(y)))
(B.100)
is jointly concave in (x, y) over 0 ≤ x ≤ 1/2, 0 ≤ y ≤ 1/2.
Proof:
Mathematica can be used to show that the eigenvalues of the Hessian
matrix of −f2 (x, y) are non-negative for 0 ≤ x ≤ 1/2, 0 ≤ y ≤ 1/2. This
143
fact implies that the Hessian matrix of −f2 (x, y) is positive semi-definite and the
function −f2 (x, y) is convex. Therefore f2 (x, y) is concave, completing the proof.
Each component of the sum in the first term of (B.73) can be bounded as
follows:
h(3) (q1s q2s , q 1s q 2s )
(B.101)
= max h(3) (φ1s φ2s , φ1s φ2s ), h(3) (φ1s φ2s , φ1s φ2s )
(B.102)
= h(3) (φ1s φ2s , φ1s φ2s ),
(B.103)
h(3) (q1s q2s , q 1s q 2s ) ≤
max
q1s ∈{φ1s ,1−φ1s }
q2s ∈{φ2s ,1−φ2s }
where (B.103) follows from Lemma B.2. Using (B.103) in (B.73) and applying
Lemma B.3 we have
X
s
ps h(3) (q1s q2s , q 1s q 2s ) ≤
X
=
X
ps h(3) (φ1s φ2s , φ1s φ2s )
(B.104)
ps f2 (2u1s , 2u2s )
(B.105)
s
s
!
≤ f2
X
ps 2u1s ,
X
s
ps 2u2s
(B.106)
s
= f2 (2u1 , 2u2 )
(B.107)
= h(3) (φ1 φ2 , φ1 φ2 ).
(B.108)
By combining (B.76), (B.77), (B.89), (B.90), (B.98), and (B.108), an outer
bound on the achievable rate region specified by (B.71)–(B.73) is given by R,
144
where
R=
[
(u1 ,u2 )∈P
n
(R1 , R2 ) : R1 ≤ h(φ1 ) + Kφ1 φ2 ,
R2 ≤ h(φ2 ) + Kφ1 φ2 ,
(B.109)
o
R1 + R2 ≤ h(3) (φ1 φ2 , φ1 φ2 ) + K(1 − φ1 φ2 + φ1 φ2 ) ,
and the set P is defined as
1
1
P = (u1 , u2 ) : 0 ≤ u1 ≤ , 0 ≤ u2 ≤
.
4
4
(B.110)
We now specify input distributions for every (u1 , u2 ) ∈ P that achieve the
outer bound given in (B.109). Let (u1 , u2 ) be an arbitrary pair in P. Define
ua = max{u1 , u2 } and ub = min{u1 , u2 } if u1 6= u2 and ua = ub = u1 = u2
otherwise. Consider the pair of fixed input distributions
(q1 , q2 ) = (φ(2ua ), 1 − φ(2ub )) and (q1 , q2 ) = (1 − φ(2ub ), φ(2ua )).
(B.111)
By computing (B.71)–(B.73) for both of the input distributions in (B.111) we have
that all rates in R1 and R2 are achievable, where
n
R1 = (R1 , R2 ) : R1 ≤ h(φa ) + Kφa φb ,
R2 ≤ h(φb ) + Kφa φb ,
o
R1 + R2 ≤ h (φa φb , φa φb ) + K(1 − φa − φb + 2φa φb ) ,
(3)
145
(B.112)
and
n
R2 = (R1 , R2 ) : R1 ≤ h(φb ) + Kφa φb ,
R2 ≤ h(φa ) + Kφa φb ,
(B.113)
o
R1 + R2 ≤ h (φa φb , φa φb ) + K(1 − φa − φb + 2φa φb ) ,
(3)
where we have defined φa = φ(2ua ) and φb = φ(2ub ). By time sharing between the
two input distributions given in (B.111), all (R1 , R2 ) ∈ co(R1 ∪ R2 ) are achievable,
where co(·) denotes the convex hull operation. Noting that
h(φa ) + Kφa φb ≥ h(φb ) + Kφa φb
(B.114)
leads to the fact that
R≡
[
(u1 ,u2 )∈P
co R1 (u1 , u2 ) ∪ R2 (u1 , u2 ) ,
(B.115)
completing the proof.
B.3 Proof of Theorem 4.5
Proof:
where
(Theorem 4.5) Define X̃1 and X̃2 such that X1 = T1 X̃1 and X2 = T2 X̃2 ,
0, Xi = 0,
Ti =
1, X =
0,
i 6
(B.116)
i = 1, 2. Choose a parallel channel Z = T1 in Theorem 4.1.
For any p(s, x1 , x2 , y, yF ) we can construct a corresponding normalized distri-
146
bution p∗ (s, x1 , x2 , y, yF ) by normalizing X̃1 and X̃2 such that they are uniformly
distributed and conditionally independent of each other and of (T1 , T2 ) given S.
For each such p(s, x1 , x2 , y, yF ) denote the mutual information computed with respect to the corresponding p∗ (s, x1 , x2 , y, yF ) by Ip∗(· ; ·). Note that by normalizing
over X̃1 and X̃2 we have that
pS,T1 ,T2 (s, t1 , t2 ) = p∗S,T1 ,T2 (s, t1 , t2 ),
for all s, t1 , t2 .
(B.117)
However, T1 and T2 may still be arbitrarily correlated with each other under
p∗ (s, x1 , x2 , y, yF ), which makes computation of (4.33) challenging.
We now compute the mutual information terms that comprise (4.33). Starting
with the parallel channel term for R1 we have
I(X1 ; Y, T1 |X2 , S) = H(T1 |X2 , S) + H(Y|T1 , X2 , S)
(B.118)
= H(T1 |T2 , X̃2 , S) + H(X̃1 |X2 -0, S) Pr[T1 -1, X2 -0]
(B.119)
≤ H(T1 |T2 , S) + K Pr[T1 -1, T2 -0]
(B.120)
= Ip∗ (X1 ; Y, T1 |X2 , S),
(B.121)
where (B.119) follows from the fact that H(Y|T1 -t1 , X2 -x2 , S) = 0 for (t1 , x2 ) 6= (1, 0),
(B.120) follows because the entropy of a random variable is upper bounded by the
logarithm of its cardinality and the fact that conditioning reduces entropy, and
(B.121) follows from the fact that under p∗ (s, x1 , x2 , y, yF ) X̃2 is uniformly distributed over {1, . . . , 2K } and conditionally independent of T2 given S. Computing
I(X2 ; Y, T1 |X1 , S) similarly we have that
I(X2 ; Y, T1 |X1 , S) ≤ Ip∗ (X2 ; Y, T1 |X1 , S)
147
(B.122)
holds for every p(s, x1 , x2 , y, yF ).
Computing the corresponding Shannon rate constraints for an arbitrary distribution p(s, x1 , x2 , y, yF ) gives
I(X1 ; Y|X2 ) = H(Y|X2 )
(B.123)
K
= q2 H(Y|T2 -0) +
2
X
pX2 (x2 )H(Y|X2 -x2 )
(B.124)
x2 =1
n
o
= q2 h(T1 |T2 -0) + H(X̃1 |T1 -1, T2 -0) Pr[T1 -1|T2 -0]
K
+
2
X
x2 =1
(B.125)
pX2 (x2 )h(T1 |X2 -x2 )
≤ q2 h(T1 |T2 -0) + K + h(q1 )
(B.126)
= Ip∗ (X1 ; Y|X2 ),
(B.127)
where q2 is defined in (4.13) and (B.125) follows from the concavity of the binary
entropy function and application of Jensen’s inequality. Computing I(X2 ; Y|X1 )
similarly gives that
I(X2 ; Y|X1 ) ≤ Ip∗ (X2 ; Y|X1 )
for all p(s, x1 , x2 , y, yF ).
148
(B.128)
Turning to the sum rate constraint in (4.33) we have
I(X1 , X2 ; Y) = H(Y)
!
= h(3)
X
ps q1s q2s ,
X
s
ps q 1s q 2s
s
+ H(Y|y ∈ {1, . . . , 2K })
X
s
ps (q1s + q2s − 2q1s q2s )
!
≤ h(3)
X
ps q1s q2s ,
X
s
ps q 1s q 2s
s
= Ip∗ (X1 , X2 ; Y),
+ K(q1s + q2s − 2q1s q2s )
(B.129)
and
I(X1 , X2 ; Y, T1 |S) = H(T1 |S) + H(Y|T1 , S)
X n
ps h(q1s ) + q1s h(T2 |T1 -0, S-s) + H(X̃2 |T1 -0, T2 -1, S-s)
=
s
o
+ q 1s h(T2 |T1 -1, S-s) + H(X̃1 |T2 -0, T1 -1, S-s)
X n
ps h(q1s ) + q1s h(T2 |T1 -0, S-s) + K
≤
s
o
+ q 1s h(T2 |T1 -1, S-s) + K
X =K+
ps h(q1s ) + q1s h(T2 |T1 -1, S-s) + q 1s h(T2 |T1 -1, S-s)
s
= Ip∗ (X1 , X2 ; Y, T1 |S).
(B.130)
Combining (B.121), (B.122), (B.127), (B.128), (B.129), and (B.130), it is clear
that the rate region generated by every p(s, x1 , x2 , y, yF ) is contained in the rate
region generated by the corresponding p∗ (s, x1 , x2 , y, yF ). Computing the dependence balance constraint given in (4.36) for an arbitrary p(s, x1 , x2 , y, yF ) ∈ PDB
149
we have
0 ≤ I(X1 ; X2 |YF , T1 , S) − I(X1 ; X2 |S)
(B.131)
= H(X2 |YF , T1 , S) − H(X2 |X1 , YF , S) − H(X2 |S) + H(X2 |X1 , S)
(B.132)
= I(X2 ; YF |X1 , S) − I(X2 ; YF , T1 |S)
(B.133)
= H(YF |X1 , S) + H(YF |X2 , S) − H(YF , T1 |S).
(B.134)
Each term in (B.134) can be computed as
K
"
H(YF |X1 , S) =
X
s
ps q1s h(T2 |T1 -0, S-s) +
H(YF |X2 , S) =
s
x-1
#
pX1 |S (x|s)h(T2 |X1 -x, S-s) ,
(B.135)
#
K
"
X
2
X
ps q2s h(T1 |T2 -0, S-s) +
2
X
x-1
pX2 |S (x|s)h(T1 |X2 -x, S-s) ,
(B.136)
H(YF , T1 |S) =
X
s
h
i
ps h(q1s ) + q1s h(T2 |T1 -0, S-s) + q 1s h(T2 |T1 -1, S-s) . (B.137)
Using (B.135), (B.136), and (B.137) in (B.134) and simplifying we have
(
0≤
X
≤
X
s
s
ps
K
2
h
X
i
pX1 |S (x|s)h(T2 |X1 -x, S-s) + pX2 |S (x|s)h(T1 |X2 -x, S-s)
)
x-1
(B.138)
+ q2s h(T1 |T2 -0, S-s) − h(q1s ) − q 1s h(T2 |T1 -1, S-s)
h
ps q2s h(T1 |T2 -0, S-s) − h(q1s ) − q 1s h(T2 |T1 -1, S-s)
i
+ h(T1 |S-s) + h(T2 |S-s)
= Ip∗ (X1 ; X2 |YF , T1 , S) − Ip∗ (X1 ; X2 |S),
150
(B.139)
(B.140)
where in (B.139) we have used the concavity of entropy and Jensen’s inequality.
From (B.140) it follows that for every p(s, x1 , x2 , y, yF ) ∈ PDB , the corresponding
normalized distribution p∗ (s, x1 , x2 , y, yF ) also satisfies the dependence balance
constraint and therefore p∗ (s, x1 , x2 , y, yF ) ∈ PDB . This shows that when computing (4.33) it is sufficient to consider p(s, x1 , x2 , y, yF ) with X̃1 and X̃2 uniformly
distributed and conditionally independent of each other and of (T1 , T2 ) given S.
Denote the set of such distributions that also satisfy the dependence balance constraint as PDB∗ .
Computing the right side of the dependence balance constraint given by (4.36)
for an arbitrary p(s, x1 , x2 , y, yF ) ∈ PDB∗ we have
I(X1 ; X2 |YF , T1 , S) = H(X2 |YF , T1 , S) − H(X2 |X̃1 , YF , T1 , S) = 0,
(B.141)
where the first equality follows from the definition of X̃1 and the second equality
follows from the the fact that
X̃1 — (T1 , YF , S) — (T2 , X̃2 )
(B.142)
forms a Markov chain for any p(s, x1 , x2 , y, yF ) ∈ PDB∗ . Therefore,
I(X1 ; X2 |S) = 0
(B.143)
and it is sufficient to consider conditionally independent X1 and X2 when computing the rate region given in the dependence balance bound. Computing each term
in (4.33) for conditionally independent X1 and X2 leads to the outer bound given
in Theorem 4.5, completing the proof.
151
APPENDIX C
PROOFS FOR CHAPTER 5
C.1 Proof of Lemma 5.1
From the definition of PN (d), any PKN ,K̂N (kN , k̂N ) ∈ PN (d) satisfies the following properties:
• the corresponding marginal distribution for KN satisfies the Bernoulli arrival
process assumption,
• Dn := K̂n − Kn ≥ 0 ∀n with probability 1, and,
P
• 1/N N
n=1 dn ≤ d, where dn := E [Dn ].
Define Un and Vn as
Un := Kn − Kn−1 ,
(C.1)
Vn := K̂n − Kn−1 ,
(C.2)
for n = 2, . . . , N . Note that from the definition of dn , we have dn = E [Vn − Un ].
Therefore, for any joint distribution PKN ,K̂N (kN , k̂N ) ∈ PN (d), the corresponding
distribution on Un and Vn satisfies PUn ,Vn (un , vn ) ∈ P1 (d) for n = 2, . . . , N . Now
consider any joint distribution PKN ,K̂N (kN , k̂N ) ∈ PN (d), and the following chain
152
of (in)equalities:
(a)
I(KN ; K̂N ) = H(KN ) − H(KN |K̂N )
(b)
= H(K1 ) +
N
X
n=2
"
H(Kn |K1n−1 ) − H(K1 |K̂N ) +
(c)
≥ H(K1 ) − H(K1 |K̂1 ) +
(d)
= I(K1 ; K̂1 ) +
(e)
≥ I(K1 ; K̂1 ) +
(f )
= I(K1 ; K̂1 ) +
N h
X
n=2
n=2
N
X
n=2
#
H(Kn |K̂N Kn−1
)
1
i
H(Kn |Kn−1 ) − H(Kn |Kn−1 K̂n )
N h
X
n=2
N
X
N
X
H(Kn |Kn−1 ) − H(Un |Kn−1 K̂n Vn )
i
[H(Un ) − H(Un |Vn )]
I(Un ; Vn )
n=2
(g)
≥
N
X
R1 (dn )
n=1
(h)
≥ N R1
N
1 X
dn
N n=1
!
(i)
≥ N R1 (d),
where each are justified as follows:
(a) definition of mutual information,
(b) chain rule,
(c) conditioning cannot increase entropy; by the Bernoulli arrival process assumption, when conditioned on Kn−1 , Kn is independent of K1n−2 ,
(d) definition of mutual information; Un is a translation of Kn when conditioned
on Kn−1 ,
(e) the conditional entropy of Kn given Kn−1 is the same as the entropy of Un ;
conditioning cannot increase entropy,
153
(f) definition of mutual information,
(g) using definition of R1 (d) and PUn ,Vn (un , vn ) ∈ P1 (d) ∀n,
(h) convexity of R1 (d),
P
(i) using 1/N N
n=1 dn ≤ d and the fact that R1 (d) is non-increasing in d.
Because the choice of PKN ,K̂N (kN , k̂N ) ∈ PN (d) was arbitrary, we have
1
I(KN ; K̂N ) ≥ R1 (d),
PN (d) N
inf
(C.3)
and therefore RN (d) ≥ R1 (d), concluding the proof.
C.2 Proof of Theorem 5.1
The computation of R1 (d) is similar to the continuous case [52]; however there
are differences due to the discrete nature of the problem that are sufficient to
preclude a closed form analytic expression. Instead, we arrive at an analytic
expression as the supremum over a Lagrange multiplier for the delay constraint,
ν, and then solve for it numerically.
Following optimization techniques from [48], we find a sequence of Lagrange
multipliers ψk satisfying
X
k
ψk e−νd(k;k̂) ≤ 1,
∀k̂
(C.4)
and a probability distribution PK̂ [k̂] satisfying
PK [k] = ψk
X
PK̂ [k̂]e−νd(k;k̂) ,
k̂
∀k.
(C.5)
As a Corollary to [48, Theorem 9.4.1], if, for a given sequence ψk satisfying (C.4),
154
there exists a valid probability distribution PK̂ [k̂] satisfying (C.5) for that ψk , and
(C.4) is met with equality for all k̂ with PK̂ [k̂] > 0, then R1 (d) is given by
"
R1 (d) = sup
X
ν≥0
k
#
ψk
− νd ,
PK [k] log
PK [k]
(C.6)
where ν is the Lagrange multiplier for the delay constraint.
Using the distortion measure
d(k; k̂) :=
k̂ − k
∞
for k̂ ≥ k,
(C.7)
∀k̂ ≥ 0
(C.8)
else,
(C.4) and (C.5) become
e
−ν k̂
k̂
X
k=0
and
ψk eνk ≤ 1,
∞
e−νk PK [k] X
PK̂ [k̂]e−ν k̂ ,
=
ψk
k̂=k
∀k ≥ 0,
(C.9)
respectively.
For reasons similar to those in [52], we conjecture that there exists an integer
k0 such that PK̂ [k̂] = 0 for k̂ < k0 and PK̂ [k̂] > 0 for k̂ ≥ k0 . Since for all k̂ with
PK̂ [k̂] > 0 (C.8) must be met with equality, the difference between the left hand
side of (C.8) for any two values of k̂ ≥ k0 must be 0. Choosing two consecutive
values of k̂ ≥ k0 gives
ψk = 1 − e−ν ,
for k ≥ k0 + 1.
(C.10)
In a similar manner, using these values of ψk in (C.9), along with PK [k] = p(1−p)k ,
155
we find
PK̂ [k̂] =
p(1 − p)k̂ (eν + p − 1)
,
eν − 1
k̂ ≥ k0 + 1.
Using the fact that PK̂ [k̂] is a probability distribution and therefore
we have
log
k0 (p, ν) =
eν −1
eν +p−1
P
k̂
PK̂ [k̂] = 1
+
− 1 ,
log(1 − p)
(C.11)
(C.12)
from which it follows that
PK̂ [k0 ] = 1 −
eν + p − 1
(1 − p)(k0 +1) .
eν − 1
(C.13)
Using (C.11) and (C.13) in (C.9) we obtain
ψk =
p(1 − p)k eν(k0 −k)
,
1 − (1 − p)(k0 +1)
for k ≤ k0 .
(C.14)
Next, we show that this ψk satisfies (C.8) for all k̂, and that (C.8) is met with
equality for k̂ ≥ k0 . By construction, (C.8) is met with equality for k̂ ≥ k0 + 1.
For k̂ = k0 , equality is verified by using (C.14) in (C.8). A sufficient condition for
(C.8) to be satisfied for k̂ < k0 is that the left hand side of (C.8) is non-decreasing
for all k̂ < k0 . Taking the difference for consecutive values of k̂ leads to the
sufficient condition
#
1 − (1 − p)(k̂+2)
,
ν ≤ log
1 − (1 − p)(k̂+1)
"
∀k̂ < k0 .
(C.15)
Note that (C.15) is met with equality for k̂ equal to the argument of the ceiling
function in k0 (p, ν), i.e.,
log
k̂ =
eν −1
eν +p−1
log(1 − p)
156
− 1.
(C.16)
Since the right hand side of (C.15) is monotone decreasing in k̂ over k̂ ≥ 0, and
k0 − 1 is strictly less than (C.16), the inequality is satisfied for all k̂ < k0 , as
required. Therefore, the unique1 optimal sequence ψk is the one given by (C.10)
P
and (C.14). Computing k PK [k] log ψk /PK [k] in (C.6) with this sequence of ψk
leads to (5.6), thereby completing the proof.
C.3 Proof of Theorem 5.2
Let the size of the Message Queue when transmission of the (k − 1)th block
begins be Q̃M,k . E [Bk ] is largest when the Message Queue is completely empty
(Q̃M,k = 0) as transmission of the previous message block begins. Therefore,
h
i
E [Bk ] ≤ E Bk |Q̃M,k = 0 , leading to
k
k
i
1X
1X h
lim
E [Bi ] ≤ lim
E Bi+1 |Q̃M,i = 0
k→∞ k
k→∞ k
i=1
i=1
h
i
= E B|Q̃M = 0 ,
(C.17)
(C.18)
where we drop the subscripts in (C.18) because conditioned on Q̃M,k = 0, Bk is
independent of k. This bound is useful because (C.18) is just a function of the
time it takes to accumulate N messages relative to the time it takes to transmit
the N + 1 channel symbols from the previous block.
1
Uniqueness follows from the strict convexity of
details.
157
P
k
PK [k] log
ψk
PK [k] .
See [48, p. 460] for
Let TN represent the time it takes for N messages to arrive. Using (C.18) gives
i
λ h
E B|Q̃M = 0
(C.19)
N λ
N +1
N +1
=
E B Q̃M = 0, TN ≤
P TN ≤
N
R
R
) (C.20)
∞
X
i
−
1
i
i
−
1
i
+
E B Q̃M = 0,
< TN ≤
P
< TN ≤
.
R
R
R
R
i=N +2
R∗ ≤
E [B] is just the codeword length (N + 1) plus the number of time instances that
QT = 0, which is a deterministic function of TN . Using the conditioning on TN ,
(C.20) can be computed as
λ
R∗ ≤
N
(
)
∞
X
N +1
i−1
i
(N + 1)P TN ≤
+
iP
< TN ≤
.
R
R
R
i=N +2
(C.21)
Using the substitution k = i − N − 1 and separating the sum into two terms leads
to
∞
N +1
N +1
k+N +1
N +1X
k+N
R /λ ≤
P TN ≤
< TN ≤
+
P
N
R
N k=1
R
R
∞
k+N
k+N +1
1 X
kP
< TN ≤
.
+
N k=1
R
R
∗
(C.22)
Let Ai = {(i − 1)/R < TN ≤ i/R}. The event {TN > (N + 1)/R} is partitioned2
by the set {Ai }, so
∞
X
k+N
k+N +1
N +1
< TN ≤
= P TN >
.
P
R
R
R
k=1
2
(C.23)
A non-empty set A with elements
Ai is said to partition B if B = ∪i Ai and ∩i6=j Ai = ∅.
P
If {Ai } partitions B, then [104] i P [Ai ] = P [∪i Ai ] = P [B].
158
Using (C.23), the first two terms of (C.22) can be combined. Also, the probability
in the last term in (C.22) can be computed directly using the probability mass
function for a Poisson random variable. After exchanging the order of summations
(C.22) becomes
N −1
∞
N +1
1 X (λ/R)i X −λ(k+N )/R R /λ ≤
+
ke
(k + N )i − e−λ/R (k + N + 1)i
N
N i=0
i! k=1
∗
(C.24)
=: R∗ /λ.
(C.25)
To show that the infinite sum in (C.24) converges for any N < ∞ it is sufficient
to show that
∞
X
ke−k(λ/R) (k + N + 1)N
(C.26)
k=1
converges. Since each term in (C.26) is positive, the nth root test for convergence [105] can be applied. Accordingly,
lim
k→∞
q
k
ke−k(λ/R) (k + N + 1)N = e−λ/R lim
k→∞
√
k
k(k + N + 1)N/k
(C.27)
= e−λ/R
(C.28)
< 1,
(C.29)
so (C.26) converges for all N < ∞, and the infinite sum in (C.24) converges.
159
Finally,
lim
N →∞
R∗
N −1
λ(N + 1)
λ X (λ/R)i
+
×
= lim
N →∞
N
N i=0
i!
)
(∞
X
i
−λ/R
i
−λ(k+N )/R
,
(k + N ) − e
(k + N + 1)
ke
(C.30)
k=1
which equals λ for R > λ. Any rate R satisfying R > R∗ also satisfies R > R∗ ,
and is achievable. Therefore, the closure of the set of achievable rates for all FV
codes is cl (RFV ) = {R : R ≥ λ}.
160
APPENDIX D
PROOFS FOR CHAPTER 6
D.1 Proof of Lemma 6.1
Proof: Let > 0 and fix a distribution pA . Let R > 0 be a rate to be specified
later. Generate a (2nR , n) code Cn by generating the codeword symbols i.i.d.
according to pA . For a given code Cn , let An be the random variable representing
the choice of a codeword an (i) with i uniformly distributed on {1, . . . , 2nR }, and
let En be the corresponding channel output. In other words,
pAn |Cn (an ) =
pEn |Cn (en ) =
1
d2nR e
,
1
X
d2nR e
an ∈C
n
pEn |An (en |an ),
∀an ∈ Cn ,
(D.1)
∀en ∈ En .
(D.2)
For a given code Cn , the mutual information between An and En is given by
I(An ; En |Cn ) = H(En |Cn ) − H(En |An Cn ).
(D.3)
Note that we introduce an explicit conditioning on Cn to distinguish the distribution induced by Cn from the distributions pAn and pEn defined as
n
p (x ) =
An
n
Y
pA (xi ),
i=1
n
pEn (e ) =
n
X Y
xn ∈Cn
i=1
pE|A (ei |xi )pA (xi ),
161
∀xn ∈ An ,
(D.4)
∀en ∈ En .
(D.5)
Next we show that if R > I(A; E), then
lim PCn[H(En |Cn ) ≥ n(H(E) − )] = 1.
(D.6)
n→∞
By [106, Lemma 19], it holds that
n
ECn V pEn |Cn ; pEn ≤ c 2− 2 (R−I(A;E)−δ()) ,
(D.7)
where c > 0 is a constant and lim→0 δ() = 0. Therefore, by Markov’s inequality,
h √ i
√
PCn V pEn |Cn ; pEn > 2− n ≤ 2 n ECn V pEn |Cn ; pEn
√
≤ c2
n− n
(R−I(A;E)−δ())
2
.
(D.8)
(D.9)
In particular, if R > I(A; E) + δ(), there exists γ > 0 such that
h √
√ i
PCn V pEn |Cn ; pEn > 2− n ≤ c 2 n−γ n .
(D.10)
By [107, Lemma 2.7], for any Cn ,
n
|H(E |Cn ) − nH(E)| ≤ V pEn |Cn ; pEn log
162
|E|n
.
V pEn |Cn ; pEn
(D.11)
n
n
Since the function x → x log |E|x is increasing for x ∈ (0, |E|2 ), we obtain
h
√ i
√
PCn |H(En |Cn ) − nH(E)| > (n log |E| + n)e− n
#
"
√ −√n
|E|n
> (n log |E| + n)2
≤ PCn V pEn |Cn ; pEn log V pEn |Cn ; pEn
h √ i
= PCn V pEn |Cn ; pEn > 2− n
√
≤ c2
n−γ n
(D.12)
(D.13)
(D.14)
and
lim PCn[H(En |Cn ) ≤ n(H(E) − )] = 0,
n→∞
(D.15)
which is equivalent to (D.6).
Returning to (D.3) and using the fact that conditioning does not increase
entropy leads to
I(An ; En |Cn ) = H(En |Cn ) − H(En |An Cn )
(D.16)
≥ H(En |Cn ) − H(En |An )
(D.17)
= H(En |Cn ) − nH(E|A).
(D.18)
Therefore,
PCn[I(An ; En |Cn ) ≥ n(I(A; E) − )] ≥ PCn[H(En |Cn ) − nH(E|A) ≥ n(I(A; E) − )]
(D.19)
= PCn[H(En |Cn ) ≥ n(H(E) − )] .
163
(D.20)
Combining (D.6) and (D.20) we have
lim PCn[I(An ; En |Cn ) ≥ n(I(A; E) − )] = 1,
n→∞
(D.21)
concluding the proof.
D.2 Proof of Proposition 6.2
Proof:
The proof follows [85, Appendix II] with some modifications; further
details can be found therein. Only the details of additional steps needed to show
that the secrecy condition is still met are included here for brevity.
From the definition of secrecy capacity at output rate λ we have Cs ≥ Cs (λ)
for all λ < µ1 . Thus, we only need to prove the converse statement, which we show
by contradiction. We assume the secrecy capacity exceeds the secrecy capacity
at output rate λ by at least some positive constant α for all λ < µ1 . From this,
we construct a sequence of (n, Mn , n/λ∗ , n , δn ) wiretap timing codes with rate
exceeding Cs − α, therefore contradicting the assumption that Cs > Cs (λ∗ ) + α.
Begin by selecting a sequence of (n, Mn , Tn , n , δn ) wiretap timing codes such
that δn → 0 and the conditions in [85] are satisfied. Define l = lim supn→∞ n/T ,
where, because average interdeparture times cannot be less than average service
times, we know that l ≤ µ1 . There are two cases to consider: l > 0 and l = 0.
For l > 0, a new sequence of codes is constructed by taking a particular subsequence of codes from those originally selected such that, in the subsequence,
n/Tn is sufficiently close to l. For this subsequence observe that there exist
(n, Mn , n/λ∗1 , n , δn ) codes with δn → 0 and whose rate satisfies rn > Cs − α,
resulting in a contradiction.
For l = 0, the original (n, Mn , Tn , n , δn ) sequence of codes with n/Tn → 0 is
164
used to construct a new sequence of (m, Mm , m/λ∗2 , Tn + m , δm ) codes, where for
any γ > 0 m = Tn /(1/λ∗2 − µ−1
1 − γ) and for sufficiently large n, m → 0. The new
sequence of codes is constructed by appending (m − n) (deterministic) arrivals at
time t = 0 to the front of the n interarrival times from the original code shifted
in time by a constant of (m − n)(µ−1
1 + γ).
Denote a codeword from the new code as Âm such that Âm−n is a vector of
zeros and Âm
m−n+1 is a vector of the n time shifted interarrival times from the
original codeword. Because the first (m − n) elements of Âm are deterministic, we
have
m
I(Âm ; Em ) = I(Âm
m−n+1 ; E ).
(D.22)
Although the codeword is declared in error if the first (m − n) symbols do not exit
the main queue before time t = (m−n)(µ−1
1 +γ), a similar condition at the wiretap
m
queue is not sufficient to guarantee secrecy. However, note that I(Âm
m−n+1 ; E )
can be equivalently obtained by transmitting the original codeword An through
a degraded version of the wiretap queue; whereby, before entering the wiretap
queue, packets are sent through a prefix channel that introduces a random delay
corresponding to the departure time of the (m − n)th packet. Therefore,
m
n
m
I(Âm
m−n+1 ; E ) ≤ I(A ; E ),
(D.23)
and δm → 0 as desired.
The choice of λ∗2 can be made such that (by construction) the new sequence of
codes satisfies rm > Cs − α, resulting in a contradiction and concluding the proof.
165
BIBLIOGRAPHY
[1] D. P. Bertsekas and R. G. Gallager, Data Networks, 2nd ed. Prentice Hall,
1991.
[2] J. Day and H. Zimmermann, “The OSI Reference Model,” Proc. IEEE,
vol. 71, no. 12, pp. 1334–1340, Dec. 1983.
[3] H. Zimmermann, “OSI Reference Model – The ISO Model of Architecture
for Open Systems Interconnection,” IEEE Trans. on Comm., vol. 28, no. 4,
pp. 425–432, Apr. 1980.
[4] J. Postel, “Internet Protocol,” RFC 791 (Standard), Internet Engineering
Task Force, Sep. 1981, updated by RFC 1349. [Online]. Available:
http://www.ietf.org/rfc/rfc791.txt
[5] ——, “Transmission Control Protocol,” RFC 793 (Standard), Internet
Engineering Task Force, Sep. 1981, updated by RFCs 1122, 3168. [Online].
Available: http://www.ietf.org/rfc/rfc793.txt
[6] Wireless LAN Business Unit, “Low Power Advantage of 802.11a/g vs.
802.11b,” Texas Instruments, Tech. Rep., Dec. 2003. [Online]. Available:
http://focus.ti.com/pdfs/bcg/80211 wp lowpower.pdf
[7] IEEE Std. 802.3-2008, “Carrier Sense Multiple Access with Collision Detection (CMSA/CD) Access Method and Physical Layer Specifications,” 2008.
[8] IEEE Std. 802.11-2007, “Wireless LAN Medium Access Control (MAC) and
Physical Layer (PHY) Specifications,” 2007.
[9] F. Tobagi and L. Kleinrock, “Packet Switching in Radio Channels: Part
II–The Hidden Terminal Problem in Carrier Sense Multiple-Access and the
Busy-Tone Solution,” IEEE Trans. on Comm., vol. 23, no. 12, pp. 1417–
1433, Dec. 1975.
166
[10] L. Kleinrock and F. Tobagi, “Packet Switching in Radio Channels: Part
I–Carrier Sense Multiple-Access Modes and Their Throughput-Delay Characteristics,” IEEE Trans. on Comm., vol. 23, no. 12, pp. 1400–1416, Dec.
1975.
[11] G. Bianchi, “Performance Analysis of the IEEE 802.11 Distributed Coordination Function,” IEEE J. Select. Areas Commun., vol. 18, no. 3, pp.
535–547, Mar. 2000.
[12] S. Lin and D. J. Costello, Error Control Coding, 2nd ed.
Apr. 2004.
Prentice Hall,
[13] I. E. Telatar and R. G. Gallager, “Combining Queueing Theory with Information Theory for Multiaccess,” IEEE J. Sel. Areas Commun., vol. 13,
no. 6, pp. 963–969, Aug. 1995.
[14] F. Kelly, Reversibility and Stochastic Networks. New York: Wiley,
1979. [Online]. Available: http://www.statslab.cam.ac.uk/∼frank/BOOKS/
kelly book.html
[15] A. Ephremides and B. Hajek, “Information Theory and Communication
Networks: An Unconsummated Union,” IEEE Trans. Inform. Theory,
vol. 44, no. 6, pp. 2416–2434, Oct. 1998.
[16] Telecommunications Industry Association Std. TIA/EIA/IS-99, “Data Services Option Standard for Wideband Spread Spectrum Digital Cellular System,” Jul. 1995.
[17] J. Sennott and L. Sennott, “A Queueing Model for Analysis of a Bursty
Multiple-Access Communication Channel,” IEEE Trans. Inform. Theory,
vol. 27, no. 3, pp. 317–321, May 1981.
[18] N. Abramson, “The ALOHA System: Another Alternative for Computer
Communications,” in Proc. AFIPS Joint Comp. Conf., Houston, Texas,
Nov. 1970, pp. 281–285.
[19] B. Tsybakov and V. Mikhailov, “Ergodicity of Slotted ALOHA System,”
Probl. Inform. Transm., vol. 15, no. 4, pp. 301–312, 1979.
167
[20] R. Rao and A. Ephremides, “On the Stability of Interacting Queues in a
Multiple-Access System,” IEEE Trans. Inform. Theory, vol. 34, no. 5, pp.
918–930, Sep. 1988.
[21] V. Anantharam, “The Stability Region of the Finite-User Slotted ALOHA
Protocol,” IEEE Trans. Inform. Theory, vol. 37, no. 3, pp. 535–540, May
1991.
[22] W. Luo and A. Ephremides, “Stability of N Interacting Queues in RandomAccess Systems,” IEEE Trans. Inform. Theory, vol. 45, no. 5, pp. 1579–1587,
Jul. 1999.
[23] J. L. Massey and P. Mathys, “The Collision Channel Without Feedback,”
IEEE Trans. Inform. Theory, vol. 31, no. 2, Mar. 1985.
[24] B. S. Tsybakov and N. B. Likhanov, “Packet Switching in a Channel Without Feedback,” Probl. Inform. Transm., vol. 19, no. 2, pp. 69–84, Apr.–Jun.
1983.
[25] J. Huber and A. Shah, “Simple Asynchronous Multiplex System for Unidirectional Low-Data-Rate Transmission,” IEEE Trans. on Comm., vol. 23,
no. 6, pp. 675–679, Jun. 1975.
[26] L. G. Roberts, “Dynamic Allocation of Satellite Capacity through Packet
Reservation,” in Proc. AFIPS Nat. Computer Conf. New York, NY, USA:
ACM, 1973, pp. 711–716.
[27] N. Abramson, “Packet Switching with Satellites,” in Proc. AFIPS Nat.
Computer Conf. New York: ACM, 1973, pp. 695–702.
[28] R. G. Gallager, “A Perspective on Multiaccess Channels,” IEEE Trans.
Inform. Theory, vol. 31, no. 2, pp. 124–142, Mar. 1985.
[29] J. Luo and A. Ephremides, “On the Throughput, Capacity, and Stability
Regions of Random Multiple Access,” IEEE Trans. Inform. Theory, vol. 52,
no. 6, pp. 2593–2607, Jun. 2006.
168
[30] L. Zheng and D. N. C. Tse, “Communication on the Grassmann Manifold: A
Geometric Approach to the Noncoherent Multiple-Antenna Channel,” IEEE
Trans. Inform. Theory, vol. 48, no. 2, pp. 359–383, Feb. 2002.
[31] I. C. Abou-Faycal, M. D. Trott, and S. Shamai (Shitz), “The Capacity
of Discrete-Time Memoryless Rayleigh-Fading Channels,” IEEE Trans. Inform. Theory, vol. 47, no. 4, pp. 1290–1301, May 2001.
[32] S. Kotagiri, “State-Dependent Networks with Side Information and Partial State Recovery,” Ph.D. Dissertation, University of Notre Dame, Notre
Dame, IN, Dec. 2007.
[33] S. Vedantam, W. Zhang, and U. Mitra, “Joint Channel Estimation and Data
Transmission: Achievable Rates,” in Information Theory Workshop, 2007.
ITW ’07. IEEE, Sep. 2007, pp. 499–504.
[34] W. Zhang, S. Vedantam, and U. Mitra, “A Constrained Channel Coding
Approach to Joint Communication and Channel Estimation,” in Proc. IEEE
Int. Symp. Information Theory (ISIT), Jul. 2008, pp. 930–934.
[35] S. Vedantam, W. Zhang, and U. Mitra, “Joint Communication and Channel
Estimation: The Two Hop Case,” in Proc. Allerton Conf. Communications,
Control, and Computing, Monticello, IL, Sep. 2008.
[36] W. Zhang, “The Role of Channel Correlation in Fading Communication
Channels,” Ph.D. Dissertation, University of Notre Dame, Notre Dame, IN,
Jul. 2006.
[37] V. Aggarwal and A. Sabharwal, “Slotted Gaussian Multiple Access Channel:
Stable Throughput Region and Role of Side Information,” in EURASIP
Journal on Wireless Communications and Networking, Mar. 2008.
[38] W. Jiang, Y. Zhu, and Z. Zhang, “Routing Overhead Minimization in LargeScale Wireless Mesh Networks,” in Proc. Vehicular Technology Conf. (VTC),
2007, pp. 1270–1274.
[39] M. Naserian, K. E. Tepe, and M. Tarique, “Routing Overhead Analysis for
Reactive Routing Protocols in Wireless Ad-Hoc Networks,” in Proc. IEEE
Int. Conf. Wireless and Mobile Comp. (WiMob), vol. 3, 2005, pp. 87–92.
169
[40] Z. Tao and G. Wu, “An Analytical Study on Routing Overhead of Two-Level
Cluster-Based Routing Protocols for Mobile Ad-Hoc Networks,” in Proc.
IEEE Int. Performance, Computing, and Communications Conf. (IPCCC),
2006, p. 8.
[41] Z. Ye and Y. Hua, “Networking by Parallel Relays: Diversity, Lifetime and
Routing Overhead,” in Proc. Asilomar Conf. Signals, Systems, and Computers, vol. 2, 2004, pp. 1302–1306.
[42] R. C. Timo and L. W. Hanlen, “MANETs: Routing Overhead and Reliability,” in Proc. Vehicular Technology Conf. (VTC), vol. 3, 2006, pp. 1107–
1111.
[43] N. Zhou and A. A. Abouzeid, “Information-Theoretic Lower Bounds on the
Routing Overhead in Mobile Ad-Hoc Networks,” in Proc. IEEE Int. Symp.
Information Theory (ISIT), Jun. 2003, p. 455.
[44] ——, “Routing in Ad-Hoc Networks: A Theoretical Framework with Practical Implications,” in Proc. IEEE INFOCOM, vol. 2, 2005, pp. 1240–1251.
[45] N. Zhou, H. Wu, and A. A. Abouzeid, “The Impact of Traffic Patterns on
the Overhead of Reactive Routing Protocols,” IEEE J. Sel. Areas Commun.,
vol. 23, no. 3, pp. 547–560, 2005.
[46] N. Bisnik and A. A. Abouzeid, “On The Capacity Deficit of Mobile
Wireless Ad Hoc Networks: A Rate Distortion Formulation,” IEEE Trans.
Inform. Theory, Mar. 2007, submitted for publication. [Online]. Available:
http://arxiv.org/abs/cs/0703050
[47] C. E. Shannon, “A Mathematical Theory of Communication,” Bell Syst.
Tech. J., vol. 27, pp. 379–423, 623–656, 1948.
[48] R. G. Gallager, Information Theory and Reliable Communication.
York: John Wiley & Sons, Inc., 1968.
New
[49] T. M. Cover and J. A. Thomas, Elements of Information Theory.
York: John Wiley & Sons, Inc., 1991.
New
170
[50] A. El Gamal and T. Cover, “Multiple User Information Theory,” Proceedings
of the IEEE, vol. 68, no. 12, pp. 1466–1483, Dec. 1980.
[51] S. Verdú, Multiuser Detection.
Press, 1998.
Cambridge, UK: Cambridge University
[52] R. G. Gallager, “Basic Limits on Protocol Information in Data Communication Networks,” IEEE Trans. Inform. Theory, vol. 22, no. 4, pp. 385–398,
Jul. 1976.
[53] T. Coleman, N. Kiyavash, and V. Subramanian, “The Rate-Distortion Function of a Poisson Process with a Queueing Distortion Measure,” in Data
Compression Conference, Mar. 2008, pp. 63–72.
[54] T. S. Han, Information Spectrum Methods in Information Theory. Berlin:
Springer, 2003.
[55] N. Gaarder and J. Wolf, “The Capacity Region of a Multiple-Access Discrete
Memoryless Channel Can Increase with Feedback (Corresp.),” IEEE Trans.
Inform. Theory, vol. 21, no. 1, pp. 100–102, Jan. 1975.
[56] C. E. Shannon, “The Zero Error Capacity of a Noisy Channel,” IEEE Trans.
Inform. Theory, vol. 2, no. 3, pp. 8–19, Sep. 1956.
[57] M. Gastpar and G. Kramer, “On Cooperation Via Noisy Feedback,” in Proc.
Int. Zurich Seminar on Communications (IZS), 2006, pp. 146–149.
[58] T. M. Cover and C. S. K. Leung, “An Achievable Rate Region for the
Multiple-Access Channel with Feedback,” IEEE Trans. Inform. Theory,
vol. 27, no. 3, pp. 292–298, May 1981.
[59] F. Willems, “The Feedback Capacity Region of a Class of Discrete Memoryless Multiple Access Channels (Corresp.),” IEEE Trans. Inform. Theory,
vol. 28, no. 1, pp. 93–95, Jan. 1982.
[60] A. B. Carleial, “Multiple-Access Channels with Different Generalized Feedback Signals,” IEEE Trans. Inform. Theory, vol. 28, no. 6, pp. 841–850,
Nov. 1982.
171
[61] A. P. Hekstra and F. M. J. Willems, “Dependence Balance Bounds for
Single-Output Two-Way Channels,” IEEE Trans. Inform. Theory, vol. 35,
no. 1, pp. 44–53, Jan. 1989.
[62] G. Kramer, “Capacity Results for the Discrete Memoryless Network,” IEEE
Trans. Inform. Theory, vol. 49, no. 1, pp. 4–21, Jan. 2003.
[63] L. H. Ozarow, “The Capacity of the White Gaussian Multiple Access Channel with Feedback,” IEEE Trans. Inform. Theory, vol. 30, no. 4, pp. 623–629,
Jul. 1984.
[64] R. Tandon and S. Ulukus, “Outer Bounds for Multiple-Access Channels
With Feedback Using Dependence Balance,” IEEE Trans. Inform. Theory,
vol. 55, no. 10, pp. 4494–4507, Oct. 2009.
[65] F. Willems, “On Multiple Access Channels with Feedback (Corresp.),” IEEE
Trans. Inform. Theory, vol. 30, no. 6, pp. 842–845, Nov. 1984.
[66] G. Kramer, “Feedback Strategies for a Class of Two-User Multiple-Access
Channels,” IEEE Trans. Inform. Theory, vol. 45, no. 6, pp. 2054–2059, Sep.
1999.
[67] A. P. Hekstra, “Dependence Balance Outer Bounds for the Equal Output
Two-Way Channel and the Multiple Access Channel with Feedback,” Master’s thesis, Eindhoven University of Technology, Information and Communication Theory Group, May 1985.
[68] R. Tandon and S. Ulukus, “Capacity Bounds for the Gaussian Interference
Channel with Transmitter Cooperation,” in Proc. IEEE Information Theory
Workshop (ITW), Jun. 2009, pp. 301–305.
[69] ——, “Dependence Balance Based Outer Bounds for Gaussian Networks
with Cooperation and Feedback,” IEEE Trans. Inform. Theory, 2008,
submitted for publication. [Online]. Available: http://arxiv.org/abs/0812.
1857
[70] H. H.-J. Liao, “Multiple Access Channels,” Ph.D. dissertation, University
of Hawaii, Honolulu, HI, Sep. 1972.
172
[71] E. C. van der Meulen, “The Discrete Memoryless Channel with Two Senders
and One Receiver,” in Proc. IEEE Int. Symp. Information Theory (ISIT),
Tsahkadsor, Armenian S.S.R., 1971, pp. 103–135.
[72] D. T. Daley and D. Vere-Jones, An Introduction to the Theory of Point
Processes. New York, NY: Springer, 2008.
[73] J. A. McFadden, “The Entropy of a Point Process,” SIAM J. Appl. Math.,
vol. 13, no. 4, pp. 988–994, Dec. 1965.
[74] I. Rubin, “Information Rates for Poisson Sequences,” IEEE Trans. Inform.
Theory, vol. 19, no. 3, pp. 283–294, May 1973.
[75] ——, “Information Rates and Data-Compression Schemes for Poisson Processes,” IEEE Trans. Inform. Theory, vol. 20, no. 2, pp. 200–210, Mar.
1974.
[76] R. Gray, D. Neuhoff, and J. Omura, “Process Definitions of Distortion-Rate
Functions and Source Coding Theorems,” IEEE Trans. Inform. Theory,
vol. 21, no. 5, pp. 524–532, Sep. 1975.
[77] R. M. Gray, Entropy and Information Theory. New York: Springer-Verlag,
1990.
[78] T. Berger, Rate-Distortion Theory: A Mathematical Basis for Data Compression, T. Kailath, Ed. Englewood Cliffs, NJ: Prentice-Hall, 1971.
[79] D. L. Neuhoff, “Source Coding and Distance Measures on Random Processes,” Ph.D. dissertation, Stanford University, Calif., 1974.
[80] R. M. Gray, D. L. Neuhoff, and D. S. Ornstein, “Nonblock Source Coding
with a Fidelity Criterion,” Ann. Probab., vol. 3, no. 3, pp. 478–491, 1975.
[81] R. Gray and D. Ornstein, “Sliding-Block Joint Source/Noisy-Channel Coding Theorems,” IEEE Trans. Inform. Theory, vol. 22, no. 6, pp. 682–690,
Nov. 1976.
[82] A. D. Wyner, “The Wire-Tap Channel,” Bell Syst. Tech. J., vol. 54, no. 8,
pp. 1355–1387, Oct. 1975.
173
[83] J. Giles and B. Hajek, “An Information-Theoretic and Game-Theoretic
Study of Timing Channels,” IEEE Trans. Inform. Theory, vol. 48, no. 9,
pp. 2455–2477, Sep. 2002.
[84] V. D. Gligor, A Guide to Understanding Covert Channel Analysis of Trusted
Systems. Fort George G. Meade, MD: The Center ; U.S. G.P.O., 1994, no.
NCSC-TG, 030.
[85] V. Anantharam and S. Verdú, “Bits Through Queues,” IEEE Trans. Inform.
Theory, vol. 42, no. 1, pp. 1290–1301, Jan. 1996.
[86] C. E. Shannon, “Certain Results in Coding Theory for Noisy Channels,”
Inform. Contr., vol. 1, pp. 6–25, Sep. 1957.
[87] R. M. Fano, Transmission of Information: A Statistical Theory of Communication. New York: M.I.T. Press, 1961.
[88] M. S. Pinsker, Information and Information Stability of Random Variables
and Processes. San Francisco: Holday-Day, 1964.
[89] T. S. Han and S. Verdú, “Approximation Theory of Output Statistics,”
IEEE Trans. Inform. Theory, vol. 39, no. 3, pp. 752–772, May 1993.
[90] A. Bedekar and M. Azizog̃lu, “The Information-Theoretic Capacity of
Discrete-Time Queues,” IEEE Trans. Inform. Theory, vol. 44, no. 2, pp.
446–461, Mar. 1998.
[91] J. Thomas, “On the Shannon Capacity of Discrete Time Queues,” in Proc.
IEEE Int. Symp. Information Theory (ISIT), Jun. 1997, p. 333.
[92] M. Bloch and J. N. Laneman, “On the Secrecy Capacity of Arbitrary Wiretap Channels,” in Proc. Allerton Conf. Communications, Control, and Computing, Monticello, IL, Oct. 2008.
[93] M. Hayashi, “General Nonasymptotic and Asymptotic Formulas in Channel
Resolvability and Identification Capacity and their Application to the Wiretap Channel,” IEEE Trans. Inform. Theory, vol. 52, no. 4, pp. 1562–1575,
Apr. 2006.
174
[94] U. Maurer and S. Wolf, “Information-Theoretic Key Agreement: From Weak
to Strong Secrecy for Free,” in Advances in Cryptology - Eurocrypt 2000.
Springer-Verlag, 2000, pp. 351–368, Lecture notes in computer science.
[95] I. Csiszár, “Almost Independence and Secrecy Capacity,” Probl. Inform.
Transm., vol. 32, no. 1, pp. 40–47, Jan.–Mar. 1996.
[96] B. P. Dunn and J. N. Laneman, “Basic Limits on Protocol Information in
Slotted Communication Networks,” in Proc. IEEE Int. Symp. Information
Theory (ISIT), Toronto, Canada, Jul. 2008, pp. 2302–2306. [Online].
Available: http://www.nd.edu/∼jnl/pubs/isit2008.pdf
[97] ——, “Rate-Delay Tradeoffs for Communicating a Bursty Source over
an Erasure Channel with Feedback,” in Proc. Int. Zurich Seminar on
Communications (IZS), Zurich, Switzerland, Mar. 2008. [Online]. Available:
http://www.nd.edu/∼jnl/pubs/izs2008.pdf
[98] B. P. Dunn, M. Bloch, and J. N. Laneman, “Secure Bits Through Queues,”
in Proc. IEEE Information Theory Workshop (ITW), Volos, Greece, Jun.
2009. [Online]. Available: http://www.nd.edu/∼jnl/pubs/itw2009a.pdf
[99] D. Tuninetti and G. Caire, “The Throughput of Some Wireless Multiaccess
Systems,” IEEE Trans. Inform. Theory, vol. 48, no. 10, pp. 2773–2785, Oct.
2002.
[100] P. Wu and N. Jindal, “Coding Versus ARQ in Fading Channels: How
reliable should the PHY be?” IEEE Trans. on Comm., Mar. 2010, submitted
for publication. [Online]. Available: http://arxiv.org/abs/0904.0226v2
[101] B. Hochwald and K. Zeger, “Tradeoff Between Source and Channel Coding,”
IEEE Trans. Inform. Theory, vol. 43, no. 5, pp. 1412 –1424, Sep. 1997.
[102] M. Bloch and J. Barros, Physical-Layer Security: From Information Theory
to Security Engineering, 2010, in preperation.
[103] S. Boyd and L. Vandenberghe, Convex Optimization.
Cambridge Univ. Press, 2005.
175
Cambridge, UK:
[104] H. Stark and J. W. Woods, Probability, Random Processes, and Estimation
Theory for Engineers. Englewood Cliffs, NJ: Prentice Hall, 1994.
[105] G. B. Thomas and R. L. Finney, Calculus and Analytic Geometry. Reading,
Mass: Addison-Wesley, 1998.
[106] P. W. Cuff, “Communication in Networks for Coordinating Behavior,” Ph.D.
dissertation, Princeton University, Jul. 2009.
[107] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete
Memoryless Systems. New York: Academic Press, 1981.
176
© Copyright 2026 Paperzz