Ad-Hoc Networks at Global Scale - University of California San Diego

100
INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING, AND COMMUNICATIONS (ICNC) 2013
Ad-Hoc Networks at Global Scale
Rene L. Cruz, Fellow, IEEE
Abstract—We argue that scalability in ad-hoc networks can
be achieved by re-defining the functionality for the information
transport system itself, where the functionality is driven by a
new type of communication paradigm inherent in information
dissemination applications. In particular, among the entire population of generated messages, each user desires only that the
personally “most interesting” messages are delivered to them we call this “star-to-one” communication.
In the paper we consider a “Zipf product form” model for
message preferences, and propose some decentralized algorithms
for message forwarding based on this model. We discuss some
simulation results for these algorithms, which suggest that it is
possible for the users to efficiently obtain the messages that are of
most interest to them. Essentially, the amount of “work” required
of each user, on the average, is proportional to the desired number
of messages to be received by each user, and is independent of
the number of users and the number of messages in the network.
Index Terms—Ad-hoc networks, wireless networking, peer-topeer networks, data-centric networks, social networking, Zipfdistribution
I. S CALING P ROBLEMS IN W IRELESS A D - HOC N ETWORKS
T
HE capability for ad-hoc wireless device-to-device communication between ubiquitous mobile devices such as
smart-phones and laptop computers opens up the possibility of
exciting new types of networking applications. From a practical technological point of view, the capabilities of short range
communication devices over un-licensed links are improving
exponentially, in terms of performance measures such as cost,
power consumption, throughput, and spectral efficiency. The
throughput available on a short range wireless communication
channel operating on an unlicensed band is often far greater
than what is available over a long distance wired communication channel, and moreover it is “free” to the user. It
is thus a compelling engineering problem to design a selfcontained wireless network whose communication resources
consist exclusively of short range un-licensed communication
links. This has led to many proposals for wireless ad-hoc
network designs (e.g. see [4] and the references therein).
From an application point of view, it is often desirable to
have geographically global connectivity, as well as a very large
population of nodes/users. Unfortunately, a common problem
shared by ad-hoc wireless network designs today, which has
been widely observed, is they typically do not scale well
with the network size, due to excessive protocol overhead.
As a result, an ad-hoc wireless network, or even any type of
wireless network associated with mobile phones today, has a
geographic span which is limited to a small area or region; In
order to support global connectivity, the wireless network is
connected hierarchically to an internetwork, e.g. the Internet,
with mostly wired communication resources. On the other
R. L. Cruz is with the Dept of Electrical and Computer Engineering, Univ.
of California, San Diego, La Jolla CA 92093-0407
hand, this approach is not self-contained, because it is heavily
reliant on a ubiquitous global communication infrastructure
that is reliable.
In this context, the Internet could certainly be argued to
be a ubiquitous and reliable communication infrastructure.
Indeed, one of the key design principles behind the Internet
is distributed control, so that it provides reliable communication even with link and node failures. However, practically
speaking, at a high level, the control of Internet infrastructure
is becoming increasingly centralized, and hence vulnerable to
certain types of attacks. More and more, the dissemination of
information occurs primarily through search engines, social
networking systems, and public messaging systems, which are
centralized systems controlled by only a few entities, each
with dominant market positions. The range of possible threats
includes not only denial of service attacks from malicious organizations, but organized disruptions intended to improperly
influence public opinion, as well as censorship. In addition,
users are increasingly becoming wary of sending, inadvertently
sharing, and intentionally storing sensitive personal information on centralized infrastructure which can be monitored and
mined, and sense a gradual erosion of privacy as an inevitable
price to be paid for utilizing essential information services.
II. E NABLING O PERATION AT G LOBAL S CALE
In view of these considerations, we are interested in the
problem of the design of a self-contained wireless ad-hoc
network, which can scale globally to a very large number
of users. Given the lack of scalability inherent with current
designs for wireless ad-hoc networks, we propose to relax
and/or change the requirements for such a network, in a
way that meaningful communication can still take place at
a global scale. For example, at any given instance of time,
the communication graph is usually disconnected at a global
scale because of nodes not being close enough to communicate
directly with each other. However, by exploiting the mobility
of the nodes, as well as continuing advances in technology
for digital storage, it can be argued that the communication
graph is indeed richly connected in a broader sense, if we
allow the messages to be temporarily be stored and forwarded
(a-la “sneaker-net”), and allow for message delay to be up to
several hours, or even days or weeks. This type of relaxation
has been considered before with “Delay Tolerant Network”
architecture proposals (e.g. see [3]). We propose a number
of other important practical relaxations and features in the
following which we believe can enable operation of ad-hoc
networks at a global scale.
A. Service Model: Star-1 instead of 1-1 or 1-Star
Most communication networks today are designed with an
infrastructure-centric point of view, and have an underlying
R. L. CRUZ: AD-HOC NETWORKS AT GLOBAL SCALE
101
service model where communication is supported between
any pair of network nodes, by forwarding packets though
an appropriately chosen route. We call this the one-to-one
(1-1) service model. In this model, the nodes are labelled
with “addresses” and delivery of arbitrarily defined data is
supported between any two nodes. The infrastructure-centric
approach is in some sense data-agnostic, and the infrastructure
is carefully managed to support arbitrary types of data.
In contrast, instead of the 1-1 service model for communication, we propose a different service model where each user
is interested in receiving a set of messages in which she is
“most interested in”, according to an underlying preference
function that ranks all of the messages from the personal
perspective of the user. We shall model this in terms of a
“happiness function” for each user that depends on the set of
messages they acquire. Generally, we are interested in finding
a distributed system protocol so that the total system happiness
is maximized, subject to constraints on resource utilization for
each user. We call this the “Star-to-One” (?-1) service model.
In our service model, each message is not necessarily
labelled with the address from which it originated. Indeed,
the concept of a node address is outside the scope of the
proposed model. Thus, in some sense, the model supports
anonymous communication, as well as the more traditional
type of communication in which the origin node is explicitly
identified. It is also possible that messages may be authored
by several nodes, who may cooperate to create a message.
In the proposed protocol model, messages are not necessarily
labeled with either source or destination addresses, but may
contain labels describing the context of the message and/or a
description of its content.
The ?-1 service model we consider here is closely related
but distinct from the “Interest-Casting” considered in [5] and
[2], which could be thought of as a “1-?” service model. In
this model, the network should deliver messages generated
by a node to all “interested users” within a certain period of
time. The ?-1 model we consider here is more data-centric
and defined by receiver preferences, rather than declared
preferences of senders under the 1-? model. As a consequence,
we believe the ?-1 model considered here is more resilient to
“spam” than the 1-? model.
take place over an unlicensed wireless communication channel. However, given that we want the framework to be
infrastructure-agnostic, this is not necessary. For example, the
communication could be wireless but over a licensed channel
owned by a service provider, or could be a wired Internet
connection. Indeed, such a generality allows use of the Internet
as a “link layer” within the framework we envision, but still
allows our envisioned framework to be a “self-contained” as
discussed earlier. This enables our framework to be deployed
before support for wireless peer-to-peer communication becomes ubiquitous on mobile devices. It also allows fixed nodes
with wired connections (e.g. desktop computers) to be a part
of the information dissemination framework we envision.
B. Network Model: Infrastructure-Agnostic, Pairwise Interactions
We shall assume there exists a common namespace L which
is used to label the messages. For example L could the space
of sequences of alpha-numeric characters up to a given length.
We assume that each message has a unique label l ∈ L which
we use to identify the message. In practice, the uniqueness
property can be achieved with high probability in a distributed
manner if the authors of messages use randomization to select
a subset of the label, and the cardinality of the namespace L
is sufficiently large.
A. Preference and Ranking Functions
We propose a network model where there is a large population of nodes, whose size may be varying with time. We
wish to minimize the assumptions made on the underlying
communication infrastructure, so that the framework may
be as general as possible. We shall simply assume that the
communication between nodes may occur in a pairwise manner. Multiple pairs of nodes may communicate concurrently,
though we do not know in advance which pairs or in what
order. Though we also allow communication between a given
single node and multiple other nodes to occur concurrently,
we shall assume each pairwise interaction between nodes may
proceed independently of other pairwise interactions.
For a given pairwise communication session between two
nodes, we envision that nominally, the communication may
C. Application Level Overlay Network
An important feature of this model is that it allows the
framework to be deployed as an application-level overlay
within existing mobile communication devices, and other
computational devices, that are already in wide use today. In
general, it does not require a new communication protocol
stack to be deployed within the operating system kernels of
the constituent nodes. Rather, it merely benefits from standardization of how commonly available infrastructure can be
configured to support the pairwise communication interactions
that define the communication model for the framework.
D. Data-centric message forwarding
In general, the two nodes involved in a pairwise communication interaction may not have control over how long the
communication session can last. Therefore, generally, it will
make sense to design the protocol accordingly. In particular,
the important decisions to be made center around which
messages, if any, should be exchanged with the peer, and in
what order. This is why we describe our framework as datacentric. In a sense, this data message selection problem is akin
to the design of a flow control policy, as well as a message
forwarding policy, in as much as the associated algorithms
determine which messages get delivered, and which paths the
messages follow. Next, we describe an abstract mathematical
model for messages and user preferences.
III. DATA M ODEL
We assume that there are M messages in the system.
Initially here we assume that M is constant with time, but
later we consider a model where the number of messages
in the system at time t, M (t), may increase with time. We
make a one to one correspondence between message and the
corresponding labels, so we let M(t) ⊂ L be the set of labels
102
INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING, AND COMMUNICATIONS (ICNC) 2013
of messages which exist at time t. We consider an arbitrary
user node, indexed by the integer i, and wish to describe user
i’s preferences among all the M messages. The preference
function Pi () is such that Pi (1) ∈ M is user i’s most favored
message, Pi (2) ∈ M is user i’s second most favored message,
etc. Message Pi (M ) is the label of node i’s least favorite
message. Note that this model is an abstract one - user i may
not explicitly be aware of the preference function Pi () because
she may not have seen or examined the set of all messages.
However, it is assumed that the answer to any queries to user i,
when given knowledge about the messages, will be consistent
with the underlying preference function. For example, suppose
the question is asked of user i - which item do you like more,
lA or lB ? In this case user i will answer she likes the message
with label lA better if and only if Ri (lA ) < Ri (lB ), where Ri
is the inverse function of Pi (), i.e. Ri (x) = y if Pi (y) = x.
We call the function Ri () the ranking function for user i; Note
that user i’s rank of message l ∈ M is Ri (l), where the rank
is r if it is the r- most preferred message.
At a given time t, a user may only be aware of a subset of
messages S(t) ⊂ M. We define the ranking function Rit () for
user i at time t to be the ranking of only the known messages
S(t) at time t. For example, if Rit (l) = 1 this implies that
l ∈ S(t) and Ri (m) > Ri (l) for all messages m ∈ S(t).
We consider a statistical model for the preference function
of a given user, which is consistent with so-called Zipf
distribution [1]. First, we assume there exists a global ranking
function R() for all messages, which in general is not initially
known to the users of the network. For this ranking function
R(), we denote by l1 the most popular message, and in
general lr is the r-most favored message. Thus R(lr ) = r
for r = 1, 2, . . . , M .
Conceptually, we can generate the preferences for user i by
placing M = |M| balls into an urn, where the volume of the
ball corresponding to message lr is r−Z . Each time we draw
a ball from the urn, we do so randomly in accordance with the
volume of the balls, so that the probability of picking a ball
is proportional to its volume. The probability that message lr
is
most favored message by user i is r−Z /K, where K =
Pthe
M
−Z
is a normalizing constant and Z ≥ 1 is the Zipf
r=1 r
parameter. To determine the second most favored message by
user i, the first randomly chosen ball is left outside the urn (no
“replacement”), and we again choose a ball at random from
those remaining – the identity of the ball will correspond to
the second most favored message. The process continues until
there is only one ball left in the urn, which will correspond the
least favored message by user i. The initial model we consider
is where the user preferences for distinct users are statistically
independent and identically distributed. We call this model the
“Zipf-product form” model for user preferences.
B. Weighting Functions
Note that message l1 is the most popular message in the
model, in a statistical sense, but it may not be the most favorite
for a given user i. Indeed message l1 is not even guaranteed to
be the most popular message among all users, in an empirical
sense, though it likely is. To define popularity concretely, we
define a weighting function, Wi (), for each user i. We require
W
(r) to be non-negative and non-increasing in r, such that
PiM
r=1 Wi (r) = 1. We interpret Wi (r) as the weight that user i
assigns to her message with rank r. Thus, the weight assigned
by user i to an arbitrary message l at time t is Wi (Rit (l)). An
example of such a ranking function Wi () is Wi (r) = 1/k for
r ∈ [1, k] and Wi (r) = 0 otherwise. This would correspond to
uniform weighting among the k-most favored items, for some
constant k. Exponentially decreasing weighting functions are
also possible.
C. Happiness and Global Popularity
Let Ci (t) ⊂ M(t) be the set of all messages that are cached
(stored) by user i at time t. We define the happiness of user
i at time t to be the sum
X
Wi (Ri (l)) .
Hi (t) =
l∈Ci (t)
Due to our normalization constraint we placed on the weighting function Wi (), it follows that 0 ≤ Hi (t) ≤ 1. Note here
that the ranking function Ri () for each user is defined relative
to the entire universe of messages at time t, M(t), and may
not be known to each user at time t. The normalized total
system happiness at time t is defined to be
1 X
Hi (t) .
H(t) =
N
i∈N
Note that 0 ≤ H(t) ≤ 1. Our ultimate aim is to maximize the
normalized total system happiness, subject to constraints on
resource use by each user.
We shall see that in order for users to discover the messages
they most prefer, it is useful to estimate the “popularity” of
each message. This is because on the average, users will prefer
messages that are more popular over messages that are less
popular. To define popularity, suppose that there are N nodes
in the network, indexed by the set N . We define the total
popularity of a message l to be T (l), where
X
T (l) =
Wi (Ri (l)) .
i∈N
In the algorithm we shall present shortly, users estimate
the global popularity T (l) for each message and attempt to
download the most popular messages first.
IV. S YSTEM L EVEL M ODELS
Now that we have defined the service model, the network
model, and the data message model, we propose a simple
system level problem formulation as well as a proposed
algorithm for information dissemination networks. In order to
help gain insight, we first consider a static problem where the
number of messages M , number of users N , and the user
preferences P (), are constant with time. We are interested in
the case where N and M are very large.
A. Static Case: Cooperative Information Dissemination
We suppose that initially, each message is replicated c − 1
times, where c ≥ 1, and that the c copies of each message
(including the original) are randomly distributed to the N
users, according to a uniform distribution. Thus, initially each
user is in possession of a random set of messages. The average
R. L. CRUZ: AD-HOC NETWORKS AT GLOBAL SCALE
number of messages at each node initially is thus equal to
cM/N .
Note that c is a parameter of the system that describes how
widely each message is distributed initially. For example, one
possible scenario covered with our model is the case where
each user generates one message (i.e. M = N ), and then
“pushes” the message to c − 1 other random users in the
network. It is also possible that the messages come from an
external source, and are randomly pushed to users, so that M
may be independent of N . Later we can consider the case
where different messages can have differing amounts initial
promotion - i.e. a different value of c for each message.
B. A Distributed Algorithm for Information Dissemination
Next, we describe a simple distributed algorithm whose
objective is to maximize total system happiness, for the static
message model above. What we describe here is a slight
adaptation of the another algorithm proposed in [7]. Each user
i maintains an ordered list of size ki , consisting of the labels of
the ki -most favored messages, in order of the users preference.
This list is called the local cache, since it depends on the users
(local) preferences. Initially, the local cache of each user holds
all of the messages initially present at the node, and the user
sorts the cache according to her own preferences. If the initial
number of messages (with mean cM/N ) for user i is larger
than the size of her cache, ki , then the least favorable messages
are discarded, and the cache for each user i holds the ki -most
preferred messages, among those initially distributed to her.
Each user also maintains a global list of messages she has
heard about, but does not necessarily have in her possession.
For each item in the global list, an estimated total popularity
value for that item is maintained, where the initial total
estimated popularity value for each item is zero. In order to
cope with the large number of possible messages, the estimated
popularity values for each item in the global list, as well as
the global list itself, can be implemented with a hash table.
The algorithm proceeds as follows. For simplicity we describe the algorithm in terms of synchronized time steps, called
iterations but it is easy to see that the algorithm can easily
be made asynchronous. In each time step, each user i may
poll another random user j in the system. Several pairwise
interactions (or “polls”) may happen in the same time step, but
here we describe the actions of a single pairwise interaction.
Several concurrent pairwise interactions may occur in a single
iteration of the algorithm.
When user i polls user j at time t, user j will then reply
with the ordered content of user j’s local cache, as well as the
weighting function preferred by user j. Equivalently, the polled
user j shares the values of Wj (Rjt (l)) for each message with
label l in user j’s local cache. When the polling user i receives
this information, the estimated total popularities for the items
in user j’s local cache are updated, within user i’s global list
of total popularity values. In particular, the estimated total
popularity value for an item l is incremented by the value
Wj (Rjt (l)), which can be determined by user j’s reply. Note
that the ranking function Rjt () here is relative to user j’s
knowledge of messages up to the current time t, which is
103
not necessarily the same as the ranking function Rj () for user
j assuming knowledge of all messages in the system.
After learning about the contents of user j’s local cache,
user i then determines which messages, if any, are in user j’s
local cache that user i has not seen before. Among this set
of “new” messages, user i determines which message has the
highest total popularity estimate, according to user i’s own
global list. User i then requests that message from user j.
Optionally, other messages contained in user j’s local cache
may also be requested, in order of user i’s ranking in terms of
total popularity. Here we assume that user j grants any such
request and sends the requested message(s) to user i.
After a user polls another user, and has received the requested message from that user, she examines that message
and compares it to each item in her local cache. The local
cache is then updated in accordance with the users preferences,
possibly adding the new message to the cache and causing the
least favorite message in the local cache to be removed from
the local cache. If a user receives multiple messages, then this
process is repeated for each new message received. This then
completes the operation of the system in the current time step.
In the next time step, the entire process is repeated.
C. Simulation Results
We ran several Monte Carlo simulation experiments for the
above model. In all of our experiments, we set N = M , so that
the number of messages and users are equal. We assume that
each users local cache size ki is identical, and ki = K = 100
messages for all users. In each system iteration, we assume
that each user polls another user in the network, chosen at
random. We set the initial replication factor to be c = 30 in
all simulations we report here. We assumed that each node
downloads at most one message in each iteration.
We assume that each user is interested in obtaining the
top G = K/2 = 50 messages, according to each users
preference function. Among the top G messages, each user
gives the messages equal priority. To model this, we set the
weighting function Wi () for each user to be the same, namely
Wi (r) = 1/G for r = 1, 2, . . . , G and Wi (r) = 0 otherwise.
Thus, each user’s total happiness is maximized with a value of
1 if and only if the top G messages among the entire universe
of messages are downloaded by that user, according to the
given users preferences.
Figure 1 shows the results of a single simulation run, with
N = M = 1000 and the Zipf parameter Z set to 2.25.
We found only a small percentage variation in the results
across different simulation runs for the same parameter values.
This suggests that when we let the network size N grow
to infinity, the normalized quantities of interest approach a
deterministic limit. We also ran the same simulations scenarios
but with the value N and M are both doubled, and the corresponding normalized results remained essentially the same.
This suggests that the simulation results are indicative of the
performance when N and M are scaled to infinity, leaving all
other parameters the same.
The bottom graph in the figure shows the normalized total
system happiness versus the iteration count per node. It is
apparent that the total system happiness quickly approaches
104
Fig. 1.
INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING, AND COMMUNICATIONS (ICNC) 2013
Typical behavior of simulation run. Zipf parameter Z = 2.25.
the optimal value of 1. In the figure, the normalized total system happiness has a value of 97% after roughly 350 iterations
per node. When the number of system iterations increased
beyond 350, for this particular simulation run, the normalized
system happiness converges to approximately 0.985, which
is strictly less than 1. After this convergence, the nodes
stop exchanging messages, in accordance with the algorithm
specification. Intuitively, in this case a few users with unusual
preferences are unable to find the messages they most prefer
because they are globally unpopular, and became “extinct”
from the system in the early stages of the algorithm execution.
It is believed that if messages are constantly re-introduced to
the system at a non-zero rate, then the total system happiness
could in fact converge to 100%, i.e. each user receives all of
her top G = 50 messages. From initial experimentation with
the parameters with the original model, it is apparent the stable
value of the normalized total system happiness can be made
arbitrarily close to 1 by setting the local cache size K and
initial replication factor c sufficiently large.
The top graph in the figure shows the average number of
messages downloaded per node versus the iteration count. In
order to normalize this quantity, we divide it by the cache
size K = 100. From the graph it is apparent that each node
downloads roughly 160 messages by the time the total system
happiness reaches 97%. Recall that each time a message is
downloaded by a user, the user needs to examine the message
and compare it to all messages in her local cache. Thus,
this quantity in some sense measures the total amount of
“work” that is done by each user. In the graph, it is apparent
that initial number of messages held in each local cache is
c = 30, corresponding to a normalized value of 0.3. This is
because we count the initial distribution of messages towards
the downloaded message count. Note that the slope of the
graph is equal to 1 until iteration count reaches 70, which is
when all of the local caches fill to capacity, since during this
time each node will always download exactly one message per
iteration.
We expect that the algorithm will tend to operate more
efficiently as the Zipf parameter Z is increased. To examine
this more closely, we repeated the simulation scenario above
for values of Z ranging from Z = 1 to Z = 3.5. (Larger
values of Z lead to machine-dependent numerical underflow
issues). For each value of Z we measured the required number
of iterations per node until the normalized system happiness
reaches 97%. From the figure it is apparent that this value is
roughly 350 when Z = 2.25. As Z varies over this range,
the number of required iterations changed from roughly 1400
(Z = 1) to 150 (Z = 3.5). With smaller values of Z, the
system exhibited more “randomness”, which was expected. We
also measured the required number of messages downloaded
(normalized again by the local cache size K) in order to reach
97% happiness, and this quantity ranged from about 3.0 (for
Z = 1) to about 1.3 (for Z = 3.5). This range for Z is thought
to be a “practical” one, based on empirical measurements
for known shared media types. We believe these experiments
suggest that the proposed algorithm can be made to operate
efficiently in practice.
D. Discussion
In practice, we expect that new messages in the system will
be constantly introduced, and user preferences may change
over time. In this scenario, we would like the algorithm to
“track” the changes in the system, so that at all times, each
user will retrieve the most interesting messages for that user.
The algorithm above can be modified for this case by adjusting
the calculation for global popularity estimation. In particular,
the global popularity estimates may be multiplied by a decay
factor η < 1 after each iteration. The constant η can be chosen
appropriately to match the time constant of change in the
system. We leave the details of this for future work.
Here we have assumed that each polled user complies with
any request for message downloading. In general, the user cost
associated with uploading messages to another user may be
non-negligible, and may need to be factored into the model
(e.g. see [6]). In practice, users may wish to store globally
popular messages, even if they do not rank high enough
to justify keeping them in the local cache. Each pairwise
interaction between nodes in general may in general support
“queries” about specific messages. By keeping a cache of
globally popular messages, each user is in a better position to
negotiate the terms of a possible transfer of messages between
the pair of nodes. This is an interesting area for future study.
R EFERENCES
[1] Lada A. Adamic,
Zipf, Power-law, Pareto - a ranking tutorial,
http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html
[2] G. Costantino, F. Martinelli, and P. Santi, Privacy-Preserving InterestCasting in Opportunistic Networks, Proc. IEEE Wireless Communications and Networking Conference (WCNC), 2012.
[3] K. Fall, A Delay Tolerant Network Architecture for Challenged Internets,
ACM SIGCOMM, August 2003.
[4] Yih-Chun Hu, David Johnson, and Adrian Perrig, SEAD: secure efficient
distance vector routing for mobile wireless ad hoc networks Ad Hoc
Networks, vol. 1, 2003. Elsevier publishing
[5] A. Mei, G. Marabito, P. Santi, J. Stefa, Social-Aware Stateless Forwarding
in Pocket Switched Networks, IEEE Infocom, 2011.
[6] T. Ning, Z. Yang, X. Xie, and H. Wu, Incentive-Aware Data Dissemination in Delay-Tolerant Mobile Networks. IEEE SECON, 2011.
[7] Efecan Poyraz and R. L. Cruz, Distributed Information Dissemination
CISS 2012, Princeton University, March 2012.