100 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING, AND COMMUNICATIONS (ICNC) 2013 Ad-Hoc Networks at Global Scale Rene L. Cruz, Fellow, IEEE Abstract—We argue that scalability in ad-hoc networks can be achieved by re-defining the functionality for the information transport system itself, where the functionality is driven by a new type of communication paradigm inherent in information dissemination applications. In particular, among the entire population of generated messages, each user desires only that the personally “most interesting” messages are delivered to them we call this “star-to-one” communication. In the paper we consider a “Zipf product form” model for message preferences, and propose some decentralized algorithms for message forwarding based on this model. We discuss some simulation results for these algorithms, which suggest that it is possible for the users to efficiently obtain the messages that are of most interest to them. Essentially, the amount of “work” required of each user, on the average, is proportional to the desired number of messages to be received by each user, and is independent of the number of users and the number of messages in the network. Index Terms—Ad-hoc networks, wireless networking, peer-topeer networks, data-centric networks, social networking, Zipfdistribution I. S CALING P ROBLEMS IN W IRELESS A D - HOC N ETWORKS T HE capability for ad-hoc wireless device-to-device communication between ubiquitous mobile devices such as smart-phones and laptop computers opens up the possibility of exciting new types of networking applications. From a practical technological point of view, the capabilities of short range communication devices over un-licensed links are improving exponentially, in terms of performance measures such as cost, power consumption, throughput, and spectral efficiency. The throughput available on a short range wireless communication channel operating on an unlicensed band is often far greater than what is available over a long distance wired communication channel, and moreover it is “free” to the user. It is thus a compelling engineering problem to design a selfcontained wireless network whose communication resources consist exclusively of short range un-licensed communication links. This has led to many proposals for wireless ad-hoc network designs (e.g. see [4] and the references therein). From an application point of view, it is often desirable to have geographically global connectivity, as well as a very large population of nodes/users. Unfortunately, a common problem shared by ad-hoc wireless network designs today, which has been widely observed, is they typically do not scale well with the network size, due to excessive protocol overhead. As a result, an ad-hoc wireless network, or even any type of wireless network associated with mobile phones today, has a geographic span which is limited to a small area or region; In order to support global connectivity, the wireless network is connected hierarchically to an internetwork, e.g. the Internet, with mostly wired communication resources. On the other R. L. Cruz is with the Dept of Electrical and Computer Engineering, Univ. of California, San Diego, La Jolla CA 92093-0407 hand, this approach is not self-contained, because it is heavily reliant on a ubiquitous global communication infrastructure that is reliable. In this context, the Internet could certainly be argued to be a ubiquitous and reliable communication infrastructure. Indeed, one of the key design principles behind the Internet is distributed control, so that it provides reliable communication even with link and node failures. However, practically speaking, at a high level, the control of Internet infrastructure is becoming increasingly centralized, and hence vulnerable to certain types of attacks. More and more, the dissemination of information occurs primarily through search engines, social networking systems, and public messaging systems, which are centralized systems controlled by only a few entities, each with dominant market positions. The range of possible threats includes not only denial of service attacks from malicious organizations, but organized disruptions intended to improperly influence public opinion, as well as censorship. In addition, users are increasingly becoming wary of sending, inadvertently sharing, and intentionally storing sensitive personal information on centralized infrastructure which can be monitored and mined, and sense a gradual erosion of privacy as an inevitable price to be paid for utilizing essential information services. II. E NABLING O PERATION AT G LOBAL S CALE In view of these considerations, we are interested in the problem of the design of a self-contained wireless ad-hoc network, which can scale globally to a very large number of users. Given the lack of scalability inherent with current designs for wireless ad-hoc networks, we propose to relax and/or change the requirements for such a network, in a way that meaningful communication can still take place at a global scale. For example, at any given instance of time, the communication graph is usually disconnected at a global scale because of nodes not being close enough to communicate directly with each other. However, by exploiting the mobility of the nodes, as well as continuing advances in technology for digital storage, it can be argued that the communication graph is indeed richly connected in a broader sense, if we allow the messages to be temporarily be stored and forwarded (a-la “sneaker-net”), and allow for message delay to be up to several hours, or even days or weeks. This type of relaxation has been considered before with “Delay Tolerant Network” architecture proposals (e.g. see [3]). We propose a number of other important practical relaxations and features in the following which we believe can enable operation of ad-hoc networks at a global scale. A. Service Model: Star-1 instead of 1-1 or 1-Star Most communication networks today are designed with an infrastructure-centric point of view, and have an underlying R. L. CRUZ: AD-HOC NETWORKS AT GLOBAL SCALE 101 service model where communication is supported between any pair of network nodes, by forwarding packets though an appropriately chosen route. We call this the one-to-one (1-1) service model. In this model, the nodes are labelled with “addresses” and delivery of arbitrarily defined data is supported between any two nodes. The infrastructure-centric approach is in some sense data-agnostic, and the infrastructure is carefully managed to support arbitrary types of data. In contrast, instead of the 1-1 service model for communication, we propose a different service model where each user is interested in receiving a set of messages in which she is “most interested in”, according to an underlying preference function that ranks all of the messages from the personal perspective of the user. We shall model this in terms of a “happiness function” for each user that depends on the set of messages they acquire. Generally, we are interested in finding a distributed system protocol so that the total system happiness is maximized, subject to constraints on resource utilization for each user. We call this the “Star-to-One” (?-1) service model. In our service model, each message is not necessarily labelled with the address from which it originated. Indeed, the concept of a node address is outside the scope of the proposed model. Thus, in some sense, the model supports anonymous communication, as well as the more traditional type of communication in which the origin node is explicitly identified. It is also possible that messages may be authored by several nodes, who may cooperate to create a message. In the proposed protocol model, messages are not necessarily labeled with either source or destination addresses, but may contain labels describing the context of the message and/or a description of its content. The ?-1 service model we consider here is closely related but distinct from the “Interest-Casting” considered in [5] and [2], which could be thought of as a “1-?” service model. In this model, the network should deliver messages generated by a node to all “interested users” within a certain period of time. The ?-1 model we consider here is more data-centric and defined by receiver preferences, rather than declared preferences of senders under the 1-? model. As a consequence, we believe the ?-1 model considered here is more resilient to “spam” than the 1-? model. take place over an unlicensed wireless communication channel. However, given that we want the framework to be infrastructure-agnostic, this is not necessary. For example, the communication could be wireless but over a licensed channel owned by a service provider, or could be a wired Internet connection. Indeed, such a generality allows use of the Internet as a “link layer” within the framework we envision, but still allows our envisioned framework to be a “self-contained” as discussed earlier. This enables our framework to be deployed before support for wireless peer-to-peer communication becomes ubiquitous on mobile devices. It also allows fixed nodes with wired connections (e.g. desktop computers) to be a part of the information dissemination framework we envision. B. Network Model: Infrastructure-Agnostic, Pairwise Interactions We shall assume there exists a common namespace L which is used to label the messages. For example L could the space of sequences of alpha-numeric characters up to a given length. We assume that each message has a unique label l ∈ L which we use to identify the message. In practice, the uniqueness property can be achieved with high probability in a distributed manner if the authors of messages use randomization to select a subset of the label, and the cardinality of the namespace L is sufficiently large. A. Preference and Ranking Functions We propose a network model where there is a large population of nodes, whose size may be varying with time. We wish to minimize the assumptions made on the underlying communication infrastructure, so that the framework may be as general as possible. We shall simply assume that the communication between nodes may occur in a pairwise manner. Multiple pairs of nodes may communicate concurrently, though we do not know in advance which pairs or in what order. Though we also allow communication between a given single node and multiple other nodes to occur concurrently, we shall assume each pairwise interaction between nodes may proceed independently of other pairwise interactions. For a given pairwise communication session between two nodes, we envision that nominally, the communication may C. Application Level Overlay Network An important feature of this model is that it allows the framework to be deployed as an application-level overlay within existing mobile communication devices, and other computational devices, that are already in wide use today. In general, it does not require a new communication protocol stack to be deployed within the operating system kernels of the constituent nodes. Rather, it merely benefits from standardization of how commonly available infrastructure can be configured to support the pairwise communication interactions that define the communication model for the framework. D. Data-centric message forwarding In general, the two nodes involved in a pairwise communication interaction may not have control over how long the communication session can last. Therefore, generally, it will make sense to design the protocol accordingly. In particular, the important decisions to be made center around which messages, if any, should be exchanged with the peer, and in what order. This is why we describe our framework as datacentric. In a sense, this data message selection problem is akin to the design of a flow control policy, as well as a message forwarding policy, in as much as the associated algorithms determine which messages get delivered, and which paths the messages follow. Next, we describe an abstract mathematical model for messages and user preferences. III. DATA M ODEL We assume that there are M messages in the system. Initially here we assume that M is constant with time, but later we consider a model where the number of messages in the system at time t, M (t), may increase with time. We make a one to one correspondence between message and the corresponding labels, so we let M(t) ⊂ L be the set of labels 102 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING, AND COMMUNICATIONS (ICNC) 2013 of messages which exist at time t. We consider an arbitrary user node, indexed by the integer i, and wish to describe user i’s preferences among all the M messages. The preference function Pi () is such that Pi (1) ∈ M is user i’s most favored message, Pi (2) ∈ M is user i’s second most favored message, etc. Message Pi (M ) is the label of node i’s least favorite message. Note that this model is an abstract one - user i may not explicitly be aware of the preference function Pi () because she may not have seen or examined the set of all messages. However, it is assumed that the answer to any queries to user i, when given knowledge about the messages, will be consistent with the underlying preference function. For example, suppose the question is asked of user i - which item do you like more, lA or lB ? In this case user i will answer she likes the message with label lA better if and only if Ri (lA ) < Ri (lB ), where Ri is the inverse function of Pi (), i.e. Ri (x) = y if Pi (y) = x. We call the function Ri () the ranking function for user i; Note that user i’s rank of message l ∈ M is Ri (l), where the rank is r if it is the r- most preferred message. At a given time t, a user may only be aware of a subset of messages S(t) ⊂ M. We define the ranking function Rit () for user i at time t to be the ranking of only the known messages S(t) at time t. For example, if Rit (l) = 1 this implies that l ∈ S(t) and Ri (m) > Ri (l) for all messages m ∈ S(t). We consider a statistical model for the preference function of a given user, which is consistent with so-called Zipf distribution [1]. First, we assume there exists a global ranking function R() for all messages, which in general is not initially known to the users of the network. For this ranking function R(), we denote by l1 the most popular message, and in general lr is the r-most favored message. Thus R(lr ) = r for r = 1, 2, . . . , M . Conceptually, we can generate the preferences for user i by placing M = |M| balls into an urn, where the volume of the ball corresponding to message lr is r−Z . Each time we draw a ball from the urn, we do so randomly in accordance with the volume of the balls, so that the probability of picking a ball is proportional to its volume. The probability that message lr is most favored message by user i is r−Z /K, where K = Pthe M −Z is a normalizing constant and Z ≥ 1 is the Zipf r=1 r parameter. To determine the second most favored message by user i, the first randomly chosen ball is left outside the urn (no “replacement”), and we again choose a ball at random from those remaining – the identity of the ball will correspond to the second most favored message. The process continues until there is only one ball left in the urn, which will correspond the least favored message by user i. The initial model we consider is where the user preferences for distinct users are statistically independent and identically distributed. We call this model the “Zipf-product form” model for user preferences. B. Weighting Functions Note that message l1 is the most popular message in the model, in a statistical sense, but it may not be the most favorite for a given user i. Indeed message l1 is not even guaranteed to be the most popular message among all users, in an empirical sense, though it likely is. To define popularity concretely, we define a weighting function, Wi (), for each user i. We require W (r) to be non-negative and non-increasing in r, such that PiM r=1 Wi (r) = 1. We interpret Wi (r) as the weight that user i assigns to her message with rank r. Thus, the weight assigned by user i to an arbitrary message l at time t is Wi (Rit (l)). An example of such a ranking function Wi () is Wi (r) = 1/k for r ∈ [1, k] and Wi (r) = 0 otherwise. This would correspond to uniform weighting among the k-most favored items, for some constant k. Exponentially decreasing weighting functions are also possible. C. Happiness and Global Popularity Let Ci (t) ⊂ M(t) be the set of all messages that are cached (stored) by user i at time t. We define the happiness of user i at time t to be the sum X Wi (Ri (l)) . Hi (t) = l∈Ci (t) Due to our normalization constraint we placed on the weighting function Wi (), it follows that 0 ≤ Hi (t) ≤ 1. Note here that the ranking function Ri () for each user is defined relative to the entire universe of messages at time t, M(t), and may not be known to each user at time t. The normalized total system happiness at time t is defined to be 1 X Hi (t) . H(t) = N i∈N Note that 0 ≤ H(t) ≤ 1. Our ultimate aim is to maximize the normalized total system happiness, subject to constraints on resource use by each user. We shall see that in order for users to discover the messages they most prefer, it is useful to estimate the “popularity” of each message. This is because on the average, users will prefer messages that are more popular over messages that are less popular. To define popularity, suppose that there are N nodes in the network, indexed by the set N . We define the total popularity of a message l to be T (l), where X T (l) = Wi (Ri (l)) . i∈N In the algorithm we shall present shortly, users estimate the global popularity T (l) for each message and attempt to download the most popular messages first. IV. S YSTEM L EVEL M ODELS Now that we have defined the service model, the network model, and the data message model, we propose a simple system level problem formulation as well as a proposed algorithm for information dissemination networks. In order to help gain insight, we first consider a static problem where the number of messages M , number of users N , and the user preferences P (), are constant with time. We are interested in the case where N and M are very large. A. Static Case: Cooperative Information Dissemination We suppose that initially, each message is replicated c − 1 times, where c ≥ 1, and that the c copies of each message (including the original) are randomly distributed to the N users, according to a uniform distribution. Thus, initially each user is in possession of a random set of messages. The average R. L. CRUZ: AD-HOC NETWORKS AT GLOBAL SCALE number of messages at each node initially is thus equal to cM/N . Note that c is a parameter of the system that describes how widely each message is distributed initially. For example, one possible scenario covered with our model is the case where each user generates one message (i.e. M = N ), and then “pushes” the message to c − 1 other random users in the network. It is also possible that the messages come from an external source, and are randomly pushed to users, so that M may be independent of N . Later we can consider the case where different messages can have differing amounts initial promotion - i.e. a different value of c for each message. B. A Distributed Algorithm for Information Dissemination Next, we describe a simple distributed algorithm whose objective is to maximize total system happiness, for the static message model above. What we describe here is a slight adaptation of the another algorithm proposed in [7]. Each user i maintains an ordered list of size ki , consisting of the labels of the ki -most favored messages, in order of the users preference. This list is called the local cache, since it depends on the users (local) preferences. Initially, the local cache of each user holds all of the messages initially present at the node, and the user sorts the cache according to her own preferences. If the initial number of messages (with mean cM/N ) for user i is larger than the size of her cache, ki , then the least favorable messages are discarded, and the cache for each user i holds the ki -most preferred messages, among those initially distributed to her. Each user also maintains a global list of messages she has heard about, but does not necessarily have in her possession. For each item in the global list, an estimated total popularity value for that item is maintained, where the initial total estimated popularity value for each item is zero. In order to cope with the large number of possible messages, the estimated popularity values for each item in the global list, as well as the global list itself, can be implemented with a hash table. The algorithm proceeds as follows. For simplicity we describe the algorithm in terms of synchronized time steps, called iterations but it is easy to see that the algorithm can easily be made asynchronous. In each time step, each user i may poll another random user j in the system. Several pairwise interactions (or “polls”) may happen in the same time step, but here we describe the actions of a single pairwise interaction. Several concurrent pairwise interactions may occur in a single iteration of the algorithm. When user i polls user j at time t, user j will then reply with the ordered content of user j’s local cache, as well as the weighting function preferred by user j. Equivalently, the polled user j shares the values of Wj (Rjt (l)) for each message with label l in user j’s local cache. When the polling user i receives this information, the estimated total popularities for the items in user j’s local cache are updated, within user i’s global list of total popularity values. In particular, the estimated total popularity value for an item l is incremented by the value Wj (Rjt (l)), which can be determined by user j’s reply. Note that the ranking function Rjt () here is relative to user j’s knowledge of messages up to the current time t, which is 103 not necessarily the same as the ranking function Rj () for user j assuming knowledge of all messages in the system. After learning about the contents of user j’s local cache, user i then determines which messages, if any, are in user j’s local cache that user i has not seen before. Among this set of “new” messages, user i determines which message has the highest total popularity estimate, according to user i’s own global list. User i then requests that message from user j. Optionally, other messages contained in user j’s local cache may also be requested, in order of user i’s ranking in terms of total popularity. Here we assume that user j grants any such request and sends the requested message(s) to user i. After a user polls another user, and has received the requested message from that user, she examines that message and compares it to each item in her local cache. The local cache is then updated in accordance with the users preferences, possibly adding the new message to the cache and causing the least favorite message in the local cache to be removed from the local cache. If a user receives multiple messages, then this process is repeated for each new message received. This then completes the operation of the system in the current time step. In the next time step, the entire process is repeated. C. Simulation Results We ran several Monte Carlo simulation experiments for the above model. In all of our experiments, we set N = M , so that the number of messages and users are equal. We assume that each users local cache size ki is identical, and ki = K = 100 messages for all users. In each system iteration, we assume that each user polls another user in the network, chosen at random. We set the initial replication factor to be c = 30 in all simulations we report here. We assumed that each node downloads at most one message in each iteration. We assume that each user is interested in obtaining the top G = K/2 = 50 messages, according to each users preference function. Among the top G messages, each user gives the messages equal priority. To model this, we set the weighting function Wi () for each user to be the same, namely Wi (r) = 1/G for r = 1, 2, . . . , G and Wi (r) = 0 otherwise. Thus, each user’s total happiness is maximized with a value of 1 if and only if the top G messages among the entire universe of messages are downloaded by that user, according to the given users preferences. Figure 1 shows the results of a single simulation run, with N = M = 1000 and the Zipf parameter Z set to 2.25. We found only a small percentage variation in the results across different simulation runs for the same parameter values. This suggests that when we let the network size N grow to infinity, the normalized quantities of interest approach a deterministic limit. We also ran the same simulations scenarios but with the value N and M are both doubled, and the corresponding normalized results remained essentially the same. This suggests that the simulation results are indicative of the performance when N and M are scaled to infinity, leaving all other parameters the same. The bottom graph in the figure shows the normalized total system happiness versus the iteration count per node. It is apparent that the total system happiness quickly approaches 104 Fig. 1. INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING, AND COMMUNICATIONS (ICNC) 2013 Typical behavior of simulation run. Zipf parameter Z = 2.25. the optimal value of 1. In the figure, the normalized total system happiness has a value of 97% after roughly 350 iterations per node. When the number of system iterations increased beyond 350, for this particular simulation run, the normalized system happiness converges to approximately 0.985, which is strictly less than 1. After this convergence, the nodes stop exchanging messages, in accordance with the algorithm specification. Intuitively, in this case a few users with unusual preferences are unable to find the messages they most prefer because they are globally unpopular, and became “extinct” from the system in the early stages of the algorithm execution. It is believed that if messages are constantly re-introduced to the system at a non-zero rate, then the total system happiness could in fact converge to 100%, i.e. each user receives all of her top G = 50 messages. From initial experimentation with the parameters with the original model, it is apparent the stable value of the normalized total system happiness can be made arbitrarily close to 1 by setting the local cache size K and initial replication factor c sufficiently large. The top graph in the figure shows the average number of messages downloaded per node versus the iteration count. In order to normalize this quantity, we divide it by the cache size K = 100. From the graph it is apparent that each node downloads roughly 160 messages by the time the total system happiness reaches 97%. Recall that each time a message is downloaded by a user, the user needs to examine the message and compare it to all messages in her local cache. Thus, this quantity in some sense measures the total amount of “work” that is done by each user. In the graph, it is apparent that initial number of messages held in each local cache is c = 30, corresponding to a normalized value of 0.3. This is because we count the initial distribution of messages towards the downloaded message count. Note that the slope of the graph is equal to 1 until iteration count reaches 70, which is when all of the local caches fill to capacity, since during this time each node will always download exactly one message per iteration. We expect that the algorithm will tend to operate more efficiently as the Zipf parameter Z is increased. To examine this more closely, we repeated the simulation scenario above for values of Z ranging from Z = 1 to Z = 3.5. (Larger values of Z lead to machine-dependent numerical underflow issues). For each value of Z we measured the required number of iterations per node until the normalized system happiness reaches 97%. From the figure it is apparent that this value is roughly 350 when Z = 2.25. As Z varies over this range, the number of required iterations changed from roughly 1400 (Z = 1) to 150 (Z = 3.5). With smaller values of Z, the system exhibited more “randomness”, which was expected. We also measured the required number of messages downloaded (normalized again by the local cache size K) in order to reach 97% happiness, and this quantity ranged from about 3.0 (for Z = 1) to about 1.3 (for Z = 3.5). This range for Z is thought to be a “practical” one, based on empirical measurements for known shared media types. We believe these experiments suggest that the proposed algorithm can be made to operate efficiently in practice. D. Discussion In practice, we expect that new messages in the system will be constantly introduced, and user preferences may change over time. In this scenario, we would like the algorithm to “track” the changes in the system, so that at all times, each user will retrieve the most interesting messages for that user. The algorithm above can be modified for this case by adjusting the calculation for global popularity estimation. In particular, the global popularity estimates may be multiplied by a decay factor η < 1 after each iteration. The constant η can be chosen appropriately to match the time constant of change in the system. We leave the details of this for future work. Here we have assumed that each polled user complies with any request for message downloading. In general, the user cost associated with uploading messages to another user may be non-negligible, and may need to be factored into the model (e.g. see [6]). In practice, users may wish to store globally popular messages, even if they do not rank high enough to justify keeping them in the local cache. Each pairwise interaction between nodes in general may in general support “queries” about specific messages. By keeping a cache of globally popular messages, each user is in a better position to negotiate the terms of a possible transfer of messages between the pair of nodes. This is an interesting area for future study. R EFERENCES [1] Lada A. Adamic, Zipf, Power-law, Pareto - a ranking tutorial, http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html [2] G. Costantino, F. Martinelli, and P. Santi, Privacy-Preserving InterestCasting in Opportunistic Networks, Proc. IEEE Wireless Communications and Networking Conference (WCNC), 2012. [3] K. Fall, A Delay Tolerant Network Architecture for Challenged Internets, ACM SIGCOMM, August 2003. [4] Yih-Chun Hu, David Johnson, and Adrian Perrig, SEAD: secure efficient distance vector routing for mobile wireless ad hoc networks Ad Hoc Networks, vol. 1, 2003. Elsevier publishing [5] A. Mei, G. Marabito, P. Santi, J. Stefa, Social-Aware Stateless Forwarding in Pocket Switched Networks, IEEE Infocom, 2011. [6] T. Ning, Z. Yang, X. Xie, and H. Wu, Incentive-Aware Data Dissemination in Delay-Tolerant Mobile Networks. IEEE SECON, 2011. [7] Efecan Poyraz and R. L. Cruz, Distributed Information Dissemination CISS 2012, Princeton University, March 2012.
© Copyright 2026 Paperzz