PACK: Prediction-Based Cloud Bandwidth and Cost Reduction System Abstract: In this paper present PACK (Predictive ACKs), a novel end-to-end traffic redundancy elimination (TRE) system, designed for cloud computing customers. Cloud-based TRE needs to apply a judicious use of cloud resources so that the bandwidth cost reduction combined with the additional cost of TRE computation and storage would be optimized. PACK’s main advantage is its capability of offloading the cloud-server TRE effort to end-clients, thus minimizing the processing costs induced by the TRE algorithm. Unlike previous solutions, PACK does not require the server to continuously maintain clients’ status. This makes PACK very suitable for pervasive computation environments that combine client mobility and server migration to maintain cloud elasticity. PACK is based on a novel TRE technique, which allows the client to use newly received chunks to identify previously received chunk chains, which in turn can be used as reliable predictors to future transmitted chunks. It presented a fully functional PACK implementation, transparent to all TCP-based applications and net-work devices. Finally, it will analyze PACK benefits for cloud users, using traffic traces from various sources. Introduction: Cloud customers pay only for the actual use of computing resources, storage, and bandwidth, according to their changing needs, utilizing the cloud’s scalable and elastic computational capabilities. In particular, data transfer costs (i.e., bandwidth) is an important issue when trying to minimize costs. Consequently, cloud customers, applying a judicious use of the cloud’s resources, are motivated to use various traffic reduction techniques, in particular traffic redundancy elimination (TRE), for reducing bandwidth costs. Traffic redundancy stems from common end-users’ activities, such as repeatedly accessing, downloading, uploading (i.e., backup), distributing, and modifying the same or similar information items (documents, data, Web, and video). TRE is used to eliminate the transmission of redundant content and, therefore, to significantly reduce the network cost. In most common TRE solutions, both the sender and the receiver examine and compare signatures of data chunks, parsed according to the data content, prior to their transmission. When redundant chunks are detected, the sender replaces the transmission of each redundant chunk with its strong signature. Commercial TRE solutions are popular at enterprise networks, and involve the deployment of two or more proprietary-protocol, state synchronized middle-boxes at both the intranet entry points of data centers and branch offices, eliminating repetitive traffic between them (e.g., Cisco, Riverbed, Quantum, Juniper, Blue Coat, Expand Networks, and F5). While proprietary middle-boxes are popular point solutions within enterprises, they are not as attractive in a cloud environment. Cloud providers cannot benefit from a technology whose goal is to reduce customer bandwidth bills, and thus are not likely to invest in one. The rise of “on-demand” work spaces, meeting rooms, and workfrom-home solutions detaches the workers from their offices. In such a dynamic work environment, fixed-point solutions that require a client-side and a server-side middle-box pair become ineffective. On the other hand, cloud-side elasticity motivates work distribution among servers and migration among data centers. Therefore, it is commonly agreed that a universal, software-based, end-to-end TRE is crucial in today’s pervasive environment. This enables the use of a standard protocol stack and makes a TRE within end-to-end secured traffic (e.g., SSL) possible. Current end-to-end TRE solutions are sender-based. In the case where the cloud server is the sender, these solutions require that the server continuously maintain clients’ status. It show here that cloud elasticity calls for a new TRE solution. First, cloud load balancing and power optimizations may lead to a server-side process and data migration environment, in which TRE solutions that require full synchronization between the server and the client are hard to accomplish or may lose efficiency due to lost synchronization. Second, the popularity of rich media that consume high bandwidth motivates content distribution network (CDN) solutions, in which the service point for fixed and mobile users may change dynamically according to the relative service point locations and loads. Clearly, a TRE solution that puts most of its computational effort on the cloud side may turn to be less cost-effective than the one that leverages the combined client-side capabilities. Given an end-to-end solution, it have found through our experiments that sender-based end-to-end TRE solutions add a considerable load to the servers, which may eradicate the cloud cost saving addressed by the TRE in the first place. This experiment further show that current end-to-end solutions also suffer from the requirement to maintain end-to-end synchronization that may result in degraded TRE efficiency. In this paper present a novel receiver-based end-to-end TRE solution that relies on the power of predictions to eliminate redundant traffic between the cloud and its end-users. In this solution, each receiver observes the incoming stream and tries to match its chunks with a previously received chunk chain or a chunk chain of a local file. Using the long-term chunks’ meta-data information kept locally, the receiver sends to the server predictions that include chunks’ signatures and easy-toverify hints of the sender’s future data. The sender first examines the hint and performs the TRE operation only on a hint-match. The purpose of this procedure is to avoid the expensive TRE computation at the sender side in the absence of traffic redundancy. When redundancy is detected, the sender then sends to the receiver only the ACKs to the predictions, instead of sending the data. On the receiver side, it proposed a new computationally lightweight chunking (fingerprinting) scheme termed PACK chunking. PACK chunking is a new alternative for Rabin fingerprinting traditionally used by RE applications. Experiments show that this approach can reach data processing speeds over 3 Gb/s, at least 20% faster than Rabin fingerprinting. Literature Review: Finding Similar Files In A Large File System This paper presents a tool, called sif, for finding all similar files in a large file system. Files are considered similar if they have significant number of common pieces, even if they are very different otherwise. For example, one file may be contained, possibly with some changes, in another file, or a file may be a reorganization of another file. The running time for finding all groups of similar files, even for as little as 25% similarity, is on the order of 500MB to 1GB an hour. The amount of similarity and several other customized parameters can be determined by the user at a post-processing stage, which is very fast. Sif can also be used to very quickly identify all similar files to a query file using a preprocessed index. Application of sif can be found in file management, information collecting (to remove duplicates), program reuse, file synchronization, data compression, and maybe even plagiarism detection. In this mode, a given file, let’s call it the query file, is compared to a large set of files that have already been ‘‘fingerprinted.’’ The collection of all fingerprints, which it will denote by All Fingers, is maintained in one large file. With each fingerprint we must associate the file it came from. It will do that by maintaining the names of all files that were fingerprinted, and associating with each fingerprint the number of the corresponding file. The first thing it will do is generate the set Query Fingers of all fingerprints for the query file. It now had to look in All Fingers and compare all fingerprints there to those of Query Fingers. Searching a set is one of the most basic data structure problems and there are many ways to handle it; the most common techniques use hashing or tree structures. In this case, it can also sort both sets and intersect them. But it found that a simple solution using multi-pattern matching was just as effective. It stores the fingerprints in All Fingers as it obtained them without providing any other structure, putting one fingerprint together with its file number per line. Then it used to search the file All Fingers using Query Fingers as the set of patterns. The output of the search is the list of all common fingerprints to Query Fingers and All Fingers. As long as the set All Fingers is no more than a few megabytes, this search is very effective. Comparing all files against all other files is more complicated. It turns out that providing a good way to view the output is one of the major difficulties. The output is a set of sets, or a hyper graph, with some similarity relationships among the elements. Hyper graphs are very hard to view. It discussed here one approach that is an extension of the one against-all paradigm, and therefore quite intuitive to view. It experimented with other more complicated approaches and it mentioned one of them briefly at the end. The input to the problem is now a directory (or several directories), which it will traverse recursively checking all the files below it, and a threshold number T indicating how similar it want the files to be. The first stage is identical to the preprocessing. All files are scanned, all fingerprints are recorded in the fingerprint file All Fingers, and the names, checksums, and sizes of all files are recorded in another file. It will be useful in this stage to separate files according to some types (e.g., text, binary, compressed) using a program such as Essence [HS93], such that files of different types will not be compared. In the current implementation it used only two types, text files and nontext files. A Low-bandwidth Network File System Efficient remote file access would often be desirable over such networks particularly when high latency makes remote login sessions unresponsive. Rather than run interactive programs such as editors remotely, users could run the programs locally and manipulate remote files through the file system. To do so, however, would require a network file system that consumes less bandwidth than most current file systems. This paper presents LBFS, a network file system designed for low-bandwidth networks. LBFS exploits similarities between files or versions of the same file to save bandwidth. It voids sending data over the network when the same data can already be found in the server’s file system or the client’s cache. Using this technique in conjunction with conventional compression and caching, LBFS consumes over an order of magnitude less bandwidth than traditional network file systems on common workloads. LBFS is designed to save bandwidth while providing traditional file system semantics. In particular, LBFS provides close-to-open consistency. After a client has written and closed a file, another client opening the same file will always see the new contents. Moreover, once a file is successfully written and closed, the data resides safely at the server. These semantics are similar to those of AFS. Other work exploring relaxed consistency semantics may apply to LBFS, but it wished to build a file system that could directly substitute for a widely accepted network file system in use today. To save bandwidth, LBFS uses a large, persistent file cache at the client. LBFS assumes clients will have enough cache to contain a user’s entire working set of files (a reasonable assumption given the capacities of cheap IDE disks today). With such aggressive caching, most client–server communication is solely for the purpose of maintaining consistency. When a user modifies a file, the client must transmit the changes to the server (since in this model the client might crash or be cut from the network). Similarly, when a client reads a file last modified by a different client, the server must send it the latest version of the file. LBFS reduces bandwidth requirements further by exploiting similarities between files. When possible, it reconstitutes files using chunks of existing data in the file system and client cache instead of transmitting those chunks over the network. Of course, not all applications can benefit from this technique. A worst case scenario is when applications encrypt files on disk, since two different encryptions of the same file have no commonality what so ever. Nonetheless, LBFS provides significant bandwidth reduction for common workloads. On both the client and server, LBFS must index a set of files to recognize data chunks it can sending over the network. To save chunk transfers, LBFS relies on the collision-resistant properties of the SHA-1 hash function. The probability of two inputs to SHA-1 producing the same output is far lower than the probability of hardware bit errors. Thus, LBFS follows the widely-accepted practice of assuming no hash collisions. If the client and server both have data chunks producing the same SHA-1 hash, they assume the two are really the same chunk and a void transferring its contents over the network. The central challenge in indexing file chunks to identify commonality is keeping the index a reasonable size while dealing with shifting offsets. As an example, one could index the hashes of all aligned 8 Kbyte data blocks in files. To transfer a file, the sender would transmit only hashes of the file’s blocks, and the receiver would request only blocks not already in its database. Unfortunately, a single byte inserted at the start of a large file would shift all the block boundaries, change the hashes of all the file’s blocks, and thereby thwart any potential bandwidth savings. Redundancy in Network Traffic: Findings and Implications A large amount of popular content is transferred repeatedly across network links in the Internet. In recent years, protocol-independent redundancy elimination, which can remove duplicate strings from within arbitrary network flows, has emerged as a powerful technique to improve the efficiency of network links in the face of repeated data. Many vendors offer such redundancy elimination middle boxes to improve the effective bandwidth of enterprise, data center and ISP links alike. This paper conduct a large scale trace-driven study of protocol independent redundancy elimination mechanisms, driven by several terabytes of packet payload traces collected at 12 distinct network locations, including the access link of a large US-based university and of 11 enterprise networks of different sizes. Based on extensive analysis, it present a number of findings on the benefits and fundamental design issues in redundancy elimination systems. Two of our key findings are (1) A new redundancy elimination algorithm based on Winnowing that outperforms the widely-used Rabin fingerprint-based algorithm by 5-10% on most traces and by as much as 35% in some traces. (2) A surprising finding that 75-90% of middle box’s bandwidth savings in our enterprise traces is due to redundant byte-strings from within each client’s traffic, implying that pushing redundancy elimination capability to the end hosts, i.e. an end-to-end redundancy elimination solution, could obtain most of the middle box’s bandwidth savings. Redundancy elimination middle boxes are being widely deployed to improve the effective bandwidth of network access links of enterprises and data centers alike, and for improving link loads in small ISP networks. Driven by the high effectiveness of these systems, recent efforts have also considered how to make these systems first-class network entities. For instance, protocol-independent redundancy elimination deployed on a wider scale across multiple network routers enabling an IP-layer redundancy elimination service. This expands the benefits of redundancy elimination to multiple end-to-end applications. It could also enable new routing protocols. It considered how to modify Web applications by introducing data markers that help in-network redundancy elimination mechanisms to identify and remove redundancy more effectively. Ditto similarly proposes to use application-independent caching at nodes in city-wide wireless mesh networks to improve the throughput of data transfers. The shortcoming with the MODP approach is that the fingerprints are chosen based on a global property, i.e., fingerprints have to take certain pre-determined values to be chosen. While this would result in the desired fraction of fingerprints being chosen across a large packet store, on a per-packet basis, the number of fingerprints chosen can be significantly different and not enough sometimes In order to guarantee that adequate are chosen uniformly from each packet, a local technique such as winnowing is essential. Winnowing, similar to the work was also introduced in the context of identifying similar documents and can be easily adapted to identifying redundancy in network traffic. Packet Caches on Routers: The Implications of Universal Redundant Traffic Elimination Many past systems have explored how to eliminate redundant transfers from network links and improve network efficiency. Several of these systems operate at the application layer, while the more recent systems operate on individual packets. A common aspect of these systems is that they apply to localized settings, e.g. at stub network access links. This paper explores the benefits of deploying packet-level redundant content elimination as a universal primitive on all Internet routers. Such a universal deployment would immediately reduce link loads everywhere. However, we argue that far more significant network-wide benefits can be derived by redesigning network routing protocols to leverage the universal deployment. It developed “redundancy-aware” intra- and inter-domain routing algorithms and show that they enable better traffic engineering, reduce link usage costs, and enhance ISPs’ responsiveness to traffic variations. In particular, employing redundancy elimination approaches across redundancy-aware routes can lower intra and inter-domain link loads by 10-50%. This current software router implementation can run at OC48 speeds. This paper considered the benefits of deploying of packet level redundant content elimination as a primitive IP-layer service across the entire Internet. It start with the assumption that all future routers will have the ability to strip redundant content from network packets on the fly, by comparing packet contents against those recently forwarded packets which are stored in a cache. Routers immediately downstream can reconstruct full packets from their own cache. Applying this technology at every link would provide immediate performance benefits by reducing the overall load on the network. It also enables new functionality: for example, it simplifies application layer multicast by eliminating the need to be careful about duplicate transmissions. It consider as “inter domain” traffic the set of all packets traversing the ISP whose destinations are routable only through peers of the ISP. It considered two approaches: local and cooperative. The local approach applies to an ISP selecting its next hop ASes in BGP, as well as the particular exit point(s) into the chosen next hop. In this approach, an ISP aggregates its inter-domain traffic over a selected set of next hops and the corresponding exit points so as to aggregate potentially redundant traffic onto a small number of network links. Thus, the ISP can significantly reduce the imp act that inter-domain traffic imposes on its internal and peering links. To compute routes in this manner, the ISP must track (1) the amount of redundant content that is common to different destination prefixes and (2) the route announcements from peers to the destinations. For simplicity it considered the case where just two ISPs coordinate their inter-domain route selection. This approach can be extended to multiple ISPs. Our approach works as follows: rather than compute inter-domain routes in isolation, each ISP tries to aggregate the redundant content in inter-domain traffic together with the redundant content in its intra-domain traffic, so as to bring down the overall utilization of all participating networks. Thus, the key difference from the intra-domain routing formulation is that the input graph used by either ISP is the union of the topologies of the two networks and peering links. The inputs to an ISP’s Linear Program are its intra-domain redundancy profiles and the inter-domain profile for traffic between ingresses and egresses in the neighbor. The output of an ISP’s formulation include its intra-domain routes and a list of exit points for traffic destined to egresses in the neighbor (and how to split inter domain traffic across the exit points) SmartRE: An Architecture for Coordinated Network-wide Redundancy Elimination Application-independent Redundancy Elimination (RE), or identifying and removing repeated content from network transfers, has been used with great success for improving network performance on enterprise access links. Recently, there is growing interest for supporting RE as a network-wide service. Such a network-wide RE service benefits ISPs by reducing link loads and increasing the effective network capacity to better accommodate the increasing number of bandwidth-intensive applications. Further, a network-wide RE service democratizes the benefits of RE to all end-to-end traffic and improves application performance by increasing throughput and reducing latencies. While the vision of a network-wide RE service is appealing, realizing it in practice is challenging. In particular, extending single-vantage-point RE solutions designed for enterprise access links to the network-wide case is inefficient and/or requires modifying routing policies. It presents SmartRE, a practical and efficient architecture for network-wide RE. It show that SmartRE can enable more effective utilization of the available resources at network devices, and thus can magnify the overall benefits of network-wide RE. It prototype our algorithms using Click and test our framework extensively using several real and synthetic traces. In SmartRE, a packet can potentially be reconstructed or decoded several hops downstream from the location where it was compressed or encoded. In this respect, SmartRE represents a significant departure from packet-level RE designs proposed in prior solutions, where each compressed packet is reconstructed at the immediate downstream router. Further, SmartRE uses a network-wide coordinated approach for intelligently allocating encoding and decoding responsibilities across network element focus our discussion on the design of three key elements: ingress nodes, interior nodes, and a central configuration module. Ingress and interior nodes maintain caches storing a subset of packets they observe. Ingress nodes encode packets. They search for redundant content in incoming packets and encode them with respect to previously seen packets. In this sense, the role of an ingress node is identical in the naive hop-by-hop approach and SmartRE. The key difference between the hop-by-hop approach and SmartRE is in the design of interior nodes. First, interior elements need not store all packets in their packet cache – they only store a subset as specified by a caching manifest produced by the configuration module. Second, they have no encoding responsibilities. Interior nodes only decode packets, i.e., expand encoded regions specified by the ingresses using packets in their local packet cache. The configuration module computes the caching manifests to optimize the ISP objective(s), while operating within the memory and packet processing constraints of network elements. Similar to other proposals for centralized network management, it assumed that this module will be at the network operations center (NOC), and has access to the network’s traffic matrix, routing policies, and the resource configurations of the network elements. Understanding and Exploiting Network Traffic Redundancy The Internet carries a vast amount and a wide range of content. Some of this content is more popular, and accessed more frequently, than others. The popularity of content could be quite ephemeral - e.g., a Web flash crowd - or much more permanent banner. A direct consequence of the skew in popularity is that, at any time, a fraction of the information carried over the Internet is redundant. First, it studied the fundamental properties of the redundancy in the information carried over the Internet, with a focus on network edges. It collect traffic traces at two network edge locations – a large university’s access link serving roughly 50,000 users, and a tier-1ISP network link connected to a large data center. It conducted several analyses over this data: What fractions of bytes are redundant? What is the frequency at which strings of bytes repeat across different packets? What is the overlap in the information accessed by distinct groups of end-users? Second, it leveraged this measurement observation in the design of family mechanisms for eliminating redundancy in network traffic and improving the overall network performance. The mechanisms it proposed can improve the available capacity of single network links as well as balance load across multiple network links. It understood how it can build mechanisms which leverage redundancy in network packets to improve the overall capacity of the network. The effectiveness of such approaches obviously depends on how redundant network traffic really is. Clearly, if only a small fraction of traffic is redundant, then the cost of deploying such mechanisms may not measure up to the observed benefits. Even if network traffic is heavily redundant, in order to build the optimal mechanisms which exploit the redundancy to the fullest, it must understand several lower-level properties of the redundant information, e.g., how do groups of users share redundant bytes among each other. With these questions in mind, we perform several “macroscopic” analyses over the above traces; these analyses help us quantify the extent of redundancy. It performed several “microscopic” analysis which help us understand the underlying nature and causes of redundancy. Wide-area Network Acceleration for the Developing World Wide-area network (WAN) accelerators operate by compressing redundant network traffic from point-to-point communications, enabling higher effective bandwidth. Unfortunately, while network bandwidth is scarce and expensive in the developing world, current WAN accelerators are designed for enterprise use, and are a poor fit in these environments. It presents Wanax, a WAN accelerator designed for developing-world deployments. It uses a novel multi resolution chunking (MRC) scheme that provides high compression rates and high disk performance for a variety of content, while using much less memory than existing approaches. Wanax exploits the design of MRC to perform intelligent load shedding to maximize throughput when running on resource-limited shared platforms. Finally, Wanax exploits the mesh network environments being deployed in the developing world, instead of just the star topologies common in enterprise branch office. (1) a novel multi-resolution chunking (MRC) technique, which provides high compression rates and high disk performance across workloads while having a small memory footprint; (2) an intelligent load shedding technique that exploits MRC to maximize effective bandwidth by adjusting disk and WAN usage as appropriate; and (3) a mesh peering protocol that exploits higher speed local peers when possible, instead of fetching only over slow WAN links. The combination of these design techniques makes it possible to achieve high effective bandwidth even with resource-limited shared machines Motivated by the challenges in the developing world, it design Wanax around four goals - (1) maximize compression, (2) minimize disk seeks, (3) minimize memory. Wanax works by compressing redundant traffic between a pair of servers – one near the clients, called a R-Wanax, and one closer to the content, called an S-Wanax. For developing regions, the S-Wanax is likely to be placed where bandwidth is cheaper. For example, in Africa, where Internet connectivity is often backhauled to Europe via slow and expensive satellite, the S-Wanax may reside in Europe. Existing System: Several commercial TRE solutions have combined the sender-based TRE ideas with the algorithmic and implementation approach along with protocol specific optimizations for middle-boxes solutions. In particular, how to get away with three-way handshake between the sender and the receiver if a full state synchronization is maintained. Traffic redundancy stems from common end-users’ activities, such as repeatedly accessing, downloading, uploading (i.e., backup), distributing, and modifying the same or similar information items (documents, data, Web, and video). TRE is used to eliminate the transmission of redundant content and, therefore, to significantly reduce the network cost. In most common TRE solutions, both the sender and the receiver examine and compare signatures of data chunks, parsed according to the data content, prior to their transmission. Current end-to-end solutions also suffer from the requirement to maintain end-to-end synchronization that may result in degraded TRE efficiency. When redundant chunks are detected, the sender replaces the transmission of each redundant chunk with its strong signature. Cloud providers cannot benefit from a technology whose goal is to reduce customer bandwidth bills, and thus are not likely to invest in one. Commercial TRE solutions are popular at enterprise networks, and involve the deployment of two or more proprietary-protocol, state synchronized middle-boxes at both the intranet entry points of data centers. The rise of “on-demand” work spaces, meeting rooms, and work-from-home solutions detaches the workers from their offices. In such a dynamic work environment, fixed-point solutions that require a client-side and a server-side middle-box pair become ineffective. Disadvantages: Cloud load balancing and power optimizations may lead to a server-side process and data migration environment, in which TRE solutions that require full synchronization between the server and the client are hard to accomplish or may lose efficiency due to lost synchronization. Proposed System: PACK’s main advantage is its capability of offloading the cloud-server TRE effort to end-clients, thus minimizing the processing costs induced by the TRE algorithm. Unlike previous solutions, PACK does not require the server to continuously maintain clients’ status. This makes PACK very suitable for pervasive computation environments that combine client mobility and server migration to maintain cloud elasticity. PACK is based on a novel TRE technique, which allows the client to use newly received chunks to identify previously received chunk chains, which in turn can be used as reliable predictors to future transmitted chunks. In this paper present a fully functional PACK implementation, transparent to all TCP based applications and network devices. In this paper, we present a novel receiver-based end-to-end TRE solution that relies on the power of predictions to eliminate redundant traffic between the cloud and its endusers. In this solution, each receiver observes the incoming stream and tries to match its chunks with a previously received chunk chain or a chunk chain of a local file. Using the long-term chunks’ metadata information kept locally, the receiver sends to the server predictions that include chunks’ signatures and easy-to-verify hints of the sender’s future data. On the receiver side, we propose a new computationally lightweight chunking (fingerprinting) scheme termed PACK chunking. PACK chunking is a new alternative for Rabin fingerprinting traditionally used by RE applications. Advantages: This solution achieves 30% redundancy elimination without significantly affecting the computational effort of the sender, resulting in a 20% reduction of the overall cost to the cloud customer. Hardware Requirements Processor Type : Pentium IV Speed : 2.4 GHZ RAM : 256 MB Hard disk : 20 GB Keyboard : 101/102 Standard Keys Mouse : Scroll Mouse Software Requirements: • Operating System : Windows 7 • Programming Package : Net Beans IDE 7.3.1 • Coding Language : JDK 1.7 System Architecture: Modules: 1. User Connect with Bandwidth and Storage Cost to Cloud 2. Upload File 3. Cloud Segment Processing 4. Read & Download File 5. Delete File from Cloud User Connect with Bandwidth and Storage Cost to Cloud The Data owner wants to connect with cloud server and upload his file, then access this file anywhere, anytime. So, he connect with bandwidth and storage cost to cloud server. Bandwidth and Storage Cost are used for upload his file and access this file. Depending on the upload files, his bandwidth and storage cost are reduced. Upload File: In computer networks, to upload can refer to the sending of data from a end user to a remote system such as a cloud server or another client with the intent that the remote system should store a copy of the data being transferred, or the initiation of such a process. Where the connection to the remote computers is via a dial-up connection, the transfer time required to download locally and then upload again could increase from seconds to hours or days. Here, the user uploads his file in cloud. The cloud maintains a chunk store. It store the chunks and its signature. Cloud Segment Processing: Upon the arrival of new data, the cloud computes the respective signature for each chunk and looks for a match in its local chunk store. If the chunk’s signature is found, the cloud determines whether it is a part of a formerly received chain, using the chunks’ metadata. If affirmative, the cloud sends a prediction to the sender for several next expected chain chunks. The prediction carries a starting point in the byte stream (i.e., offset) and the identity of several subsequent chunks (PRED command). Upon a successful prediction, the user responds with a PRED-ACK confirmation message. Once the PRED-ACK message is received and processed, the receiver copies the corresponding data from the chunk store to its TCP input buffers, placing it according to the corresponding sequence numbers. At this point, the receiver sends a normal TCP ACK with the next expected TCP sequence number. In case the prediction is false, or one or more predicted chunks are already sent, the sender continues with normal operation, e.g., sending the raw data, without sending a PRED-ACK message. Read & Download File: First, the user sent a filename as request to cloud. Then the server will match this filename and get this file. The Server encrypts this file using user’s public key. Then it sends a cipher text to user. Finally user will decrypt this cipher text and get the original file. Then download this file. Delete File from Cloud: The user wants to delete his file. So he sent the filename as request to cloud. Then the server will match this filename and get this file. Then it will delete this file and its chunks. The server calculates this deleted files bandwidth and cost. Then add his bandwidth and cost. Data Flow Diagram: Upload File Read File Pack algorithm Split File into many chunks and compute its SHA-1 signature Cloud Segment Processing Upload chunks to cloud Download Read Maintain this chunks in chunk store Delete References: U. Manber, “Finding similar files in a large file system,” in Proc. USENIX Winter Tech. Conf., 1994, pp. 1–10. A. Muthitacharoen, B. Chen, and D. Mazières, “A low-bandwidth network file system,” in Proc. SOSP, 2001, pp. 174–187. A. Anand, C. Muthukrishnan, A. Akella, and R. Ramjee, “Redundancy in network traffic: Findings and implications,” in Proc. SIGMETRICS, 2009, pp. 37–48. A. Anand, A. Gupta, A. Akella, S. Seshan, and S. Shenker, “Packet caches on routers: The implications of universal redundant traffic elimination,” in Proc. SIGCOMM, 2008, pp. 219–230. A. Anand, V. Sekar, and A. Akella, “SmartRE: An architecture for coordinated network wide redundancy elimination,” in Proc. SIG-COMM, 2009, vol. 39, pp. 87–98. A. Gupta, A. Akella, S. Seshan, S. Shenker, and J. Wang, “Under-standing and exploiting network traffic redundancy,” UW-Madison, Madison, WI, USA, Tech. Rep. 1592, Apr. 2007.
© Copyright 2026 Paperzz