PACK: Prediction-Based Cloud Bandwidth and Cost Reduction

PACK: Prediction-Based Cloud Bandwidth and Cost
Reduction System
Abstract:
In this paper present PACK (Predictive ACKs), a novel end-to-end traffic redundancy
elimination (TRE) system, designed for cloud computing customers. Cloud-based TRE needs to
apply a judicious use of cloud resources so that the bandwidth cost reduction combined with the
additional cost of TRE computation and storage would be optimized. PACK’s main advantage is
its capability of offloading the cloud-server TRE effort to end-clients, thus minimizing the
processing costs induced by the TRE algorithm. Unlike previous solutions, PACK does not
require the server to continuously maintain clients’ status. This makes PACK very suitable for
pervasive computation environments that combine client mobility and server migration to
maintain cloud elasticity. PACK is based on a novel TRE technique, which allows the client to
use newly received chunks to identify previously received chunk chains, which in turn can be
used as reliable predictors to future transmitted chunks. It presented a fully functional PACK
implementation, transparent to all TCP-based applications and net-work devices. Finally, it will
analyze PACK benefits for cloud users, using traffic traces from various sources.
Introduction:
Cloud customers pay only for the actual use of computing resources, storage, and
bandwidth, according to their changing needs, utilizing the cloud’s scalable and elastic
computational capabilities. In particular, data transfer costs (i.e., bandwidth) is an important
issue when trying to minimize costs. Consequently, cloud customers, applying a judicious use of
the cloud’s resources, are motivated to use various traffic reduction techniques, in particular
traffic redundancy elimination (TRE), for reducing bandwidth costs. Traffic redundancy stems
from common end-users’ activities, such as repeatedly accessing, downloading, uploading (i.e.,
backup), distributing, and modifying the same or similar information items (documents, data,
Web, and video). TRE is used to eliminate the transmission of redundant content and, therefore,
to significantly reduce the network cost. In most common TRE solutions, both the sender and the
receiver examine and compare signatures of data chunks, parsed according to the data content,
prior to their transmission. When redundant chunks are detected, the sender replaces the
transmission of each redundant chunk with its strong signature. Commercial TRE solutions are
popular at enterprise networks, and involve the deployment of two or more proprietary-protocol,
state synchronized middle-boxes at both the intranet entry points of data centers and branch
offices, eliminating repetitive traffic between them (e.g., Cisco, Riverbed, Quantum, Juniper,
Blue Coat, Expand Networks, and F5). While proprietary middle-boxes are popular point
solutions within enterprises, they are not as attractive in a cloud environment. Cloud providers
cannot benefit from a technology whose goal is to reduce customer bandwidth bills, and thus are
not likely to invest in one. The rise of “on-demand” work spaces, meeting rooms, and workfrom-home solutions detaches the workers from their offices. In such a dynamic work
environment, fixed-point solutions that require a client-side and a server-side middle-box pair
become ineffective. On the other hand, cloud-side elasticity motivates work distribution among
servers and migration among data centers. Therefore, it is commonly agreed that a universal,
software-based, end-to-end TRE is crucial in today’s pervasive environment. This enables the
use of a standard protocol stack and makes a TRE within end-to-end secured traffic (e.g., SSL)
possible. Current end-to-end TRE solutions are sender-based. In the case where the cloud server
is the sender, these solutions require that the server continuously maintain clients’ status. It show
here that cloud elasticity calls for a new TRE solution. First, cloud load balancing and power
optimizations may lead to a server-side process and data migration environment, in which TRE
solutions that require full synchronization between the server and the client are hard to
accomplish or may lose efficiency due to lost synchronization. Second, the popularity of rich
media that consume high bandwidth motivates content distribution network (CDN) solutions, in
which the service point for fixed and mobile users may change dynamically according to the
relative service point locations and loads. Clearly, a TRE solution that puts most of its
computational effort on the cloud side may turn to be less cost-effective than the one that
leverages the combined client-side capabilities. Given an end-to-end solution, it have found
through our experiments that sender-based end-to-end TRE solutions add a considerable load to
the servers, which may eradicate the cloud cost saving addressed by the TRE in the first place.
This experiment further show that current end-to-end solutions also suffer from the requirement
to maintain end-to-end synchronization that may result in degraded TRE efficiency. In this paper
present a novel receiver-based end-to-end TRE solution that relies on the power of predictions to
eliminate redundant traffic between the cloud and its end-users. In this solution, each receiver
observes the incoming stream and tries to match its chunks with a previously received chunk
chain or a chunk chain of a local file. Using the long-term chunks’ meta-data information kept
locally, the receiver sends to the server predictions that include chunks’ signatures and easy-toverify hints of the sender’s future data. The sender first examines the hint and performs the TRE
operation only on a hint-match. The purpose of this procedure is to avoid the expensive TRE
computation at the sender side in the absence of traffic redundancy. When redundancy is
detected, the sender then sends to the receiver only the ACKs to the predictions, instead of
sending the data. On the receiver side, it proposed a new computationally lightweight chunking
(fingerprinting) scheme termed PACK chunking. PACK chunking is a new alternative for Rabin
fingerprinting traditionally used by RE applications. Experiments show that this approach can
reach data processing speeds over 3 Gb/s, at least 20% faster than Rabin fingerprinting.
Literature Review:
Finding Similar Files In A Large File System
This paper presents a tool, called sif, for finding all similar files in a large file system.
Files are considered similar if they have significant number of common pieces, even if they are
very different otherwise. For example, one file may be contained, possibly with some changes, in
another file, or a file may be a reorganization of another file. The running time for finding all
groups of similar files, even for as little as 25% similarity, is on the order of 500MB to 1GB an
hour. The amount of similarity and several other customized parameters can be determined by
the user at a post-processing stage, which is very fast. Sif can also be used to very quickly
identify all similar files to a query file using a preprocessed index. Application of sif can be
found in file management, information collecting (to remove duplicates), program reuse, file
synchronization, data compression, and maybe even plagiarism detection. In this mode, a given
file, let’s call it the query file, is compared to a large set of files that have already been
‘‘fingerprinted.’’ The collection of all fingerprints, which it will denote by All Fingers, is
maintained in one large file. With each fingerprint we must associate the file it came from. It will
do that by maintaining the names of all files that were fingerprinted, and associating with each
fingerprint the number of the corresponding file. The first thing it will do is generate the set
Query Fingers of all fingerprints for the query file. It now had to look in All Fingers and
compare all fingerprints there to those of Query Fingers. Searching a set is one of the most basic
data structure problems and there are many ways to handle it; the most common techniques use
hashing or tree structures. In this case, it can also sort both sets and intersect them. But it found
that a simple solution using multi-pattern matching was just as effective. It stores the fingerprints
in All Fingers as it obtained them without providing any other structure, putting one fingerprint
together with its file number per line. Then it used to search the file All Fingers using Query
Fingers as the set of patterns. The output of the search is the list of all common fingerprints to
Query Fingers and All Fingers. As long as the set All Fingers is no more than a few megabytes,
this search is very effective. Comparing all files against all other files is more complicated. It
turns out that providing a good way to view the output is one of the major difficulties. The output
is a set of sets, or a hyper graph, with some similarity relationships among the elements. Hyper
graphs are very hard to view. It discussed here one approach that is an extension of the one
against-all paradigm, and therefore quite intuitive to view. It experimented with other more
complicated approaches and it mentioned one of them briefly at the end. The input to the
problem is now a directory (or several directories), which it will traverse recursively checking all
the files below it, and a threshold number T indicating how similar it want the files to be. The
first stage is identical to the preprocessing. All files are scanned, all fingerprints are recorded in
the fingerprint file All Fingers, and the names, checksums, and sizes of all files are recorded in
another file. It will be useful in this stage to separate files according to some types (e.g., text,
binary, compressed) using a program such as Essence [HS93], such that files of different types
will not be compared. In the current implementation it used only two types, text files and nontext files.
A Low-bandwidth Network File System
Efficient remote file access would often be desirable over such networks particularly
when high latency makes remote login sessions unresponsive. Rather than run interactive
programs such as editors remotely, users could run the programs locally and manipulate remote
files through the file system. To do so, however, would require a network file system that
consumes less bandwidth than most current file systems. This paper presents LBFS, a network
file system designed for low-bandwidth networks. LBFS exploits similarities between files or
versions of the same file to save bandwidth. It voids sending data over the network when the
same data can already be found in the server’s file system or the client’s cache. Using this
technique in conjunction with conventional compression and caching, LBFS consumes over an
order of magnitude less bandwidth than traditional network file systems on common workloads.
LBFS is designed to save bandwidth while providing traditional file system semantics. In
particular, LBFS provides close-to-open consistency. After a client has written and closed a file,
another client opening the same file will always see the new contents. Moreover, once a file is
successfully written and closed, the data resides safely at the server. These semantics are similar
to those of AFS. Other work exploring relaxed consistency semantics may apply to LBFS, but it
wished to build a file system that could directly substitute for a widely accepted network file
system in use today. To save bandwidth, LBFS uses a large, persistent file cache at the client.
LBFS assumes clients will have enough cache to contain a user’s entire working set of files (a
reasonable assumption given the capacities of cheap IDE disks today). With such aggressive
caching, most client–server communication is solely for the purpose of maintaining consistency.
When a user modifies a file, the client must transmit the changes to the server (since in this
model the client might crash or be cut from the network). Similarly, when a client reads a file last
modified by a different client, the server must send it the latest version of the file. LBFS reduces
bandwidth requirements further by exploiting similarities between files. When possible, it
reconstitutes files using chunks of existing data in the file system and client cache instead of
transmitting those chunks over the network. Of course, not all applications can benefit from this
technique. A worst case scenario is when applications encrypt files on disk, since two different
encryptions of the same file have no commonality what so ever. Nonetheless, LBFS provides
significant bandwidth reduction for common workloads. On both the client and server, LBFS
must index a set of files to recognize data chunks it can sending over the network. To save chunk
transfers, LBFS relies on the collision-resistant properties of the SHA-1 hash function. The
probability of two inputs to SHA-1 producing the same output is far lower than the probability of
hardware bit errors. Thus, LBFS follows the widely-accepted practice of assuming no hash
collisions. If the client and server both have data chunks producing the same SHA-1 hash, they
assume the two are really the same chunk and a void transferring its contents over the network.
The central challenge in indexing file chunks to identify commonality is keeping the index a
reasonable size while dealing with shifting offsets. As an example, one could index the hashes of
all aligned 8 Kbyte data blocks in files. To transfer a file, the sender would transmit only hashes
of the file’s blocks, and the receiver would request only blocks not already in its database.
Unfortunately, a single byte inserted at the start of a large file would shift all the block
boundaries, change the hashes of all the file’s blocks, and thereby thwart any potential bandwidth
savings.
Redundancy in Network Traffic: Findings and Implications
A large amount of popular content is transferred repeatedly across network links in the
Internet. In recent years, protocol-independent redundancy elimination, which can remove
duplicate strings from within arbitrary network flows, has emerged as a powerful technique to
improve the efficiency of network links in the face of repeated data. Many vendors offer such
redundancy elimination middle boxes to improve the effective bandwidth of enterprise, data
center and ISP links alike. This paper conduct a large scale trace-driven study of protocol
independent redundancy elimination mechanisms, driven by several terabytes of packet payload
traces collected at 12 distinct network locations, including the access link of a large US-based
university and of 11 enterprise networks of different sizes. Based on extensive analysis, it present
a number of findings on the benefits and fundamental design issues in redundancy elimination
systems. Two of our key findings are (1) A new redundancy elimination algorithm based on
Winnowing that outperforms the widely-used Rabin fingerprint-based algorithm by 5-10% on
most traces and by as much as 35% in some traces. (2) A surprising finding that 75-90% of
middle box’s bandwidth savings in our enterprise traces is due to redundant byte-strings from
within each client’s traffic, implying that pushing redundancy elimination capability to the end
hosts, i.e. an end-to-end redundancy elimination solution, could obtain most of the middle box’s
bandwidth savings. Redundancy elimination middle boxes are being widely deployed to improve
the effective bandwidth of network access links of enterprises and data centers alike, and for
improving link loads in small ISP networks. Driven by the high effectiveness of these systems,
recent efforts have also considered how to make these systems first-class network entities. For
instance, protocol-independent redundancy elimination deployed on a wider scale across
multiple network routers enabling an IP-layer redundancy elimination service. This expands the
benefits of redundancy elimination to multiple end-to-end applications. It could also enable new
routing protocols. It considered how to modify Web applications by introducing data markers
that help in-network redundancy elimination mechanisms to identify and remove redundancy
more effectively. Ditto similarly proposes to use application-independent caching at nodes in
city-wide wireless mesh networks to improve the throughput of data transfers. The shortcoming
with the MODP approach is that the fingerprints are chosen based on a global property, i.e.,
fingerprints have to take certain pre-determined values to be chosen. While this would result in
the desired fraction of fingerprints being chosen across a large packet store, on a per-packet
basis, the number of fingerprints chosen can be significantly different and not enough sometimes
In order to guarantee that adequate are chosen uniformly from each packet, a local technique
such as winnowing is essential. Winnowing, similar to the work was also introduced in the
context of identifying similar documents and can be easily adapted to identifying redundancy in
network traffic.
Packet Caches on Routers: The Implications of Universal Redundant Traffic Elimination
Many past systems have explored how to eliminate redundant transfers from network
links and improve network efficiency. Several of these systems operate at the application layer,
while the more recent systems operate on individual packets. A common aspect of these systems
is that they apply to localized settings, e.g. at stub network access links. This paper explores the
benefits of deploying packet-level redundant content elimination as a universal primitive on all
Internet routers. Such a universal deployment would immediately reduce link loads everywhere.
However, we argue that far more significant network-wide benefits can be derived by
redesigning network routing protocols to leverage the universal deployment. It developed
“redundancy-aware” intra- and inter-domain routing algorithms and show that they enable better
traffic engineering, reduce link usage costs, and enhance ISPs’ responsiveness to traffic
variations. In particular, employing redundancy elimination approaches across redundancy-aware
routes can lower intra and inter-domain link loads by 10-50%. This current software router
implementation can run at OC48 speeds. This paper considered the benefits of deploying of
packet level redundant content elimination as a primitive IP-layer service across the entire
Internet. It start with the assumption that all future routers will have the ability to strip redundant
content from network packets on the fly, by comparing packet contents against those recently
forwarded packets which are stored in a cache. Routers immediately downstream can reconstruct
full packets from their own cache. Applying this technology at every link would provide
immediate performance benefits by reducing the overall load on the network. It also enables new
functionality: for example, it simplifies application layer multicast by eliminating the need to be
careful about duplicate transmissions. It consider as “inter domain” traffic the set of all packets
traversing the ISP whose destinations are routable only through peers of the ISP. It considered
two approaches: local and cooperative. The local approach applies to an ISP selecting its next
hop ASes in BGP, as well as the particular exit point(s) into the chosen next hop. In this
approach, an ISP aggregates its inter-domain traffic over a selected set of next hops and the
corresponding exit points so as to aggregate potentially redundant traffic onto a small number of
network links. Thus, the ISP can significantly reduce the imp act that inter-domain traffic
imposes on its internal and peering links. To compute routes in this manner, the ISP must track
(1) the amount of redundant content that is common to different destination prefixes and (2) the
route announcements from peers to the destinations. For simplicity it considered the case where
just two ISPs coordinate their inter-domain route selection. This approach can be extended to
multiple ISPs. Our approach works as follows: rather than compute inter-domain routes in
isolation, each ISP tries to aggregate the redundant content in inter-domain traffic together with
the redundant content in its intra-domain traffic, so as to bring down the overall utilization of all
participating networks. Thus, the key difference from the intra-domain routing formulation is
that the input graph used by either ISP is the union of the topologies of the two networks and
peering links. The inputs to an ISP’s Linear Program are its intra-domain redundancy profiles
and the inter-domain profile for traffic between ingresses and egresses in the neighbor. The
output of an ISP’s formulation include its intra-domain routes and a list of exit points for traffic
destined to egresses in the neighbor (and how to split inter domain traffic across the exit points)
SmartRE: An Architecture for Coordinated Network-wide Redundancy Elimination
Application-independent Redundancy Elimination (RE), or identifying and removing
repeated content from network transfers, has been used with great success for improving network
performance on enterprise access links. Recently, there is growing interest for supporting RE as a
network-wide service. Such a network-wide RE service benefits ISPs by reducing link loads and
increasing the effective network capacity to better accommodate the increasing number of
bandwidth-intensive applications. Further, a network-wide RE service democratizes the benefits
of RE to all end-to-end traffic and improves application performance by increasing throughput
and reducing latencies. While the vision of a network-wide RE service is appealing, realizing it
in practice is challenging. In particular, extending single-vantage-point RE solutions designed for
enterprise access links to the network-wide case is inefficient and/or requires modifying routing
policies. It presents SmartRE, a practical and efficient architecture for network-wide RE. It show
that SmartRE can enable more effective utilization of the available resources at network devices,
and thus can magnify the overall benefits of network-wide RE. It prototype our algorithms using
Click and test our framework extensively using several real and synthetic traces. In SmartRE, a
packet can potentially be reconstructed or decoded several hops downstream from the location
where it was compressed or encoded. In this respect, SmartRE represents a significant departure
from packet-level RE designs proposed in prior solutions, where each compressed packet is
reconstructed at the immediate downstream router. Further, SmartRE uses a network-wide
coordinated approach for intelligently allocating encoding and decoding responsibilities across
network element focus our discussion on the design of three key elements: ingress nodes, interior
nodes, and a central configuration module. Ingress and interior nodes maintain caches storing a
subset of packets they observe. Ingress nodes encode packets. They search for redundant content
in incoming packets and encode them with respect to previously seen packets. In this sense, the
role of an ingress node is identical in the naive hop-by-hop approach and SmartRE. The key
difference between the hop-by-hop approach and SmartRE is in the design of interior nodes.
First, interior elements need not store all packets in their packet cache – they only store a subset
as specified by a caching manifest produced by the configuration module. Second, they have no
encoding responsibilities. Interior nodes only decode packets, i.e., expand encoded regions
specified by the ingresses using packets in their local packet cache. The configuration module
computes the caching manifests to optimize the ISP objective(s), while operating within the
memory and packet processing constraints of network elements. Similar to other proposals for
centralized network management, it assumed that this module will be at the network operations
center (NOC), and has access to the network’s traffic matrix, routing policies, and the resource
configurations of the network elements.
Understanding and Exploiting Network Traffic Redundancy
The Internet carries a vast amount and a wide range of content. Some of this content is
more popular, and accessed more frequently, than others. The popularity of content could be
quite ephemeral - e.g., a Web flash crowd - or much more permanent banner. A direct
consequence of the skew in popularity is that, at any time, a fraction of the information carried
over the Internet is redundant. First, it studied the fundamental properties of the redundancy in
the information carried over the Internet, with a focus on network edges. It collect traffic traces
at two network edge locations – a large university’s access link serving roughly 50,000 users,
and a tier-1ISP network link connected to a large data center. It conducted several analyses over
this data: What fractions of bytes are redundant? What is the frequency at which strings of bytes
repeat across different packets? What is the overlap in the information accessed by distinct
groups of end-users? Second, it leveraged this measurement observation in the design of family
mechanisms for eliminating redundancy in network traffic and improving the overall network
performance. The mechanisms it proposed can improve the available capacity of single network
links as well as balance load across multiple network links. It understood how it can build
mechanisms which leverage redundancy in network packets to improve the overall capacity of
the network. The effectiveness of such approaches obviously depends on how redundant network
traffic really is. Clearly, if only a small fraction of traffic is redundant, then the cost of deploying
such mechanisms may not measure up to the observed benefits. Even if network traffic is heavily
redundant, in order to build the optimal mechanisms which exploit the redundancy to the fullest,
it must understand several lower-level properties of the redundant information, e.g., how do
groups of users share redundant bytes among each other. With these questions in mind, we
perform several “macroscopic” analyses over the above traces; these analyses help us quantify
the extent of redundancy. It performed several “microscopic” analysis which help us understand
the underlying nature and causes of redundancy.
Wide-area Network Acceleration for the Developing World
Wide-area network (WAN) accelerators operate by compressing redundant network
traffic from point-to-point communications, enabling higher effective bandwidth. Unfortunately,
while network bandwidth is scarce and expensive in the developing world, current WAN
accelerators are designed for enterprise use, and are a poor fit in these environments. It presents
Wanax, a WAN accelerator designed for developing-world deployments. It uses a novel multi
resolution chunking (MRC) scheme that provides high compression rates and high disk
performance for a variety of content, while using much less memory than existing approaches.
Wanax exploits the design of MRC to perform intelligent load shedding to maximize throughput
when running on resource-limited shared platforms. Finally, Wanax exploits the mesh network
environments being deployed in the developing world, instead of just the star topologies
common in enterprise branch office. (1) a novel multi-resolution chunking (MRC) technique,
which provides high compression rates and high disk performance across workloads while
having a small memory footprint; (2) an intelligent load shedding technique that exploits MRC to
maximize effective bandwidth by adjusting disk and WAN usage as appropriate; and (3) a mesh
peering protocol that exploits higher speed local peers when possible, instead of fetching only
over slow WAN links. The combination of these design techniques makes it possible to achieve
high effective bandwidth even with resource-limited shared machines Motivated by the
challenges in the developing world, it design Wanax around four goals - (1) maximize compression, (2) minimize disk seeks, (3) minimize memory. Wanax works by compressing
redundant traffic between a pair of servers – one near the clients, called a R-Wanax, and one
closer to the content, called an S-Wanax. For developing regions, the S-Wanax is likely to be
placed where bandwidth is cheaper. For example, in Africa, where Internet connectivity is often
backhauled to Europe via slow and expensive satellite, the S-Wanax may reside in Europe.
Existing System:

Several commercial TRE solutions have combined the sender-based TRE ideas with the
algorithmic and implementation approach along with protocol specific optimizations for
middle-boxes solutions.

In particular, how to get away with three-way handshake between the sender and the
receiver if a full state synchronization is maintained.

Traffic redundancy stems from common end-users’ activities, such as repeatedly
accessing, downloading, uploading (i.e., backup), distributing, and modifying the same or
similar information items (documents, data, Web, and video).

TRE is used to eliminate the transmission of redundant content and, therefore, to
significantly reduce the network cost.

In most common TRE solutions, both the sender and the receiver examine and compare
signatures of data chunks, parsed according to the data content, prior to their
transmission.

Current end-to-end solutions also suffer from the requirement to maintain end-to-end
synchronization that may result in degraded TRE efficiency.

When redundant chunks are detected, the sender replaces the transmission of each
redundant chunk with its strong signature.

Cloud providers cannot benefit from a technology whose goal is to reduce customer
bandwidth bills, and thus are not likely to invest in one.

Commercial TRE solutions are popular at enterprise networks, and involve the
deployment of two or more proprietary-protocol, state synchronized middle-boxes at both
the intranet entry points of data centers.

The rise of “on-demand” work spaces, meeting rooms, and work-from-home solutions
detaches the workers from their offices. In such a dynamic work environment, fixed-point
solutions that require a client-side and a server-side middle-box pair become ineffective.

Disadvantages: Cloud load balancing and power optimizations may lead to a server-side
process and data migration environment, in which TRE solutions that require full
synchronization between the server and the client are hard to accomplish or may lose
efficiency due to lost synchronization.
Proposed System:

PACK’s main advantage is its capability of offloading the cloud-server TRE effort to
end-clients, thus minimizing the processing costs induced by the TRE algorithm.

Unlike previous solutions, PACK does not require the server to continuously maintain
clients’ status. This makes PACK very suitable for pervasive computation environments
that combine client mobility and server migration to maintain cloud elasticity.

PACK is based on a novel TRE technique, which allows the client to use newly received
chunks to identify previously received chunk chains, which in turn can be used as reliable
predictors to future transmitted chunks.

In this paper present a fully functional PACK implementation, transparent to all TCP
based applications and network devices.

In this paper, we present a novel receiver-based end-to-end TRE solution that relies on
the power of predictions to eliminate redundant traffic between the cloud and its endusers. In this solution, each receiver observes the incoming stream and tries to match its
chunks with a previously received chunk chain or a chunk chain of a local file. Using the
long-term chunks’ metadata information kept locally, the receiver sends to the server
predictions that include chunks’ signatures and easy-to-verify hints of the sender’s future
data. On the receiver side, we propose a new computationally lightweight chunking
(fingerprinting) scheme termed PACK chunking. PACK chunking is a new alternative for
Rabin fingerprinting traditionally used by RE applications.

Advantages: This solution achieves 30% redundancy elimination without significantly
affecting the computational effort of the sender, resulting in a 20% reduction of the
overall cost to the cloud customer.
Hardware Requirements
 Processor Type
: Pentium IV
 Speed
: 2.4 GHZ
 RAM
: 256 MB
 Hard disk
: 20 GB
 Keyboard
: 101/102 Standard Keys
 Mouse
: Scroll Mouse
Software Requirements:
•
Operating System
: Windows 7
•
Programming Package
: Net Beans IDE 7.3.1
•
Coding Language
: JDK 1.7
System Architecture:
Modules:
1. User Connect with Bandwidth and Storage Cost to Cloud
2. Upload File
3. Cloud Segment Processing
4. Read & Download File
5. Delete File from Cloud
User Connect with Bandwidth and Storage Cost to Cloud

The Data owner wants to connect with cloud server and upload his file, then access this
file anywhere, anytime.

So, he connect with bandwidth and storage cost to cloud server. Bandwidth and Storage
Cost are used for upload his file and access this file.

Depending on the upload files, his bandwidth and storage cost are reduced.
Upload File:

In computer networks, to upload can refer to the sending of data from a end user to a
remote system such as a cloud server or another client with the intent that the remote
system should store a copy of the data being transferred, or the initiation of such a
process.

Where the connection to the remote computers is via a dial-up connection, the transfer
time required to download locally and then upload again could increase from seconds to
hours or days.

Here, the user uploads his file in cloud. The cloud maintains a chunk store. It store the
chunks and its signature.
Cloud Segment Processing:

Upon the arrival of new data, the cloud computes the respective signature for each chunk
and looks for a match in its local chunk store.

If the chunk’s signature is found, the cloud determines whether it is a part of a formerly
received chain, using the chunks’ metadata. If affirmative, the cloud sends a prediction to
the sender for several next expected chain chunks.

The prediction carries a starting point in the byte stream (i.e., offset) and the identity of
several subsequent chunks (PRED command).

Upon a successful prediction, the user responds with a PRED-ACK confirmation
message. Once the PRED-ACK message is received and processed, the receiver copies
the corresponding data from the chunk store to its TCP input buffers, placing it according
to the corresponding sequence numbers.

At this point, the receiver sends a normal TCP ACK with the next expected TCP
sequence number. In case the prediction is false, or one or more predicted chunks are
already sent, the sender continues with normal operation, e.g., sending the raw data,
without sending a PRED-ACK message.
Read & Download File:

First, the user sent a filename as request to cloud. Then the server will match this
filename and get this file.

The Server encrypts this file using user’s public key. Then it sends a cipher text to user.

Finally user will decrypt this cipher text and get the original file.

Then download this file.
Delete File from Cloud:

The user wants to delete his file. So he sent the filename as request to cloud. Then the
server will match this filename and get this file.

Then it will delete this file and its chunks.

The server calculates this deleted files bandwidth and cost. Then add his bandwidth and
cost.
Data Flow Diagram:
Upload File
Read File
Pack algorithm
Split File into many chunks
and compute its SHA-1
signature
Cloud Segment
Processing
Upload chunks to cloud
Download
Read
Maintain this chunks
in chunk store
Delete
References:

U. Manber, “Finding similar files in a large file system,” in Proc. USENIX Winter Tech.
Conf., 1994, pp. 1–10.

A. Muthitacharoen, B. Chen, and D. Mazières, “A low-bandwidth network file system,”
in Proc. SOSP, 2001, pp. 174–187.

A. Anand, C. Muthukrishnan, A. Akella, and R. Ramjee, “Redundancy in network traffic:
Findings and implications,” in Proc. SIGMETRICS, 2009, pp. 37–48.

A. Anand, A. Gupta, A. Akella, S. Seshan, and S. Shenker, “Packet caches on routers:
The implications of universal redundant traffic elimination,” in Proc. SIGCOMM, 2008,
pp. 219–230.

A. Anand, V. Sekar, and A. Akella, “SmartRE: An architecture for coordinated network
wide redundancy elimination,” in Proc. SIG-COMM, 2009, vol. 39, pp. 87–98.

A. Gupta, A. Akella, S. Seshan, S. Shenker, and J. Wang, “Under-standing and exploiting
network traffic redundancy,” UW-Madison, Madison, WI, USA, Tech. Rep. 1592, Apr.
2007.