A Comparison of Performance-Enhancing Features in SkyDrive and

Opening Up the Sky:
A Comparison of Performance-Enhancing Features in
SkyDrive and Dropbox
Herman Slatman
University of Twente
P.O. Box 217, 7500AE Enschede
The Netherlands
[email protected]
ABSTRACT
Cloud storage services are increasing in popularity and using a
growing amount of bandwidth on the Internet. Insights on how
much traffic is generated is needed for a number of reasons.
Cloud storage providers are interested in serving their clients
efficiently and effectively, and they want to know how their
product is performing and how they can improve their service.
Internet Service Providers need an indication of the amount of
traffic generated by cloud storage. Lastly, users of cloud storage
services might want to know how their favorite service
performs. At the moment not much is known about the
performance of different cloud storage providers, but this paper
aims at getting a thorough understanding of those services and
their impact on the Internet. This paper focuses on Microsoft
SkyDrive, as this is the second most popular cloud storage
service [1] and because it has been neatly integrated in the
Microsoft Windows operating system.
Microsoft SkyDrive will be compared to Dropbox in terms of
performance-enhancing features. As shown in [1], Dropbox
storage servers are all located in the United States, which is not
an optimal solution for clients spread around the world. Also,
the way SkyDrive manages and transfers its files will be
analyzed to assert whether SkyDrive has deployed more
efficient synchronization strategies than Dropbox.
This research contributes to getting to know which technologies
the state-of-the-art cloud storage services have or have not
deployed to increase performance and to gain a thorough
understanding of the performance of Microsoft SkyDrive
compared to Dropbox’s.
Keywords
Cloud Storage, Performance, SkyDrive
increasingly making use of cloud storage services, like
Dropbox, Google Drive and Microsoft SkyDrive, to store and
share files with great ease. Those cloud storage services already
generate quite some traffic on the Internet - an educated guess 1
on the total amount of traffic generated by uploading files to
Dropbox, is estimated at about 54Gbps - and it is to be expected
the amount of traffic due to cloud storage services will further
increase in the future. To maintain the quality of the Internet in
terms of available bandwidth and latency, predicting the impact
cloud storage services have and will have on the Internet is
important. To gain a better understanding of the impact of cloud
storage services, having knowledge of the internals and the
performance of those services is necessary. Not much is known
about the internals of cloud storage services, but [1] gives a
great insight in Dropbox’s internals, which is shown to be the
most popular cloud storage provider.
The goal of this paper is to get a thorough understanding of the
Microsoft SkyDrive service, its internals and, specifically, the
performance of aforementioned service. The main reasons
Microsoft SkyDrive has been chosen as the research topic, are
that it is the second largest service [1] in terms of traffic
generated and because it has the potential to grow substantially
in the near future. The latter is because SkyDrive can be
accessed via the Web and several client applications are
available for different operating systems and because it has
been neatly integrated in the Microsoft Windows operating
system.
To say something about the performance of SkyDrive, a
comparison against Dropbox will be performed. The main
research question reads the following:

1. INTRODUCTION
Recent developments show an increased interest in the use of
cloud storage services. Both individuals and enterprises are
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
18thTwente Student Conference on IT, January 25, 2013, Enschede, The
Netherlands.
Copyright 2013, University of Twente, Faculty of Electrical Engineering,
Mathematics and Computer Science.
How does SkyDrive compare to Dropbox, in terms of the
presence of performance-enhancing features?
The main research question is focused on generated traffic and
the efficiency with which SkyDrive handles files on its service.
To answer this research question, the research is split up in
three parts, answering the following questions:


1
How are the administration and transfer of files controlled
in SkyDrive?
How are the servers in SkyDrive distributed over the
world?
http://www.extremetech.com/computing/129183-how-big-isthe-cloud, accessed on 03-10-2012

Does SkyDrive deploy specific technologies to enhance
its performance, and how do these compare to Dropbox?
The first question was included to gain an understanding of the
operation of SkyDrive. Information gained from this was used
to setup the experiments for the two remaining research
questions. Together these questions give an insight of the
performance of the Microsoft SkyDrive service compared to
Dropbox in terms of features.
An overview of the SkyDrive will be given in Section 2.
Section 3 describes the methodology used to conduct this
research. In Section 4 technologies that enhance the
performance of cloud services are introduced. Section 5
compares SkyDrive and Dropbox. The subsequent Section
introduces related work. Section 7, lastly, summarizes the
conclusions of this paper.
2. A BIRD’S EYE VIEW OF SKYDRIVE
SkyDrive was initially released by Microsoft in 2007 and has
since then been known under a few different names. At the time
of writing, it offers 7 GB of storage for free to new users,
whereas early users could opt-in for a free 25GB if they had
used the service before the 22nd of April of 2012.
Client applications are available for Windows Vista and
Windows 7, which can be used to integrate SkyDrive inside
those operating systems. In Windows 8, Microsoft’s newest
iteration of the operating system, the SkyDrive has been
integrated natively. Client applications are also available for the
OS X, iOS, Windows Phone and Android operating systems,
covering a broad spectrum of devices. This paper focusses on
the desktop client for Windows 7.
A web interface to the SkyDrive service is also available, which
is built on HTML5 technologies 2. Amongst other things, it
supports email-integration, integration with Microsoft Office
and it features Office Web Apps, in which users can create,
view and edit documents right in the browser and store them on
SkyDrive.
Users can login to the service with their Microsoft Account
which is used in all other services provided by Microsoft. Files
on the service can be shared with other people that have a
Microsoft Account, but it is also possible to share files on social
networks, such as Twitter, LinkedIn and Facebook. SkyDrive
maintains an Access Control List (ACL) for every file and
folder 3, which is used to grant users the privileges needed to
execute the associated operations on a file. It is possible, for
example, to create an URL for a file which has the property that
everyone is allowed to read the file, but not change it. It is also
possible to mandate to be logged in before access is granted.
On the 15th of November 2012, Microsoft introduced selective
sync, enabling users to control which files are being
synchronized amongst their devices. Updates to the SkyDrive
applications for Windows Phone 8 and Android were also rolled
out. According to Microsoft, on November the 15th the amount
of SkyDrive storage had doubled since the introduction of the
desktop and mobile applications on April the 22nd of 2012.
3. METHODOLOGY
Active and passive measurements have been carried out to
assess the performance of SkyDrive and to compare it with
Dropbox. These included uploading files to the SkyDrive
servers and measurements to determine the location of the
servers.
Before conducting any active or passive experiments, a lab
environment suitable for those experiments was setup. This lab
environment consisted of a host PC running Debian
GNU/Linux version 6.0, kernel 2.6.32-5-amd, on which
Wireshark, a popular packet sniffer and network protocol
analyzer, was installed. Windows 7 was installed as a virtual
machine. On this virtual machine, the SkyDrive client
application was installed, together with Charles, a web
debugging proxy. Charles is a shareware application that allows
for setting up a local proxy to capture, for example, all data that
is sent via SSL/TLS encrypted connections.
The setup described above allowed for capturing and analyzing
all traffic that was exchanged during the various experiments,
including the encrypted traffic.
3.1 File Administration and Transfers
Files differing in size and containing random text were
uploaded to SkyDrive to determine the way SkyDrive handles
file administration and transfers. At first these uploads were
analyzed only using Wireshark, which showed all transfers and
administration of files were carried over encrypted connections.
Charles was used to gain a more thorough understanding of the
information sent over those encrypted connections. Hostnames
used in the service were recorded and the corresponding IPaddresses were added together with the functionality they
provide.
3.2 Distribution of Servers
Active measurements were performed to assess the
geographical distribution of servers in the SkyDrive service.
This was done in two consecutive steps. The first step was to
find out what hostnames the SkyDrive application in the lab
environment would connect to. Wireshark was used to analyze
the relevant Internet traffic and it then showed some of the
hostnames that SkyDrive connects to. Some online
investigation showed more hostnames 4,5 to incorporate in this
research.
The second step involved setting up a test bed of Planet-Lab
servers spread over the world. On those machines the
traceroute and dig commands were executed against the
hostnames found during the first step, to determine whether the
hostnames would always resolve to the same IP. The IPaddresses that resulted from this step were all queried against
the databases on MaxMind.com and Route.IM to get
information on their geographical location.
The results gained from querying those two websites were not
taken for granted though, as research [5] shows these GeoIP
services are not always precise, especially on the city-level.
Instead, the results of the queries on those websites have been
complemented by traceroute timings, to further establish the
outcomes.
2
http://bit.ly/SD-Modern-Web, Introducing SkyDrive for the
modern web, built using HTML5, accessed on 29-10-2012
4
http://bit.ly/Upload-Issues-For-ISP,
accessed on 29-10-2012
Microsoft
Answers,
3
http://bit.ly/Rebuilding-Permissions, Designing app-centric
sharing for SkyDrive, accessed on 07-11-2012
5
http://bit.ly/Low-Bandwidth-Areas,
accessed on 29-10-2012
Microsoft
Answers,
3.3 Comparison With Dropbox
The comparison between Dropbox and SkyDrive has been
based both on a literature survey and active measurements. The
literature survey was performed first to create an understanding
of features that in general improve the performance of cloud
services. Google Scholar was used primarily to search for
relevant sources. Starting point were the very generic terms
cloud storage and cloud service. Then some more terms were
introduced in the search queries: for example performance and
infrastructure.
Active measurements were then conducted to determine if the
features that were found during the literature survey are present
in SkyDrive. This involved uploading a series of different files
that were carefully crafted in order to ensure the features would
be exploited when they were present in the service. The files
that were uploaded as part of these measurements are described
in Section 5.3 and can also be found in Table 3.
Another part of the comparison is the assessment of the
popularity of SkyDrive compared to Dropbox. This is not part
of the research questions, but was included to be able to say
something about the usage of the service. The dataset that was
analyzed as part of this was produced by capturing flow data
from a building on the campus of the University of Twente, in
which 982 unique IP-addresses were present. These IPaddresses are assigned statically. The number of unique IPaddresses that connected to a storage server in the SkyDrive
service was recorded. This was put against the number of IPaddresses that connected to a Dropbox storage server. Also, the
amount of traffic generated in flows was captured.
4. CLOUD STORAGE TECHNOLOGIES
A literature survey was conducted to gain an understanding of
what features affect the performance of a cloud service, and
more specifically, a cloud storage service. A selection of those
features has been made and they are discussed and elaborated
on in the following subsections.
4.1 Data Deduplication
Many users store a lot of files in the cloud nowadays. It is
perfectly possible some files are uploaded to a cloud storage
facility by two or more different users or that it is being stored
twice or more times by a single user. This could be the case for
an e-book for example; it is then unnecessary to save more than
one copy of the e-book in the storage service. This kind of
administration is known as data deduplication, in this case,
server-side data deduplication [3], [4]. Data deduplication
allows for less Internet traffic to be generated, as files will not
have to be uploaded when they are already present on the cloud
storage facility. In this paper, only client side data deduplication
will be considered. Client-side data deduplication can be
implemented by creating a mechanism that checks if a file is
already stored on the service, and only uploads a file when it is
not already present. This saves bandwidth, as files will not be
uploaded unnecessarily.
4.2 Delta Updates
When a file is created to be stored on a cloud storage facility, it
can in general be assumed that all bytes have to be transferred
over the Internet. In general, files will change over time and
those changes have to be synchronized to the cloud storage
facility.
Cloud storage providers can implement a feature called delta
updates, with which it becomes possible to upload a chunk of
data that has been changed, while leaving the unchanged
chunks of data untouched [1]. An example of an algorithm that
can be used to implement delta updates is the rsync algorithm
[7]. Less Internet traffic is generated when delta updates are
implemented in a cloud storage service, as there is no need to
upload an entire file when only a small part is changed.
4.3 Data Compression
Data compression is the act of encoding data in such a way that
the encoded data takes less bytes to store the same information
that is present in the original data [4]. When data compression
is deployed on the client side of a cloud storage service, files
that are exchanged with the cloud storage facility are
compressed before they are sent over the Internet. This allows
for less Internet traffic to be generated as, in general, files can
indeed be compressed. RFC2616 describes the HTTP 1.1
specification [2], which includes a section on compression of
files sent via HTTP. Compression is in fact in widespread 6 use
by websites, saving their users bandwidth and time.
4.4 Server Distribution
In general, services on the web perform faster and more
efficiently when the client connecting to such a service is close
to the server [8]. Services that are used on a world wide scale
should therefore, ideally, deploy servers distributed all over the
world, to guarantee a good performance and quick response for
all users spread. This is no different in cloud storage services, in
which a big amount of data has to be uploaded and downloaded,
and therefore server distribution is an important part of the
performance of those services.
4.5 Storage Protocol
The storage protocol that is at the heart of a cloud storage
service, and can therefore severely impact the performance of
the service [4]. At a high-level, the protocols may be
implemented in an Application Programming Interface (API).
Several options are available, such as Web- and File-based
APIs. Figure 1 shows a diagram with some of the available
options categorized on access method, and some technologies
corresponding to those options. The most popular APIs are
REST and SOAP, which are employed by Amazon S3 and
Windows Azure for example. The APIs provide for ways to
connect to services via a specific interface and specify how
systems have to communicate with each other, including how
data is exchanged between each entity and how data is saved on
the cloud storage servers. Other APIs include Block-based
access to cloud storage.
Another part of the storage protocol is the transport protocol
that is used to transfer the files from a client to the storage
servers. An example is of course the TCP/IP stack of protocols,
that is also being used in HTTP to power the Web. Dropbox, for
example, uses the HTTP and HTTPS application layer protocols
to transfer its files [1]. The use of HTTP(S) introduces roundtrip times, as messages are acknowledged upon receipt. The
duration of those round-trip times also influences the
performance of cloud storage services.
6
http://w3techs.com/technologies/details/ce-compression/all/all
Figure 1: Cloud storage access methods showing Web- Fileand Block-based APIs and others. Figure taken and slightly
adapted from [4].
5. SKYDRIVE VS. DROPBOX
This Section compares SkyDrive and Dropbox. Subsection 5.1
describes SkyDrive internals. In subsection 5.2 the geographical
distribution of servers in SkyDrive will be assessed. In
subsection 5.3 a comparison of data deduplication, delta
updates and data compression is performed. In Section 5.4 the
popularity of both services is featured. Lastly, subsection 5.5
shows how SkyDrive stacks up against Dropbox in a conclusive
summary. It also discusses the results and introduces future
work.
5.1 SkyDrive In-Depth
Intelligent Transfer Service (BITS) 7. BITS defines new headers
on top of the standard HTTP headers. In BITS new sessions are
started for every file that has to be uploaded via a CreateSession packet. Files are uploaded in Fragment packets, which
contain information on the part of the file that is being uploaded
and the data itself. The Fragment packets contain the blocks that
were described in the previous paragraph and, as such, are
around 1MB in size in the SkyDrive service. Although
SkyDrive uses the BITS headers, it does not seem to run on the
BITS protocol. Connections to the storage servers use remote
port 443, and data is sent encrypted over the network.
Connections to the storage server are closed a little after the file
transfer is completed.
A continuous connection is present whenever the SkyDrive
application is running. This connection also uses remote port
443. It periodically polls a notification server for notifications
the application has subscribed for. These notifications include
the amount of disk space used, the disk space quota and
information on files that have been uploaded or updated.
When the SkyDrive application is started, authentication is
performed via login.live.com, based on a Windows Live ID.
After successfully authenticating, the application registers itself
for notifications on act-3.blu.mesh.com. Notifications are sent
by a host suffixed with wns.windows.com. Storage operations
are all performed against a host suffixed with storage.msn.com,
except in the case of storage via the web interface, which are
performed against hosts suffixed with storage.live.com. Other
hostnames associated with the web interface have been omitted
for brevity.
This section describes technical details of SkyDrive that were of
interest during the research and is in its totality an answer to the
first research question. Knowledge about the internals of
SkyDrive was needed to setup experiments for the other two
research questions.
Table 1 shows the hostnames that are in use by SkyDrive,
together with services that are provided by those hostnames.
Users are identified by a 16-character identifier. This identifier
is also used for identifying every single file or folder that is
stored on the service. When used as an identifier for files and
folders, a numerical suffix is added to identify the right entity.
An example is B222AADFECF84486!1514, where the
exclamation mark separates the user- and file-identifier.
Hostname
login.live.com
*.mesh.com
*.wns.windows.com
skydrivesync.policies.live.net
skyapi.live.net
ssw.live.com
*.storage.msn.com
*.storage.live.com
The application stores a local database in which file metadata
are kept. This metadata includes filename, client-identifier, fileidentifier and a 32-character hash-value. When a file is added, it
is assigned a provisional file-identifier. These look like
#b18dd088-9f1f-4bb9-aba1-1206. The file is then uploaded to
the storage server and, as soon as the upload is finished, the file
is assigned a final file-identifier, which looks like the one
described in the previous paragraph.
When a file gets altered, its hash value is checked. When this
value is not the same as the one that is present in the database,
the file is uploaded to the storage server again.
Files stored using the native application on Windows are split
up in blocks. The maximum block-size is set in a configuration
file (ClientPolicy.ini), which can be found in the local
application data folder. The currently assigned block-size is
1MB. The SkyDrive application periodically checks online
whether the policies that are set in ClientPolicy.ini have to be
updated, so the block-size might be subject to changes.
The transfer of files is carried out via HTTPS. Analyzing the
headers of the packets using Charles showed that SkyDrive uses
special headers that are defined in Microsoft’s Background
Table 1: Hostnames and their use in SkyDrive
Service
Authentication
Notification subscription
Notifications
Client Policy updates
API functions
Debug/Statistics
Storage
Storage via web
5.2 Server Distribution
Table 2 shows the hostnames that are in use by SkyDrive to
store files. The client application runs uploads to exactly one of
those hostnames; the one that is used can change over time
though, as the hostname that should be used by the service is
explicitly stated in the ClientPolicy.ini file. Every hostname is
associated with at least two distinct IP-addresses. Together with
the option to change the storage server at runtime due to a
ClientPolicy.ini update, this indicates load balancing is
performed in the service. All hostnames have been traced to the
United States, using MaxMind.com data, Route.IM data and by
running traceroute. Most of them resolve to the state of
Washington, whereas two IP-addresses where traced to the state
California. The region the hosts behind dm1.storage.msn.com
originate from could not be resolved on MaxMind.com, but
7
http://bit.ly/Microsoft-BITS, Microsoft TechNet, accessed on
14-01-2013.
response times on Route.IM suggest that they are closer to
California than Washington.
Stage 1 – LI6000.txt, containing 6000 paragraphs of ‘Lorem
Ipsum’, was uploaded. This resulted in approximately 3.7
megabytes being uploaded to the SkyDrive storage server.
Table 2: Hostnames, associated IP-addresses and locations
for storage servers in SkyDrive
Stage 2 – The contents of LI6000.txt were copied, appended to
the original LI6000.txt and saved, basically doubling the size of
the file. This resulted in 7.5 megabytes getting transferred to the
SkyDrive storage server, which is about equal to the file size of
LI12000.txt.
Hostname(s)
IP address
Ctry.
Rgn.
by1.storage.msn.com
65.54.191.46
US
WA
by2.storage.msn.com
65.54.191.47
US
WA
blu1.storage.msn.com
65.55.195.238
US
WA
blu2.storage.msn.com
65.55.195.239
US
WA
dm1.storage.msn.com
157.55.246.46
US
-
157.55.246.47
US
-
157.55.241.174
US
-
157.55.241.175
US
-
207.46.0.174
US
CA
207.46.0.175
US
CA
sn2.storage.msn.com
As the above table shows, all storage servers are located in the
United States. This means files from all over the world need to
be send there to be stored. As SkyDrive uses TCP at the
transport layer, and closes the connection to the storage server
after a file transfer is completed, this might cripple performance
for users that are not close to the United States. This is because
TCP employs a slow-start mechanism. Performance is affected
by the round-trip time between the client application and
storage server.
5.3 Technology Comparison
Our experiments showed that SkyDrive does not employ data
deduplication, delta updates and data compression. The latter
can be established from inspecting Figure 2. The file sizes on
the horizontal axis correspond to the file sizes in Table 3. They
all contained a specific number of paragraphs of ‘Lorem Ipsum’
- text that is often used as dummy text on websites when
designing page layouts 8 -, according to the number that is
present in their filename. The reason ‘regular’ text and no
random data was inside the files, is because of the possible data
compression on files in the service. When random data is
inside, the compression rate might well be 0%, which is not the
case when regular text is used. Files were built in a modular
manner to exploit the features of data deduplication and delta
updates. The graph shows a linear progress in upload traffic
when the size of the file that is being uploaded increases. The
amount of upload traffic is bigger than the file size. This
overhead contains the information needed to administer the
upload of the file. The absence of data compression can be
concluded from the fact that the amount of bytes uploaded is
bigger than the amount of bytes the files consist of. The
derivative, or direction coefficient, in Figure 2 is about equal to
1.006, whereas employment of data compression would have
shown a derivative smaller than 1.0. In Dropbox, according to
[1], data compression is present. This can also be established
from inspecting Figure 2.
To discover whether SkyDrive employs delta updates an
experiment was setup that consisted of four stages:
8
http://www.lipsum.com/
Stage 3 – The resulting file from Stage 2 was again appended
with 6000 paragraphs of ‘Lorem Ipsum’ and saved. The file
now contains 18000 paragraphs. This resulted in 11.2
megabytes being sent to the SkyDrive storage server, which is
about equal to the file size of LI18000.txt.
Stage 4 – Consisted of cutting off the last 15000 paragraphs
from the Stage 3 file and saving, resulting in 1.9 megabytes
being sent to the SkyDrive storage server, which is about equal
to LI3000.txt.
The above experiment shows SkyDrive does not use delta
updates. The same measurement was performed with Dropbox
as the storage service.
Figure 3 shows the results of both measurements. From the
figure it can be concluded that Dropbox does indeed employ
delta updates, as the amount of upload traffic does not double
when doubling the amount of data in the file and that SkyDrive
does not employ delta updates, as every single byte is sent when
a file is changed.
Table 3: Files and their size as used during measurements
Filename
LI3000.txt
LI6000.txt
LI9000.txt
LI12000.txt
LI15000.txt
LI18000.txt
LI21000.txt
LI24000.txt
LI27000.txt
LI30000.txt
File size (Bytes)
1.866.358
3.732.718
5.599.089
7.465.438
9.331.798
11.198.158
13.064.518
14.930.878
16.797.238
18.663.598
File size Rounded
(MB)
1.9
3.7
5.6
7.5
9.3
11.2
13.1
14.9
16.8
18.7
Figure 2: Upload Traffic observed when uploading the
‘Lorem Ipsum’ files to SkyDrive.
The absence of client-side data deduplication in the SkyDrive
service has been established by uploading LI3000.txt to the
storage servers five times, each time to a different folder.
Analysis of the traffic generated showed that the file was
uploaded to the storage servers in its entirety each time. From
this fact can be concluded that SkyDrive does not keep track of
files that are already present on the storage servers for a specific
user and so does not perform client-side data deduplication to
save upload bandwidth. Dropbox does employ client-side data
deduplication on a per-user basis. The results of this experiment
are shown in Figure 4. Note that only during the first upload the
(compressed) bytes are uploaded to Dropbox, whereas the entire
file is sent uncompressed every time to SkyDrive.
Figure 3: Amount of uploaded bytes under common file
operations, e.g. appending and deleting text.
Figure 5: Number of unique IP-addresses connecting to a
SkyDrive or Dropbox storage server during two different
timespans
Figure 4: Upload traffic observed when uploading
LI3000.txt to five different folders.
5.4 Service Popularity
The popularity of the SkyDrive service was measured by
monitoring the unique IP-addresses that connected to a storage
server each day, in a building on the campus of the University
of Twente. This gives a good indication of popularity, as clients
would only connect to a storage server when they upload a file.
The same was performed with Dropbox as storage service. The
top part of Figure 5 shows the measurement for the period from
the 1st of June till the 5th of July. It shows more unique IPaddresses connecting to a Dropbox server than there are unique
IP-addresses connecting to a SkyDrive storage server. The
bottom part of Figure 5 shows the number of unique IPaddresses that connect to a storage server in the SkyDrive and
Dropbox service in the period spanning from September the
19th till October the 22th. The graph shows SkyDrive is roughly
at 1/6th of unique IP-addresses as compared to Dropbox. An
decrease of about 10.7% was observed. The number of IPaddresses connecting to a Dropbox storage server remained
pretty stable. A decrease of 2.3% was observed.
Figure 6: Amount of traffic generated during two different
timespans
The amount of traffic generated during flows was also
measured. Figure 6 shows the sum of downloaded and
uploaded MB to SkyDrive and Dropbox storage servers. Two
different timespans were used again, and they roughly
correspond to the timespans in Figure 5. The amount of traffic
generated by interacting with SkyDrive storage servers during
the September/October timespan has increased by
approximately 178.8% compared to the measurements from
June. This includes both up- and downloaded bytes. The
amount of traffic generated by interacting with Dropbox storage
servers decreased by approximately 14.0% during that same
timespan.
5.5 Discussion
Table 4 shows the described features and briefly summarizes
the findings of Sections 5.2 and 5.3. As written before,
SkyDrive does not employ client-side data deduplication, delta
updates nor data compression, as opposed to Dropbox. Reasons
for this, and these are conjectures only, could include that the
development team of SkyDrive was under the impression the
current state of the Internet provides for enough bandwidth to
handle the operation of the service in its current form. Also,
Microsoft owns an infrastructure that provides a lot of storage
space and bandwidth and can therefore offer SkyDrive in its
current form. In contrast, there is Dropbox, which has to
squeeze out every single bit of bandwidth as it has to pay for the
rent of storage space and the amount of bandwidth uploaded to
these servers to Amazon S3, which indicates why Dropbox
employs various technologies to reduce the amount of
bandwidth generated and bytes stored.
Both SkyDrive and Dropbox do not employ geographical
distribution of the user’s data on a world-wide scale, as both
services store files in the United States. Reason for this could be
that Microsoft’s infrastructure is based there, and they felt no
need to distribute the data geographically. As explained, the
distance packets have to travel, influences the speed with which
this happens, and thus influences the speed with which files can
be uploaded to the service.
Table 4: Technologies and their presence in SkyDrive and
Dropbox
Cloud Storage Provider
Technology
Client-Side Data Dedupl.
Delta Updates
Data Compression
Server Distribution
Storage Protocol
SkyDrive
No
No
No
US
Via HTTPS
Dropbox
Yes
Yes
Yes
US
Via HTTPS
Future work in this field could be conducted on other providers
of cloud storage, to determine whether other technologies have
been deployed to enhance performance of those services. The
usage of the web interface to SkyDrive could be investigated
also, to get a more thorough understanding of the service and
how it performs compared to Dropbox. Also, the way cloud
storage services are being utilized by clients could be
investigated, to gain a better understanding of the typical usage
of cloud storage services.
6. RELATED WORK
As written before, [1] provides for a thorough understanding of
the Dropbox service. Its performance is clearly discussed in the
paper. In this paper SkyDrive has been researched, and it is
shown to be the second most popular cloud storage provider.
Another paper on the performance of cloud storage is [3]. In
this paper Dropbox is discussed amongst three other cloud
storage providers. The performance was measured while
making and restoring an online backup. The methodology is
very similar to the one in this research, but the SkyDrive service
is used and examined in this research. Also, we address some
features that enhance the performance of cloud storage
providers.
A paper in which the optimization of cloud storage systems is
discussed is [6]. This paper describes which factors influence
the performance of cloud storage systems and current issues on
existing services. These were used to understand the
performance of Microsoft SkyDrive and compare it effectively
with Dropbox.
In [4],[9] and [10] a couple of features that make cloud storage
services perform more efficiently are introduced and discussed.
These features were used in the literature survey on the
performance comparison between SkyDrive and Dropbox.
7. CONCLUSIONS
We have performed an analysis of the SkyDrive application to
gain an understanding of its internals. We have established the
service stores files over HTTPs using headers available in the
Microsoft BITS service and maintains a local database of files
stored online. As soon as a file is changed, the file is sent
encrypted to the storage server. This answers the question how
SkyDrive administers and handles its files.
The measurements have also shown that the storage servers,
against which most traffic in the SkyDrive service is performed,
are all located in the United States. This does not differ from
Dropbox however, as both services do not employ the strategy
that Content Delivery Networks employ to speed up the up- and
download speed by deploying servers close to clients. This
answers the question how the servers in the SkyDrive service
are distributed over the world.
Experiments conducted during this research have shown that
SkyDrive does not employ client side data deduplication, data
compression nor delta updates, as opposed to Dropbox. This
answers the third research question.
From those three conclusions we conclude that the Microsoft
SkyDrive service is inferior to Dropbox in terms of the presence
of performance-enhancing features. The distribution of storage
servers in SkyDrive is setup in the same way as in Dropbox.
However, no performance-enhancing features that are available
in Dropbox are available in SkyDrive. This results in quite
some bandwidth being squandered by SkyDrive.
8. ACKNOWLEDGEMENTS
This paper has been written as part of the ‘Broadband for All’
track of the Bachelorreferaat course at the University of
Twente, which is supervised by the ‘Design and Analysis of
Communication Systems’-group (DACS) of the University of
Twente. I would like to thank my supervisor, I. Drago, for his
continuing support and insights during my work.
9. REFERENCES
[1] Drago, I., Mellia, M., Munafò, M.M., Sperotto, A., Sadre,
R. and Pras, A. 2012. Inside Dropbox: Understanding
Personal Cloud Storage Services. In Proceedings of the
12th ACM SIGCOMM Conference on Internet
Measurement. IMC ’12. Pages 481-494. DOI=
http://dx.doi.org/10.1145/2398776.2398827
[2] Fielding, R., e.a. 1999. RFC2616 - Hypertext Transfer
Protocol – HTTP 1.1. Available on
http://www.ietf.org/rfc/rfc2616.txt
[3] Hu, W., Yang, T., Matthews, J.N. 2010. The good, the bad
and the ugly of consumer cloud storage. ACM SIGOPS
Operating Systems Review, Vol 44, Issue 3, July 2010,
pages 110-115.
DOI=http://dx.doi.org/10.1145/1842733.1842751
[4] Jones, M. T., 2010. Anatomy of a cloud storage
infrastructure. IBM developerWorks. Available on
http://www.ibm.com/developerworks/cloud/library/clcloudstorage/. Also available as PDF.
[5] Poese, I., Uhlig, S., Kaafar, M.A., Donnet, B., Gueye, B.
2011. IP geolocation databases: unreliable? ACM
SIGCOMM Computer Communication Review, Vol 41,
Issue 2, April 2011, pages 53-56. DOI=
http://dx.doi.org/10.1145/1971162.1971171
[6] Spillner, J., Müller, J., Schill, A. 2012. Creating optimal
cloud storage systems. Future Generation Computer
Systems. 16 June 2012.
DOI=http://dx.doi.org/10.1016/j.future.2012.06.004
[7] Tridgell, A., Mackerras, P. 1996. The rsync algorithm.
Joint Computer Science Technical Report Series, TR-CS96-05
[8] Vakali, A., Pallis, G. 2003. Content delivery networks:
status and trends. IEEE Internet Computing, Vol 7, Issue 6,
Nov-Dec 2003, pages 68-74. DOI=
http://dx.doi.org/10.1109/MIC.2003.1250586
[9] Wang, L., et. al. 2010. Cloud Computing: a Perspective
Study. New Generation Computing, Vol 28, Issue 2, April
2010, pages 137-146.
DOI=http://dx.doi.org/10.1007/s00354-008-0081-5
[10] Zeng, W., Zhao, Y., Ou, K., Song, W. 2009. Research on
cloud storage architecture and key technologies.
Proceedings ICIS ’09, pages 1044-1048. DOI=
http://dx.doi.org/10.1145/1655925.1656114