Opening Up the Sky: A Comparison of Performance-Enhancing Features in SkyDrive and Dropbox Herman Slatman University of Twente P.O. Box 217, 7500AE Enschede The Netherlands [email protected] ABSTRACT Cloud storage services are increasing in popularity and using a growing amount of bandwidth on the Internet. Insights on how much traffic is generated is needed for a number of reasons. Cloud storage providers are interested in serving their clients efficiently and effectively, and they want to know how their product is performing and how they can improve their service. Internet Service Providers need an indication of the amount of traffic generated by cloud storage. Lastly, users of cloud storage services might want to know how their favorite service performs. At the moment not much is known about the performance of different cloud storage providers, but this paper aims at getting a thorough understanding of those services and their impact on the Internet. This paper focuses on Microsoft SkyDrive, as this is the second most popular cloud storage service [1] and because it has been neatly integrated in the Microsoft Windows operating system. Microsoft SkyDrive will be compared to Dropbox in terms of performance-enhancing features. As shown in [1], Dropbox storage servers are all located in the United States, which is not an optimal solution for clients spread around the world. Also, the way SkyDrive manages and transfers its files will be analyzed to assert whether SkyDrive has deployed more efficient synchronization strategies than Dropbox. This research contributes to getting to know which technologies the state-of-the-art cloud storage services have or have not deployed to increase performance and to gain a thorough understanding of the performance of Microsoft SkyDrive compared to Dropbox’s. Keywords Cloud Storage, Performance, SkyDrive increasingly making use of cloud storage services, like Dropbox, Google Drive and Microsoft SkyDrive, to store and share files with great ease. Those cloud storage services already generate quite some traffic on the Internet - an educated guess 1 on the total amount of traffic generated by uploading files to Dropbox, is estimated at about 54Gbps - and it is to be expected the amount of traffic due to cloud storage services will further increase in the future. To maintain the quality of the Internet in terms of available bandwidth and latency, predicting the impact cloud storage services have and will have on the Internet is important. To gain a better understanding of the impact of cloud storage services, having knowledge of the internals and the performance of those services is necessary. Not much is known about the internals of cloud storage services, but [1] gives a great insight in Dropbox’s internals, which is shown to be the most popular cloud storage provider. The goal of this paper is to get a thorough understanding of the Microsoft SkyDrive service, its internals and, specifically, the performance of aforementioned service. The main reasons Microsoft SkyDrive has been chosen as the research topic, are that it is the second largest service [1] in terms of traffic generated and because it has the potential to grow substantially in the near future. The latter is because SkyDrive can be accessed via the Web and several client applications are available for different operating systems and because it has been neatly integrated in the Microsoft Windows operating system. To say something about the performance of SkyDrive, a comparison against Dropbox will be performed. The main research question reads the following: 1. INTRODUCTION Recent developments show an increased interest in the use of cloud storage services. Both individuals and enterprises are Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 18thTwente Student Conference on IT, January 25, 2013, Enschede, The Netherlands. Copyright 2013, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science. How does SkyDrive compare to Dropbox, in terms of the presence of performance-enhancing features? The main research question is focused on generated traffic and the efficiency with which SkyDrive handles files on its service. To answer this research question, the research is split up in three parts, answering the following questions: 1 How are the administration and transfer of files controlled in SkyDrive? How are the servers in SkyDrive distributed over the world? http://www.extremetech.com/computing/129183-how-big-isthe-cloud, accessed on 03-10-2012 Does SkyDrive deploy specific technologies to enhance its performance, and how do these compare to Dropbox? The first question was included to gain an understanding of the operation of SkyDrive. Information gained from this was used to setup the experiments for the two remaining research questions. Together these questions give an insight of the performance of the Microsoft SkyDrive service compared to Dropbox in terms of features. An overview of the SkyDrive will be given in Section 2. Section 3 describes the methodology used to conduct this research. In Section 4 technologies that enhance the performance of cloud services are introduced. Section 5 compares SkyDrive and Dropbox. The subsequent Section introduces related work. Section 7, lastly, summarizes the conclusions of this paper. 2. A BIRD’S EYE VIEW OF SKYDRIVE SkyDrive was initially released by Microsoft in 2007 and has since then been known under a few different names. At the time of writing, it offers 7 GB of storage for free to new users, whereas early users could opt-in for a free 25GB if they had used the service before the 22nd of April of 2012. Client applications are available for Windows Vista and Windows 7, which can be used to integrate SkyDrive inside those operating systems. In Windows 8, Microsoft’s newest iteration of the operating system, the SkyDrive has been integrated natively. Client applications are also available for the OS X, iOS, Windows Phone and Android operating systems, covering a broad spectrum of devices. This paper focusses on the desktop client for Windows 7. A web interface to the SkyDrive service is also available, which is built on HTML5 technologies 2. Amongst other things, it supports email-integration, integration with Microsoft Office and it features Office Web Apps, in which users can create, view and edit documents right in the browser and store them on SkyDrive. Users can login to the service with their Microsoft Account which is used in all other services provided by Microsoft. Files on the service can be shared with other people that have a Microsoft Account, but it is also possible to share files on social networks, such as Twitter, LinkedIn and Facebook. SkyDrive maintains an Access Control List (ACL) for every file and folder 3, which is used to grant users the privileges needed to execute the associated operations on a file. It is possible, for example, to create an URL for a file which has the property that everyone is allowed to read the file, but not change it. It is also possible to mandate to be logged in before access is granted. On the 15th of November 2012, Microsoft introduced selective sync, enabling users to control which files are being synchronized amongst their devices. Updates to the SkyDrive applications for Windows Phone 8 and Android were also rolled out. According to Microsoft, on November the 15th the amount of SkyDrive storage had doubled since the introduction of the desktop and mobile applications on April the 22nd of 2012. 3. METHODOLOGY Active and passive measurements have been carried out to assess the performance of SkyDrive and to compare it with Dropbox. These included uploading files to the SkyDrive servers and measurements to determine the location of the servers. Before conducting any active or passive experiments, a lab environment suitable for those experiments was setup. This lab environment consisted of a host PC running Debian GNU/Linux version 6.0, kernel 2.6.32-5-amd, on which Wireshark, a popular packet sniffer and network protocol analyzer, was installed. Windows 7 was installed as a virtual machine. On this virtual machine, the SkyDrive client application was installed, together with Charles, a web debugging proxy. Charles is a shareware application that allows for setting up a local proxy to capture, for example, all data that is sent via SSL/TLS encrypted connections. The setup described above allowed for capturing and analyzing all traffic that was exchanged during the various experiments, including the encrypted traffic. 3.1 File Administration and Transfers Files differing in size and containing random text were uploaded to SkyDrive to determine the way SkyDrive handles file administration and transfers. At first these uploads were analyzed only using Wireshark, which showed all transfers and administration of files were carried over encrypted connections. Charles was used to gain a more thorough understanding of the information sent over those encrypted connections. Hostnames used in the service were recorded and the corresponding IPaddresses were added together with the functionality they provide. 3.2 Distribution of Servers Active measurements were performed to assess the geographical distribution of servers in the SkyDrive service. This was done in two consecutive steps. The first step was to find out what hostnames the SkyDrive application in the lab environment would connect to. Wireshark was used to analyze the relevant Internet traffic and it then showed some of the hostnames that SkyDrive connects to. Some online investigation showed more hostnames 4,5 to incorporate in this research. The second step involved setting up a test bed of Planet-Lab servers spread over the world. On those machines the traceroute and dig commands were executed against the hostnames found during the first step, to determine whether the hostnames would always resolve to the same IP. The IPaddresses that resulted from this step were all queried against the databases on MaxMind.com and Route.IM to get information on their geographical location. The results gained from querying those two websites were not taken for granted though, as research [5] shows these GeoIP services are not always precise, especially on the city-level. Instead, the results of the queries on those websites have been complemented by traceroute timings, to further establish the outcomes. 2 http://bit.ly/SD-Modern-Web, Introducing SkyDrive for the modern web, built using HTML5, accessed on 29-10-2012 4 http://bit.ly/Upload-Issues-For-ISP, accessed on 29-10-2012 Microsoft Answers, 3 http://bit.ly/Rebuilding-Permissions, Designing app-centric sharing for SkyDrive, accessed on 07-11-2012 5 http://bit.ly/Low-Bandwidth-Areas, accessed on 29-10-2012 Microsoft Answers, 3.3 Comparison With Dropbox The comparison between Dropbox and SkyDrive has been based both on a literature survey and active measurements. The literature survey was performed first to create an understanding of features that in general improve the performance of cloud services. Google Scholar was used primarily to search for relevant sources. Starting point were the very generic terms cloud storage and cloud service. Then some more terms were introduced in the search queries: for example performance and infrastructure. Active measurements were then conducted to determine if the features that were found during the literature survey are present in SkyDrive. This involved uploading a series of different files that were carefully crafted in order to ensure the features would be exploited when they were present in the service. The files that were uploaded as part of these measurements are described in Section 5.3 and can also be found in Table 3. Another part of the comparison is the assessment of the popularity of SkyDrive compared to Dropbox. This is not part of the research questions, but was included to be able to say something about the usage of the service. The dataset that was analyzed as part of this was produced by capturing flow data from a building on the campus of the University of Twente, in which 982 unique IP-addresses were present. These IPaddresses are assigned statically. The number of unique IPaddresses that connected to a storage server in the SkyDrive service was recorded. This was put against the number of IPaddresses that connected to a Dropbox storage server. Also, the amount of traffic generated in flows was captured. 4. CLOUD STORAGE TECHNOLOGIES A literature survey was conducted to gain an understanding of what features affect the performance of a cloud service, and more specifically, a cloud storage service. A selection of those features has been made and they are discussed and elaborated on in the following subsections. 4.1 Data Deduplication Many users store a lot of files in the cloud nowadays. It is perfectly possible some files are uploaded to a cloud storage facility by two or more different users or that it is being stored twice or more times by a single user. This could be the case for an e-book for example; it is then unnecessary to save more than one copy of the e-book in the storage service. This kind of administration is known as data deduplication, in this case, server-side data deduplication [3], [4]. Data deduplication allows for less Internet traffic to be generated, as files will not have to be uploaded when they are already present on the cloud storage facility. In this paper, only client side data deduplication will be considered. Client-side data deduplication can be implemented by creating a mechanism that checks if a file is already stored on the service, and only uploads a file when it is not already present. This saves bandwidth, as files will not be uploaded unnecessarily. 4.2 Delta Updates When a file is created to be stored on a cloud storage facility, it can in general be assumed that all bytes have to be transferred over the Internet. In general, files will change over time and those changes have to be synchronized to the cloud storage facility. Cloud storage providers can implement a feature called delta updates, with which it becomes possible to upload a chunk of data that has been changed, while leaving the unchanged chunks of data untouched [1]. An example of an algorithm that can be used to implement delta updates is the rsync algorithm [7]. Less Internet traffic is generated when delta updates are implemented in a cloud storage service, as there is no need to upload an entire file when only a small part is changed. 4.3 Data Compression Data compression is the act of encoding data in such a way that the encoded data takes less bytes to store the same information that is present in the original data [4]. When data compression is deployed on the client side of a cloud storage service, files that are exchanged with the cloud storage facility are compressed before they are sent over the Internet. This allows for less Internet traffic to be generated as, in general, files can indeed be compressed. RFC2616 describes the HTTP 1.1 specification [2], which includes a section on compression of files sent via HTTP. Compression is in fact in widespread 6 use by websites, saving their users bandwidth and time. 4.4 Server Distribution In general, services on the web perform faster and more efficiently when the client connecting to such a service is close to the server [8]. Services that are used on a world wide scale should therefore, ideally, deploy servers distributed all over the world, to guarantee a good performance and quick response for all users spread. This is no different in cloud storage services, in which a big amount of data has to be uploaded and downloaded, and therefore server distribution is an important part of the performance of those services. 4.5 Storage Protocol The storage protocol that is at the heart of a cloud storage service, and can therefore severely impact the performance of the service [4]. At a high-level, the protocols may be implemented in an Application Programming Interface (API). Several options are available, such as Web- and File-based APIs. Figure 1 shows a diagram with some of the available options categorized on access method, and some technologies corresponding to those options. The most popular APIs are REST and SOAP, which are employed by Amazon S3 and Windows Azure for example. The APIs provide for ways to connect to services via a specific interface and specify how systems have to communicate with each other, including how data is exchanged between each entity and how data is saved on the cloud storage servers. Other APIs include Block-based access to cloud storage. Another part of the storage protocol is the transport protocol that is used to transfer the files from a client to the storage servers. An example is of course the TCP/IP stack of protocols, that is also being used in HTTP to power the Web. Dropbox, for example, uses the HTTP and HTTPS application layer protocols to transfer its files [1]. The use of HTTP(S) introduces roundtrip times, as messages are acknowledged upon receipt. The duration of those round-trip times also influences the performance of cloud storage services. 6 http://w3techs.com/technologies/details/ce-compression/all/all Figure 1: Cloud storage access methods showing Web- Fileand Block-based APIs and others. Figure taken and slightly adapted from [4]. 5. SKYDRIVE VS. DROPBOX This Section compares SkyDrive and Dropbox. Subsection 5.1 describes SkyDrive internals. In subsection 5.2 the geographical distribution of servers in SkyDrive will be assessed. In subsection 5.3 a comparison of data deduplication, delta updates and data compression is performed. In Section 5.4 the popularity of both services is featured. Lastly, subsection 5.5 shows how SkyDrive stacks up against Dropbox in a conclusive summary. It also discusses the results and introduces future work. 5.1 SkyDrive In-Depth Intelligent Transfer Service (BITS) 7. BITS defines new headers on top of the standard HTTP headers. In BITS new sessions are started for every file that has to be uploaded via a CreateSession packet. Files are uploaded in Fragment packets, which contain information on the part of the file that is being uploaded and the data itself. The Fragment packets contain the blocks that were described in the previous paragraph and, as such, are around 1MB in size in the SkyDrive service. Although SkyDrive uses the BITS headers, it does not seem to run on the BITS protocol. Connections to the storage servers use remote port 443, and data is sent encrypted over the network. Connections to the storage server are closed a little after the file transfer is completed. A continuous connection is present whenever the SkyDrive application is running. This connection also uses remote port 443. It periodically polls a notification server for notifications the application has subscribed for. These notifications include the amount of disk space used, the disk space quota and information on files that have been uploaded or updated. When the SkyDrive application is started, authentication is performed via login.live.com, based on a Windows Live ID. After successfully authenticating, the application registers itself for notifications on act-3.blu.mesh.com. Notifications are sent by a host suffixed with wns.windows.com. Storage operations are all performed against a host suffixed with storage.msn.com, except in the case of storage via the web interface, which are performed against hosts suffixed with storage.live.com. Other hostnames associated with the web interface have been omitted for brevity. This section describes technical details of SkyDrive that were of interest during the research and is in its totality an answer to the first research question. Knowledge about the internals of SkyDrive was needed to setup experiments for the other two research questions. Table 1 shows the hostnames that are in use by SkyDrive, together with services that are provided by those hostnames. Users are identified by a 16-character identifier. This identifier is also used for identifying every single file or folder that is stored on the service. When used as an identifier for files and folders, a numerical suffix is added to identify the right entity. An example is B222AADFECF84486!1514, where the exclamation mark separates the user- and file-identifier. Hostname login.live.com *.mesh.com *.wns.windows.com skydrivesync.policies.live.net skyapi.live.net ssw.live.com *.storage.msn.com *.storage.live.com The application stores a local database in which file metadata are kept. This metadata includes filename, client-identifier, fileidentifier and a 32-character hash-value. When a file is added, it is assigned a provisional file-identifier. These look like #b18dd088-9f1f-4bb9-aba1-1206. The file is then uploaded to the storage server and, as soon as the upload is finished, the file is assigned a final file-identifier, which looks like the one described in the previous paragraph. When a file gets altered, its hash value is checked. When this value is not the same as the one that is present in the database, the file is uploaded to the storage server again. Files stored using the native application on Windows are split up in blocks. The maximum block-size is set in a configuration file (ClientPolicy.ini), which can be found in the local application data folder. The currently assigned block-size is 1MB. The SkyDrive application periodically checks online whether the policies that are set in ClientPolicy.ini have to be updated, so the block-size might be subject to changes. The transfer of files is carried out via HTTPS. Analyzing the headers of the packets using Charles showed that SkyDrive uses special headers that are defined in Microsoft’s Background Table 1: Hostnames and their use in SkyDrive Service Authentication Notification subscription Notifications Client Policy updates API functions Debug/Statistics Storage Storage via web 5.2 Server Distribution Table 2 shows the hostnames that are in use by SkyDrive to store files. The client application runs uploads to exactly one of those hostnames; the one that is used can change over time though, as the hostname that should be used by the service is explicitly stated in the ClientPolicy.ini file. Every hostname is associated with at least two distinct IP-addresses. Together with the option to change the storage server at runtime due to a ClientPolicy.ini update, this indicates load balancing is performed in the service. All hostnames have been traced to the United States, using MaxMind.com data, Route.IM data and by running traceroute. Most of them resolve to the state of Washington, whereas two IP-addresses where traced to the state California. The region the hosts behind dm1.storage.msn.com originate from could not be resolved on MaxMind.com, but 7 http://bit.ly/Microsoft-BITS, Microsoft TechNet, accessed on 14-01-2013. response times on Route.IM suggest that they are closer to California than Washington. Stage 1 – LI6000.txt, containing 6000 paragraphs of ‘Lorem Ipsum’, was uploaded. This resulted in approximately 3.7 megabytes being uploaded to the SkyDrive storage server. Table 2: Hostnames, associated IP-addresses and locations for storage servers in SkyDrive Stage 2 – The contents of LI6000.txt were copied, appended to the original LI6000.txt and saved, basically doubling the size of the file. This resulted in 7.5 megabytes getting transferred to the SkyDrive storage server, which is about equal to the file size of LI12000.txt. Hostname(s) IP address Ctry. Rgn. by1.storage.msn.com 65.54.191.46 US WA by2.storage.msn.com 65.54.191.47 US WA blu1.storage.msn.com 65.55.195.238 US WA blu2.storage.msn.com 65.55.195.239 US WA dm1.storage.msn.com 157.55.246.46 US - 157.55.246.47 US - 157.55.241.174 US - 157.55.241.175 US - 207.46.0.174 US CA 207.46.0.175 US CA sn2.storage.msn.com As the above table shows, all storage servers are located in the United States. This means files from all over the world need to be send there to be stored. As SkyDrive uses TCP at the transport layer, and closes the connection to the storage server after a file transfer is completed, this might cripple performance for users that are not close to the United States. This is because TCP employs a slow-start mechanism. Performance is affected by the round-trip time between the client application and storage server. 5.3 Technology Comparison Our experiments showed that SkyDrive does not employ data deduplication, delta updates and data compression. The latter can be established from inspecting Figure 2. The file sizes on the horizontal axis correspond to the file sizes in Table 3. They all contained a specific number of paragraphs of ‘Lorem Ipsum’ - text that is often used as dummy text on websites when designing page layouts 8 -, according to the number that is present in their filename. The reason ‘regular’ text and no random data was inside the files, is because of the possible data compression on files in the service. When random data is inside, the compression rate might well be 0%, which is not the case when regular text is used. Files were built in a modular manner to exploit the features of data deduplication and delta updates. The graph shows a linear progress in upload traffic when the size of the file that is being uploaded increases. The amount of upload traffic is bigger than the file size. This overhead contains the information needed to administer the upload of the file. The absence of data compression can be concluded from the fact that the amount of bytes uploaded is bigger than the amount of bytes the files consist of. The derivative, or direction coefficient, in Figure 2 is about equal to 1.006, whereas employment of data compression would have shown a derivative smaller than 1.0. In Dropbox, according to [1], data compression is present. This can also be established from inspecting Figure 2. To discover whether SkyDrive employs delta updates an experiment was setup that consisted of four stages: 8 http://www.lipsum.com/ Stage 3 – The resulting file from Stage 2 was again appended with 6000 paragraphs of ‘Lorem Ipsum’ and saved. The file now contains 18000 paragraphs. This resulted in 11.2 megabytes being sent to the SkyDrive storage server, which is about equal to the file size of LI18000.txt. Stage 4 – Consisted of cutting off the last 15000 paragraphs from the Stage 3 file and saving, resulting in 1.9 megabytes being sent to the SkyDrive storage server, which is about equal to LI3000.txt. The above experiment shows SkyDrive does not use delta updates. The same measurement was performed with Dropbox as the storage service. Figure 3 shows the results of both measurements. From the figure it can be concluded that Dropbox does indeed employ delta updates, as the amount of upload traffic does not double when doubling the amount of data in the file and that SkyDrive does not employ delta updates, as every single byte is sent when a file is changed. Table 3: Files and their size as used during measurements Filename LI3000.txt LI6000.txt LI9000.txt LI12000.txt LI15000.txt LI18000.txt LI21000.txt LI24000.txt LI27000.txt LI30000.txt File size (Bytes) 1.866.358 3.732.718 5.599.089 7.465.438 9.331.798 11.198.158 13.064.518 14.930.878 16.797.238 18.663.598 File size Rounded (MB) 1.9 3.7 5.6 7.5 9.3 11.2 13.1 14.9 16.8 18.7 Figure 2: Upload Traffic observed when uploading the ‘Lorem Ipsum’ files to SkyDrive. The absence of client-side data deduplication in the SkyDrive service has been established by uploading LI3000.txt to the storage servers five times, each time to a different folder. Analysis of the traffic generated showed that the file was uploaded to the storage servers in its entirety each time. From this fact can be concluded that SkyDrive does not keep track of files that are already present on the storage servers for a specific user and so does not perform client-side data deduplication to save upload bandwidth. Dropbox does employ client-side data deduplication on a per-user basis. The results of this experiment are shown in Figure 4. Note that only during the first upload the (compressed) bytes are uploaded to Dropbox, whereas the entire file is sent uncompressed every time to SkyDrive. Figure 3: Amount of uploaded bytes under common file operations, e.g. appending and deleting text. Figure 5: Number of unique IP-addresses connecting to a SkyDrive or Dropbox storage server during two different timespans Figure 4: Upload traffic observed when uploading LI3000.txt to five different folders. 5.4 Service Popularity The popularity of the SkyDrive service was measured by monitoring the unique IP-addresses that connected to a storage server each day, in a building on the campus of the University of Twente. This gives a good indication of popularity, as clients would only connect to a storage server when they upload a file. The same was performed with Dropbox as storage service. The top part of Figure 5 shows the measurement for the period from the 1st of June till the 5th of July. It shows more unique IPaddresses connecting to a Dropbox server than there are unique IP-addresses connecting to a SkyDrive storage server. The bottom part of Figure 5 shows the number of unique IPaddresses that connect to a storage server in the SkyDrive and Dropbox service in the period spanning from September the 19th till October the 22th. The graph shows SkyDrive is roughly at 1/6th of unique IP-addresses as compared to Dropbox. An decrease of about 10.7% was observed. The number of IPaddresses connecting to a Dropbox storage server remained pretty stable. A decrease of 2.3% was observed. Figure 6: Amount of traffic generated during two different timespans The amount of traffic generated during flows was also measured. Figure 6 shows the sum of downloaded and uploaded MB to SkyDrive and Dropbox storage servers. Two different timespans were used again, and they roughly correspond to the timespans in Figure 5. The amount of traffic generated by interacting with SkyDrive storage servers during the September/October timespan has increased by approximately 178.8% compared to the measurements from June. This includes both up- and downloaded bytes. The amount of traffic generated by interacting with Dropbox storage servers decreased by approximately 14.0% during that same timespan. 5.5 Discussion Table 4 shows the described features and briefly summarizes the findings of Sections 5.2 and 5.3. As written before, SkyDrive does not employ client-side data deduplication, delta updates nor data compression, as opposed to Dropbox. Reasons for this, and these are conjectures only, could include that the development team of SkyDrive was under the impression the current state of the Internet provides for enough bandwidth to handle the operation of the service in its current form. Also, Microsoft owns an infrastructure that provides a lot of storage space and bandwidth and can therefore offer SkyDrive in its current form. In contrast, there is Dropbox, which has to squeeze out every single bit of bandwidth as it has to pay for the rent of storage space and the amount of bandwidth uploaded to these servers to Amazon S3, which indicates why Dropbox employs various technologies to reduce the amount of bandwidth generated and bytes stored. Both SkyDrive and Dropbox do not employ geographical distribution of the user’s data on a world-wide scale, as both services store files in the United States. Reason for this could be that Microsoft’s infrastructure is based there, and they felt no need to distribute the data geographically. As explained, the distance packets have to travel, influences the speed with which this happens, and thus influences the speed with which files can be uploaded to the service. Table 4: Technologies and their presence in SkyDrive and Dropbox Cloud Storage Provider Technology Client-Side Data Dedupl. Delta Updates Data Compression Server Distribution Storage Protocol SkyDrive No No No US Via HTTPS Dropbox Yes Yes Yes US Via HTTPS Future work in this field could be conducted on other providers of cloud storage, to determine whether other technologies have been deployed to enhance performance of those services. The usage of the web interface to SkyDrive could be investigated also, to get a more thorough understanding of the service and how it performs compared to Dropbox. Also, the way cloud storage services are being utilized by clients could be investigated, to gain a better understanding of the typical usage of cloud storage services. 6. RELATED WORK As written before, [1] provides for a thorough understanding of the Dropbox service. Its performance is clearly discussed in the paper. In this paper SkyDrive has been researched, and it is shown to be the second most popular cloud storage provider. Another paper on the performance of cloud storage is [3]. In this paper Dropbox is discussed amongst three other cloud storage providers. The performance was measured while making and restoring an online backup. The methodology is very similar to the one in this research, but the SkyDrive service is used and examined in this research. Also, we address some features that enhance the performance of cloud storage providers. A paper in which the optimization of cloud storage systems is discussed is [6]. This paper describes which factors influence the performance of cloud storage systems and current issues on existing services. These were used to understand the performance of Microsoft SkyDrive and compare it effectively with Dropbox. In [4],[9] and [10] a couple of features that make cloud storage services perform more efficiently are introduced and discussed. These features were used in the literature survey on the performance comparison between SkyDrive and Dropbox. 7. CONCLUSIONS We have performed an analysis of the SkyDrive application to gain an understanding of its internals. We have established the service stores files over HTTPs using headers available in the Microsoft BITS service and maintains a local database of files stored online. As soon as a file is changed, the file is sent encrypted to the storage server. This answers the question how SkyDrive administers and handles its files. The measurements have also shown that the storage servers, against which most traffic in the SkyDrive service is performed, are all located in the United States. This does not differ from Dropbox however, as both services do not employ the strategy that Content Delivery Networks employ to speed up the up- and download speed by deploying servers close to clients. This answers the question how the servers in the SkyDrive service are distributed over the world. Experiments conducted during this research have shown that SkyDrive does not employ client side data deduplication, data compression nor delta updates, as opposed to Dropbox. This answers the third research question. From those three conclusions we conclude that the Microsoft SkyDrive service is inferior to Dropbox in terms of the presence of performance-enhancing features. The distribution of storage servers in SkyDrive is setup in the same way as in Dropbox. However, no performance-enhancing features that are available in Dropbox are available in SkyDrive. This results in quite some bandwidth being squandered by SkyDrive. 8. ACKNOWLEDGEMENTS This paper has been written as part of the ‘Broadband for All’ track of the Bachelorreferaat course at the University of Twente, which is supervised by the ‘Design and Analysis of Communication Systems’-group (DACS) of the University of Twente. I would like to thank my supervisor, I. Drago, for his continuing support and insights during my work. 9. REFERENCES [1] Drago, I., Mellia, M., Munafò, M.M., Sperotto, A., Sadre, R. and Pras, A. 2012. Inside Dropbox: Understanding Personal Cloud Storage Services. In Proceedings of the 12th ACM SIGCOMM Conference on Internet Measurement. IMC ’12. Pages 481-494. DOI= http://dx.doi.org/10.1145/2398776.2398827 [2] Fielding, R., e.a. 1999. RFC2616 - Hypertext Transfer Protocol – HTTP 1.1. Available on http://www.ietf.org/rfc/rfc2616.txt [3] Hu, W., Yang, T., Matthews, J.N. 2010. The good, the bad and the ugly of consumer cloud storage. ACM SIGOPS Operating Systems Review, Vol 44, Issue 3, July 2010, pages 110-115. DOI=http://dx.doi.org/10.1145/1842733.1842751 [4] Jones, M. T., 2010. Anatomy of a cloud storage infrastructure. IBM developerWorks. Available on http://www.ibm.com/developerworks/cloud/library/clcloudstorage/. Also available as PDF. [5] Poese, I., Uhlig, S., Kaafar, M.A., Donnet, B., Gueye, B. 2011. IP geolocation databases: unreliable? ACM SIGCOMM Computer Communication Review, Vol 41, Issue 2, April 2011, pages 53-56. DOI= http://dx.doi.org/10.1145/1971162.1971171 [6] Spillner, J., Müller, J., Schill, A. 2012. Creating optimal cloud storage systems. Future Generation Computer Systems. 16 June 2012. DOI=http://dx.doi.org/10.1016/j.future.2012.06.004 [7] Tridgell, A., Mackerras, P. 1996. The rsync algorithm. Joint Computer Science Technical Report Series, TR-CS96-05 [8] Vakali, A., Pallis, G. 2003. Content delivery networks: status and trends. IEEE Internet Computing, Vol 7, Issue 6, Nov-Dec 2003, pages 68-74. DOI= http://dx.doi.org/10.1109/MIC.2003.1250586 [9] Wang, L., et. al. 2010. Cloud Computing: a Perspective Study. New Generation Computing, Vol 28, Issue 2, April 2010, pages 137-146. DOI=http://dx.doi.org/10.1007/s00354-008-0081-5 [10] Zeng, W., Zhao, Y., Ou, K., Song, W. 2009. Research on cloud storage architecture and key technologies. Proceedings ICIS ’09, pages 1044-1048. DOI= http://dx.doi.org/10.1145/1655925.1656114
© Copyright 2026 Paperzz