Doctoral Thesis Proposal Measuring and understanding performance in the mobile Internet Arash Molavi Kakhki College of Computer and Information Science Northeastern University [email protected] March 5, 2016 i Abstract In the past decade, mobile Internet access has gone from limited capacity and reliability 2G, to 4G technologies with high speed Internet access. Mobile phones have also evolved from simple voice and text devices to smart phones and tablets with ample resources that have dramatically improved user experience and popularity of smart devices. The combination of fast Internet access and powerful mobile devices has contributed to a rapid shift from desktop computers to smart devices for everyday online activities, hence an ever growing demand for fast and reliable mobile connectivity. The increase of mobile Internet usage brings its own challenges. For example ISPs are having a difficult time keeping up with the growing demand and have often relied more on traffic management policies and resource allocation optimization rather than expanding their capacity. In order to make advancements in mobile networks, we need tools to give us more visibility into these networks to understand their performance and unique properties that differ from fixed-line networks. In my thesis, I will provide tools and measurement techniques and results to improve visibility into mobile network performance from two different perspectives: a)understanding mobile network operator policies and b)understanding transport layer performance as seen from mobile clients. First, I develop techniques and tools that shed lights onto mobile network operator policies without requiring any special access to devices or ISPs. This is important because mobile networks are not transparent about their traffic management policies, and the impact of these policies on users’s experience is poorly understood. This is largely because mobile networks and devices are locked down and opaque, and my work demonstrates that it is possible to overcome this limitation. Next, I propose to apply my differentiation detecting technique to conduct a case study of a recent instance of controversial differentiation: T-Mobile’s new service BingeOn. Finally, I propose to investigate Google’s new experimental transport protocol, QUIC, and perform an extensive analysis on how HTTP/S traffic over QUIC compares to TCP based protocol in a mobile network. This work can help to improve QUIC to address unique properties of the mobile environment. ii Contents 1 Introduction 1 2 Identifying Traffic Differentiation in Mobile 2.1 Intro and background . . . . . . . . . . . . . 2.2 Methodology . . . . . . . . . . . . . . . . . 2.3 Validation . . . . . . . . . . . . . . . . . . . 2.4 Evaluation . . . . . . . . . . . . . . . . . . . 2.5 Measurement study . . . . . . . . . . . . . . 2.5.1 Challenges in operational networks . 2.6 Differentiation results . . . . . . . . . . . . . Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 3 6 7 10 10 12 3 T-Mobile BingeOn 13 4 QUIC 15 5 Other work 18 5.1 Iolaus: Securing Online Content Rating Systems [1] . . . . . . . . . . . . . . . . . . . 18 5.2 Designing and Testing a Market for Browsing Data . . . . . . . . . . . . . . . . . . . 18 6 Plan for completion of the research 19 iii 1 Introduction During the last decade, mobile Internet access has evolved from 2G mobile voice technology with very limited capacity and reliability to 4G technologies which provide high speed Internet access, and 5G Internet with promises of "much greater throughput, much lower latency, ultra-high reliability, much higher connectivity density, and higher mobility range" [2] is on the horizon. Mobile phones have also evolved from simple voice and text devices to smart phones and tablets with high capacity CPUs, ample memories, powerful GPUs, and large and high resolution screens which have dramatically improved user experience and popularity of smart devices. This combination of fast mobile Internet access and powerful mobile devices, along with hundreds of thousands of applications designed specifically for smart devices, is rapidly changing the proportion of mobile to fixed-line Internet traffic. More and more users are using their phones for online activities from simple browsing and banking, to more resource hungry tasks such as gaming and media streaming, which were traditionally tied to computers connected to fixed-line networks. CISCO has recently reported an estimated 74% global mobile data traffic growth in 2015 and forecasts 30.6 exabytes per month of mobile data traffic by 2020, i.e., a five-fold increase in just less than 5 years [3]. While mobile and fixed-line networks are very similar and share many networking protocols and infrastructure, this increase of mobile Internet usage brings its own challenges at various layers. To name a few, mobile Internet service providers need to invest in updating their infrastructures to be able to accommodate the rapid growing demand, many Internet protocols were designed decades ago with no consideration of mobile networks and do not perform well in mobile environments and there is need for their constant tuning, or even introduction and implementation of new protocols that fit mobile networks better, and there is lack of transparency by mobile providers about their traffic management policies, and limited resources for end users and policy maker to monitor ISPs’ practices. The fact that mobile networks and devices are locked and opaque only adds to the complexity of these challenges. This thesis posits that an effective way to address these challenges is to design and develop tools that can provide visibility into mobile network, without requiring any special access to user devices or ISPs. Such tools will help to understand mobile network performance, crowd source network measurments, improve current protocols to address unique properties of the mobile environment, and provide transparency into mobile networks’ traffic management policies and their effect of user experience. In my thesis, I will provide tools and measurement techniques and results to improve visibility into mobile network performance from two different perspectives: a)understanding mobile network operator policies and b)understanding transport layer performance as seen from mobile clients. First, I look into traffic differentiation and content modification in mobile networks by designing and implementing a system which can detect such activities by ISPs. Traffic differentiation—giving better (or worse) performance to certain classes of Internet traffic—is a well-known but poorly understood traffic management policy. This is largely because mobile networks and devices are locked down and opaque. In this chapter of my thesis, I present the design, implementation, and evaluation of the first system and mobile app for identifying traffic differentiation for arbitrary applications in the mobile environment (i.e., wireless networks such as cellular and WiFi, used by smartphones and tablets). The key idea is to use a VPN proxy to record and replay the network traffic generated by arbitrary applications, and compare it with the network behavior when replaying this traffic outside of an encrypted tunnel. This work also performs the first known testbed experiments with 1 actual commercial shaping devices to validate my system design and demonstrate how it outperforms previous work for detecting differentiation. Second, I propose to apply my differentiation techniques to conduct a case study of a recent instance of controversial differentiation: T-Mobile’s new service BingeOn, which promises zero-rating for selected media streaming applications. I will investigate how T-Mobile treats different applications’ network traffic. I also propose to further extend this investigation and study the effect of T-Mobile’s zero-rating policies on users’ quality of experience. Third, I propose to investigate Google’s new experimental transport protocol, QUIC, and perform an extensive analysis on how HTTP/S traffic over QUIC compares to TCP. This includes benchmarking QUIC’s end-to-end performance in both mobile and fixed-line networks under various network conditions and comparing it head-to-head with HTTP/s over TCP, and looking into QUIC’s possible shortcomings and proposing solutions to mitigate them. I will also investigate if intermediation, i.e., a QUIC proxy, can help the end to end performance. Finally, the last chapter of my thesis will include two other projects that I worked on during my PhD. The first project introduces Iolaus, a system I designed and implemented for securing online content rating systems. It uses two novel techniques: (a) weighing ratings to defend against multiple identity attacks and (b) relative ratings to mitigate the effect of "bought" ratings. Second project is "Designing and Testing a Market for Browsing Data", where I prototype and evaluate an information market that provides privacy and revenue to users while preserving and sometimes improving their Web performance. 2 Identifying Traffic Differentiation in Mobile Networks 2.1 Intro and background The rise in popularity of bandwidth-hungry applications (e.g., Netflix) coupled with resource-constrained networks (e.g., mobile providers) has reignited discussions about how the different applications’ network traffic is treated (or mistreated) by ISPs. One commonly discussed approach to managing scarce network resources is traffic differentiation—giving better (or worse) performance to certain classes of network traffic—using devices that selectively act on network traffic (e.g., Sandvine [4], Cisco, and others). Differentiation can enforce policies ranging from protecting the network from bandwidth-hungry applications, to limiting services that compete with those offered by the network provider (e.g., providers that sell voice/video services in addition to IP connectivity). Recent events have raised the profile of traffic differentiation and network neutrality issues in the wired setting, with Comcast and Netflix engaged in public disputes over congestion on network interconnects [5–7]. The situation is more extreme in the mobile setting, where historically regulatory agencies imposed few restrictions on how a mobile provider manages its network, with subscribers and regulators generally having a limited (or no) ability to understand and hold providers accountable for management polices.1 Despite the importance of this issue, the impacts of traffic differentiation on specific applications— and the Internet as a whole—are poorly understood. Previous efforts attempt to detect traffic 1 As I discuss throughout this proposal, FCC rules that took effect in June 2015 [8] now prohibit many forms of traffic differentiation in the US. 2 differentiation (e.g., [9–12]), but are limited to specific types of application traffic (e.g., peer-to-peer traffic [9]), or focus on shaping behaviors on general classes of traffic as opposed to specific applications [11, 12]. Addressing these limitations is challenging, due to the wide variety of applications (often closed-source) that users may wish to test for differentiation, coupled with the closed nature of traffic shapers. As a result, external observers struggle to detect if such shapers exist in networks, what exactly these devices do, and what is their impact on traffic. What little we do know is concerning. Glasnost [9] and others [10, 11, 13] found ISPs throttling specific types of Internet traffic, Cogent admitted to differentiating Netflix traffic [14], and recent work by Google and T-Mobile [15] indicated that unintentional interactions between traffic shaping devices has been known to degrade performance. This proposal tackles three key challenges that have held back prior work on measuring traffic differentiation: (1) the inability to test arbitrary classes of applications, (2) a lack of understanding of how traffic shaping devices work in practice, and (3) the limited ability to measure mobile networks (e.g., cellular and WiFi) from end-user devices such as smartphones and tablets. Designing methods that can measure traffic differentiation in mobile networks (in addition to well-studied fixed-line networks) presents unique challenges, as the approaches must work with highly variable underlying network performance and within the constraints of mobile operating systems. By addressing these challenges, we can provide tools that empower average users to identify differentiation in mobile networks, and use data gathered from these tools to understand differentiation in practice. There is an ongoing debate about whether the network should be neutral with respect to the packets it carries [16]; i.e., not discriminate against or otherwise alter traffic except for lawful purposes (e.g., blocking illegal content). Proponents such as President Obama argue for an open Internet [17] as essential to its continued success, while some ISPs argue for shaping to manage network resources [9, 13]. The regulatory framework surrounding traffic differentiation is rapidly changing; the FCC recently introduced new rules that prohibit many forms of differentiation [8]. In this environment, it is important to monitor traffic differentiation in practice, both to inform regulators and enforce policies. 2.2 Methodology I use a trace record-replay methodology to reliably detect traffic differentiation for arbitrary applications in the mobile environment. I provide an overview in Figure 1. I record a packet trace from a target application (Fig. 1(a)), extract bidirectional application-layer payloads, and use this to generate a transcript of messages for the client and server to replay (Fig. 1(b)). To test for differentiation, the client and server coordinate to replay these transcripts, both in plaintext and through an encrypted channel (using a VPN tunnel) where payloads are hidden from any shapers on the path (Fig. 1(c)). Finally, I use validated statistical tests to identify whether the application traffic being tested was subject to differentiation (Fig. 1(d)). a) Recording the trace I need a way to enable end users to capture traces of mobile application traffic without requiring modifications to the OS. To address this challenge, I leverage the fact that all major mobile operating systems natively support VPN connections, and use the Meddle VPN [18] to facilitate trace recording. Figure 1(a) illustrates how a client connects to the Meddle VPN, which relays traffic between the client and destination server(s). In addition, the Meddle VPN collects packet traces that I use to generate replay transcripts. When possible, I record traffic using the same 3 (a) Record (b) Parse (c) Replay (d) Analyze Figure 1: System overview: (a) Client connects to VPN, that records traffic between the client and app server(s). (b) Parser produces transcripts for replaying. (c) Client and replay server use transcripts to replay the trace, once in plaintext and once through a VPN tunnel. Replay server records packet traces for each replay. (d) Analyzer uses the traces to detect differentiation. network being tested for differentiation because applications may behave differently based on the network type, e.g., a streaming video app may select a higher bitrate on WiFi. My methodology is extensible to arbitrary applications—even closed source and proprietary ones—to ensure it works even as the landscape of popular applications, the network protocols they use, and their impact on mobile network resources changes over time. Prior work focused on specific applications (e.g., BitTorrent [9]) and manually emulated application flows. This approach, however, does not scale to large numbers of applications and may even be infeasible when the target application uses proprietary, or otherwise closed, protocols.2 b) Creating the replay script After recording a packet trace, my system processes it to extract the application-layer payloads as client-to-server and server-to-client byte streams. I then create a replay transcript that captures the behavior of the application’s traffic, including port numbers, dependencies between application-layer messages (if they exist) and any timing properties of the application (e.g., fixed-rate multimedia).3 Finally, I use each endpoint’s TCP or UDP stack to replay the traffic as specified in the transcript. Figure 1(b) shows how I create two objects with the necessary information for the client and server to replay the traffic from the recorded trace. Logical dependencies: For TCP traffic, I preserve application-layer dependencies using the implicit happens-before relationship exposed by application-layer communication in TCP. More specifically, I extract two unidirectional byte streams sAB and sBA for each pair of communicating hosts A and B in the recorded trace. For each sequence of bytes in sAB , I identify the bytes in sBA that preceded them in the trace. When replaying, host A sends bytes in sAB to host B only after it has received the preceding bytes in sBA from host B. I enforce analogous constraints from B to A. For UDP traffic, I do not enforce logical dependencies because the transport layer does explicitly not impose them. Timing dependencies: A second challenge is ensuring the replayed traffic maintains the intermessage timing features of the initial trace. For example, streaming video apps commonly download 2 The Glasnost paper describes an (unevaluated) tool to support replaying of arbitrary applications but I was unsuccessful in using it, even with the help of a Glasnost coauthor. 3 This approach is similar to Cui et al.’s [19]; however, my approach perfectly preserves application payloads (to ensure traffic is classified by shapers) and timing (to prevent false positives when detecting shaping). 4 and buffer video content in chunks instead of buffering the entire video, typically to reduce dataconsumption for viewer abandonment and to save energy by allowing the radio to sleep between bursts. Other content servers use packet pacing to minimize packet loss and thus improve flow completion times. My system preserves the timings between packets for both TCP and UDP traffic to capture such behavior when replaying (this feature can be disabled when timing is not intrinsic to the application). Preserving timing is a key feature of my approach, that can have a significant impact on detecting differentiation. In short, it prevents us from detecting shaping that does not impact applicationperceived performance. Specifically, for each stream of bytes (UDP or TCP) sent by a host, A, I annotate each byte with time offset from the first byte in the stream in the recorded trace. If the ith byte of A was sent at t ms from the start of the trace, I ensure that byte i is not sent before t ms have elapsed in the replay. For TCP, this constraint is enforced after enforcing the happens-before relationship. In the case of UDP, I do not retransmit lost packets. This closely matches the behavior of realtime applications such as VoIP and live-streaming video (which often tolerate packet losses), but does not work well for more complicated protocols such as BitTorrent’s µTP [20] or Google’s QUIC [21]. I will investigate how to incorporate this behavior in future work. c) Replaying the trace After steps 1 and 2, I replay the recorded traffic on a target network to test my null hypothesis that there is no differentiation in the network. This test requires two samples of network performance: one that exposes the replay to differentiation (if any), and a control that is not exposed to differentiation. A key challenge is how to conduct the control trial. Initially, I followed prior work [9] and randomized the port numbers and payload, while maintaining all other features of the replayed traffic. However, using my testbed containing commercial traffic shapers, I found that some shapers will by default label “random” traffic with high port numbers as peer-to-peer traffic, a common target of differentiation. Thus, this approach is unreliable to generate control trials because one can reasonably expect it to be shaped. Instead, I re-use the Meddle VPN tunnel to conduct control trials. By sending all recorded traffic over an encrypted IPSec tunnel, I preserve all application behavior while simultaneously preventing any DPI-based shaper from differentiating based on payload. Thus, each replay test consists of replaying the trace twice, once in plaintext (exposed trial), and once over the VPN (control trial), depicted in Figure 1(c). To detect differentiation in noisy environments, I run multiple back-to-back tests (both control and exposed). Note that I compare exposed and control trials with each other and not the original recorded trace. d) Detecting differentiation After running the exposed and control trials, I compare performance metrics (throughput, loss, and delay) to detect differentiation (Fig. 1(d)). I focus on these metrics because traffic shaping policies typically involve bandwidth rate-limiting (reflected in throughput differences), and may have impacts on delay and loss depending on the queuing/shaping discipline. I base my analysis on server-side packet captures to compute throughput and RTT values for TCP, because I cannot rely on collecting network-level traces on mobile clients. For UDP applications, I use jitter as a delay metric and measure it at the client at the application layer (observing inter-packet timings). A key challenge is how to automatically detect when differences between two traces are caused by differentiation, instead of signal strength, congestion, or other confounding factors. In Section 2.4, I 5 Cumulative transfer (Mbits) Cumulative transfer (Mbits) 35 30 25 20 15 10 5 0 Original Replay with timing Replay no timing 0 1 2 3 4 5 6 Time (s) 8 7 6 5 4 3 2 1 0 Original Replay with timing Replay no timing 0 10 20 30 40 50 60 Time (s) (a) YouTube (b) Skype Figure 2: Plots showing bytes transferred over time, with and without preserving packet timings for TCP (YouTube) and UDP (Skype) applications. By preserving inter-packet timings, my replay closely resembles the original traffic generated by each app. find that previous techniques to identify differentiation are inaccurate when tested against commercial packet shaping devices with varying packet loss in my testbed. I describe a novel area test approach to compare traces that has perfect accuracy with no loss, and greater than 70% accuracy under high loss (Fig. 4(b)). 2.3 Validation I validate my methodology using my replay system and a testbed comprised of two commercial shaping devices. First, I verify that my approach captures salient features of the recorded traffic. Next, I validate that my replays trigger differentiation and identify the relevant features used for classification. Record/Replay similarity My replay script replicates the original trace, including payloads, port numbers, and inter-packet timing. I now show that the replay traffic characteristics are essentially identical to the recorded traffic. Figure 2 shows the results for a TCP application (YouTube (a)) and a UDP application (Skype (b)). As discussed above, preserving inter-packet timings is important to produce a replay that closely resembles the original trace (otherwise, I may claim that differentiation exists when the application would never experience it). Traffic shapers detect replayed traffic Our group acquired traffic shaping products from two different vendors and integrated them into a testbed for validating whether replays trigger differentiation. The testbed consists of a client connected to a router that sits behind a traffic shaper, which exchanges traffic with a gateway server that I control. The gateway presents the illusion (to the packet shaper) that it routes traffic to and from the public Internet. I configure the replay server to listen on arbitrary IP addresses on the gateway, giving us the ability to preserve original server IP addresses in replays (by spoofing them inside my testbed). I describe below some of my key findings from this testbed. I verified that for a variety of popular mobile applications, including YouTube, Netflix, Skype, Spotify, and Google Hangouts, replay traffic is classified as the application that was recorded. 6 Replay transfer rate Region 3 (above max) Region 2 (between avg & max) Region 1 (below avg) Avg Max Shaping rate Figure 3: Regions considered for determining detection accuracy. If the shaping rate is less than the application’s average throughput (left), I should always detect differentiation. Likewise, if the shaping rate is greater than peak throughput (right), I should never detect differentiation. In the middle region, it is possible to detect differentiation, but the performance impact depends on the application. 2.4 Evaluation I present the first study that uses ground-truth information to compare the effectiveness of various techniques for detecting differentiation, and propose a new test to address limitations of prior work. I find that previous approaches for detecting differentiation are inaccurate when tested against groundtruth data, and characterize under what circumstances these approaches yield correct information about traffic shaping as perceived by applications. When determining the accuracy of detection, I separately consider three scenarios regarding the shaping rate and a given trace (shown in Figure 3): • Region 1. If the shaping rate is less than the average throughput of the recorded traffic, a traffic differentiation detector should always identify differentiation because the shaper is guaranteed to affect the time it takes to transfer data in the trace. • Region 3. If the shaping rate is greater than the peak throughput of the recorded trace, a differentiation detector should never identify differentiation because the shaper will not impact the application’s throughput (as discussed earlier, I focus on shaping that will actually impact the application). • Region 2. Finally, if the shaping rate is between the average and peak throughput of the recorded trace, the application may achieve the same average throughput but a lower peak throughput. In this case, it is possible to detect differentiation, but the impact of this differentiation will depend on the application. For example, a non-interactive download (e.g., file transfers for app updates) may be sensitive only to changes in average throughput (but not peak transfer rates), while a real-time app such as video-conferencing may experience lower QoE for lower peak rates. Given these cases, there is no single definition of accuracy that covers all applications in region 2. Instead, I show that I can configure detection techniques to consistently detect differentiation or consistently not detect differentiation in this region, depending on the application. I explore approaches used in Glasnost [9] and NetPolice [10], and propose a new technique that has high accuracy in regions 1 and 3, and reliably does not detect differentiation in region 2. 1-Glasnost: Maximum throughput test. Glasnost [9] does not preserve the inter-packet timing when replaying traffic, and identifies differentiation if the maximum throughput for control 7 and exposed flows differ by more than a threshold. I expect this will always detect differentiation in region 1, but might generate false positives in regions 2 and 3 (because Glasnost may send traffic at a higher rate than the recorded trace). 2-NetPolice: Two-sample KS Test. As discussed in NetPolice [10], looking at a single summary statistic (e.g., maximum) to detect differentiation can be misleading. The issue is that two distributions of a metric (e.g., throughput) may have the same mean, median, or maximum, but vastly different distributions. A differentiation detection approach should be robust to this. To address this, NetPolice uses the two-sample Kolmogorov–Smirnov (KS) test, which compares two empirical distribution functions (i.e., empirical CDFs) using the maximum distance between two empirical CDF sample sets for a confidence interval α. To validate the KS Test result, NetPolice uses a resampling method as follows: randomly select half of the samples from each of the two original input distributions and apply the KS Test on the two sample subsets, and repeat this r times. If the results of more than β% of the r tests agree with the original test, they conclude that the original KS Test statistic is valid. 3-My approach: Area Test. The KS Test only considers the difference between distributions along the y-axis even if the difference in the x-axis (in this case, throughput) is small. In my experiments, I found this makes the test very sensitive to small changes in performance not due to differentiation, which can lead to inaccurate results or labeling valid tests as invalid. To address this, I propose an Area Test that accounts for the degree of differentiation detected: I find the area, a, between the two CDF curves and normalize it by the minimum of peak throughputs for each distribution, w. This test concludes there is differentiation if the KS Test detects differentiation and the normalized area between the curves is greater than a threshold t. I next evaluate the three differentiation tests in out lab testbed, which consists of a commercial traffic shaper, a replay client, and a replay server. I vary the shaping rate using the commercial device to shape each application to throughput values in regions 1-3 and emulate noisy packet loss in my tests using the Linux Traffic Control (tc) and Network Emulation (netem) tools to add bursty packet loss according to Gilbert-Elliott (GE) model [22]. The criteria under which I evaluate the three differentiation tests are as follows: • Overall accuracy: Using the taxonomy in Figure 3, a statistical test should always detect differentiation in region 1, and never detect differentiation in region 3. I define accuracy as the fraction of samples for which the test correctly identifies the presence or absence of differentiation in these regions. • Resilience to noise: Differences between two empirical CDFs could be explained by a variety of reasons, including random noise [11]. I need a test that retains high accuracy in the face of large latencies and packet loss that occur in the mobile environment. When evaluating and calibrating my statistical tests, I take this into account by simulating packet loss. • Data consumption: To account for network variations over time, I run multiple iterations of control and exposed trials back to back. I merge all control trials into a single distribution, and similarly merge all exposed trials. As I show below, this can improve accuracy compared to running a single trial, but at the cost of longer/more expensive tests. When picking a statistical test, I want the one that yields the highest accuracy with the smallest data consumption. Below are detailed results of my evaluation: 8 App Netflix YouTube Hangout Skype KS Test R1 R3 100 100 100 100 100 100 100 100 Area R1 100 100 100 100 Test R3 100 100 100 100 Glasnost R1 R3 100 0 100 0 100 0 90 0 Table 1: Shaping detection accuracy for different apps, in regions 1 and 3 (denoted by R1 and R3). I find that the KS Test (NetPolice) and Area Test have similarly high accuracy in both regions (1 and 3), but Glasnost performs poorly in region 3. App Netflix YouTube Hangout Skype Region 2 Area Test 28 0 0 10 KS Test 65 67 40 55 Glasnost 100 100 100 92 Table 2: Percent of tests identified as differentiation in region 2 for the three detection methods. Glasnost consistently identifies region 2 as differentiation, whereas the Area Test is more likely to not detect differentiation. Summary results for low noise. I first consider the accuracy of detecting differentiation under low loss scenarios. Table 1 presents the accuracy results for four popular apps in regions 1 and 3. I find that the KS Test and Area Test have similarly high accuracy in both regions 1 and 3, but Glasnost performs poorly in region 3 because it will detect differentiation, even if the shaping rate is above the maximum. I explore region 2 behavior later in this section; until then I focus on accuracy results for regions 1 and 3 and the two KS Tests (as they do not identify differentiation in region 3). Impact of noise. Properties of the network unrelated to shaping (e.g., congested-induced packet loss) may impact performance and cause false positives for differentiation detectors. In Figure 4 I examine the impact of loss on accuracy for one TCP application (Netflix) and one UDP application (Skype). I find that both variants of the KS Test have high accuracy in the face of congestion, but the Area Test is more resilient to moderate and high loss for Skype. Impact of number of replays. One way to account for noise is to use additional replays to average out any variations from transient noise in a single replay. To evaluate the impact of additional replays, I varied the number of replays combined to detect differentiation under high loss and plotted the detection accuracy in Fig. 4(b).4 In the case of moderate or high loss, increasing the number of replays improves accuracy, particularly for a short YouTube trace. In all cases, the Area Test is more accurate than (or equal to) the KS Test. Importantly, the accuracy does not significantly improve past 2 or 3 replays. In my deployment, I use 2 tests (to minimize data consumption). Region 2 detection results. Recall that my goal is to identify shaping that impinges on the network resources required by an application. However, in region 2, shaping may or may not impact the application’s performance, which makes identifying shaping with actual application impact challenging. I report detection results for the three tests in region 2 in Table 2. I find that Glasnost consistently identifies differentiation, the Area Test consistently does not detect differentiation, and 4 Additional replays did not help when there is low loss. 9 Netflix - Area Netflix - KS2 Skype - Area Skype - KS2 youtube-KS2 youtube-Area Accuracy Accuracy 1 0.8 0.6 0.4 0.2 1 0.9 0.8 0.7 0.6 1 0 Low loss Mod loss High loss hangout-KS2 hangout-Area 2 3 4 #replays combined (b) Combining replays on accuracy in high loss. (a) Accuracy against loss. Figure 4: Area test performance in loss. the KS Test is inconsistent. When testing an application that is sensitive to shaping of its peak throughput, the Glasnost method may be preferable, whereas the Area Test is preferred when testing applications that can tolerate shaping above their average throughput. When detecting differentiation, I conservatively avoid the potential for false positives and thus do not report differentiation for region 2. Thus, I use the Area Test in my implementation because it will more reliably achieve this result (Table 2). Determining which apps are affected by region 2 differentiation is a topic of future work. 2.5 Measurement study I then use my system to identify cases of differentiation using measurement study of production networks. For each network, I focus on detection for a set of popular apps that I expect might be subject to differentiation: streaming audio and video (high bandwidth) and voice/video calling (competes with carrier’s line of business). The data from these experiments was initially collected between January and May 2015. I then repeated the experiments in August 2015, after the new FCC rules prohibiting differentiation took effect. 2.5.1 Challenges in operational networks In this section I discuss several challenges that I encountered when attempting to identify differentiation in mobile networks. To the best of my knowledge, I are the first to identify these issues for detecting differentiation and discuss workarounds that address them. Per-client management. To replay traffic for multiple simultaneous users, my replay server needs to be able to map each received packet to exactly one app replay. A simple solution is to put this information in the first few bytes of each flow between client and server. Unfortunately, as shown earlier, this can disrupt classification and lead to false negatives. Instead, I use side-channel connections out of band from the replay to supply information regarding each replay run. NAT behavior. I also use the side-channel to identify which clients will connect from which IPs and ports. For networks without NAT devices, this works well. However, many mobile networks use NAT, meaning I can’t use the side-channel to identify the IP and port that a replay connection will use (because they may be modified by the NAT). For such networks, each replay server can reliably support only one active client per ISP and application. While this has not been an issue in my initial app release, I are investigating other work-arounds. Moreover, to scale the system, I can use multiple replay servers and a load balancer. The load balancer can be configured to assign clients with the 10 same IP address to different servers. “Translucent” HTTP proxies. It is well known that ISPs use transparent proxies on Web traffic [23]. My system works as intended if a proxy simply relays exactly the same traffic sent by a replay client. In practice, I found examples of “translucent” proxies that issue the same HTTP requests as the client but change connection keep-alive behavior; e.g., the client uses a persistent connection in the recorded trace and the proxy does not. To handle this behavior, my system must be able to map HTTP requests from unexpected connections to the same replay. Another challenge with HTTP proxies is that the proxy may have a different public IP address from other non-HTTP traffic (including the side-channel), which means the server will not be able to map the HTTP connections to the client based on IP address. In these cases the server replies to the HTTP request with a special message. When the client receives this message, it restarts the replay. This time the client adds a custom header (X-) to HTTP requests with client’s ID, so the server can match HTTP connections with different IP addresses to the corresponding clients. The server then ignores this extra header for the remainder of the replay. I have also observed that some mobile providers exhibit such behavior for other TCP and UDP ports, meaning I cannot rely on an X- header to convey side-channel information. In these cases I can identify users based on ISP’s IP subnet, which means I can only support one active client per replay server and ISP, regardless of the application they are replaying. Content-modifying proxies and transcoders. Recent reports and previous work highlight cases of devices in ISPs that modify customer traffic for reasons such as tracking [24], caching [23], security [25], and reducing bandwidth [26]. I also saw several cases of content-modifying proxies. For example, I saw Sprint modifying HTTP headers, Verizon inserting global tracking cookies, and Boost transcoding YouTube videos. In my replay methodology, I assumed that packets exchanged between the client and server would not be not modified in flight, and trivially detect modification because packet payloads do not match recorded traces. In the case of header manipulation, I use an edit-distance based approach to match modified content to the recorded content it corresponds to. While my system can tolerate moderate amounts of modification (e.g., HTTP header manipulation/substitution), if the content is modified drastically, e.g., transcoding an image to reduce its size, it is difficult to detect shaping because the data from my control and exposed trials do not match. In my current system, I simply notify the user that there is content modification but do not attempt to identify shaping. I leave a more detailed analysis of this behavior to future work. Caching. My replay system assumes content is served from the replay server and not an ISP’s cache. I detect the latter case by identifying cases where the client receives data that was not sent by the server. In the few cases where I observed this behavior, the ISPs were caching small static objects (e.g., thumbnail images) which were not the dominant portion of the replay traffic (e.g., when streaming audio) and had negligible effect on my statistical analysis.5 Going forward, I expect this behavior to be less prominent due to the increased use of encrypted protocols, e.g., HTTP/2 and QUIC. In such cases, an ISP may detect and shape application traffic using SNI for classification, but they cannot modify or cache content. 5 Boost was one exception, which I discuss later. 11 ISP Verizon T-Mobile AT&T Sprint Boost BlackWireless H2O SimpleMobile NET10 YT m f m/p m 60% 37%* 36% p NF m f m/p m 45%* p SF m f m/p m 65%* p SK - VB - HO - Table 3: Shaping detection results per ISP in my dataset, for six popular apps: YouTube (YT), Netflix (NF), Spotify (SF), Skype (SK), Viber (VB), and Google Hangout (HO). When shaping occurs, the table shows the difference between average throughput (%) I detected. A dash (-) indicates no differentiation, (f) means IP addresses changed for each connection, (p) means a “translucent” proxy changed connection behavior from the original app behavior, and (m) indicates that a middlebox modified content in flight between client and server. *For the H2O network, replays with random payload have better performance than VPN and exposed replays, indicating a policy that favors non-video HTTP over VPN and streaming video. 2.6 Differentiation results In this section, I present results from running my Differentiation Detector Android app on popular mobile networks. This dataset consists of 4,786 replay tests, covering traces from six popular apps. I collected test results from most major cellular providers and MVNOs in the US; further, I gathered measurements from four international networks. My app supports both VPN traffic and random payloads (but not random ports) as control traffic. I use the latter only if a device does not support VPN connectivity or the cellular provider blocks or differentiates against VPN traffic. After collecting data, I run my analysis to detect differentiation; the results for US carriers are presented in Table 3. In other networks such as Idea (India), JazzTel (Spain), Three (UK), and T-Mobile (Netherlands), my results based on a subset of the traces (the traces that users selected) indicated no differentiation. My key finding is that my approach successfully detect differentiation in three mobile networks (BlackWireless, H2O, and SimpleMobile), with the impact of shaping resulting in up to 65% difference in average throughput. These shaping policies all apply to YouTube (not surprising given its impact on networks), but not always to Netflix and Spotify. I did not identify differentiation for UDP traffic in any of the carriers. Interestingly, H2O consistently gives better performance to port 80 traffic with random payloads, indicating a policy that gives relatively worse performance to VPN traffic and streaming audio/video. I tested these networks again in August 2015 and did not observe such differentiation. I speculate that these networks ceased their shaping practices in part due to new FCC rules barring this behavior, effective in June 2015. I contacted these ISPs I tested for comment on the behavior I observed in their networks. At the time of publication, only one ISP had responded (Sprint) and did not wish to comment. While attempting to detect differentiation, I identified other interesting behavior that prevented us from identifying shaping. For example, Boost transcodes YouTube video to a fraction of the original quality, and then caches the result for future requests. Other networks use proxies to change TCP behavior. I found such devices to be pervasive in mobile networks. 12 (a) Netflix (b) YouTube Figure 5: Average throughput against RTT for Netflix (participating in BingeOn) and YouTube (not participating in BingeOn) 3 T-Mobile BingeOn The first proposed component on my thesis is to apply my differentiation detecting technique to conduct a case study of a recent instance of controversial differentiation: T-Mobile’s new service. On November 15th 2015, T-Mobile USA announced a new service, Binge On, which allows customers with qualifying plans to stream video from a selected number of content providers for free [27]. TMobile claims that Binge On works by detecting video streaming and optimizing the video quality for smartphone screens which will provide DVD-quality experience (typically 480p or better) [27]. The service has been advertised as a win for customers, as video streaming is the number one consuming activity users do on their phone that can quickly burn through their monthly data bucket. On the other hand, many Network Neutrality advocates are arguing that T-Mobile is not clear with how exactly Binge On is detecting and optimizing users’ video traffic, and its practice could be violating FCC’s newly imposed Open Internet order. T-Mobile has also been criticized for enabling Binge On by default for all users. Tmobile gives the users the option to opt-out, but many argue that Binge On should be disabled by default and users should decide it they want to opt-in. Electronic Frontier Foundation (EFF), recently published a post [28] where they download video streams and 13 non-video files using T-Mobile’s network with Binge On enabled and show that T-Mobile’s Binge On does not optimize videos, but simply throttles download rates to 1.5Mbps, and relies on the content provides’ adaptive bit rate to fallback to a lower quality. EFF also reports that T-Mobile throttles video streams from non-participating providers (e.g. YouTube) as well, meaning users are stuck with 1.5Mbps and lower quality, while they still have to pay for the data. My differentiation detector system, which I described in Section 2, gives us the unique ability to replay any application’s traffic, including video streaming, through T-Mobile’s network and investigate Binge On in greater detail. The advantage of my approach compared to others like EFF is that I control the server side as well and can gather much more information about the connection. It also allows us to make arbitrary modifications the traffic from both ends of the communication to do things like reverse engineering T-Mobile’s detection schemes. This system works for any application and there is little overhead in preparing replays. Through my preliminary tests using my differentiation detector system on T-Mobile’s network, I can confirm EFF finding that T-Mobile throttle’s all video streaming, regardless if they are participating in Binge On or not. According to my results the exact rate limit imposed by Binge On is 1.6Mbps. I have also found that unlike what T-Mobile claims [27], Binge On’s zero-rating does not apply to all tethered traffic. Only data with USB tethering can be zero-rated, and data through device WiFi hotspot will be counted against users’ quota. I used my differentiation detector app to replay traffic belonging to a video streaming application which is part of Binge On (Netflix) and a non-participating application (YouTube). The app can replay traffic in three modes: 1)in the clear (NoVPN), which allows T-Mobile to detect the application that is being replaying, 2)through a VPN tunnel (VPN), and 3)with randomized payload (Random). The VPN and Random modes evade traffic classification by T-mobile (or result in wrong classification) and serve as baselines. Figure 5 shows average throughput against average round trip time for Netflix and YouTube and include data points for Binge On enabled/disabled, and un-tethered/tethered (WiFi hotspot) traffic. While un-tethered Binge On traffic is capped at 1.6Mbps for NoVPN replays for both YouTube and Netflix, Random and VPN traffic are able to evade T-Mobile’s detection and achieve higher rates. I also found that when traffic is throttled, number of lost packets is significantly high, suggesting that T-Mobile’s throttling is simple policing where packets are not queued (or queues are very small) and dropped when traffic goes above the target rate. For this project I am proposing to investigate the following: • Fully characterize Binge On’s behavior using my differentiation detector system. I will be testing at least three video services participating in Binge On, and three non-participating apps. The goal is to investigate if T-mobile is doing pure shaping/policing, or any other mechanisms or "optimizations" are in place. For this I will be looking at many traffic traces of replays through T-mobile’s network with and without Binge On, and compare statistics such as Throughput, RTT, retransmission rate, etc., to determine the network policies. • Understand the implementation and accuracy of T-Mobile’s data-consumption accounting under Binge On and how non-participating applications’ traffic are treated and counted. According to my initial findings, I expect to see Tmobile counting the traffic belonging to non-participating services against users’ quota, while still throttling them, which is a clear violation of FCC Open Internet order. • Investigate and quantify the effect of Binge On’s policies on users’s experience. To achieve this 14 goal, I will implement a tool to stream video against existing app servers and log streaming statistics on the quality of experience, including time for video to start playing, buffering time, video quality, amount of video loaded in a certain time, and number of rebufffers. • Reverse engineering T-Mobile’s detection mechanism and exploring ways to exploit the classifier to detect non-participating traffic as zero-rated traffic. Reverse engineering the classifier requires exhaustive tests were I make incremental modifications to traffic and observe the effect. While I do not have access to Tmobile’s classifiers, I can infer their classification result by observing how the traffic is treated. For example, looking at the average throughput, if it is at Binge On’s limit, i.e., 1.6Mbps and the data usage doesn’t go up, we can infer that the traffic has been classified as video streaming. Once I reverse engineer the classification, I will build a tool to masquerade any traffic as Binge On participating traffic, hence have arbitrary traffic zero-rated. There are a number ways to achieve this. One way is to deploy two simple proxies: one local on user’s machine to modify the traffic to exploit Tmobile’s classification, and one outside Tmobile’s network to reverse the changes applied by the first proxy before forwarding the traffic to the final destination. 4 QUIC The second proposed component of my thesis is to investigate Google’s new experimental transport protocol. In 2013, Google introduced a new UDP based experimental transport protocol called QUIC, "aiming to solves a number of transport and application layer problems experienced by modern web applications, while requiring little or no change from application writers" [29]. Google has been developing the protocol ever since and working towards its standardization (Internet drafts recently published [30, 31]). QUIC is now available on all Google services and Google Chrome browser, and more that 50% of requests from Chrome browser to Google servers are served with QUIC, and it’s continuing to ramp up QUIC traffic [32]. So far QUIC has been only used for web traffic and can be thought of as SPDY (or HTTP/2) on top of UDP. As it is illustrated if Figure 6, QUIC does not follow the traditional network layering and sits between the existing transport and application layers. This is mostly because QUIC is implemented on top of UDP. Unlike TCP, UDP does not have the guarantees such as reliability and acknowledgements so QUIC needs to implement these guarantees, hence spanning over multiple layers. Google, whose servers and products are the sole clients of QUIC at scale, has reported that their data shows positive results so far and QUIC has been delivering real performance improvements over TCP. Google claims a 3% improvement in mean page load time with QUIC on Google Search, shaving a full second off the Google Search page load time for the slowest 1% of connections, and 30% fewer YouTube rebuffers when watching videos over QUIC [32]. Google attributes these performance boosts to QUIC’s lower-latency connection establishment (0-RTT), reduced head-of-line-blocking, improved congestion control, and better loss recovery. Google points out many reasons for why it is developing QUIC and lists it’s benefits as: 0-RTT connection establishment, reduced head-of-line blocking, Forward Error Correcting (FEC), reduced bandwidth consumption and packet count, privacy assurances comparable to TLS, better congestion control, better loss recovery. However, the two main reasons behind Google’s push for QUIC seem 15 Figure 6: QUIC spanning over multiple layers (photo taken from [33]). to be the following: 1. TCP is part of the operating system stack which makes experimenting and changing it challenging. While modifications to TCP are plausible, it won’t be widely deployed until the changes to kernel were in place, which can take decades to be propagated. On the other hand, QUIC is a self-contained protocol which allows modification and innovation without TCP’s limitations [34]. 2. QUIC is fully encrypted, including the protocol headers, which makes meddling of midldleboxes almost impossible. This is of particular interest to contest providers such as Google who frequently suffer from performance degeneration due to middlebox intervening or misconfiguration. As a result Google tries to encrypt as much of the payload and control structure as feasible [34]. While Google’s reports are very promising and indicate QUIC’s great benefits and potentials, it is still an experimental protocol and needs to go through extensive benchmarking before turning into a standard. However, it’s been only a couple of years since QUIC’s inception, and there aren’t many studies on QUIC’s performance. This work aims to fill this gap and provide a comprehensive study of the protocol, with more focus on performance in mobile networks. Below are the main questions I will be focusing on throughout this study: • Benchmarking QUIC under different network conditions. I will be using network emulators and tools including Linux’s Traffic Control [35], Network Emulation [36], and Dummynet [37] to understand QUIC’s performance across a wide range of bandwidth limitations, delays, packet losses and reordering, and queues. • Head-to-head comparison against TCP to shed light on when QUIC outperforms TCP. For this I will be running back-to-back tests where a web page is downloaded using QUIC and then TCP using different devices and in different network conditions and we investigate metrics including page load time (PLT), and CPU and memory usage. For every scenario, multiple back-to-back experiments are run to account for transient network variations such as network congestion. My initial results show that QUIC’s performance depends on network 16 conditions and depending on the circumstances, it could perform better or worse compared to HTTPS. Figure 7 shows results for head-to-head comparison of PLTs for SPDY over QUIC vs. TCP for different bandwidth, loss rates, and object sizes. As the figure suggests, when loss is low, QUIC performs worse than HTTPS, especially for larger objects and high bandwidth. Upon investigating QUIC’s source code, I found that this is because QUIC has hard coded caps on parameters such as congestion control, due to limitations such as linux UDP buffer sizes. This results in QUIC not achieving more than 20Mbps and underperforming when available bandwidth is higher. However as loss increases, QUIC seems to handle loss better and outperforms TCP. • Understanding effects of QUIC’s implementation in user space. QUIC is implemented in the user space, hence resource contention could be a concern, especially on mobile devices. I will deploy QUIC on mobile devices to understand the performance in this environment. This includes using Cronet, the networking stack of Chromium put into a library for use on mobile [38], to fetch content using QUIC, and also using Selenium to run automated tests using a real browser on a mobile device to evaluate the end-to-end performance observed by end user. • Instrumenting QUIC to study the effect of it’s various transport optimizations. QUIC integrates a few optimizations: Proportional Rate Reduction (PRR) in the congestion control mechanisms, Tail Loss Probe and and packet pacing to deal with loss, and classic TCP algorithms such as Hybrid Slow Start and Forward-RTO. However it is not clear if these optimizations work as expected, or even if they conflict with each other and cause harm. To investigate these optimizations, I will compile QUIC client and servers from source and instrument them so we can get access to inner workings of the protocol. For example my initial results show that QUIC may suffer greatly when packer re-ordering happens. This is because re-ordered packets cause high false positive in QUIC’s loss detection mechanism which results in getting stuck in PRR state. • Investigating if middlebox solutions can help boost QUIC’s end-to-end performance for end users. Many ISPs are against QUIC’s full encryption which renders their middleboxes useless. They argue if QUIC makes part of the control traffic visible, ISPs can take measures to improve the performance for their customers. Google claims that if it is presented with evidence that ISPs can actually help QUIC, it will consider to change the protocol accordingly. However it is not clear if ISPs are correct, and if so under what circumstances and what information do they exactly need to help QUIC. To answer these questions, I will implement a QUIC proxy and experiment how a node in the middle with visibility into the protocol can help. • Crowd source QUIC measurements. In order to collect data on QUIC’s performance and how it compares to TCP from many networks and locations, I will integrate some of the tests mentioned before into Mobilyzer [39], which provides a scalable, efficient and controllable open platform for network measurement from mobile devices. 17 Figure 7: SPDY over QUIC vs. TCP in presence of loss. 5 5.1 Other work Iolaus: Securing Online Content Rating Systems [1] Many content providers such as YouTube and Yelp generally provided suggestions by leveraging the aggregated ratings of their users. However, popularity and economic incentives of these sites are leading to increasing levels of malicious activity, including multiple identity (Sybil) attacks and the "buying" of ratings from users. In this chapter I present Iolaus [1], a system that leverages the underlying social network of online content rating systems to defend against such attacks. Iolaus uses two novel techniques: (a) weighing ratings to defend against multiple identity attacks and (b) relative ratings to mitigate the effect of "bought" ratings. An evaluation of Iolaus using microbenchmarks, synthetic data, and real-world content rating data demonstrates that Iolaus is able to outperform existing approaches and serve as a practical defense against multiple-identity and rating-buying attacks. 5.2 Designing and Testing a Market for Browsing Data Browsing privacy solutions are faced with an uphill battle to deployment. Many operate counter to the economic objectives of popular online services (e.g., by completely block- ing ads) and do not provide enough incentive for users who may be subject to performance degradation for deploying them. In this study, I take a step towards realizing a system for online privacy that is mutually beneficial to users and online advertisers: an information market. This system not only maintains economic viability for online services, but provides users with financial compensation to encourage them to participate. I prototype and evaluate an in- formation market that provides privacy and revenue to users while preserving and sometimes improving their Web performance. I evaluate feasibility of the market via a one month field study with 63 users and find that users are in- deed willing to sell their browsing information. I also use Web traces of millions of users to drive a simulation study to evaluate the system at scale. I find that the system can indeed be profitable to both users and online advertisers. 18 6 Plan for completion of the research Table 4 presents my plan for completion of the research. Timeline Work Progress WWW 2013 (published) IMC 2015 (published) PETS 2016 (submitted) Iolaus: Securing Online Content Rating Systems Identifying Traffic Differentiation in Mobile Networks Designing and Testing a Market for Browsing Data Completed Completed Completed Spring 2016 T-Mobile BingeOn Performance characterization (1 month) QoE measurements (2 weeks) Reverse Engineering (1 month) Subverting BingeOn (2 weeks) Ongoing Summer and Fall 2016 QUIC Benchmarking and comparison against TCP (1 month) Instrumentation (2 month) QUIC proxy (2 month) Collect crowd sourced measurements (2 weeks) Ongoing Fall 2016 Thesis writing Table 4: Plan for completion of my research 19 References [1] A. M. Kakhki, C. Kliman-Silver, and A. Mislove, “Iolaus: Securing online content rating systems,” in Proceedings of the Twenty-Second International World Wide Web Conference (WWW’13), Rio de Janeiro, Brazil, May 2013. [2] (2015, March) Ngmn 5g white paper. [Online]. Available: https: //www.ngmn.org/fileadmin/ngmn/content/downloads/Technical/2015/NGMN_5G_White_Paper_V1_0.pdf [3] “Cisco visual networking index: Global mobile data traffic forecast update, 2015–2020 white paper.” [4] “Sandvine - intelligent broadband networks,” http://www.sandvine.com. [5] S. Higginbotham, “The Netflix–Comcast agreement isn’t a network neutrality violation, but it is a problem,” http://gigaom.com/2014/02/23/ the-netflix-comcast-agreement-isnt-a-network-neutrality-violation-but-it-is-a-problem/, February 2014. [6] Measurement Lab Consortium, “Isp interconnection and its impact on consumer internet performance,” http://www.measurementlab.net/blog/2014_interconnection_report, October 2014. [7] M. Luckie, A. Dhamdhere, D. Clark, B. Huffaker, and k. claffy, “Challenges in inferring internet interdomain congestion,” in Proc. of IMC, 2014. [8] FCC, “Protecting and promoting the open internet,” https://www.federalregister.gov/articles/2015/04/13/2015-07841/protecting-and-promoting-the-open-internet, April 2015. [9] M. Dischinger, M. Marcon, S. Guha, K. P. Gummadi, R. Mahajan, and S. Saroiu, “Glasnost: Enabling end users to detect traffic differentiation,” in Proc. of USENIX NSDI, 2010. [10] Y. Zhang, Z. M. Mao, and M. Zhang, “Detecting Traffic Differentiation in Backbone ISPs with NetPolice,” in Proc. of IMC, 2009. [11] M. B. Tariq, M. Motiwala, N. Feamster, and M. Ammar, “Detecting network neutrality violations with causal inference,” in CoNEXT, 2009. [12] Z. Zhang, O. Mara, and K. Argyraki, “Network neutrality inference,” in Proc. of ACM SIGCOMM, 2014. [13] F. Sarkar, “Prevention of bandwidth abuse of a communications system,” Jan. 9 2014, uS Patent App. 14/025,213. [Online]. Available: https://www.google.com/patents/US20140010082 [14] D. Rayburn, “Cogent now admits they slowed down netflix’s traffic, creating a fast lane & slow lane,” http:// blog.streamingmedia.com/2014/11/cogent-now-admits-slowed-netflixs-traffic-creating-fast-lane-slow-lane.html, November 2014. [15] J. Hui, K. Lau, A. Jain, A. Terzis, and J. Smith, “How YouTube performance is improved in T-Mobile network,” http://velocityconf.com/velocity2014/public/schedule/detail/35350. [16] D. Clark, “Network neutrality: Words of power and 800-pound gorillas,” International Journal of Communication, 2007. [17] B. Obama, “Net neutrality: President Obama’s plan for a free and open internet,” http://www.whitehouse.gov/net-neutrality#section-read-the-presidents-statement. [18] A. Rao, J. Sherry, A. Legout, W. Dabbout, A. Krishnamurthy, and D. Choffnes, “Meddle: Middleboxes for increased transparency and control of mobile traffic,” in Proc. of CoNEXT 2012 Student Workshop, 2012. [19] W. Cui, V. Paxson, N. Weaver, and R. H. Katz, “Protocol-independent adaptive replay of application dialog,” in Proc. of NDSS, 2006. [Online]. Available: http://research.microsoft.com/apps/pubs/default.aspx?id=153197 [20] A. Norberg, “uTorrent transport protocol,” http://www.bittorrent.org/beps/bep_0029.html. [21] “Experimenting with QUIC,” http://blog.chromium.org/2013/06/experimenting-with-quic.html. [22] G. Hasslinger and O. Hohlfeld, “The Gilbert-Elliott model for packet loss in real time services on the Internet,” 2008. 20 [23] X. Xu, Y. Jiang, T. Flach, E. Katz-Bassett, D. Choffnes, and R. Govindan, “Investigating transparent web proxies in cellular networks,” in Proc. PAM, 2015. [24] J. Mayer, “How verizon’s advertising header works,” http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/. [25] C. Reis, S. D. Gribble, T. Kohno, and N. C. Weaver, “Detecting In-Flight Page Changes with Web Tripwires,” in Proc. of USENIX NSDI, 2008. [26] N. Weaver, C. Kreibich, M. Dam, and V. Paxson, “Here Be Web Proxies,” in Proc. PAM, 2014. [27] “http://www.t-mobile.com/offer/binge-on-streaming-video.html.” [28] “EFF Confirms: T-Mobile’s Binge On Optimization is Just Throttling, Applies Indiscriminately to All Video,” https://www.eff.org/deeplinks/2016/01/eff-confirms-t-mobiles-bingeon-optimization-just-throttling-applies. [29] “QUIC at 10,000 feet,” https://docs.google.com/document/d/1gY9-YNDNAB1eip-RTPbqphgySwSNSDHLq9D5Bty4FSU. [30] “QUIC: A UDP-Based Secure and Reliable Transport for HTTP/2,” https://tools.ietf.org/html/draft-tsvwg-quic-protocol-02. [31] “QUIC Loss Recovery And Congestion Control,” https://tools.ietf.org/html/draft-tsvwg-quic-loss-recovery-01. [32] http://blog.chromium.org/2015/04/a-quic-update-on-googles-experimental.html. [33] “Ietf93 quic barbof: Protocol overview,” https://docs.google.com/presentation/d/ 15e1bLKYeN56GL1oTJSF9OZiUsI-rcxisLo9dEyDkWQs/edit#slide=id.g99041b54d_0_525. [34] “Quic: Design document and specification rationale,” https://docs.google.com/document/d/1RNHkx_VvKWyWg6Lr8SZ-saqsQx7rFV-ev2jRFUoVD34/edit. [35] “Linux traffic control,” http://linux.die.net/man/8/tc. [36] “Linux network emulation,” http://www.linuxfoundation.org/collaborate/workgroups/networking/netem. [37] “Dummynet,” http://info.iet.unipi.it/~luigi/dummynet/. [38] “Cronet,” https://chromium.googlesource.com/chromium/src/+/lkgr/components/cronet/. [39] A. Nikravesh, H. Yao, S. Xu, D. R. Choffnes, and Z. M. Mao, “Mobilyzer: An open platform for controllable mobile network measurements,” in Proc. of MobiSys, 2015. 21
© Copyright 2026 Paperzz