Doctoral Thesis Proposal Measuring and understanding

Doctoral Thesis Proposal
Measuring and understanding performance in the mobile
Internet
Arash Molavi Kakhki
College of Computer and Information Science
Northeastern University
[email protected]
March 5, 2016
i
Abstract
In the past decade, mobile Internet access has gone from limited capacity and reliability 2G,
to 4G technologies with high speed Internet access. Mobile phones have also evolved from simple
voice and text devices to smart phones and tablets with ample resources that have dramatically
improved user experience and popularity of smart devices. The combination of fast Internet
access and powerful mobile devices has contributed to a rapid shift from desktop computers to
smart devices for everyday online activities, hence an ever growing demand for fast and reliable
mobile connectivity. The increase of mobile Internet usage brings its own challenges. For example
ISPs are having a difficult time keeping up with the growing demand and have often relied more
on traffic management policies and resource allocation optimization rather than expanding their
capacity.
In order to make advancements in mobile networks, we need tools to give us more visibility into these networks to understand their performance and unique properties that differ
from fixed-line networks. In my thesis, I will provide tools and measurement techniques and
results to improve visibility into mobile network performance from two different perspectives:
a)understanding mobile network operator policies and b)understanding transport layer performance as seen from mobile clients.
First, I develop techniques and tools that shed lights onto mobile network operator policies
without requiring any special access to devices or ISPs. This is important because mobile
networks are not transparent about their traffic management policies, and the impact of these
policies on users’s experience is poorly understood. This is largely because mobile networks
and devices are locked down and opaque, and my work demonstrates that it is possible to
overcome this limitation. Next, I propose to apply my differentiation detecting technique to
conduct a case study of a recent instance of controversial differentiation: T-Mobile’s new service
BingeOn. Finally, I propose to investigate Google’s new experimental transport protocol, QUIC,
and perform an extensive analysis on how HTTP/S traffic over QUIC compares to TCP based
protocol in a mobile network. This work can help to improve QUIC to address unique properties
of the mobile environment.
ii
Contents
1 Introduction
1
2 Identifying Traffic Differentiation in Mobile
2.1 Intro and background . . . . . . . . . . . . .
2.2 Methodology . . . . . . . . . . . . . . . . .
2.3 Validation . . . . . . . . . . . . . . . . . . .
2.4 Evaluation . . . . . . . . . . . . . . . . . . .
2.5 Measurement study . . . . . . . . . . . . . .
2.5.1 Challenges in operational networks .
2.6 Differentiation results . . . . . . . . . . . . .
Networks
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
2
3
6
7
10
10
12
3 T-Mobile BingeOn
13
4 QUIC
15
5 Other work
18
5.1 Iolaus: Securing Online Content Rating Systems [1] . . . . . . . . . . . . . . . . . . . 18
5.2 Designing and Testing a Market for Browsing Data . . . . . . . . . . . . . . . . . . . 18
6 Plan for completion of the research
19
iii
1
Introduction
During the last decade, mobile Internet access has evolved from 2G mobile voice technology with very
limited capacity and reliability to 4G technologies which provide high speed Internet access, and 5G
Internet with promises of "much greater throughput, much lower latency, ultra-high reliability, much
higher connectivity density, and higher mobility range" [2] is on the horizon. Mobile phones have also
evolved from simple voice and text devices to smart phones and tablets with high capacity CPUs,
ample memories, powerful GPUs, and large and high resolution screens which have dramatically
improved user experience and popularity of smart devices. This combination of fast mobile Internet
access and powerful mobile devices, along with hundreds of thousands of applications designed specifically for smart devices, is rapidly changing the proportion of mobile to fixed-line Internet traffic.
More and more users are using their phones for online activities from simple browsing and banking,
to more resource hungry tasks such as gaming and media streaming, which were traditionally tied to
computers connected to fixed-line networks. CISCO has recently reported an estimated 74% global
mobile data traffic growth in 2015 and forecasts 30.6 exabytes per month of mobile data traffic by
2020, i.e., a five-fold increase in just less than 5 years [3].
While mobile and fixed-line networks are very similar and share many networking protocols and
infrastructure, this increase of mobile Internet usage brings its own challenges at various layers. To
name a few, mobile Internet service providers need to invest in updating their infrastructures to be
able to accommodate the rapid growing demand, many Internet protocols were designed decades ago
with no consideration of mobile networks and do not perform well in mobile environments and there
is need for their constant tuning, or even introduction and implementation of new protocols that
fit mobile networks better, and there is lack of transparency by mobile providers about their traffic
management policies, and limited resources for end users and policy maker to monitor ISPs’ practices.
The fact that mobile networks and devices are locked and opaque only adds to the complexity of
these challenges.
This thesis posits that an effective way to address these challenges is to design and develop tools
that can provide visibility into mobile network, without requiring any special access to user devices
or ISPs. Such tools will help to understand mobile network performance, crowd source network
measurments, improve current protocols to address unique properties of the mobile environment,
and provide transparency into mobile networks’ traffic management policies and their effect of user
experience.
In my thesis, I will provide tools and measurement techniques and results to improve visibility
into mobile network performance from two different perspectives: a)understanding mobile network
operator policies and b)understanding transport layer performance as seen from mobile clients.
First, I look into traffic differentiation and content modification in mobile networks by designing
and implementing a system which can detect such activities by ISPs. Traffic differentiation—giving
better (or worse) performance to certain classes of Internet traffic—is a well-known but poorly understood traffic management policy. This is largely because mobile networks and devices are locked
down and opaque. In this chapter of my thesis, I present the design, implementation, and evaluation
of the first system and mobile app for identifying traffic differentiation for arbitrary applications
in the mobile environment (i.e., wireless networks such as cellular and WiFi, used by smartphones
and tablets). The key idea is to use a VPN proxy to record and replay the network traffic generated by arbitrary applications, and compare it with the network behavior when replaying this traffic
outside of an encrypted tunnel. This work also performs the first known testbed experiments with
1
actual commercial shaping devices to validate my system design and demonstrate how it outperforms
previous work for detecting differentiation.
Second, I propose to apply my differentiation techniques to conduct a case study of a recent
instance of controversial differentiation: T-Mobile’s new service BingeOn, which promises zero-rating
for selected media streaming applications. I will investigate how T-Mobile treats different applications’ network traffic. I also propose to further extend this investigation and study the effect of
T-Mobile’s zero-rating policies on users’ quality of experience.
Third, I propose to investigate Google’s new experimental transport protocol, QUIC, and perform
an extensive analysis on how HTTP/S traffic over QUIC compares to TCP. This includes benchmarking QUIC’s end-to-end performance in both mobile and fixed-line networks under various network
conditions and comparing it head-to-head with HTTP/s over TCP, and looking into QUIC’s possible
shortcomings and proposing solutions to mitigate them. I will also investigate if intermediation, i.e.,
a QUIC proxy, can help the end to end performance.
Finally, the last chapter of my thesis will include two other projects that I worked on during my
PhD. The first project introduces Iolaus, a system I designed and implemented for securing online
content rating systems. It uses two novel techniques: (a) weighing ratings to defend against multiple
identity attacks and (b) relative ratings to mitigate the effect of "bought" ratings. Second project is
"Designing and Testing a Market for Browsing Data", where I prototype and evaluate an information
market that provides privacy and revenue to users while preserving and sometimes improving their
Web performance.
2
Identifying Traffic Differentiation in Mobile Networks
2.1
Intro and background
The rise in popularity of bandwidth-hungry applications (e.g., Netflix) coupled with resource-constrained
networks (e.g., mobile providers) has reignited discussions about how the different applications’ network traffic is treated (or mistreated) by ISPs. One commonly discussed approach to managing
scarce network resources is traffic differentiation—giving better (or worse) performance to certain
classes of network traffic—using devices that selectively act on network traffic (e.g., Sandvine [4],
Cisco, and others). Differentiation can enforce policies ranging from protecting the network from
bandwidth-hungry applications, to limiting services that compete with those offered by the network
provider (e.g., providers that sell voice/video services in addition to IP connectivity).
Recent events have raised the profile of traffic differentiation and network neutrality issues in
the wired setting, with Comcast and Netflix engaged in public disputes over congestion on network
interconnects [5–7].
The situation is more extreme in the mobile setting, where historically regulatory agencies imposed few restrictions on how a mobile provider manages its network, with subscribers and regulators
generally having a limited (or no) ability to understand and hold providers accountable for management polices.1
Despite the importance of this issue, the impacts of traffic differentiation on specific applications—
and the Internet as a whole—are poorly understood. Previous efforts attempt to detect traffic
1
As I discuss throughout this proposal, FCC rules that took effect in June 2015 [8] now prohibit many forms of
traffic differentiation in the US.
2
differentiation (e.g., [9–12]), but are limited to specific types of application traffic (e.g., peer-to-peer
traffic [9]), or focus on shaping behaviors on general classes of traffic as opposed to specific applications
[11, 12]. Addressing these limitations is challenging, due to the wide variety of applications (often
closed-source) that users may wish to test for differentiation, coupled with the closed nature of traffic
shapers. As a result, external observers struggle to detect if such shapers exist in networks, what
exactly these devices do, and what is their impact on traffic. What little we do know is concerning.
Glasnost [9] and others [10, 11, 13] found ISPs throttling specific types of Internet traffic, Cogent admitted to differentiating Netflix traffic [14], and recent work by Google and T-Mobile [15]
indicated that unintentional interactions between traffic shaping devices has been known to degrade
performance.
This proposal tackles three key challenges that have held back prior work on measuring traffic
differentiation: (1) the inability to test arbitrary classes of applications, (2) a lack of understanding of
how traffic shaping devices work in practice, and (3) the limited ability to measure mobile networks
(e.g., cellular and WiFi) from end-user devices such as smartphones and tablets. Designing methods
that can measure traffic differentiation in mobile networks (in addition to well-studied fixed-line
networks) presents unique challenges, as the approaches must work with highly variable underlying
network performance and within the constraints of mobile operating systems. By addressing these
challenges, we can provide tools that empower average users to identify differentiation in mobile
networks, and use data gathered from these tools to understand differentiation in practice.
There is an ongoing debate about whether the network should be neutral with respect to the
packets it carries [16]; i.e., not discriminate against or otherwise alter traffic except for lawful purposes
(e.g., blocking illegal content). Proponents such as President Obama argue for an open Internet [17] as
essential to its continued success, while some ISPs argue for shaping to manage network resources [9,
13].
The regulatory framework surrounding traffic differentiation is rapidly changing; the FCC recently
introduced new rules that prohibit many forms of differentiation [8]. In this environment, it is
important to monitor traffic differentiation in practice, both to inform regulators and enforce policies.
2.2
Methodology
I use a trace record-replay methodology to reliably detect traffic differentiation for arbitrary applications in the mobile environment. I provide an overview in Figure 1. I record a packet trace
from a target application (Fig. 1(a)), extract bidirectional application-layer payloads, and use this
to generate a transcript of messages for the client and server to replay (Fig. 1(b)). To test for differentiation, the client and server coordinate to replay these transcripts, both in plaintext and through
an encrypted channel (using a VPN tunnel) where payloads are hidden from any shapers on the path
(Fig. 1(c)). Finally, I use validated statistical tests to identify whether the application traffic being
tested was subject to differentiation (Fig. 1(d)).
a) Recording the trace I need a way to enable end users to capture traces of mobile application
traffic without requiring modifications to the OS. To address this challenge, I leverage the fact that
all major mobile operating systems natively support VPN connections, and use the Meddle VPN [18]
to facilitate trace recording. Figure 1(a) illustrates how a client connects to the Meddle VPN, which
relays traffic between the client and destination server(s). In addition, the Meddle VPN collects
packet traces that I use to generate replay transcripts. When possible, I record traffic using the same
3
(a) Record
(b) Parse
(c) Replay
(d) Analyze
Figure 1: System overview: (a) Client connects to VPN, that records traffic between the client and app
server(s). (b) Parser produces transcripts for replaying. (c) Client and replay server use transcripts to replay
the trace, once in plaintext and once through a VPN tunnel. Replay server records packet traces for each
replay. (d) Analyzer uses the traces to detect differentiation.
network being tested for differentiation because applications may behave differently based on the
network type, e.g., a streaming video app may select a higher bitrate on WiFi.
My methodology is extensible to arbitrary applications—even closed source and proprietary
ones—to ensure it works even as the landscape of popular applications, the network protocols they
use, and their impact on mobile network resources changes over time. Prior work focused on specific
applications (e.g., BitTorrent [9]) and manually emulated application flows. This approach, however,
does not scale to large numbers of applications and may even be infeasible when the target application
uses proprietary, or otherwise closed, protocols.2
b) Creating the replay script After recording a packet trace, my system processes it to extract
the application-layer payloads as client-to-server and server-to-client byte streams. I then create
a replay transcript that captures the behavior of the application’s traffic, including port numbers,
dependencies between application-layer messages (if they exist) and any timing properties of the
application (e.g., fixed-rate multimedia).3 Finally, I use each endpoint’s TCP or UDP stack to replay
the traffic as specified in the transcript. Figure 1(b) shows how I create two objects with the necessary
information for the client and server to replay the traffic from the recorded trace.
Logical dependencies: For TCP traffic, I preserve application-layer dependencies using the implicit
happens-before relationship exposed by application-layer communication in TCP. More specifically, I
extract two unidirectional byte streams sAB and sBA for each pair of communicating hosts A and B
in the recorded trace. For each sequence of bytes in sAB , I identify the bytes in sBA that preceded
them in the trace. When replaying, host A sends bytes in sAB to host B only after it has received the
preceding bytes in sBA from host B. I enforce analogous constraints from B to A. For UDP traffic,
I do not enforce logical dependencies because the transport layer does explicitly not impose them.
Timing dependencies: A second challenge is ensuring the replayed traffic maintains the intermessage timing features of the initial trace. For example, streaming video apps commonly download
2
The Glasnost paper describes an (unevaluated) tool to support replaying of arbitrary applications but I was
unsuccessful in using it, even with the help of a Glasnost coauthor.
3
This approach is similar to Cui et al.’s [19]; however, my approach perfectly preserves application payloads (to
ensure traffic is classified by shapers) and timing (to prevent false positives when detecting shaping).
4
and buffer video content in chunks instead of buffering the entire video, typically to reduce dataconsumption for viewer abandonment and to save energy by allowing the radio to sleep between
bursts. Other content servers use packet pacing to minimize packet loss and thus improve flow
completion times.
My system preserves the timings between packets for both TCP and UDP traffic to capture such
behavior when replaying (this feature can be disabled when timing is not intrinsic to the application).
Preserving timing is a key feature of my approach, that can have a significant impact on detecting
differentiation. In short, it prevents us from detecting shaping that does not impact applicationperceived performance.
Specifically, for each stream of bytes (UDP or TCP) sent by a host, A, I annotate each byte with
time offset from the first byte in the stream in the recorded trace. If the ith byte of A was sent at t
ms from the start of the trace, I ensure that byte i is not sent before t ms have elapsed in the replay.
For TCP, this constraint is enforced after enforcing the happens-before relationship.
In the case of UDP, I do not retransmit lost packets. This closely matches the behavior of realtime applications such as VoIP and live-streaming video (which often tolerate packet losses), but does
not work well for more complicated protocols such as BitTorrent’s µTP [20] or Google’s QUIC [21].
I will investigate how to incorporate this behavior in future work.
c) Replaying the trace After steps 1 and 2, I replay the recorded traffic on a target network to
test my null hypothesis that there is no differentiation in the network. This test requires two samples
of network performance: one that exposes the replay to differentiation (if any), and a control that is
not exposed to differentiation.
A key challenge is how to conduct the control trial. Initially, I followed prior work [9] and
randomized the port numbers and payload, while maintaining all other features of the replayed
traffic. However, using my testbed containing commercial traffic shapers, I found that some shapers
will by default label “random” traffic with high port numbers as peer-to-peer traffic, a common
target of differentiation. Thus, this approach is unreliable to generate control trials because one can
reasonably expect it to be shaped.
Instead, I re-use the Meddle VPN tunnel to conduct control trials. By sending all recorded traffic
over an encrypted IPSec tunnel, I preserve all application behavior while simultaneously preventing
any DPI-based shaper from differentiating based on payload. Thus, each replay test consists of
replaying the trace twice, once in plaintext (exposed trial), and once over the VPN (control trial),
depicted in Figure 1(c). To detect differentiation in noisy environments, I run multiple back-to-back
tests (both control and exposed). Note that I compare exposed and control trials with each other
and not the original recorded trace.
d) Detecting differentiation
After running the exposed and control trials, I compare performance metrics (throughput, loss, and delay) to detect differentiation (Fig. 1(d)). I focus on
these metrics because traffic shaping policies typically involve bandwidth rate-limiting (reflected in
throughput differences), and may have impacts on delay and loss depending on the queuing/shaping
discipline. I base my analysis on server-side packet captures to compute throughput and RTT values
for TCP, because I cannot rely on collecting network-level traces on mobile clients. For UDP applications, I use jitter as a delay metric and measure it at the client at the application layer (observing
inter-packet timings).
A key challenge is how to automatically detect when differences between two traces are caused by
differentiation, instead of signal strength, congestion, or other confounding factors. In Section 2.4, I
5
Cumulative transfer
(Mbits)
Cumulative transfer
(Mbits)
35
30
25
20
15
10
5
0
Original
Replay with timing
Replay no timing
0
1
2
3
4
5
6
Time (s)
8
7
6
5
4
3
2
1
0
Original
Replay with timing
Replay no timing
0
10
20
30
40
50
60
Time (s)
(a) YouTube
(b) Skype
Figure 2: Plots showing bytes transferred over time, with and without preserving packet timings for TCP
(YouTube) and UDP (Skype) applications. By preserving inter-packet timings, my replay closely resembles
the original traffic generated by each app.
find that previous techniques to identify differentiation are inaccurate when tested against commercial
packet shaping devices with varying packet loss in my testbed. I describe a novel area test approach
to compare traces that has perfect accuracy with no loss, and greater than 70% accuracy under high
loss (Fig. 4(b)).
2.3
Validation
I validate my methodology using my replay system and a testbed comprised of two commercial
shaping devices. First, I verify that my approach captures salient features of the recorded traffic.
Next, I validate that my replays trigger differentiation and identify the relevant features used for
classification.
Record/Replay similarity
My replay script replicates the original trace, including payloads,
port numbers, and inter-packet timing. I now show that the replay traffic characteristics are essentially identical to the recorded traffic. Figure 2 shows the results for a TCP application (YouTube
(a)) and a UDP application (Skype (b)). As discussed above, preserving inter-packet timings is important to produce a replay that closely resembles the original trace (otherwise, I may claim that
differentiation exists when the application would never experience it).
Traffic shapers detect replayed traffic
Our group acquired traffic shaping products from two
different vendors and integrated them into a testbed for validating whether replays trigger differentiation. The testbed consists of a client connected to a router that sits behind a traffic shaper, which
exchanges traffic with a gateway server that I control. The gateway presents the illusion (to the
packet shaper) that it routes traffic to and from the public Internet. I configure the replay server to
listen on arbitrary IP addresses on the gateway, giving us the ability to preserve original server IP
addresses in replays (by spoofing them inside my testbed). I describe below some of my key findings
from this testbed. I verified that for a variety of popular mobile applications, including YouTube,
Netflix, Skype, Spotify, and Google Hangouts, replay traffic is classified as the application that was
recorded.
6
Replay transfer rate
Region 3
(above max)
Region 2
(between avg & max)
Region 1
(below avg)
Avg
Max
Shaping rate
Figure 3: Regions considered for determining detection accuracy. If the shaping rate is less than the
application’s average throughput (left), I should always detect differentiation. Likewise, if the shaping rate
is greater than peak throughput (right), I should never detect differentiation. In the middle region, it is
possible to detect differentiation, but the performance impact depends on the application.
2.4
Evaluation
I present the first study that uses ground-truth information to compare the effectiveness of various
techniques for detecting differentiation, and propose a new test to address limitations of prior work. I
find that previous approaches for detecting differentiation are inaccurate when tested against groundtruth data, and characterize under what circumstances these approaches yield correct information
about traffic shaping as perceived by applications.
When determining the accuracy of detection, I separately consider three scenarios regarding the
shaping rate and a given trace (shown in Figure 3):
• Region 1. If the shaping rate is less than the average throughput of the recorded traffic, a traffic
differentiation detector should always identify differentiation because the shaper is guaranteed
to affect the time it takes to transfer data in the trace.
• Region 3. If the shaping rate is greater than the peak throughput of the recorded trace, a
differentiation detector should never identify differentiation because the shaper will not impact
the application’s throughput (as discussed earlier, I focus on shaping that will actually impact
the application).
• Region 2. Finally, if the shaping rate is between the average and peak throughput of the recorded
trace, the application may achieve the same average throughput but a lower peak throughput.
In this case, it is possible to detect differentiation, but the impact of this differentiation will
depend on the application. For example, a non-interactive download (e.g., file transfers for app
updates) may be sensitive only to changes in average throughput (but not peak transfer rates),
while a real-time app such as video-conferencing may experience lower QoE for lower peak rates.
Given these cases, there is no single definition of accuracy that covers all applications in region
2. Instead, I show that I can configure detection techniques to consistently detect differentiation
or consistently not detect differentiation in this region, depending on the application.
I explore approaches used in Glasnost [9] and NetPolice [10], and propose a new technique that
has high accuracy in regions 1 and 3, and reliably does not detect differentiation in region 2.
1-Glasnost: Maximum throughput test.
Glasnost [9] does not preserve the inter-packet
timing when replaying traffic, and identifies differentiation if the maximum throughput for control
7
and exposed flows differ by more than a threshold. I expect this will always detect differentiation in
region 1, but might generate false positives in regions 2 and 3 (because Glasnost may send traffic at
a higher rate than the recorded trace).
2-NetPolice: Two-sample KS Test.
As discussed in NetPolice [10], looking at a single
summary statistic (e.g., maximum) to detect differentiation can be misleading. The issue is that
two distributions of a metric (e.g., throughput) may have the same mean, median, or maximum,
but vastly different distributions. A differentiation detection approach should be robust to this. To
address this, NetPolice uses the two-sample Kolmogorov–Smirnov (KS) test, which compares two
empirical distribution functions (i.e., empirical CDFs) using the maximum distance between two
empirical CDF sample sets for a confidence interval α. To validate the KS Test result, NetPolice uses
a resampling method as follows: randomly select half of the samples from each of the two original
input distributions and apply the KS Test on the two sample subsets, and repeat this r times. If the
results of more than β% of the r tests agree with the original test, they conclude that the original
KS Test statistic is valid.
3-My approach: Area Test.
The KS Test only considers the difference between distributions
along the y-axis even if the difference in the x-axis (in this case, throughput) is small. In my
experiments, I found this makes the test very sensitive to small changes in performance not due to
differentiation, which can lead to inaccurate results or labeling valid tests as invalid. To address
this, I propose an Area Test that accounts for the degree of differentiation detected: I find the area,
a, between the two CDF curves and normalize it by the minimum of peak throughputs for each
distribution, w. This test concludes there is differentiation if the KS Test detects differentiation and
the normalized area between the curves is greater than a threshold t.
I next evaluate the three differentiation tests in out lab testbed, which consists of a commercial
traffic shaper, a replay client, and a replay server. I vary the shaping rate using the commercial
device to shape each application to throughput values in regions 1-3 and emulate noisy packet loss
in my tests using the Linux Traffic Control (tc) and Network Emulation (netem) tools to add bursty
packet loss according to Gilbert-Elliott (GE) model [22]. The criteria under which I evaluate the
three differentiation tests are as follows:
• Overall accuracy: Using the taxonomy in Figure 3, a statistical test should always detect differentiation in region 1, and never detect differentiation in region 3. I define accuracy as the
fraction of samples for which the test correctly identifies the presence or absence of differentiation in these regions.
• Resilience to noise: Differences between two empirical CDFs could be explained by a variety
of reasons, including random noise [11]. I need a test that retains high accuracy in the face
of large latencies and packet loss that occur in the mobile environment. When evaluating and
calibrating my statistical tests, I take this into account by simulating packet loss.
• Data consumption: To account for network variations over time, I run multiple iterations of
control and exposed trials back to back. I merge all control trials into a single distribution, and
similarly merge all exposed trials. As I show below, this can improve accuracy compared to
running a single trial, but at the cost of longer/more expensive tests. When picking a statistical
test, I want the one that yields the highest accuracy with the smallest data consumption.
Below are detailed results of my evaluation:
8
App
Netflix
YouTube
Hangout
Skype
KS Test
R1
R3
100
100
100
100
100
100
100
100
Area
R1
100
100
100
100
Test
R3
100
100
100
100
Glasnost
R1
R3
100
0
100
0
100
0
90
0
Table 1: Shaping detection accuracy for different apps, in regions 1 and 3 (denoted by R1 and R3). I find
that the KS Test (NetPolice) and Area Test have similarly high accuracy in both regions (1 and 3), but
Glasnost performs poorly in region 3.
App
Netflix
YouTube
Hangout
Skype
Region 2
Area Test
28
0
0
10
KS Test
65
67
40
55
Glasnost
100
100
100
92
Table 2: Percent of tests identified as differentiation in region 2 for the three detection methods. Glasnost consistently identifies region 2 as differentiation, whereas the Area Test is more likely to not detect
differentiation.
Summary results for low noise.
I first consider the accuracy of detecting differentiation under
low loss scenarios. Table 1 presents the accuracy results for four popular apps in regions 1 and 3.
I find that the KS Test and Area Test have similarly high accuracy in both regions 1 and 3, but
Glasnost performs poorly in region 3 because it will detect differentiation, even if the shaping rate is
above the maximum. I explore region 2 behavior later in this section; until then I focus on accuracy
results for regions 1 and 3 and the two KS Tests (as they do not identify differentiation in region 3).
Impact of noise.
Properties of the network unrelated to shaping (e.g., congested-induced packet
loss) may impact performance and cause false positives for differentiation detectors. In Figure 4 I
examine the impact of loss on accuracy for one TCP application (Netflix) and one UDP application
(Skype). I find that both variants of the KS Test have high accuracy in the face of congestion, but
the Area Test is more resilient to moderate and high loss for Skype.
Impact of number of replays.
One way to account for noise is to use additional replays to
average out any variations from transient noise in a single replay. To evaluate the impact of additional
replays, I varied the number of replays combined to detect differentiation under high loss and plotted
the detection accuracy in Fig. 4(b).4 In the case of moderate or high loss, increasing the number of
replays improves accuracy, particularly for a short YouTube trace. In all cases, the Area Test is more
accurate than (or equal to) the KS Test. Importantly, the accuracy does not significantly improve
past 2 or 3 replays. In my deployment, I use 2 tests (to minimize data consumption).
Region 2 detection results.
Recall that my goal is to identify shaping that impinges on
the network resources required by an application. However, in region 2, shaping may or may not
impact the application’s performance, which makes identifying shaping with actual application impact
challenging. I report detection results for the three tests in region 2 in Table 2. I find that Glasnost
consistently identifies differentiation, the Area Test consistently does not detect differentiation, and
4
Additional replays did not help when there is low loss.
9
Netflix - Area
Netflix - KS2
Skype - Area
Skype - KS2
youtube-KS2
youtube-Area
Accuracy
Accuracy
1
0.8
0.6
0.4
0.2
1
0.9
0.8
0.7
0.6
1
0
Low loss
Mod loss
High loss
hangout-KS2
hangout-Area
2
3
4
#replays combined
(b) Combining replays on accuracy in high loss.
(a) Accuracy against loss.
Figure 4: Area test performance in loss.
the KS Test is inconsistent. When testing an application that is sensitive to shaping of its peak
throughput, the Glasnost method may be preferable, whereas the Area Test is preferred when testing
applications that can tolerate shaping above their average throughput.
When detecting differentiation, I conservatively avoid the potential for false positives and thus
do not report differentiation for region 2. Thus, I use the Area Test in my implementation because
it will more reliably achieve this result (Table 2). Determining which apps are affected by region 2
differentiation is a topic of future work.
2.5
Measurement study
I then use my system to identify cases of differentiation using measurement study of production
networks. For each network, I focus on detection for a set of popular apps that I expect might
be subject to differentiation: streaming audio and video (high bandwidth) and voice/video calling
(competes with carrier’s line of business). The data from these experiments was initially collected
between January and May 2015. I then repeated the experiments in August 2015, after the new FCC
rules prohibiting differentiation took effect.
2.5.1
Challenges in operational networks
In this section I discuss several challenges that I encountered when attempting to identify differentiation in mobile networks. To the best of my knowledge, I are the first to identify these issues for
detecting differentiation and discuss workarounds that address them.
Per-client management.
To replay traffic for multiple simultaneous users, my replay server
needs to be able to map each received packet to exactly one app replay. A simple solution is to
put this information in the first few bytes of each flow between client and server. Unfortunately, as
shown earlier, this can disrupt classification and lead to false negatives. Instead, I use side-channel
connections out of band from the replay to supply information regarding each replay run.
NAT behavior.
I also use the side-channel to identify which clients will connect from which IPs
and ports. For networks without NAT devices, this works well. However, many mobile networks use
NAT, meaning I can’t use the side-channel to identify the IP and port that a replay connection will
use (because they may be modified by the NAT). For such networks, each replay server can reliably
support only one active client per ISP and application. While this has not been an issue in my initial
app release, I are investigating other work-arounds. Moreover, to scale the system, I can use multiple
replay servers and a load balancer. The load balancer can be configured to assign clients with the
10
same IP address to different servers.
“Translucent” HTTP proxies.
It is well known that ISPs use transparent proxies on Web
traffic [23]. My system works as intended if a proxy simply relays exactly the same traffic sent by
a replay client. In practice, I found examples of “translucent” proxies that issue the same HTTP
requests as the client but change connection keep-alive behavior; e.g., the client uses a persistent
connection in the recorded trace and the proxy does not. To handle this behavior, my system must
be able to map HTTP requests from unexpected connections to the same replay.
Another challenge with HTTP proxies is that the proxy may have a different public IP address
from other non-HTTP traffic (including the side-channel), which means the server will not be able
to map the HTTP connections to the client based on IP address. In these cases the server replies
to the HTTP request with a special message. When the client receives this message, it restarts the
replay. This time the client adds a custom header (X-) to HTTP requests with client’s ID, so the
server can match HTTP connections with different IP addresses to the corresponding clients. The
server then ignores this extra header for the remainder of the replay. I have also observed that some
mobile providers exhibit such behavior for other TCP and UDP ports, meaning I cannot rely on an
X- header to convey side-channel information. In these cases I can identify users based on ISP’s IP
subnet, which means I can only support one active client per replay server and ISP, regardless of the
application they are replaying.
Content-modifying proxies and transcoders.
Recent reports and previous work highlight
cases of devices in ISPs that modify customer traffic for reasons such as tracking [24], caching [23],
security [25], and reducing bandwidth [26]. I also saw several cases of content-modifying proxies.
For example, I saw Sprint modifying HTTP headers, Verizon inserting global tracking cookies, and
Boost transcoding YouTube videos.
In my replay methodology, I assumed that packets exchanged between the client and server would
not be not modified in flight, and trivially detect modification because packet payloads do not match
recorded traces. In the case of header manipulation, I use an edit-distance based approach to match
modified content to the recorded content it corresponds to. While my system can tolerate moderate
amounts of modification (e.g., HTTP header manipulation/substitution), if the content is modified
drastically, e.g., transcoding an image to reduce its size, it is difficult to detect shaping because the
data from my control and exposed trials do not match. In my current system, I simply notify the user
that there is content modification but do not attempt to identify shaping. I leave a more detailed
analysis of this behavior to future work.
Caching.
My replay system assumes content is served from the replay server and not an ISP’s
cache. I detect the latter case by identifying cases where the client receives data that was not sent
by the server. In the few cases where I observed this behavior, the ISPs were caching small static
objects (e.g., thumbnail images) which were not the dominant portion of the replay traffic (e.g., when
streaming audio) and had negligible effect on my statistical analysis.5 Going forward, I expect this
behavior to be less prominent due to the increased use of encrypted protocols, e.g., HTTP/2 and
QUIC. In such cases, an ISP may detect and shape application traffic using SNI for classification,
but they cannot modify or cache content.
5
Boost was one exception, which I discuss later.
11
ISP
Verizon
T-Mobile
AT&T
Sprint
Boost
BlackWireless
H2O
SimpleMobile
NET10
YT
m
f
m/p
m
60%
37%*
36%
p
NF
m
f
m/p
m
45%*
p
SF
m
f
m/p
m
65%*
p
SK
-
VB
-
HO
-
Table 3: Shaping detection results per ISP in my dataset, for six popular apps: YouTube (YT), Netflix (NF),
Spotify (SF), Skype (SK), Viber (VB), and Google Hangout (HO). When shaping occurs, the table shows
the difference between average throughput (%) I detected. A dash (-) indicates no differentiation, (f) means
IP addresses changed for each connection, (p) means a “translucent” proxy changed connection behavior from
the original app behavior, and (m) indicates that a middlebox modified content in flight between client and
server. *For the H2O network, replays with random payload have better performance than VPN and exposed
replays, indicating a policy that favors non-video HTTP over VPN and streaming video.
2.6
Differentiation results
In this section, I present results from running my Differentiation Detector Android app on popular
mobile networks. This dataset consists of 4,786 replay tests, covering traces from six popular apps. I
collected test results from most major cellular providers and MVNOs in the US; further, I gathered
measurements from four international networks.
My app supports both VPN traffic and random payloads (but not random ports) as control traffic.
I use the latter only if a device does not support VPN connectivity or the cellular provider blocks or
differentiates against VPN traffic. After collecting data, I run my analysis to detect differentiation;
the results for US carriers are presented in Table 3. In other networks such as Idea (India), JazzTel
(Spain), Three (UK), and T-Mobile (Netherlands), my results based on a subset of the traces (the
traces that users selected) indicated no differentiation.
My key finding is that my approach successfully detect differentiation in three mobile networks
(BlackWireless, H2O, and SimpleMobile), with the impact of shaping resulting in up to 65% difference
in average throughput. These shaping policies all apply to YouTube (not surprising given its impact
on networks), but not always to Netflix and Spotify. I did not identify differentiation for UDP traffic
in any of the carriers. Interestingly, H2O consistently gives better performance to port 80 traffic
with random payloads, indicating a policy that gives relatively worse performance to VPN traffic
and streaming audio/video. I tested these networks again in August 2015 and did not observe such
differentiation. I speculate that these networks ceased their shaping practices in part due to new
FCC rules barring this behavior, effective in June 2015.
I contacted these ISPs I tested for comment on the behavior I observed in their networks. At the
time of publication, only one ISP had responded (Sprint) and did not wish to comment.
While attempting to detect differentiation, I identified other interesting behavior that prevented
us from identifying shaping. For example, Boost transcodes YouTube video to a fraction of the
original quality, and then caches the result for future requests. Other networks use proxies to change
TCP behavior. I found such devices to be pervasive in mobile networks.
12
(a) Netflix
(b) YouTube
Figure 5: Average throughput against RTT for Netflix (participating in BingeOn) and YouTube (not
participating in BingeOn)
3
T-Mobile BingeOn
The first proposed component on my thesis is to apply my differentiation detecting technique to
conduct a case study of a recent instance of controversial differentiation: T-Mobile’s new service. On
November 15th 2015, T-Mobile USA announced a new service, Binge On, which allows customers
with qualifying plans to stream video from a selected number of content providers for free [27]. TMobile claims that Binge On works by detecting video streaming and optimizing the video quality for
smartphone screens which will provide DVD-quality experience (typically 480p or better) [27]. The
service has been advertised as a win for customers, as video streaming is the number one consuming
activity users do on their phone that can quickly burn through their monthly data bucket. On the
other hand, many Network Neutrality advocates are arguing that T-Mobile is not clear with how
exactly Binge On is detecting and optimizing users’ video traffic, and its practice could be violating
FCC’s newly imposed Open Internet order. T-Mobile has also been criticized for enabling Binge
On by default for all users. Tmobile gives the users the option to opt-out, but many argue that
Binge On should be disabled by default and users should decide it they want to opt-in. Electronic
Frontier Foundation (EFF), recently published a post [28] where they download video streams and
13
non-video files using T-Mobile’s network with Binge On enabled and show that T-Mobile’s Binge On
does not optimize videos, but simply throttles download rates to 1.5Mbps, and relies on the content
provides’ adaptive bit rate to fallback to a lower quality. EFF also reports that T-Mobile throttles
video streams from non-participating providers (e.g. YouTube) as well, meaning users are stuck with
1.5Mbps and lower quality, while they still have to pay for the data.
My differentiation detector system, which I described in Section 2, gives us the unique ability to
replay any application’s traffic, including video streaming, through T-Mobile’s network and investigate Binge On in greater detail. The advantage of my approach compared to others like EFF is that
I control the server side as well and can gather much more information about the connection. It
also allows us to make arbitrary modifications the traffic from both ends of the communication to do
things like reverse engineering T-Mobile’s detection schemes. This system works for any application
and there is little overhead in preparing replays.
Through my preliminary tests using my differentiation detector system on T-Mobile’s network,
I can confirm EFF finding that T-Mobile throttle’s all video streaming, regardless if they are participating in Binge On or not. According to my results the exact rate limit imposed by Binge On
is 1.6Mbps. I have also found that unlike what T-Mobile claims [27], Binge On’s zero-rating does
not apply to all tethered traffic. Only data with USB tethering can be zero-rated, and data through
device WiFi hotspot will be counted against users’ quota.
I used my differentiation detector app to replay traffic belonging to a video streaming application
which is part of Binge On (Netflix) and a non-participating application (YouTube). The app can
replay traffic in three modes: 1)in the clear (NoVPN), which allows T-Mobile to detect the application
that is being replaying, 2)through a VPN tunnel (VPN), and 3)with randomized payload (Random).
The VPN and Random modes evade traffic classification by T-mobile (or result in wrong classification)
and serve as baselines. Figure 5 shows average throughput against average round trip time for Netflix
and YouTube and include data points for Binge On enabled/disabled, and un-tethered/tethered (WiFi
hotspot) traffic. While un-tethered Binge On traffic is capped at 1.6Mbps for NoVPN replays for both
YouTube and Netflix, Random and VPN traffic are able to evade T-Mobile’s detection and achieve
higher rates. I also found that when traffic is throttled, number of lost packets is significantly high,
suggesting that T-Mobile’s throttling is simple policing where packets are not queued (or queues are
very small) and dropped when traffic goes above the target rate.
For this project I am proposing to investigate the following:
• Fully characterize Binge On’s behavior using my differentiation detector system. I will be testing
at least three video services participating in Binge On, and three non-participating apps. The
goal is to investigate if T-mobile is doing pure shaping/policing, or any other mechanisms or
"optimizations" are in place. For this I will be looking at many traffic traces of replays through
T-mobile’s network with and without Binge On, and compare statistics such as Throughput,
RTT, retransmission rate, etc., to determine the network policies.
• Understand the implementation and accuracy of T-Mobile’s data-consumption accounting under Binge On and how non-participating applications’ traffic are treated and counted. According
to my initial findings, I expect to see Tmobile counting the traffic belonging to non-participating
services against users’ quota, while still throttling them, which is a clear violation of FCC Open
Internet order.
• Investigate and quantify the effect of Binge On’s policies on users’s experience. To achieve this
14
goal, I will implement a tool to stream video against existing app servers and log streaming
statistics on the quality of experience, including time for video to start playing, buffering time,
video quality, amount of video loaded in a certain time, and number of rebufffers.
• Reverse engineering T-Mobile’s detection mechanism and exploring ways to exploit the classifier to detect non-participating traffic as zero-rated traffic. Reverse engineering the classifier
requires exhaustive tests were I make incremental modifications to traffic and observe the effect. While I do not have access to Tmobile’s classifiers, I can infer their classification result by
observing how the traffic is treated. For example, looking at the average throughput, if it is at
Binge On’s limit, i.e., 1.6Mbps and the data usage doesn’t go up, we can infer that the traffic
has been classified as video streaming. Once I reverse engineer the classification, I will build
a tool to masquerade any traffic as Binge On participating traffic, hence have arbitrary traffic
zero-rated. There are a number ways to achieve this. One way is to deploy two simple proxies:
one local on user’s machine to modify the traffic to exploit Tmobile’s classification, and one
outside Tmobile’s network to reverse the changes applied by the first proxy before forwarding
the traffic to the final destination.
4
QUIC
The second proposed component of my thesis is to investigate Google’s new experimental transport
protocol. In 2013, Google introduced a new UDP based experimental transport protocol called QUIC,
"aiming to solves a number of transport and application layer problems experienced by modern web
applications, while requiring little or no change from application writers" [29]. Google has been
developing the protocol ever since and working towards its standardization (Internet drafts recently
published [30, 31]). QUIC is now available on all Google services and Google Chrome browser, and
more that 50% of requests from Chrome browser to Google servers are served with QUIC, and it’s
continuing to ramp up QUIC traffic [32].
So far QUIC has been only used for web traffic and can be thought of as SPDY (or HTTP/2) on
top of UDP. As it is illustrated if Figure 6, QUIC does not follow the traditional network layering
and sits between the existing transport and application layers. This is mostly because QUIC is
implemented on top of UDP. Unlike TCP, UDP does not have the guarantees such as reliability
and acknowledgements so QUIC needs to implement these guarantees, hence spanning over multiple
layers.
Google, whose servers and products are the sole clients of QUIC at scale, has reported that their
data shows positive results so far and QUIC has been delivering real performance improvements
over TCP. Google claims a 3% improvement in mean page load time with QUIC on Google Search,
shaving a full second off the Google Search page load time for the slowest 1% of connections, and 30%
fewer YouTube rebuffers when watching videos over QUIC [32]. Google attributes these performance
boosts to QUIC’s lower-latency connection establishment (0-RTT), reduced head-of-line-blocking,
improved congestion control, and better loss recovery.
Google points out many reasons for why it is developing QUIC and lists it’s benefits as: 0-RTT
connection establishment, reduced head-of-line blocking, Forward Error Correcting (FEC), reduced
bandwidth consumption and packet count, privacy assurances comparable to TLS, better congestion
control, better loss recovery. However, the two main reasons behind Google’s push for QUIC seem
15
Figure 6: QUIC spanning over multiple layers (photo taken from [33]).
to be the following:
1. TCP is part of the operating system stack which makes experimenting and changing it challenging. While modifications to TCP are plausible, it won’t be widely deployed until the
changes to kernel were in place, which can take decades to be propagated. On the other hand,
QUIC is a self-contained protocol which allows modification and innovation without TCP’s
limitations [34].
2. QUIC is fully encrypted, including the protocol headers, which makes meddling of midldleboxes almost impossible. This is of particular interest to contest providers such as Google who
frequently suffer from performance degeneration due to middlebox intervening or misconfiguration. As a result Google tries to encrypt as much of the payload and control structure as
feasible [34].
While Google’s reports are very promising and indicate QUIC’s great benefits and potentials, it
is still an experimental protocol and needs to go through extensive benchmarking before turning into
a standard. However, it’s been only a couple of years since QUIC’s inception, and there aren’t many
studies on QUIC’s performance. This work aims to fill this gap and provide a comprehensive study
of the protocol, with more focus on performance in mobile networks. Below are the main questions
I will be focusing on throughout this study:
• Benchmarking QUIC under different network conditions.
I will be using network
emulators and tools including Linux’s Traffic Control [35], Network Emulation [36], and Dummynet [37] to understand QUIC’s performance across a wide range of bandwidth limitations,
delays, packet losses and reordering, and queues.
• Head-to-head comparison against TCP to shed light on when QUIC outperforms
TCP.
For this I will be running back-to-back tests where a web page is downloaded using
QUIC and then TCP using different devices and in different network conditions and we investigate metrics including page load time (PLT), and CPU and memory usage. For every scenario,
multiple back-to-back experiments are run to account for transient network variations such as
network congestion. My initial results show that QUIC’s performance depends on network
16
conditions and depending on the circumstances, it could perform better or worse compared to
HTTPS. Figure 7 shows results for head-to-head comparison of PLTs for SPDY over QUIC vs.
TCP for different bandwidth, loss rates, and object sizes. As the figure suggests, when loss
is low, QUIC performs worse than HTTPS, especially for larger objects and high bandwidth.
Upon investigating QUIC’s source code, I found that this is because QUIC has hard coded
caps on parameters such as congestion control, due to limitations such as linux UDP buffer
sizes. This results in QUIC not achieving more than 20Mbps and underperforming when available bandwidth is higher. However as loss increases, QUIC seems to handle loss better and
outperforms TCP.
• Understanding effects of QUIC’s implementation in user space.
QUIC is implemented in the user space, hence resource contention could be a concern, especially on mobile
devices. I will deploy QUIC on mobile devices to understand the performance in this environment. This includes using Cronet, the networking stack of Chromium put into a library for use
on mobile [38], to fetch content using QUIC, and also using Selenium to run automated tests
using a real browser on a mobile device to evaluate the end-to-end performance observed by
end user.
• Instrumenting QUIC to study the effect of it’s various transport optimizations.
QUIC integrates a few optimizations: Proportional Rate Reduction (PRR) in the congestion
control mechanisms, Tail Loss Probe and and packet pacing to deal with loss, and classic
TCP algorithms such as Hybrid Slow Start and Forward-RTO. However it is not clear if these
optimizations work as expected, or even if they conflict with each other and cause harm.
To investigate these optimizations, I will compile QUIC client and servers from source and
instrument them so we can get access to inner workings of the protocol. For example my initial
results show that QUIC may suffer greatly when packer re-ordering happens. This is because
re-ordered packets cause high false positive in QUIC’s loss detection mechanism which results
in getting stuck in PRR state.
• Investigating if middlebox solutions can help boost QUIC’s end-to-end performance
for end users.
Many ISPs are against QUIC’s full encryption which renders their middleboxes useless. They argue if QUIC makes part of the control traffic visible, ISPs can take
measures to improve the performance for their customers. Google claims that if it is presented
with evidence that ISPs can actually help QUIC, it will consider to change the protocol accordingly. However it is not clear if ISPs are correct, and if so under what circumstances and what
information do they exactly need to help QUIC. To answer these questions, I will implement
a QUIC proxy and experiment how a node in the middle with visibility into the protocol can
help.
• Crowd source QUIC measurements.
In order to collect data on QUIC’s performance
and how it compares to TCP from many networks and locations, I will integrate some of the
tests mentioned before into Mobilyzer [39], which provides a scalable, efficient and controllable
open platform for network measurement from mobile devices.
17
Figure 7: SPDY over QUIC vs. TCP in presence of loss.
5
5.1
Other work
Iolaus: Securing Online Content Rating Systems [1]
Many content providers such as YouTube and Yelp generally provided suggestions by leveraging
the aggregated ratings of their users. However, popularity and economic incentives of these sites
are leading to increasing levels of malicious activity, including multiple identity (Sybil) attacks and
the "buying" of ratings from users. In this chapter I present Iolaus [1], a system that leverages the
underlying social network of online content rating systems to defend against such attacks. Iolaus uses
two novel techniques: (a) weighing ratings to defend against multiple identity attacks and (b) relative
ratings to mitigate the effect of "bought" ratings. An evaluation of Iolaus using microbenchmarks,
synthetic data, and real-world content rating data demonstrates that Iolaus is able to outperform
existing approaches and serve as a practical defense against multiple-identity and rating-buying
attacks.
5.2
Designing and Testing a Market for Browsing Data
Browsing privacy solutions are faced with an uphill battle to deployment. Many operate counter to the
economic objectives of popular online services (e.g., by completely block- ing ads) and do not provide
enough incentive for users who may be subject to performance degradation for deploying them. In
this study, I take a step towards realizing a system for online privacy that is mutually beneficial
to users and online advertisers: an information market. This system not only maintains economic
viability for online services, but provides users with financial compensation to encourage them to
participate. I prototype and evaluate an in- formation market that provides privacy and revenue
to users while preserving and sometimes improving their Web performance. I evaluate feasibility of
the market via a one month field study with 63 users and find that users are in- deed willing to sell
their browsing information. I also use Web traces of millions of users to drive a simulation study to
evaluate the system at scale. I find that the system can indeed be profitable to both users and online
advertisers.
18
6
Plan for completion of the research
Table 4 presents my plan for completion of the research.
Timeline
Work
Progress
WWW 2013 (published)
IMC 2015 (published)
PETS 2016 (submitted)
Iolaus: Securing Online Content Rating Systems
Identifying Traffic Differentiation in Mobile Networks
Designing and Testing a Market for Browsing Data
Completed
Completed
Completed
Spring 2016
T-Mobile BingeOn
Performance characterization (1 month)
QoE measurements (2 weeks)
Reverse Engineering (1 month)
Subverting BingeOn (2 weeks)
Ongoing
Summer and Fall 2016
QUIC
Benchmarking and comparison against TCP (1 month)
Instrumentation (2 month)
QUIC proxy (2 month)
Collect crowd sourced measurements (2 weeks)
Ongoing
Fall 2016
Thesis writing
Table 4: Plan for completion of my research
19
References
[1] A. M. Kakhki, C. Kliman-Silver, and A. Mislove, “Iolaus: Securing online content rating systems,” in Proceedings
of the Twenty-Second International World Wide Web Conference (WWW’13), Rio de Janeiro, Brazil, May 2013.
[2] (2015, March) Ngmn 5g white paper. [Online]. Available: https:
//www.ngmn.org/fileadmin/ngmn/content/downloads/Technical/2015/NGMN_5G_White_Paper_V1_0.pdf
[3] “Cisco visual networking index: Global mobile data traffic forecast update, 2015–2020 white paper.”
[4] “Sandvine - intelligent broadband networks,” http://www.sandvine.com.
[5] S. Higginbotham, “The Netflix–Comcast agreement isn’t a network neutrality violation, but it is a problem,”
http://gigaom.com/2014/02/23/
the-netflix-comcast-agreement-isnt-a-network-neutrality-violation-but-it-is-a-problem/, February 2014.
[6] Measurement Lab Consortium, “Isp interconnection and its impact on consumer internet performance,”
http://www.measurementlab.net/blog/2014_interconnection_report, October 2014.
[7] M. Luckie, A. Dhamdhere, D. Clark, B. Huffaker, and k. claffy, “Challenges in inferring internet interdomain
congestion,” in Proc. of IMC, 2014.
[8] FCC, “Protecting and promoting the open internet,”
https://www.federalregister.gov/articles/2015/04/13/2015-07841/protecting-and-promoting-the-open-internet,
April 2015.
[9] M. Dischinger, M. Marcon, S. Guha, K. P. Gummadi, R. Mahajan, and S. Saroiu, “Glasnost: Enabling end users
to detect traffic differentiation,” in Proc. of USENIX NSDI, 2010.
[10] Y. Zhang, Z. M. Mao, and M. Zhang, “Detecting Traffic Differentiation in Backbone ISPs with NetPolice,” in
Proc. of IMC, 2009.
[11] M. B. Tariq, M. Motiwala, N. Feamster, and M. Ammar, “Detecting network neutrality violations with causal
inference,” in CoNEXT, 2009.
[12] Z. Zhang, O. Mara, and K. Argyraki, “Network neutrality inference,” in Proc. of ACM SIGCOMM, 2014.
[13] F. Sarkar, “Prevention of bandwidth abuse of a communications system,” Jan. 9 2014, uS Patent App.
14/025,213. [Online]. Available: https://www.google.com/patents/US20140010082
[14] D. Rayburn, “Cogent now admits they slowed down netflix’s traffic, creating a fast lane & slow lane,” http://
blog.streamingmedia.com/2014/11/cogent-now-admits-slowed-netflixs-traffic-creating-fast-lane-slow-lane.html,
November 2014.
[15] J. Hui, K. Lau, A. Jain, A. Terzis, and J. Smith, “How YouTube performance is improved in T-Mobile network,”
http://velocityconf.com/velocity2014/public/schedule/detail/35350.
[16] D. Clark, “Network neutrality: Words of power and 800-pound gorillas,” International Journal of
Communication, 2007.
[17] B. Obama, “Net neutrality: President Obama’s plan for a free and open internet,”
http://www.whitehouse.gov/net-neutrality#section-read-the-presidents-statement.
[18] A. Rao, J. Sherry, A. Legout, W. Dabbout, A. Krishnamurthy, and D. Choffnes, “Meddle: Middleboxes for
increased transparency and control of mobile traffic,” in Proc. of CoNEXT 2012 Student Workshop, 2012.
[19] W. Cui, V. Paxson, N. Weaver, and R. H. Katz, “Protocol-independent adaptive replay of application dialog,” in
Proc. of NDSS, 2006. [Online]. Available: http://research.microsoft.com/apps/pubs/default.aspx?id=153197
[20] A. Norberg, “uTorrent transport protocol,” http://www.bittorrent.org/beps/bep_0029.html.
[21] “Experimenting with QUIC,” http://blog.chromium.org/2013/06/experimenting-with-quic.html.
[22] G. Hasslinger and O. Hohlfeld, “The Gilbert-Elliott model for packet loss in real time services on the Internet,”
2008.
20
[23] X. Xu, Y. Jiang, T. Flach, E. Katz-Bassett, D. Choffnes, and R. Govindan, “Investigating transparent web
proxies in cellular networks,” in Proc. PAM, 2015.
[24] J. Mayer, “How verizon’s advertising header works,”
http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/.
[25] C. Reis, S. D. Gribble, T. Kohno, and N. C. Weaver, “Detecting In-Flight Page Changes with Web Tripwires,”
in Proc. of USENIX NSDI, 2008.
[26] N. Weaver, C. Kreibich, M. Dam, and V. Paxson, “Here Be Web Proxies,” in Proc. PAM, 2014.
[27] “http://www.t-mobile.com/offer/binge-on-streaming-video.html.”
[28] “EFF Confirms: T-Mobile’s Binge On Optimization is Just Throttling, Applies Indiscriminately to All Video,”
https://www.eff.org/deeplinks/2016/01/eff-confirms-t-mobiles-bingeon-optimization-just-throttling-applies.
[29] “QUIC at 10,000 feet,”
https://docs.google.com/document/d/1gY9-YNDNAB1eip-RTPbqphgySwSNSDHLq9D5Bty4FSU.
[30] “QUIC: A UDP-Based Secure and Reliable Transport for HTTP/2,”
https://tools.ietf.org/html/draft-tsvwg-quic-protocol-02.
[31] “QUIC Loss Recovery And Congestion Control,” https://tools.ietf.org/html/draft-tsvwg-quic-loss-recovery-01.
[32] http://blog.chromium.org/2015/04/a-quic-update-on-googles-experimental.html.
[33] “Ietf93 quic barbof: Protocol overview,” https://docs.google.com/presentation/d/
15e1bLKYeN56GL1oTJSF9OZiUsI-rcxisLo9dEyDkWQs/edit#slide=id.g99041b54d_0_525.
[34] “Quic: Design document and specification rationale,”
https://docs.google.com/document/d/1RNHkx_VvKWyWg6Lr8SZ-saqsQx7rFV-ev2jRFUoVD34/edit.
[35] “Linux traffic control,” http://linux.die.net/man/8/tc.
[36] “Linux network emulation,” http://www.linuxfoundation.org/collaborate/workgroups/networking/netem.
[37] “Dummynet,” http://info.iet.unipi.it/~luigi/dummynet/.
[38] “Cronet,” https://chromium.googlesource.com/chromium/src/+/lkgr/components/cronet/.
[39] A. Nikravesh, H. Yao, S. Xu, D. R. Choffnes, and Z. M. Mao, “Mobilyzer: An open platform for controllable
mobile network measurements,” in Proc. of MobiSys, 2015.
21