1503228.pdf

Non Cooperative Path Characterization using Packet
Spacing Techniques
Supriyo Chakraborty, Bikramjit Walia and D. Manjunath
Department of Electrical Engineering, UT-Bombay, Mumbai lNDIA
s u p r i y o , bikram, [email protected]
Abstract- Non cooperative network path measurements assume that there is no cooperation from the other end of a
path. Such methods have to cleverly exploit standard protocol
options to initiate probing traffic. Performing measurements in
this framework for the ‘reverse path’ is particularly challenging.
We describe the design of iPathmetefl, non cooperative tool to
measure capacity and avdable bandwidth on the reverse path.
The probing traffic of iPathmeter2 consists of a chirp of packetpairs with appropriate spacing. The spacing between the ACKs
from the measuring host and the size of the advertised receive
window help shape the transmitting pattern from the remote host
to the measuring node. Thus, iPathmeter2 adapts the existing
cooperative packet spacing techniques for reverse path, noncooperative measurements. A utilization based estimator for the
available bandwidth is also described. iPathmeter2 is validated
by performing measurements under controlled conditions. Results from live tests on the Internet are also reported.
a tool for the public Internet in Section IV. These design issues
are used in iPathmeter2 and Section V describes its basic
design. Some experimental results are presented in Section VI,
11. COOPERATIVE BANDWiDTH ESTIMATION
Many estimators are available to estimate the bandwidth
metrics [ll-[ll]. Almost all of these estimators work in a cooperative framework which requires access to both ends of the
path being measured, These estimators work as follows: sender
transmits probing packets according to a specified pattern and
the receiver node timestamps these packets and obtains the
deviation from the pattern. These deviations are used in the
bandwidth estimation. There are four basic schemes of packet
transmission patterns that are used by the above estimators(1) packet pair dispersion, ( 2 ) variable packet size probing, (3)
1, ZNTRODUCTlON
self induced congestion and (4)train of packet pairs.
In packet pair dispersion (PPD) based bandwidth estimators,
End-to-end bandwidth estimation on a network path is useful in a network measurement and monitoring tool. Example two packets (a packet-pair) are transmitted back-to-back to
applications of path’bandwidth estimates include peer-to-peer cause them to queue together at the bottleneck link. If the links
applications, service level monitors and dynamic server selec- were rate-based servers then, when the packets arrive at the
tion [I]. Two bandwidth related metrics are usually defined for destination, the dispersion will be the same as the dispersion
a network path-bottleneck capacity and available bandwidfli. when they exit the bottleneck link. Thus if L is the length of
In this paper we describe a non-cooperative technique to obtain the probing packet and the dispersion observed is D , the path
A modification of
these performance measures for a path from only one end capacity C can be estimated by C =
of the path. An obvious application of such a single ended the basic packet pair technique is packet train probing which
measurement method would be in monitoring Internet service sends multiple back to back packets. The dispersion of the
quality by an end user, e.g., bandwidth received by the node packet train is shown be asymptotically equal to the available
from a popular web server, where the user typically does not capacity even in the presence of cross traffic. Pathrate [4],IGI
have access to the far-end of any Internet path. Further, we [8], Cprobe [ 11 and Spruce [ 111 use this technique.
In variable packet size (VPS) probing. the capacity of each
remark that for most users the service quality on a few paths
in the Internet will dominate the Internet service quality that hop along a path is measured by exploiting the fact that the
it sees. Hence by measuring the path characteristics on these number of links that a packet traverses on a path can be limited
by the TTL field in the IP packet header. On reception of a
paths, the user can quantify its Internet service quality.
An important requirement of a path bandwidth estimation packet with an expired TTL.a router responds with an ICMP
technique is that it be able to measure forward and reverse error message back to the sender node. By varying the TTL
path characteristics from the measuring node. Most techniques field within a packet, the minimum RTT (Round Trip lime)
available i n the literature can be easily extended to obtain for each hop along the path is obtained as a function of the
the non-cooperative forward path characteristics while reverse packet size, The capacity of each link on the path is obtained
path measurements require clever techniques. We describe from a the estimate of the link capacity on the preceding link
one such technique in this paper. Since we adapt existing and a plot of the minimum RTT (for packets returned from
techniques for reverse path, non-cooperative measurement. a the receiver side of the 1ink)against the packet size. Pathchar
brief survey of the cooperative measurement techniques is [2], Clink [3] and Pchar [lo] use this technique.
In the third technique to estimate the bottleneck capacity and
given in the next section. We discuss generic issues in the
design of non cooperative estimators in Section I11 and the available bandwidth, a binary search kind of approach is used.
design issues that need to be addressed when developing such The goal here is to build up a queue of the probing packets at
h.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 3, 2008 at 03:39 from IEEE Xplore. Restrictions apply.
the bottleneck link, thereby causing a ‘self induced congestion’
[SIC) at the link, and infer its bandwidth from the ‘queueing
signature’. By varying the packet rate in the packet-train at
the source, the available bandwidth can be estimated as the
rate €or which the queue length begins to increase. The interpacket spacing corresponds to the transmission rare. Pathload
[9] and Pathchirp [6] are examples of this approach.
A fourth approach uses a train of packet-pairs (TOPP) 151
in which the spacing between the packets in the train is
progressively decreased. When the spacing becomes less than
the service time of the bottleneck link, the second probing
packet is queued at the bottleneck link and the spacing between
the packets at the output of the link starts to increase. Thus
the packet spacings in the received train can be used as an
estimator of the available bandwidth.
Observe from the above discussion that, except for Pathchar
and Clink, the other bandwidth estimators are designed for a
cooperative framework. Although Pathchar and Clink do not
assume a cooperative framework, they depend on the prompt
generation of ICMP messages, which, in the days of DOS
attacks is not a reasonable assumption. Further, Pathchar and
Clink can only measure forward path characteristics.
REMOTE
HOST
LOCAL
HOST
s3
54
8
Fig. 1. S1. S2 and S3 are the exponentially decreasing spacings between
the ACKs sent from the measuring node. When this spacing is to be 54.there
is no packet to ACK at the local host. The ACK that should have been sent
according to the algorithm but could not be sent is shown as a dotted line.
Thus. long chirps cannot tx initiatsd from the far-end of the path.
Lmal
host
Remote
host
Local
host
Remote
host
111. N O N COOPERATIVE BANDWIDTH
ESTIMATION
A non-cooperative tool has to cleverly exploit standard
protocol options to initiate probing traffic in the network
according to a pattern specified by the measurement algorithm.
e.g., like that used by Pathload or Pathchirp. Note though
that this pattern has io be initiated in the direction in which
the measurements are to be carried out, i.e., for reverse path
characterization, thus the required pattern should be initiated
at the far end. We exploit the features of TCP and the fact that
ai the other end of a path, there will likely be open public TCP
ports, e.g., HTTP and FTP,in the design of a non-cooperative,
reverse path bandwidth estimator. The TCP feature that we
exploit is as follows. Recall that the TCP sending window is
affected by the acknowledgments (ACKs) from the receiver
and by the advertised ‘receiver window’ (rwin).The rate at
which ACKs are sent by receiver (spacing between them) and
the size of w i n is used to shape the incoming traffic as per
the requirement of the measuring algorithm.
Fig. 2. On the left side is the buffering phpse. On the right is the probing
phase. The ACKs sent i n the probing phase correspond to the packets captured
during the buffenng phase. Here S1, 52.S3 and 54 are the exponentially
chstributed spacing between the packets sent from the local host to the remote
host. As the spacing ai the local host decreases (the rate of probing increases),
the packets from the remote host wMlld stan getting queued.
IV. DESIGN
ISSUES
It is possible to emulate any of the cooperative algorithms
in the non cooperative framework. For illustration, consider
the possible use of packet chirps as in Pathchirp where the
receiver could send exponentially spaced ACKs with Twin
equal to MTU sized packets. Then, one would expect that
the remote host would send data packets with interpacket
spacing similar to the ACK spacing and hence emulate a
chirp from h e far-end. In our experiments, we could initiate a
chirp (exponentially spaced packets) from the far-end on the
local network but we could not achieve it over the WAN. The
problem here is that because of the WAN path delays, the
receiver may not be able send a sufficient number of ACKs
with the required spacing. This is because a sufficient number
0-7803-R924-7/05/$20.00
(~>zaO5IEEE.
of packets are not yet received to allow the sending of ACKs
at a high rate. S e e Fig. 1 for an illustration. To overcome this,
the following two phase approach was attempted.
1) BufSeering Phase: We first ‘buffer’ a sufficient number
of packets by not acknowledging them. These will be
ACKed in the probing phase as per the requirements
of the measuring algorithm. After establishing the TCP
connection with the remote host, for the first few ACKs,
we gradually increase rwin. As rwin is doubled. the
remote host responds with two back to back packets.
The last of the two packets is acknowledged with rwin
equal to four times the original min. The remote host
now responds with four back l o back packets. These
227
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 3, 2008 at 03:39 from IEEE Xplore. Restrictions apply.
four packets are ‘buffered’, to be ACKed in the probing
phase (as explained below). It is important to ensure
that the number of packers so buffered should not cause
retransmissions by the remote host as this would reduce
cwnd. See Fig. 2 for illustration of these events,
Probing Phase: The measuring node transmits ACKs for
the buffered packets with spacings and w i n values that
would cause the remote host to transmit new packets
resembling a packet chirp. This is illustrated in Fig. 2.
While experimenting with the above two phase approach,
we found that our A-abing phase did not perform as expected.
This was because even when the w i n and (our estimate
00 mnd were such as to allow these transmissions, many
hosts would not transmit new packets till all their previous
transmissions were acknowledged. This, we believe, is due to
the use of Nagle’s algorithm I121 on the remote host.
We also encountered the following unexpected behavior
with respect to a sender’s response to rwin. Many hosts would
send two packets of size equal to half of rwin for each ACK
sent. This behavior was seen for all values of m i n . However
if the ACK spacing were reduced. we received packets equal
to rwin, or equal to the path MTU when rwin was greater
than the path MTU. The reason for this is not clear.
The above experience leads us to the following outline of
the final design. The packets from the remote host should be
shaped so that it resembles a “chirp of packet pairs” (COPP).
where the spacing between the packets in each packet-pair
is successively reduced. This chirp pattern that should be
initiated by remote host has similarities to both TOFP and
Pathchirp. We reiterate that ours is a non-cooperative tool and
the challenge is to be able to initiate a packet pair from the
remote host at any time within the experiment by exploiting the
current TCP implementation. We have successfully achieved
this in iPathmeter2.
3 ) The two-phase measurement described in the previous
section is then initiated.
The code for iPathmeter2 uses two independent
processes-one to send appropriately spaced ACKs and
the other to receive the packets from the remote host.
Clearly, neither of these should be blocked because the ACK
spaces have to be correct and the received packets should be
timestamped accurately. Thus we need to poll the NIC for
received packets in non-blocking manner. In iPathmeter2,
we this is achieved using a system independent library called
Libpcap that provides a portable framework €or low-level
network monitoring. We use it to poll the NZC regularly to
capture the probe packets that will be sent by the remote
host. This is explained in detail below.
The Buffering Phase: RecalI that in this phase we accumulate packets by not acknowledging them. This is achieved
as follows. Initially, send an ACK with rwin = 1500 bytes.
The remote host responds with two back-to-back packets of
750 bytes each. Do a cumulative acknowledgment by sending
an ACK for the second packet, but changing w i n to four
times its original value to account for the fact that it can now
accept four MTU sized packets, The remote host responds with
a series of four back-to-back packets. We however observed
deviation in rhe number of back-to-back packets received from
different sites. This may be attributed to various burst mitigation techniques adopted in the TCP implementations [ 131. On
some of the Linux implementauons a variable MAXBURST
is used to achieve the same, and it is normally set to a v d u e
of 3. We must also mention here that this does not affect our
experiment, because we require only about three packets in
our buffer to be able to continue with the probing phase. The
buffering phase is illustrated in Fig. 3.
The Probing Phase: In this phase we do cumulative
ACKing for the packets. In the Buffering Phase, the senders
nund increases to 3* M S S or more. In this phase we ensure
v. IPATHMETER2 : DESIGN A N D IMPLEMENTATION
that the cwnd does not decrease in size. Since the amount
We make the following reasonable assumptions in the of data received is always the minimum of cwnd and rwin
design of iPathmeter2.
we use w i n to control the data being received. At the start
1) An HTTP daemon is running on a remote host at the of this phase the last packet of the packet train accumulated
far-end of the path.
in the previous phase is acknowledged and rwin = 1500 is
2) A file of size that will sustain the burst of probe packets advertised. This opens up the mnd at the sender side and it
is available for download via HTTP at remote host.
sends us a packet of size w i n = 1500.
3) Like in Clink and Pathchar we also assume that the
We use duplicate ACKs (DUPACKs) to generate an ACKACKs do not experience any congestion.
pair which in turn will generate a corresponding packet-pair
The implementation details are as follows:
from the remote host. Here we expIoit the property that TCP
1) Initialize by setting up firewall rules in the INPUT enters into a Fast Retransmit state only on the reception of 3
chain of Iptables to block incoming probe packets from DUFACKs [14]. By using only one DUPACK, we ensure that
reaching the kernel TCP Stack directly. This will prevent the remote host does not enter into a Fast Retransmit state.
the kernel TCP from sending of ACKs for these probe This eliminates the possibility of a decrease in m n d . The
packets. Note that the probe packets come as HTTP DUPACK sent contains Twin increased to twice its current
packets from the remote host with destination port the value causing the remote host to send a new packet. The
two packets classify as a ‘packet pair’. From our assumptions,
same as that used by iPathmeter2.
2) We then bind to a randomly chosen port on the Jocal these packet pairs are injected into the network at the same
host, establish a TCP session with the remote host rate at which their respective ACKs are sent. The above two
using the 3-way handshake and then initiate a HlTP data packets (forming the packet pair) are received by the
measuring host and timestamped. The last of these two packets
connection with it.
0-7803-8924-7/05/$20.00 ( ~ ) 2 0 O 5IEEE.
228
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 3, 2008 at 03:39 from IEEE Xplore. Restrictions apply.
Local
host
Remote
host
Local
host
Remote
host
J,
Ln
I
Cross traffic
generator
Local host
1-
51
-/
4
s2
Remote host
Fig. 1. Testbed setup. All the WAN links are of 2 Mbps.
8
4
for packet-pair j the spacing is less than the transmission
time on the bottleneck link. Dj will approach the packet
53 8
transmission time on the bottleneck link, i.e., D j approaches a
constant for increasing j (the packet-pair spacing is decreasing
with increasing j ) . Denote this constant by D.The bottleneck
capacity is then estimated as C =
The available bandwidth is estimated by considering the
Fig. 3. iPathmeter2 : The buffering phase is shown on the left and the fraction of packets that are queued at the bottleneck link. For
probing phase on the right. Si are the spacing &tween the ACK pairs at the
this we use the estimate of the capacity from above. The local
measuring and Ri are the spacing between the corresponding packet-pairs
host sends ACK pairs to the remote hosr with spacing of 2L/C
as received at the measuring node. 53 is less than the service time on the
bottleneck link of the path; hence R3 > S3.
(corresponding to a probing rate of CIS), which in turn will
generate the packet pairs from the remote host with the above
spacing. In each pair. if the second packet is queued at the
bottleneck
link, the packet spacing will be more than the 2L/C
is used for generating the next ACK pair, and correspondingly
that
we
started
with. Thus we can estimate the number of
the next packet pair from the remote host. The time difference
The utilization of the bottleneck link,
packets
that
are
queued.
between the original ACK and the duplicate ACK in the new
U
is
estimated
as
iteration is decreased and thus the rate of probing is increased.
This is illustrated in Fig. 3.
NPackets queued
U=
As we decrease the spacing between the packet-pairs. it can
NPackets queued -k NPackets not queued
happen that this spacing corresponds to rate of probing greater
We will call this the utilization based estimate (UBE) of the
than the available bandwidth on the path. In this case, the
available bandwidth. An estimator similar to that of Spruce
second packet in a e packet-pair gets queued up behind the
[11] is also used in Pathmetee. Here the ACKs from the
first packet on the bottleneck link and the spacing between
measuring node are spaced at L / C , corresponding to a probing
them is increased at the output, Thus the spacing between this rate of C, and the spacing between the received packets are
packet pair as observed at the local host will be more than
used in Eqn. 2 of [ll]. This estimator is denoted by ESS.
what it was between the corresponding ACK pair.
VI. EXPERIMENTAL
RESULTS
The experiment is repeated a number of times. After collecting a sufficient number of samples, the TCP session wilh
We first show the results from experiments under controlled
the remote host is closed. The data collected is fed to the conditions to validate iPathmeter2. The testbed setup was as
inference engine described below.
shown in Fig. 4. A cross traffic of UDP packets generated
Estimator: As can be seen from above, there is no correla- as a Poisson process of a specified rate is introduced on the
tion between the packet pairs corresponding to the different path being measured, A sample trace from this experiment is
ACK pairs and packets from different packet pairs do not shown in Fig. 5 with timestamps obtained on the respective
queue up at the bottleneck link together. Hence we cannot machines using t cpdump.
use the inference engine of SIC based bandwidth estimators.
All the links are of capacity of 2 Mbps. The utilization of
The inference engine of iPathmeter2 estimates both the all the links is low, i.e., there is very less cross traffic on this
capacity and the available bandwidth of the path. We consider network path. Table I shows the estimates of the capacity and
capacity fist. Assume that the remote host initiates N chirps of the available bandwidlh (using both the UBE and ESS). The
of J packet-pairs and that each packet has L bits. At the available bandwidth estimates from Pathchirp are also shown.
measuring node. iPathmeter2 obtains the packet-pair spacings Pathchirp estimates are provided by taking an average of all
for all the packet-pairs in the chirp. Recall that each packet- the per chirp estimates from Pathchirp 161.
pair in a chirp corresponds lo a different probing rate. Let dj,,,
We now report some results from measurements on the Indenore the spacing at the measuring host for the j-th packet- ternet. iPathmeter2 was used to estimate bandwidth which the
pair in the n-th chirp. Define Dj = minn=1,2,3,...,Ar(dj,n), IIT-Bombay network receives from web servers. Several web
i.e., the minimum spacing at the local-end of the reverse path servers were probed and estimates for capacity and available
for !he j-th packet pair in a chirp. It is easy to see that if bandwidlh are as provided in the Table 11. A comparison is
6.
0-7803-8924-7/05/$20.00 ( ~ ) 2 0 0 5IEEE.
229
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 3, 2008 at 03:39 from IEEE Xplore. Restrictions apply.
LOCAL
HOST
(202.141.154.131)
TABLE I
BANDWIDTH
ESTIMATES UNDER CONTROLLED CONDITIONS
REMOTE
HOST
(202.41.97.50)
0.0000
o.oDo0
o.oM10
0.0241
0.50
OZ985
07.1285
07.1317
07.4578
07.6529
07.6562
07.6563
07.6854
07.6884
07.8137
07.8429
07.8462
B7.8462
07.8822
07.8935
07.9289
07.9289
07.9431
07.9644
07.9792
07.9793
07.9836
08,0149
08.0217
08.0218
08.0231
Web Server
07.0980
07.0980
07.0981
1.72
1.58
1.25
Capacity
W P S )
Yahoo
Rediff
Nokia
Shockwave.com
HPSR.com
Stanford.edu
07.6231
07.6231
07.6231
07.6555
07.6555
07.6555
Infosys.com
1.98
1.98
2.015
1.99
2.01
1.96
0.04
1.51
1.26
1.46
1.32
Available bandwidth
Rvlbps?
1.53
1.18
1.23
0.62
1.03
1.16
0.016
07.8130
07.8130
07.8130
07.8458
Finally, we remark that iPathmeter2 is a network friendly
tool, and does not significantly aJter the network load.
07.B458
07.8928
07.8928
ACKNOWLEDGMENTS
The authors thank K. Ani1 Kumar for many perceptive
remarks and pointers to literature.
07.9283
07.92M
07.9429
07.9429
07.9786
07.9787
07.9830
07.9830
08.0573
08.0211
08.0211
08.0226
08.0226
08.0633
08.0634
08.0630
REFERENCES
R. Carter and M. Crovella, “Server selection using dynamic path
characterization in wide-area networks,” in Proc. of IEEE INFOCOM.,
Kobe, Japan. Apr. 1997, pp. 10141021.
[2] V. Jacobson. (1997, Apt) Pathchar: A tool to infer characteristics of
internet paths. [Onhe]. Available: ftp://ftp.ee.lbl.gov/paihchar/
[3] A. B. Downey, “Using pathchar to estimate internet link characteristics.”
in Proc. of ACM SIGCOMM., Sept. 1999, pp. 222-223.
[43 C. Dovrolis, P. Ramanahan. and D. Moore, “Whai do packet dispmion
techniques measure?” in Proc. oflEEEINFOCOM., Apr. 2001, pp. 905914.
[5] B. Melander, M. Bjorhan, and P. Gunningberg. “A new end-to-end
probing and analysis method for estimating bandwidth bottlenecks,” in
Pmc. of IEEE GLOBECOM., San Francisco CA, USA, Nov. 2ooO.
[6] V. Ribeiro, R.hedi, R. Baraniuk J. Navratil, and L. Cdtrell, “pathchirp:
Efficient available bandwidth estimation for network palhs,” in Pmc. of
Passive and Active Mearurements (PAM) Workrhop.. Apr. 2003.
VI ( 2 m ,
Dec.)
Caida.
[Online].
Available:
http://www.caida,org/toolshaxonomy
[8j N. Hu and P. Steenkiste, ‘Emhation and characterization of available
bandwidth probing techniques.” IEEE J o u d on Selecred Areas in
Communication, vol. 21, no. 6, pp. 879-894. Aug. 2003.
191 M. Jain and C. Dovrolis. “End-to-end available bandwidth: Measurement
methodology, dynamics, and relation with TCP throughput.” in Pmc. of
ACM SIGCOMM., Aug. 2W2, p p ~295-308.
[lo] B.
A.
Mah. (1999, Feb.) pchar: a tool for measuring
internet
path
characterisitcs.
[Online].
Avaiiable:
hltp://www.employees.org/ bmah/Software/pchar/
[11] J. Straws, D. Kalabi. and E Kaashoek. “A measurement study of
available bandwidth estimation tools,” in Pmc. of ACM SIGCOMM.,
2003, pp. 3 9 4 4 .
[I21 J, Nagle, “Congestion control in IP/TCP internetwork$.” RFC 896, Jan.
1984.
[I31 S . Floyd. “Highspeed TCP for large congestion windows,” RFC 36.19,
Dec. 2003.
[14] W. Stevens, “TCP slow start, congestion avoidance, fast retransmit, and
fast recovery algorithms,” RFC 2001, Jan. 1997.
[l]
Fig. 5. Timing diagram of data flow between the ioal host and remote host.
The two clocks are not synchronized. At the local host, times are referenced
to the transmission of the first SYN packet while at the remote the times
are referenced to the reception of this SYN packet. T h e times shown are in
minutes.
not possible as Pathchirp and other tools discussed above all
work in cooperative setups.
VII. DISCUSSION
AND FUTURE WORK
There were some observations of unexpected behavior from
our testing of iPathmeter2 on the Internet. Not all hosts in the
Internet behaved as expected with respect to ACK spacing and
varying m i n . When a w i n = 1500 is advertised, some of
the servers respond with MTU sized packets while ochers send
two back-to-back packets of size 750 bytes each. iPathmeter2
responds to such behavior adaptively. More surprisingly, a few
of the web servers when probed respond with different size
packets for each ACK pair. The methods that we develop must
work in the real world with a variety of possible non standard
TCP implementations. iPathmeter2 has been designed witb
t h e same objective. With extensive testing we expect to be able
to discover more unexpected behaviors and adapt to them.
0-7803-8924-7/05l$20.00(~)2005IEEE.
1.98
230
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 3, 2008 at 03:39 from IEEE Xplore. Restrictions apply.