Internet Stream Size Distributions

Internet Stream Size Distributions Nevil Brownlee
kc claffy
CAIDA, SDSC, UC San Diego and
The University of Auckland, New Zealand
CAIDA, SDSC UC San Diego
[email protected]
[email protected]
ABSTRACT
We present and discuss stream size and lifetime distributions for
web and non-web TCP traffic on a campus OC12 link at UC San
Diego. The distributions are stable over long periods, and show that
on this link only 3% of the streams last longer than one minute, and
that only about 0.5% of them are bigger than 100 kBytes. Although
there are large streams (elephants) on this link, the bulk of its traffic
is composed of many small streams (mice).
1.
INTRODUCTION
Many studies of network stream size distributions have been published. Downey [1] presented a model that explains the distribution
of file sizes found both in computer systems and in the World Wide
Web. Zhang, et al [2] observe that “Internet traffic is now dominated by mice, i.e. small objects 10-20 kB in size; the average web
document is only around 30 kB,” but in contrast report that “the
majority of the packets and bytes belong to elephants.”
In this paper we present more detailed measurements of stream size
and lifetime distributions, allowing us to comment on the stability
of the distributions from minute to minute over periods of an hour
or more.
2.
MEASURING DISTRIBUTIONS
We observe streams in real time using the methodology described
by Brownlee & Murray [3]. We use a NeTraMet meter, modified
to observe streams within a flow, to collect data on stream size and
lifetime distributions. For this study we use a ruleset (meter configuration file) that produces separate flows for various kinds of traffic, in particular web, non-web TCP and UDP. Each stream within
a flow is monitored by the NeTraMet meter. When a stream times
out, the meter knows its size in packets, size in bytes, and (active)
lifetime in microseconds. Each stream’s data is used to add a point
to its flow’s size and lifetime distributions.
Support for this work is provided by DARPA NGI Contract
N66001-98-2-8922, NSF Award NCR-9711092 ‘CAIDA: Cooperative Association for Internet Data Analysis,’ and The University
of Auckland.
OC12, Mon 11 Mar 02, FlowTime for whole torrent
% streams
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
stream lifetime (minutes)
25
30
0935
0940
0945
0950
0955
1000
1005
time (HHMM)
1010
Figure 1: Stream lifetime plots (% of total) for all traffic,
covering 40 minutes from when metering began
NeTraMet’s dynamic timeout algorithm is similar to the one described in [4]. It uses a minimum inactive time of two seconds, after which a flow is timed out if it remains inactive for a period equal
to its average packet interarrival time multiplied by a factor of 10.
We also tested the meter using a factor of 100, and compared the
percentage of streams timing out during each minute after a twohour run. With the 100 factor, 96.5% of streams lasted one minute
or less, compared with 97.9% when using the factor of 10, a difference of 1.4%. For longer streams the effect was smaller, 0.6% for
2-minute streams and 0.2% for 5-minute streams. Overall the slight
improvement in measuring stream lifetimes is arguably not worth
the additional processing and memory overhead in the meter.
We read flow data (including all the distributions) from our meters
at one-minute intervals; the ‘times of day’ when distributions were
read from the meter appear on the figures below as 24-hour times.
3. METER LOCATIONS
We used NeTraMet to make measurements on high-speed network
links at three experimental sites. At our OC3 and OC48 sites more
than a third of the traffic by byte is web streams. The main difference between these two sites is that the OC48 link carries about 800
Mb/s whereas the OC3 link carries only about 8 Mb/s. However our
OC12 link is markedly different: its 120 Mb/s load is dominated by
non-web TCP streams. In this paper we concentrate on our OC12
link at UC San Diego; we will address the differences in traffic
5. STREAM SIZES IN BYTES
OC12, Mon 11 Mar 02, FromFlowOctets, web streams
% streams
5
We now explore the stream size distributions for different kinds of
traffic. Figures 2 and 3 show per-minute distributions for a typical
hour. Each minute’s distribution shows the percentage of streams
of various sizes, using a log scale from 40 bytes (0.04 kB) to 400
kilobytes. We observe that most streams on this link are counted in
the lowest bin; our z-axis scale runs from 0 to only 5% of streams,
in order to reveal detail for streams larger than 40 bytes.
4
3
1000
2
1010
1020
1
1030
1040
0
1050
0.1
1
10
100
time (HHMM)
1100
stream size (kBytes)
Figure 2: Stream size plots (% of total) for web traffic
We have examined cumulative distributions to determine what proportion of streams lie in different size ranges. For web streams:
87% are under 1 kB in size; 8% are between 1 and 10 kB; and
4.8% are between 10 and 100 kB. For non-web streams these figures are 89%, 7% and 1.5%, respectively, i.e. non-web TCP traffic
has slightly more small streams, but significantly fewer streams between 10 and 100 kB. Figures 2 and 3 show this effect clearly;
both figures show a steep fall toward 1 kB, but whereas web traffic
drops from 2% to 0.2% as stream size increases above 10 kB, the
non-web traffic streams fall from around 0.8% at 1 kB down to near
zero for sizes above 10 kB.
Another striking feature of these plots is that they change little over
time; their basic shapes remain much the same over the whole hour
shown on the figures.
OC12, Mon 11 Mar 02, FromFlowOctets, non-web streams
6. CONCLUSION
% streams
5
4
3
1000
2
1010
1020
1
1030
1040
0
1050
0.1
1
10
100
time (HHMM)
1100
stream size (kBytes)
Figure 3: Stream size plots (% of total) for non-web TCP traffic
among the OC3, OC12 and OC48 links in future work.
4.
We have measured stream lifetime and byte size distributions at
one-minute intervals for traffic on our OC12 link. On this link 87%
of the streams are smaller than 1 kB, and only about 0.5% are bigger than 100 kB. This suggests that although there are large streams
(elephants) on this link, the bulk of its traffic is composed of small
streams (mice).
STREAM LIFETIMES
We begin by examining the stream lifetime distribution for the total
traffic, i.e. the torrent, on the OC12 link. Figure 1 shows distributions of stream lifetime for 40 consecutive minutes from 0932, the
time when we began metering.
The stream lifetime distribution for each minute shows the percentage of streams with lifetimes of 1 to 30 minutes. During the first
30 minutes the maximum observed lifetime increasd, reaching the
30th bin after half an hour, i.e. at 1032. After that time the rightmost edge of the plot shows the overflow bin; a few streams continue to time out after 30 minutes.
After the initial 30 minutes, the distributions continue similarly.
Each minute, a few streams time out with lifetimes between 10 and
30 minutes. The most surprising feature of figure 1 is that nearly
97% of all streams on this link last only one minute or less, and that
fewer than 1% last more than five minutes.
On this campus OC12 link, 97% of streams last one minute or less,
and only about 1% of them live longer than 5 minutes. We believe
that a meter-reading interval of one minute (as used in this study)
yields valid and useful data about short-term behaviour of stream
distributions. Where more detail of longer-running streams is required, five-minute readings should prove effective.
Finally, note that the data for our distributions is collected in real
time. Our NeTraMet meter performs data reduction and produces
flow data files directly; our figures are generated by simple perl
scripts using that flow data. This capability is well-suited to ongoing monitoring applications where there is no desire to generate,
store or process large amounts of packet trace data.
7. REFERENCES
[1] A. B. Downey, The structural cause of file size distributions,
MASCOTS Symposium, 2001, available at
http://rocky.wellesley.edu/downey/filesize/
[2] Y. Zhang and L. Qiu, Understanding the End-to-End
Performance Impact of RED in a Heterogeneous
Environment, Cornell CS Technical Report 2000-1802, July
2000, available at http://www.aciri.org/floyd/red.html
[3] N. Brownlee and M. Murray, Streams, Flows and Torrents,
PAM2001 April 2001, available at http://www.caida.org/
outreach/papers/2001/StreamsFlowsTorrents/
[4] Bo Ryu, David Cheney and Hans-Werner Braun, Internet
Flow Characterization - Adaptive Timeout and Statistical
Modeling, PAM2001, April 2001