(YMKT99) Measurement and Modeling of The Temporal Dependence in Packet Loss

Measurement and Modelling of the Temporal Dependence in Packet Loss
Maya Yajnik, Sue Moon, Jim Kurose and Don Towsley Department of Computer Science University of Massachusetts Amherst, MA 01003 USA emails:[yajnik,sbmoon,kurose,towsley]@cs.umass.edu
UMASS CMPSCI Technical Report # 98-78 Abstract Understanding and modelling packet loss in the Internet is especially relevant for the design and analysis of delay-sensitive multimedia applications. In this paper, we present analysis of hours of end-to-end unicast and hours of stationary traces for further analysis. We multicast packet loss measurement. From these we selected consider the dependence as seen in the autocorrelation function of the original loss data as well as the dependence between good run lengths and loss run lengths. The correlation timescale is found to be or less. We evaluate the accuracy of three models of increasing complexity: the Bernoulli model, the 2-state Markov chain model and the -th order Markov chain model. Out of the trace segments considered, the Bernoulli model was found to be accurate for segments, the 2-state model was found to be accurate for segments. A Markov chain model of order or greater was found to be necessary to accurately model the rest of the segments. For the case of adaptive applications which track loss, we address two issues of on-line loss estimation: the required memory size and whether to use exponential smoothing or a sliding window average to estimate average loss rate. We nd that a large memory size is necessary and that the sliding window average provides a more accurate estimate for the same effective memory size. Keywords : measurement, modelling, loss, temporal dependence, correlation.

This work was supported by the National Science Foundation under grant NCR 9508274 and by Defense Advanced Research Projects Agency under agreement F30602-98-2-0238. Any opinions, ndings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reect the views of the National Science Foundation.
1 Introduction
Packet loss is a key factor determining the quality seen by delay-sensitive multimedia applications such as audio/video conferencing and Internet telephony. These applications experience degradation in quality with increasing loss and delay in the network. Understanding the loss seen by such applications is important in their design and performance analysis. Adaptive audio/video applications adjust their transmission rate according to the perceived congestion level in the network (see [6, 8, 9]). By this adjustment a suitable loss level can be maintained and bandwidth can be shared fairly between connections. For such adaptive applications it is important to have simple loss models that can be parameterized in an on-line manner. In this paper, we present careful analysis of end-to-end loss using hours of measurements. The measurements are of loss as seen by packet probes sent at regular intervals (of , , and ) sent on both unicast and multicast Internet connections. Since signicant non-stationary effects are seen in the data, we divide the traces into two-hour segments and check each segment for stationarity. Gradual decrease and increase in the mean loss rate, abrupt and dramatic increase in loss rate for a few minutes and spikes of high loss rate are observed. We analyze hours of stationary trace segments to determine the extent of temporal dependence in the data. The results show that the correlation timescale, the time over which what happens to one packet is connected to what happens to another packet, is approximately second or less. Beyond this timescale, packet losses are independent of each other. We also consider loss models of increasing complexity and estimate the level of complexity necessary to accurately model the stationary trace segments. The loss process is modelled as a -state discrete-time Markov chain corresponds to the Bernoulli model and corresponds to a 2-state Markov chain model. model where is refered to as the order of the Markov chain. For the datasets with sampling intervals of , the Bernoulli model was found to be accurate for segments, the 2-state Markov model for segments and Markov chain models of orders to for the remaining . The Bernoulli and the 2-state Markov chain models are not accurate for the traces with sampling intervals of and and the estimated order of the Markov chain model ranged from to . Adaptive multimedia applications often track the congestion level in the network in order to adjust the rate of transmission to share the bandwidth fairly and to maintain a low enough loss level ([6, 9]). They can use information about loss in the network over a period of time, instead of the traditional per-packet loss information. For such situations of on-line loss estimation, it is useful to know how much past information is necessary to accurately estimate the parameters of the loss models. The non-stationarity observed in the data shows that the loss rate does vary over time, sometimes dramatically, and thus up-to-date information is important. Because of the need of adaptive applications for loss rate estimates, we address the question of how much memory is necessary for the on-line parameter estimator to provide an accurate enough estimate. In general, it is found that the variance in the estimator due to limited number of samples is high and thus large numbers of samples are required to get an accurate estimate. We also address the question of whether or not exponential smoothing provides a better estimate of the mean loss rate than the sliding window average. Though computationally quicker and requiring less buffer space, exponential smoothing needs more than twice as many samples to provide an estimate with the same accuracy as the sliding window average. Earlier works on the measurement of unicast and multicast packet loss in the Internet ([1, 4, 5, 14, 15]) have noted that the number of consecutively lost packets is small. In [15] long outages of several seconds and minutes were also observed on the MBone (multicast backbone network overlaid on the Internet). In [13] measurements of voice trafc are analyzed to assess the effects of strategies to compensate for varying delay and loss on the quality of the voice connection. In [1] statistical analysis of temporal correlation in loss data has been described and weak correlation has been noted. The use of discrete-time Markov chain models, particularly the 2-state Markov chain

Date 1 2 3 4 5 6 7 8 9 14Nov97 14Nov97 20Nov97 12Dec97 20Dec97 20Dec97 21Dec97 21Dec97 26Jun98
Type unicast multicast unicast multicast unicast multicast unicast multicast unicast
Table 1: Trace Descriptions Sampling Destination Interval ( ) 80ms SICS, Sweden 80ms SICS, Sweden 160ms SICS, Sweden 160ms St. Louis, Missouri 20ms Seattle, Washington 20ms Seattle, Washington 20ms Los Angeles, California 20ms Los Angeles, California 40ms Atlanta, Georgia
Time 09:52 09:53 16:03 14:23 13:41 13:49 13:17 13:26 16:56
Duration Rate 8hr 8hr 2days 2days 2hr 27min 2hr 27min 2hr 2hr 6hr
Loss 2.70% 11.03% 1.91% 5.24% 1.69% 3.76% 1.38% 3.37% 2.22%
model (sometimes called the Gilbert model) has been proposed in [14, 7, 8, 11]. Discrete-time Markov chain models of increasing levels of complexity, including the 2-state Markov chain model have been described in [14]. This paper takes a more careful look at both temporal dependence and the validity of using various models for packet loss using hours of stationary data. The remainder of this paper is structured as follows. In Section 2 we describe how the data was collected. In Section 3 we rst describe two ways of representing the data and then go on to discuss our analysis of stationarity and the temporal dependence in the data. The models and an evaluation of their accuracy is discussed in Section 4. In Section 5 we discuss the memory size required for on-line estimation of model parameters and also the relative suitability of exponential smoothing and sliding window averaging as ways of estimating the average loss rate. Finally, the conclusions are given in Section 6.
2 Measurement
Our study of end-to-end loss in the Internet is based on hours of traces. These traces were gathered by sending out packet probes along unicast and multicast end-to-end connections at periodic intervals (of , , or ms) and recording the sequence numbers of the probe packets that arrived successfully at the receiver. The packets whose sequence numbers were not recorded were assumed to have been lost. Thus the loss data can be considered to be a binary time series where takes the value if the probe packet arrived successfully and the value if it was lost. The interval at which the probe packets are sent out into the network is referred to as the sampling interval ( ) for the rest of the paper. The measurements are summarized in Table 1. The source was located at Amherst, Massachusetts for all the traces. Note that one of multicast trace (not shown in the table) displayed an extremely high loss rate of and was therefore disregarded for analysis. We refer to the datasets using the notation Date.Type (for example, 20Nov97.uni).

30
25
Loss percentage
20
15
10
0 0
1000
2000
3000 4000 5000 Time in seconds
6000
7000
8000
Figure 1: The Loss Percentage for a two-hour segment of 14Nov97.uni
3 Analysis
We consider two ways of representing the loss data. The rst is as a binary time series where takes the value if the probe packet arrived successfully and the value if it was lost. The time series is a discrete-valued time series which takes on values in the set . A second way of representing the data is as alternating sequences of good runs and loss runs. The trace can be divided into portions of consecutive s and portions of consecutive s. The good runs are the lengths (expressed in number of packets) of the portions of the trace which consist solely of consecutive s. Similarly, the loss runs are the lengths of the portions of the traces which consist solely of consecutive s. Thus the data can be considered to be and where is the good run length and the interleaving of the two sequences of observations: is the corresponding loss run length. The remainder of this section is structured as follows. We discuss our analysis of the traces for stationarity in Section 3.1. We nd that approximately hours exhibit stationarity. In Section 3.2 we focus on the temporal dependence in these stationary trace segments as shown by the sample autocorrelation function of the binary time series representation of the data. We discuss our analysis of the temporal dependence using the good run and loss run representation of the data in Section 3.3.

3.1 Stationarity
By looking at the smoothed trend of the loss data, obvious non-stationarity was found in most of the traces. Therefore, the traces are divided into approximately two-hour segments and each segment was checked for stationarity. For the rest of the paper we refer to the datasets using the notation Date.Type-Segment# (for example, 14Nov97.uni3). A time series is said to be stationary in the strict-sense, if the statistical properties remain constant over the entire series. A time series is said to be stationary in the wide-sense (also called weak stationarity) if the mean and covariance function remain constant over time. Since there is no good way to rigorously test for stationarity,

Dataset 1 2 3 4 5 6 7 8 9
Table 2: Stationarity sampling seginterval ments 80ms 80ms 160ms 160ms 20ms 20ms 20ms 20ms 40ms 4 4 24 25 1 1 1 1 3
14Nov97.uni 14Nov97.multi 20Nov97.uni 12Dec97.multi 20Dec97.uni 20Dec97.multi 21Dec97.uni 21Dec97.multi 26Jun98.uni
number of stationary segments none none 19 14 1 1 1 none 3
we checked whether the average loss rate varied signicantly in the trace segments. We plotted the smoothed trace packets) to judge the extent to which the average loss using a nite moving average lter (of window size of in the average loss rate seen in the rate varied over the length of the trace. Abrupt increases of greater than smoothed trace, were taken to be an indication of non-stationarity. Also, to test for gradual increase or decrease in the average loss rate, we t a straight line to the data (a straight line which minimizes the least-square error) as an estimate of the linear trend. A total change in the average loss rate of or greater, over a hour trace segment, as estimated by the least-squares straight line, was considered to be an indication of non-stationarity. There are different kinds of observed non-stationary effects. For example, consider the smoothed loss for the third segment of trace 14Nov97.uni in Fig. 1. In the rst half of the segment there is a slow, linear decay in loss to over one hour. Such slow decays and increases were noticed in other traces as well. percentage from Also, in this case there is an abrupt, dramatic increase from loss to loss which lasts for approximately 10 minutes before abruptly dropping to a lower loss rate. Such abrupt increases were noticed in other traces although this trace segment shows the most dramatic increases which last the longest time. Table 2 summarizes the results of our analysis for stationarity. Traces 20Dec97.multi and 21Dec97.multi exhibit periodic behavior (with a period of ) making them difcult to analyze and model. Thus we identied out of two hour segments as exhibiting stationary behavior. The stationary segments of ve traces: 20Nov97.uni, 12Dec97.multi, 20Dec97.uni, 21Dec97.uni and 26Jun98.uni have been analyzed for temporal dependence in the following sections.
3.2 Autocorrelation of the Binary Loss Data

The sample autocorrelation function of the loss data for the stationary segments, expressed as the binary time series reveals the extent of dependence in the loss experienced by the probe packets over time. In this section, we dene the autocorrelation, describe the estimation of correlation timescale from the autocorrelation function and discuss the results for the loss data. Let be a stationary sequence of random variables and let be a realization of . The autocor

relation function for the stationary sequence of random variables
The sample autocovariance function, assuming stationarity is
where
is the lag. The sample autocorrelation function is
where is the sampling interval. Correlation timescale, is the minimum lag, in terms of time, such that is uncorrelated at all lags, , . In the case of an independent stochastic process, the autocorrelation function is zero, and if the stochastic process , is zero at that lag. However, in the case of an is independent at a lag , then the autocorrelation function, observed sample sequence which is the realization of an independent stochastic process, the sample autocorrelation function could have non-zero, though very small, values. Proposition 1 gives an idea of what constitutes a signicant value for the sample autocorrelation function for a given number of samples, . Proposition 1 For large , the sample autocorrelations of an IID sequence with nite variance are approximately IID with a normal distribution (mean of and variance of )(see [10]). Hence if is a realization of such an IID sequence then, approximately of the sample autocorrelations should fall between the condence bounds . This proposition species what constitutes a signicant value for the sample autocorrelation function for a given number of samples. The correlation timescale is the smallest lag in terms of time, , at and beyond which the value of the sample autocorrelation function becomes small enough to be considered insignicant. Using Proposition 1 we can test the following hypothesis: Hypothesis 1 is independent at lag

The test statistic is calculated from the sample autocorrelation function at lag , of the sample sequence,
R S IQ Q

5 V U
The lag can also be expressed in terms of time. That is, let
be the lag in terms of time. (2)
5 D D HGFE5
5 D D HPIE5
$ 4

) 2 3
where is the mean and is the lag. The sample mean for observed data
is
& '%
(1) as
! & ) "10#
$# ! (
is dened as
! "
5
97 8 @97 8

B B @98 A 7 C 6 95 B #A B
W5 V T U
@ 7

5 aU X Y `

0.2
observed bounds
0.2
20Nov97.uni1 (160ms) 26Jun98.uni1 (40ms) 20Dec97.uni (20ms)

0.15
0.15
Autocorrelation
Autocorrelation
0.1
0.1
0.05
0.05
0
0 500 1000 1500 2000
10
20
30
40
50
d: Lag in terms of time (ms)
Lag
follows.
If the hypothesis is true, then has a limiting standard normal distribution. Thus, for example, if then the hypothesis is rejected at signicance level . The results for the loss data show that there is some correlation at small lags, and that the sample autocorrelation function decays rapidly. As an example, consider the rst segment of trace 20Nov97.uni, 20Nov97.uni-1 with sampling interval of . Fig. 2 shows the autocorrelation function of the binary loss data. A value close to zero of the sample autocorrelation function, for a lag of , indicates independence between and . Since there are a nite number of samples, the condence bounds of show the range of values which are close enough to zero to indicate independence. In the gure, the condence bounds ( ) are plotted as dotted lines. The autocorrelation at lag (that is, the correlation between consecutive packets) is approximately . It is clear that from a lag of onwards the autocorrelation function becomes insignicantly small except for a few stray points. This means that the data is dependent up to a lag of 2 and that the correlation timescale for this trace segment is . We analyzed all of the stationary segments of our traces. The traces, 20Nov97.uni and 12Dec97.multi had sampling intervals of , the traces 20Dec97.uni and 21Dec97.uni had sampling intervals of and the trace 26Jun98.uni had a sampling interval of . For the trace segments with sampling intervals of , the sample autocorrelation functions were similar to that described in the example trace segment 20Nov97.uni-1. That is, the autocorrelation function for lag 1 is usually signicant and approximately and the function decays to insignicant values for lags of around and more. When the lag is expressed in terms of time (refer to 2) , this corresponds to a correlation timescale of or less. The correlation between consecutive packets for datasets 20Nov97.uni and 12Dec97.multi is found to vary between and with an average of . The behavior of the sample autocorrelation function for traces with sampling intervals of and is somewhat different from that for traces with sampling intervals of . For the traces, the autocorrelation
X

C

5 ` U X Y

Figure 2: Sample Autocorrelation Function of Binary Data for 20Nov97.uni-1
Figure 3: Sample Autocorrelation Function for three trace segments with sampling intervals of , and
5 U X Y `
5 ` U @@7

12
10
Frequency
160
320
480
640
800
960
1120
Correlation Timescale in ms
Figure 4: Frequency Distribution of Correlation Timescale
function at lag 1 is approximately which is much higher than the value of approximately for the trace segments. This indicates much higher correlation between consecutive packets for sampling intervals of than is present for traces with . The trace segments with sampling intervals of showed a similarly high at lag of . The autocorrelation functions for the traces with sampling interval autocorrelation of approximately remains signicant for lags of or less, and for the traces with sampling intervals of it remains signicant for lags of or less. Fig. 3 shows the sample autocorrelation as a function of lag expressed in terms of time (2), for three trace segments each with different sampling intervals. Here the chosen trace segments are 20Nov97.uni1, 26Jun98.uni and 20Dec97.uni with sampling intervals of , and respectively. Thus from this gure it is clear that the correlation timescale is or less. The correlation timescale of a data segment is estimated as the smallest lag (expressed in terms of time) for which the Hypothesis 1 is rejected (that is, as the smallest lag at which the time series is independent). A signicance level is used. This method could underestimate the correlation timescale, since the autocorrelation function could of show dependence at some higher lag, but we have ignored this effect. The results for all the stationary trace segments are summarized in Fig. 4 as a frequency distribution of the correlation timescale estimated for each data segment. Here we can see that the correlation timescale is less than except in two cases. For comparison with traces with sampling intervals of , the traces 20Dec97.uni and 21Dec97.uni were partitioned into subtraces, each subtrace corresponding to a trace with sampling interval of . Also, the segments of trace 26Jun98.uni was partitioned into subtraces, each subtrace corresponding to a trace with sampling interval of . As expected, the sample autocorrelation functions of the subtraces are similar to those . seen for the traces with sampling intervals of Thus, the results indicate a correlation timescale of 1 second or less over all the stationary trace segments. Also, this suggests that loss may not be adequately modelled by an IID process such as the Bernoulli process since the traces exhibit some correlation at small lags. The 2-state Markov chain model is able to accurately model the correlation for lag 1 but since the correlation predicted decays rapidly for larger lags, it is inaccurate with respect to modelling the correlation for lags greater than 1.

3.3 Good Run and Loss Run Lengths

The second way of representing the data is as alternating sequences of good run lengths and loss run lengths: that is, and . The good runs are the lengths (expressed in number of packets) of the portions of the trace which consist solely of consecutive s. Similarly, the loss runs are the lengths of the portions of the traces which consist solely of consecutive s. Since the binary loss data, is not completely independent in many cases, we check the good runs and loss runs for independence. The autocorrelation function of the good run lengths and the loss run lengths gives information about the extent of dependence between the run lengths. Thus if the autocorrelation functions are found to have very small values, then it indicates that there is little dependence between the lengths of runs and the underlying processes can be assumed to be IID. Similarly, if the crosscorrelation function between the good runs and the loss runs is found to be small it indicates that the lengths of good runs and loss runs are not dependent on each other. It was found that, in most cases, the good runs and the loss runs are indeed independent. Thus the good runs and the loss runs can be modelled separately as IID random variables. Fig. 5 shows the autocorrelation function of the good run lengths, the autocorrelation function of the loss run lengths and the crosscorrelation function between the lengths of the good runs and the loss runs for an example trace segment 20Nov97.uni-1. The dotted lines indicate the condence bounds for the range of values considered close to zero thus indicating independence for that lag as specied by Proposition 1. It is evident that there is insignicant dependence among the good runs, among the loss runs and between good runs and loss runs. We analyzed all the stationary segments of the traces similarly. The three sample correlation functions (autocorrelation of good run lengths, autocorrelation of loss run lengths and the crosscorrelation between good run lengths and loss run lengths) indicate independence for all the segments in the trace 20Nov97.uni. The trace 12Dec97.multi did show dependence between the good run lengths in cases (up to lags , and ). The traces with sampling intervals of (20Dec97.uni and 21Dec97.uni) show different results. The autocorrelation function of the loss runs and the crosscorrelation function of the good runs and the loss runs have insignicantly small values thus indicating the independence of the loss run lengths and the independence of the and ). good runs from the loss runs. However, the good runs lengths do show some dependence (up to lags Fig. 6 shows the three correlation functions for the trace 20Dec97.uni which has a sampling interval of . The trace 26Jun98.uni with sampling interval of shows independence for all three correlation functions: autocorrelation of good run lengths, autocorrelation of loss run lengths and crosscorrelation of good run and loss run lengths. The next step is to look at the distributions of the good run and the loss run lengths. Fig. 7 shows the distribution of the lengths of the good runs expressed in the number of packets and Fig. 8 shows the distribution of the loss run lengths for the same trace segment 20Nov97.uni-1. Both gures also show the distributions predicted by the models for comparison with the observed data which will be discussed in Section 4. Here it is evident that the distribution of the good run lengths decays approximately linearly when a logscale is used for the y-axis, suggesting a geometric distribution for the good run length process. The loss run lengths are either or . The other stationary segments from traces with sampling intervals of showed very similar results. The distribution of the good run lengths for the segments of traces 20Nov97.uni and 12Dec97.multi appear to be, by visual inspection, approximately geometric. For some segments there is an excess of very short good run lengths (less than 5). The loss run lengths were either , or for all but one segment of trace 20Nov97.uni. The loss run lengths are either , , or for all but three segments of trace 12Dec97.multi. This low number of consecutive losses has been noted in earlier work( [4], [5]). The distributions of good run and loss run lengths for the traces 20Dec97.uni, 21Dec97.uni and 26Jun98.uni look somewhat different. For example, consider the distribution of good run lengths shown in Fig. 9 for the trace

0.5
0.5
Autocorrelation/ Crosscorrelation
0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0 10 20 30
Autocorrelation/ Crosscorrelation
0.4
autocorrelationgood autocorrelationloss crosscorrelation confidence bounds
0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4
autocorrelationgood autocorrelationloss crosscorrelation confidence bounds
40
50
0.5 0
10
20
30
40
50
Lag
Lag
Figure 5: Autocorrelations and Crosscorrelation of Good Run and Loss Run Lengths for 20Nov97.uni-1
Figure 6: Autocorrelations and Crosscorrelation of Good Run and Loss Run Lengths for 20Dec97.uni
20Dec97.uni. The distribution of good run lengths shows geometric decay except for the high probability for burst lengths of less than . The distribution of loss run lengths for this trace shown in Fig. 8, shows approximately geometric decay. The traces 21Dec97.uni and 26Jun98.uni show similar results. Since the good run lengths for 20Nov97.uni and 12Dec97.multi appear to be geometrically distributed, we use the Chi-Square goodness-of-t test to provide more systematic evidence that the distributions are geometric. Instead of the discrete-valued geometric distribution we use its continuous-valued equivalent, the exponential distribution. We also compute the shape and rate parameters assuming a Gamma distribution for the good run lengths. , where is the coefcient of variation) of the Gamma distribution gives an The shape parameter ( idea of how close the distribution is to the exponential distribution. A shape parameter of corresponds to an exponential distribution, which is a special case of the Gamma distribution.The shape parameter for the segments of 20Nov97.uni varied between and with an average of . The shape parameter for the segments of 12Dec97.multi varied between and with an average of . The Chi-Square goodness-of-t test is used to check whether to reject the hypothesis that the good run lengths are IID random variables with the given distribution. Using the Chi-Square goodness-of-t test, the following hypothesis can be tested for both the Gamma and the Exponential distributions. Hypothesis 2 The good run lengths are IID random variables with the given distribution function. It is difcult to similarly use the Chi-Square goodness-of-t test for the loss run lengths because the lengths of the loss run lengths vary within a very small range. The continuous-value distributions were used to make it easier to determine the partitions for the test. For traces 20Nov97.uni and 12Dec97.multi, the hypothesis is rejected for the exponential distribution for out of segments and for the Gamma distribution it was rejected for out of segments. Thus we conclude that the good run lengths are approximately geometrically distributed for the traces 20Nov97.uni and 12Dec97.multi, although the t is not exact for one-third of the trace segments. For comparison with the traces with sampling intervals of , the traces 20Dec97.uni, 21Dec97.uni and 26Jun98.uni were partitioned into subtraces, each subtrace corresponding to a trace with sampling intervals of . These subtraces were then analyzed. As expected, the results are similar to those seen for the rst two traces

10
X X
) V U
10
10
Probability of the length being b
observed Bernoulli model 2state model

10
1
10
10
10
50
100
150
200
250
300
10
10
Number of packets in the burst: b
Figure 7: The Distribution of Good Run Lengths for 20Nov97.uni-1
Figure 8: The Distribution of Loss Run Lengths for 20Nov97.uni-1
10
10

2

1
10
10
10
10
10
50
100
150
200
250
300
10
10
Figure 9: The Distribution of Good Run Lengths for 20Dec97.uni
Figure 10: The Distribution of Loss Run Lengths for 20Dec97.uni
11
4 Modelling
In this section we discuss the potential of using three models of increasing complexity: the Bernoulli loss model, the 2-state Markov chain model and the -th order Markov chain model for modelling the data. The loss process which characterizes the binary loss data (as described in Section 3) is considered to be a stationary discrete-time stochastic process taking values in the set . The order of the process is the number of previous values of the process on which the current value depends and can be considered as a measure of the complexity of the model. For an order of , there are states. The Bernoulli model is the simplest model with order of and no states. The 2-state Markov chain model is of order . The Markov chain model of -th order is a generalization of these two models and has states. First, in Section 4.1 we describe the models and associated parameter estimators. Then in Section 4.2 we assess the validity of the models and the results for the stationary trace segments.

4.1 Models
4.1.1 Bernoulli Loss Model

, is IID (independent and identically disIn the Bernoulli loss model, the sequence of random variables, tributed). That is, the probability of being either or is independent of all other values of the time series and the probabilities are the same irrespective of . Thus this model is characterized by a single parameter, , the probability of being a (value corresponds to a lost packet). It can be estimated from a sample trace by
is the number of times the value occurs in the observed time series, where samples in the time series. The good run length distribution for this model is

and
The loss run length distribution for this model is

4.1.2
Two-state Markov Chain Model

In the 2-state Markov chain model the loss process is modelled as a discrete-time Markov chain with two states. The current state, of the stochastic process is depends only by the previous value, . Unlike the Bernoulli model, this model is able to capture the dependence between consecutive losses but has an additional parameter. This model can be characterized by two parameters, and , the transition probabilities between the two states.
12

A &
5
A 7 7
A 7

with sampling intervals of

(3) is the number of
5
(4)
where is the number of times in the observed time series that follows a and is the number of times follows . is the number of times a occurs in the trace and is the number of times a occurs in the trace. The good run length distribution for this model is
The loss run length distribution for this model is

Note that if equals this 2-state Markov chain model is equivalent to the Bernoulli model, since it means that the probability of loss is independent of which state the process is in. Also, the relative value of and provides us us information about how bursty the losses are. If then a lost packet is more likely provided that the previous packet was lost than when the previous packet was successfully received. On the other hand, indicates that loss is more likely if the previous packet was not lost.

4.1.3
Markov Chain Model of -th order
A class of random processes that is rich enough to capture a large variety of temporal dependencies is the class of nite-state Markov processes. The Bernoulli model and the 2-state Markov chain model are special cases of this class of models. In the Markov chain model, the current state of the process depends on a certain number of previous values of the process. This number of previous values is is the order of the process. Such a process is characterized by its order , and by a conditional probability matrix whose rows may ) according to which, the next random variable is be interpreted as probability mass functions on ( generated when the process is in a state .
is independent of for all (see [3]). Note that the Bernoulli and the 2-state Markov chain model are special cases of the Markov chain model corresponding to orders of and respectively. Let be an observed sequence or a sample path from a Markov source. The -th order state transition probabilities of the Markov chain can be estimated for all as follows. The number of states is and the number of conditional probabilities is .
13
&
A
A
A process
is a Markov chain of order , if the conditional probability
A A &

& A
5

5
A 7
A 7

The maximum likelihood estimators (page
of [2]) of
and for a sample trace are (5) (6)
&
7

A A

A

Let be a given state of the chain. Let be the number of times state is followed by value in the sample sequence. Let be the number of times state is seen. Let be an estimate of the probability that , given that . Then estimates the state transition probability from state to state . The maximum likelihood estimators of the state transition probabilities of the -th order Markov chain are if
otherwise
4.2 Results for the Stationary Segments

The autocorrelation function of the binary loss data as discussed in Section 3.2 shows some correlation for small lags. The Bernoulli model does not capture any of the correlation seen for example in Fig. 2. The 2-state Markov chain model is able to accurately model the correlation only for a lag of . The estimated order of the binary loss process indicates the complexity of the Markov chain model necessary to accurately model the data. If the estimated order is , then the Bernoulli loss model is accurate, if it is then the 2-state Markov chain model is accurate, and if it is or greater, then a Markov chain model of order is required. Testing Hypothesis 1 (that the binary loss data is independent at a given lag ) is discussed in Section 3.2. A test statistic for testing the hypothesis using the sample autocorrelation function is also discussed in Section 3.2. An alternative method to test the same hypothesis for independence of the binary loss data at a given lag, is by using the and a lag of is Chi-Square Test for independence ([12]). The test statistic for
where is the number of times occurs in the sample sequence and similarly, is the number of times occurs in the sample sequence. is the number of times and (that is, the number of times follows after a lag of ). is distributed as a chi-square distribution with degree of freedom. The order of the Markov chain process can be estimated by estimating the minimum lag beyond which the process is independent. This is similar to the estimation of correlation timescale from the sample autocorrelation function as discussed in Section 3.2. Thus we test Hypothesis 1 for lags from to . Let be the rst lag at which the hypothesis is not rejected. Then the estimated order is . Fig. 11 shows the estimated order of the Markov chain for the stationary segments of all the datasets. The two methods of testing the independence hypothesis (using the sample autocorrelation function and using the chi-square test for indepedence) give the same results. Figure 11 shows that, out of segments, The Bernoulli loss model is accurate for segments, the 2-state Markov chain model is accurate for segments and for the rest of the segments Markov chain models of higher orders are necessary for accurate modelling. The estimated order for datasets 20Nov97.uni and 12Dec97.multi ranges from to and that for datasets 20Dec97.uni, 21Dec97.uni and 26Jun98.uni ranges from to and is much higher than that estimated for the other datasets. We also checked the good run and loss run lengths for independence. It was found that, in most cases, as discussed in Section 3.3 the good run and the loss run lengths are, indeed, independent. Thus the good runs and the loss runs can be modelled separately as IID random variables. Both the Bernoulli and the 2-state Markov chain model model the good run and loss run processes as being independent. Next we look at the distributions of the good run and the loss runs lengths and check how well the two models

14
C
and

) W5 U
5 U 5 5 5 5

A
5 6

A

11 10 9 8
Frequency
7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 35 40 45
Order of the Markov Chain
Figure 11: Frequency Distribution of the Order of The Markov Chain Model
predict the empirical distributions. Both the Bernoulli model and the 2-state Markov chain model predict that the good run and the loss run lengths are distributed according to geometric distributions. As discussed in Section 3.3, the good run length distribution was found to be approximately geometrically distributed for traces 20Nov97.uni and 12Dec97.multi, which means that the 2-state Markov chain model is suitable for these traces. However, since the Chi-Square goodness-of-t test shows that the exponential distribution is not accurate for one-third of the segments, a Markov chain model with more states would more accurately model the good run lengths. The loss run lengths were found to be geometrically distributed in all cases. The traces 20Dec97.uni, 21Dec97.uni and 26Jun98.uni show a greater degree of correlation. The good run lengths are clearly not geometrically distributed as shown in Fig. 9 although the loss run lengths are geometrically distributed. Thus we conclude that neither the Bernoulli nor the 2-state Markov chain model are accurate for these traces and a greater number of states are necessary to accurately model the loss process. and For the stationary segments of 20Nov97.uni the estimated parameter, varies between with an average of , varied between and with an average of and varied between and with an average of . Table 4.2 summarizes the estimated model parameters (as discussed in Section 4.1). It shows the minimum, maximum and average over all stationary segments for each the dataset. For trace segments where the value of is higher than the value of , the packet loss is burstier than predicted by the Bernoulli model and the 2-state Markov chain model is more accurate.
5 On-Line Parameter Estimation

Adaptive applications monitor the congestion level in the network in order to adjust their transmission rate to keep the loss rate low and to share the available bandwidth fairly with other connections. We address two issues in the on-line estimation of loss for such applications. The rst issue is the amount of memory necessary for an accurate estimate of the Bernoulli model parameter (discussed in Section 5.1) and the the 2-state Markov chain model parameters (discussed in Section 5.2). The second issue is whether to use exponential smoothing or a sliding window average
15
X X
7
Table 3: Parameters for the Bernoulli and the 2-state Markov Chain Models Data 20Nov97.uni minimum 0.0005 0.0004 0.0208 maximum 0.0351 0.0347 0.0909 average 0.0162 0.0158 0.0471 12Dec97.multi minimum 0.0376 0.0373 0.0408 maximum 0.0735 0.0749 0.0636 average 0.0496 0.0496 0.0513 20Dec97.uni 0.0168 0.0141 0.1732 21Dec97.uni 0.0135 0.0109 0.2085 26Jun98.uni minimum 0.0204 0.0178 0.1452 maximum 0.0254 0.0218 0.1608 average 0.0222 0.0192 0.1546
to get up-to-date and accurate average loss rate estimates. This is discussed in Section 5.3
5.1 Memory for the Bernoulli Model Parameter

Equation 3 gives an estimator for the single Bernoulli loss model parameter, , which is equivalent to the mean loss rate. Assume that values of the binary loss data, are available for estimation of the parameter, . Henceforth, will be referred to as the memory size of . The condence interval for this estimator is

Fig. 12 shows the estimation of the mean loss rate for the example trace segment 20Nov97.uni-1 for memory sizes of , and plotted as a function of the packet sequence number. The graphs display the extent of and for a memory size of samples). variation in the estimated mean loss rate (it varies between For an accuracy of in the estimator, and a condence level of , the number of observations, that are required is
Note that this depends on the loss rate, . For example, for an accuracy of in the estimate of , the required memory size as calculated using (7) for a loss rate of is and the required memory size for a loss rate of is .
5.2 Memory for the 2-State Markov Chain Model Parameters

For the 2-state Markov chain model, there are two parameters to be estimated, as specied by 5 and as specied by 6. Instead of considering the parameter, directly consider the related parameter . Assume that a

16

) X

7

Fig. 13 shows the estimate of the value of for the example trace segment 20Nov97.uni-1 for memory sizes and plotted as a function of the packet sequence number. The graphs display the extent of variation of in the estimated value (it varies between and for a memory size of samples). Similarly, Fig. 14 shows the estimate of the value of for the example trace segment. The estimate of uctuates wildly especially . for a memory size of in the estimator of parameter , with a condence level of , the number of For an accuracy of observations, that are required is

Note that this depends on the fraction of packets lost, . For an absolute permissible error of in estimating , with a condence level of , the memory, required is
5.3 Exponential Smoothing versus Sliding Window Averaging

Adaptive applications need to estimate the average loss rate in an on-line manner in order to adjust to the congestion in the network. There could be two ways to estimate the loss rate in an on-line manner: Exponential smoothing and Sliding Window Averaging. The sliding window average gives the sample mean over a sliding window, say of size samples. Thus only the latest samples contribute to the current estimate ensuring that the older information is not
17
7 ) X 7 7
The memory required for an accuracy of
in the estimate of parameter is
7 ) X 7 7
) X
7
7 7

The memory required for an accuracy of
in the estimate of parameter is
7
7
7
) X
7 7 7

7 X 7 7 7

and the
condence interval for the estimate of parameter is
memory of values of the basic loss data, are available for estimation of the parameter, . Then the condence interval for the estimator for this parameter is

7 X 7 7 7
7 X 7 7 7

0.05 0.045 0.04
0.05
estimated parameter value: p
estimated mean value
overall mean averaging:500 averaging:1000 averaging:10000
0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005
overall p value est. p value:1000 est. p value:10000
0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10
4
packet sequence number
0 0
0.5
1.5
2.5
3.5
4.5 x 10
4
0.2 0.18
estimated parameter value: s
0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 0.5 1 1.5 2 2.5 3
overall s est. s:1000 est. s:10000
3.5
4.5 x 10
4
Figure 14: Estimating the parameter
for 20Nov97.uni-1 for memory sizes
and
18
Figure 12: Estimating the mean loss rate for 20Nov97.uni-1 using sliding window averaging for memory sizes of , and

Figure 13: Estimating the parameter 20Nov97.uni-1 for memory sizes of
for and
used. Restricting the memory size is important because of possible non-stationarities in the loss data. Exponential smoothing is a computationally quicker method for estimating the current mean and it also requires less buffer space. The question we address here is: whether sliding window averaging gives a signicantly better estimate of the loss rate in terms of both the variance of the estimator and the memory required to get an accurate estimate. As we will see the sliding window average is, indeed, a signicantly better estimator. For the same effective memory size, exponential smoothing produces an estimate with more than double the variance of the estimate produced by a sliding window estimate and for the same variance, it requires more than twice as much memory. In the analysis of the variance of the estimate we have assumed, for simplicity, that the loss process is IID (that is, we assume the Bernoulli model). In the presence of correlation, the variance of the sample mean estimator is higher than that calculated assuming independence. In order to compare the two styles of on-line mean estimation, we rst compute the effective memory size in . then for a gain of , the the case of exponential smoothing. Consider the loss data as a binary time series exponential smoothing estimator at time is
Thus all the previous samples are used but the weights given to the old samples decay geometrically with the age of the sample relative to the latest sample. at time , where , be . Let be a given number of Let the weight given to a sample observation, observations immediately older than the observation at time and let be the sum of the weights given to all observations as old as or older than the observation at time .
The effective memory, , is dened as the such that the total weight given to observations as old as or older than is restricted to a small fraction, , . That is, using (7), is calculated as the such that . The effective memory is dened as the memory, , for which the weight in the tail, is approximately equal to some fraction , . (for small values of a)
An important property of an estimator is the relative efciency dened as, its variance relative to the sliding window average (which is equivalent to the sample mean estimator). Let the population mean be and let the sample mean estimator (that is, the sliding window average) be . Then the variance of the sample mean estimator is
19
U )
) 7
For
, the effective memory,
is given by

A
7
A 6
7

7 5
(7)
(8)
0.05 0.045 0.04
0.05
overall mean slide. win. average:2000 exp. smoothing:2000
0.045 0.04
overall mean averaging: 1000 exp. smoothing: 2000
0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10
4
0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10
4
The relative efciency of the exponential smoothing estimator (relative to the sliding window average) is the ratio of the variances
The gain, can be calculated for a given effective memory size using (8). This means that the relative efciency of the exponential smoothing estimator with the effective memory ( ) equal to the memory of the sliding window average is . This implies that we see twice as much variance in the exponential smoothing estimator as we do with the sliding window averaging. This difference in variance can be seen in Fig. 15, which compares samples. The the estimation of the means for the example trace segment 20Nov97.uni-1 for memory size of mean estimator values are plotted versus the sequence number of the probe packets. The straight line is the overall mean, the solid line is the sliding window average and the dotted line is the exponential smoothing estimator values. Similarly, Fig. 16 shows the comparison of exponential smoothing estimator with effective memory with the sliding window averaging estimator with memory . Here the variance is almost the same since the exponential smoothing uses double the memory. Yet another way of comparing exponential smoothing and sliding window averaging is by considering the case when the gain of the exponential smoothing algorithm and the window size for the sliding window average are set such that the variances of the estimators are equal. In this case, in the exponential smoothing algorithm, the total weight given to samples older than the corresponding window size of the sliding window averaging algorithm, is approximately . Thus in order to get the same variance in the estimator the exponential smoothing algorithm constructs a weighted average with a weight of given to samples beyond the window.

20
A)
) )
7 7

)

7

where
is the population variance. The variance of the exponential smoothing estimator
7
Figure 15: Estimating the mean loss rate for 20Nov97.uni-1: comparison of exponential smoothing: and sliding window average:

Figure 16: Estimating the mean loss rate for 20Nov97.uni-1: comparison of exponential smoothing: and sliding window average: is
6 Conclusions and Future Work

We have presented measurements of Internet packet loss for unicast and multicast connections. We analyzed the traces for stationarity and identied hours of stationary trace segments. The loss data was represented both as a binary time series and as alternating sequences of good runs and loss runs. We checked the data for temporal dependence using both representations. Our study of the autocorrelation function of the binary time series representation, revealed that the correlation or less. The distributions of the good run lengths and the loss run lengths were timescale for these traces is found to be approximately geometrically distributed for the traces. For the and traces, the good run lengths were found to be not geometrically distributed, while the loss run lengths were geometrically distributed. We examined the accuracy of three models: the Bernoulli model, the 2-state Markov chain model and the Markov chain model of -th order, in terms of our analysis of temporal dependence. For the datasets with sampling intervals , the Bernoulli model was found to be accurate for segments, the 2-state Markov model for segments of and Markov chain models of orders to for the remaining . The Bernoulli and the 2-state Markov chain models are not accurate for the traces with sampling intervals of and and the estimated order of the Markov chain model ranged from to . For on-line loss estimation, we computed the required memory size for a given accuracy in the parameter estimate for both the average loss rate and the 2-state Markov chain model parameters. Also, we found that using the sliding window average for average loss rate estimation has advantages over exponential smoothing since it uses less of the old information for the same variance in the estimator. A richer collection of measurements with different sampling intervals and a variety of senders and receivers would allow a better understanding of the temporal dependence in the loss data. And it would be worthwhile to apply the models to protocol design and performance analysis.

Acknowledgments
We would like to thank Supratik Bhattacharyya, Timur Friedman, Christopher Rafael, Sanjeev Khudanpur and Joseph Horowitz for fruitful discussions and mathematical insights. We are grateful to Chuck Cranor of Washington Univ., St.Louis, Stephen Pink of Swedish Institute of Computer Science, Linda Prime of Univ. of Washington Peter Wan of Georgia Institute of Technology and Lixia Zhang of Univ. of California, Los Angeles for providing computer accounts which allowed us to take the measurements.
References
[1] A NDREN , J., AND H ILDING , M. Real-time Determination of Trafc Conditions over the Internet. Masters thesis, Chalmers University of Technology Gothenburg, Sweden, Jan. 1998. [2] B ILLINGSLEY, P. Statistical Inference for Markov Processes. The University of Chicago Press, 1961. [3] B ILLINGSLEY, P. Statistical Methods in Markov Chains. Ann. Math. Stat. 32 (1961), 1240. [4] B OLOT, J. End-to-end Packet Delay and Loss Behavior in the Internet. In SIGCOMM Symposium on Communications Architectures and Protocols (San Francisco, Sept. 1993), pp. 289298.
21

[5] B OLOT, J., C REPIN , H., AND G ARCIA , A. V. Analysis of Audio Packet Loss in the Internet. In Proceedings of the International Worskshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV) (Durham, New Hampshire, Apr. 1995), pp. 163174. [6] B OLOT, J., AND G ARCIA , A. V. Control Mechamisms for Packet audio in the Internet. In Proceedings of the Conference on Computer Communications (IEEE Infocom) (San Fransisco, CA, Apr. 1996), pp. 232239. [7] B OLOT, J., AND G ARCIA , A. V. The Case for FEC-Based Error Control for Packet Audio in the Internet. In Proceedings of ACM International Conference on Multimedia (Sept. 1996). [8] B OLOT, J., AND T URLETTI , T. Adaptive Error Control for Packet Video in the Internet. In ICIP (Lausanne, Switzerland, Sept. 1996). [9] B OLOT, J., AND T URLETTI , T. Experience with Rate Control Mechanisms for Packet Video in the Internet. In SIGCOMM Symposium on Communications Architectures and Protocols (Jan. 1998), pp. 45. [10] B ROCKWELL , P., AND DAVIS , R. Introduction to Time Series and Forecasting. Springer-Verlag, 1996. [11] H ANDLEY, M. An Examination of MBone Performance. Tech. Rep. ISI/RR-97-450, USC/ISI, January 1997. [12] L EVIN , R. Statistics for Management. Prentice-Hall, Inc., 1987. [13] M AXEMCHUK , N., AND L O , S. Measurement and Interpretation of Voice Trafc on the Iinternet. In Conference Record of the International Conference on Communications (ICC) (Montreal, Canada, June 1997). [14] YAJNIK , M., K UROSE , J., AND T OWSLEY, D. Packet Loss Correlation in the MBone Multicast Network: Experimental Measurements and Markov Chain Models. Tech. Rep. UMASS CMPSCI 95-115, University of Massachusetts, Amherst, MA, 1995. [15] YAJNIK , M., K UROSE , J., AND T OWSLEY, D. Packet Loss Correlation in the MBone Multicast Network. In IEEE Global Internet Mini-conference, part of GLOBECOM96 (London, England, Nov. 1996), pp. 9499.
22

(YMKT99) Measurement and Modeling of The Temporal Dependence in Packet Loss

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(YMKT99) Measurement and Modeling of The Temporal Dependence in Packet Loss

Uploaded by

Copyright:

Available Formats

Measurement and Modelling of the Temporal Dependence in Packet Loss

3000 4000 5000 Time in seconds

Figure 1: The Loss Percentage for a two-hour segment of 14Nov97.uni

14Nov97.uni 14Nov97.multi 20Nov97.uni 12Dec97.multi 20Dec97.uni 20Dec97.multi 21Dec97.uni 21Dec97.multi 26Jun98.uni

number of stationary segments none none 19 14 1 1 1 none 3

3.2 Autocorrelation of the Binary Loss Data

relation function for the stationary sequence of random variables

The sample autocovariance function, assuming stationarity is

is the lag. The sample autocorrelation function is

be the lag in terms of time. (2)

20Nov97.uni1 (160ms) 26Jun98.uni1 (40ms) 20Dec97.uni (20ms)

d: Lag in terms of time (ms)

Figure 2: Sample Autocorrelation Function of Binary Data for 20Nov97.uni-1

Figure 4: Frequency Distribution of Correlation Timescale

3.3 Good Run and Loss Run Lengths

0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0 10 20 30

autocorrelationgood autocorrelationloss crosscorrelation confidence bounds

0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4

autocorrelationgood autocorrelationloss crosscorrelation confidence bounds

Probability of the length being b

Probability of the length being b

observed Bernoulli model 2state model

observed Bernoulli model 2state model

Number of packets in the burst: b

Number of packets in the burst: b

Figure 7: The Distribution of Good Run Lengths for 20Nov97.uni-1

Figure 8: The Distribution of Loss Run Lengths for 20Nov97.uni-1

Probability of the length being b

Probability of the length being b

observed Bernoulli model 2state model

observed Bernoulli model 2state model

Number of packets in the burst: b

Number of packets in the burst: b

Figure 9: The Distribution of Good Run Lengths for 20Dec97.uni

Figure 10: The Distribution of Loss Run Lengths for 20Dec97.uni

The loss run length distribution for this model is

Two-state Markov Chain Model

with sampling intervals of

(3) is the number of

The loss run length distribution for this model is

Markov Chain Model of -th order

is a Markov chain of order , if the conditional probability

The maximum likelihood estimators (page

and for a sample trace are (5) (6)

4.2 Results for the Stationary Segments

Order of the Markov Chain

5 On-Line Parameter Estimation

5.1 Memory for the Bernoulli Model Parameter

5.2 Memory for the 2-State Markov Chain Model Parameters

5.3 Exponential Smoothing versus Sliding Window Averaging

The memory required for an accuracy of

in the estimate of parameter is

The memory required for an accuracy of

in the estimate of parameter is

condence interval for the estimate of parameter is

0.05 0.045 0.04

estimated parameter value: p

estimated mean value

overall mean averaging:500 averaging:1000 averaging:10000

0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005

overall p value est. p value:1000 est. p value:10000

packet sequence number