Professional Documents
Culture Documents
Maya Yajnik, Sue Moon, Jim Kurose and Don Towsley Department of Computer Science University of Massachusetts Amherst, MA 01003 USA emails:[yajnik,sbmoon,kurose,towsley]@cs.umass.edu
UMASS CMPSCI Technical Report # 98-78 Abstract Understanding and modelling packet loss in the Internet is especially relevant for the design and analysis of delay-sensitive multimedia applications. In this paper, we present analysis of hours of end-to-end unicast and hours of stationary traces for further analysis. We multicast packet loss measurement. From these we selected consider the dependence as seen in the autocorrelation function of the original loss data as well as the dependence between good run lengths and loss run lengths. The correlation timescale is found to be or less. We evaluate the accuracy of three models of increasing complexity: the Bernoulli model, the 2-state Markov chain model and the -th order Markov chain model. Out of the trace segments considered, the Bernoulli model was found to be accurate for segments, the 2-state model was found to be accurate for segments. A Markov chain model of order or greater was found to be necessary to accurately model the rest of the segments. For the case of adaptive applications which track loss, we address two issues of on-line loss estimation: the required memory size and whether to use exponential smoothing or a sliding window average to estimate average loss rate. We nd that a large memory size is necessary and that the sliding window average provides a more accurate estimate for the same effective memory size. Keywords : measurement, modelling, loss, temporal dependence, correlation.
This work was supported by the National Science Foundation under grant NCR 9508274 and by Defense Advanced Research Projects Agency under agreement F30602-98-2-0238. Any opinions, ndings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reect the views of the National Science Foundation.
1 Introduction
Packet loss is a key factor determining the quality seen by delay-sensitive multimedia applications such as audio/video conferencing and Internet telephony. These applications experience degradation in quality with increasing loss and delay in the network. Understanding the loss seen by such applications is important in their design and performance analysis. Adaptive audio/video applications adjust their transmission rate according to the perceived congestion level in the network (see [6, 8, 9]). By this adjustment a suitable loss level can be maintained and bandwidth can be shared fairly between connections. For such adaptive applications it is important to have simple loss models that can be parameterized in an on-line manner. In this paper, we present careful analysis of end-to-end loss using hours of measurements. The measurements are of loss as seen by packet probes sent at regular intervals (of , , and ) sent on both unicast and multicast Internet connections. Since signicant non-stationary effects are seen in the data, we divide the traces into two-hour segments and check each segment for stationarity. Gradual decrease and increase in the mean loss rate, abrupt and dramatic increase in loss rate for a few minutes and spikes of high loss rate are observed. We analyze hours of stationary trace segments to determine the extent of temporal dependence in the data. The results show that the correlation timescale, the time over which what happens to one packet is connected to what happens to another packet, is approximately second or less. Beyond this timescale, packet losses are independent of each other. We also consider loss models of increasing complexity and estimate the level of complexity necessary to accurately model the stationary trace segments. The loss process is modelled as a -state discrete-time Markov chain corresponds to the Bernoulli model and corresponds to a 2-state Markov chain model. model where is refered to as the order of the Markov chain. For the datasets with sampling intervals of , the Bernoulli model was found to be accurate for segments, the 2-state Markov model for segments and Markov chain models of orders to for the remaining . The Bernoulli and the 2-state Markov chain models are not accurate for the traces with sampling intervals of and and the estimated order of the Markov chain model ranged from to . Adaptive multimedia applications often track the congestion level in the network in order to adjust the rate of transmission to share the bandwidth fairly and to maintain a low enough loss level ([6, 9]). They can use information about loss in the network over a period of time, instead of the traditional per-packet loss information. For such situations of on-line loss estimation, it is useful to know how much past information is necessary to accurately estimate the parameters of the loss models. The non-stationarity observed in the data shows that the loss rate does vary over time, sometimes dramatically, and thus up-to-date information is important. Because of the need of adaptive applications for loss rate estimates, we address the question of how much memory is necessary for the on-line parameter estimator to provide an accurate enough estimate. In general, it is found that the variance in the estimator due to limited number of samples is high and thus large numbers of samples are required to get an accurate estimate. We also address the question of whether or not exponential smoothing provides a better estimate of the mean loss rate than the sliding window average. Though computationally quicker and requiring less buffer space, exponential smoothing needs more than twice as many samples to provide an estimate with the same accuracy as the sliding window average. Earlier works on the measurement of unicast and multicast packet loss in the Internet ([1, 4, 5, 14, 15]) have noted that the number of consecutively lost packets is small. In [15] long outages of several seconds and minutes were also observed on the MBone (multicast backbone network overlaid on the Internet). In [13] measurements of voice trafc are analyzed to assess the effects of strategies to compensate for varying delay and loss on the quality of the voice connection. In [1] statistical analysis of temporal correlation in loss data has been described and weak correlation has been noted. The use of discrete-time Markov chain models, particularly the 2-state Markov chain
Date 1 2 3 4 5 6 7 8 9 14Nov97 14Nov97 20Nov97 12Dec97 20Dec97 20Dec97 21Dec97 21Dec97 26Jun98
Type unicast multicast unicast multicast unicast multicast unicast multicast unicast
Table 1: Trace Descriptions Sampling Destination Interval ( ) 80ms SICS, Sweden 80ms SICS, Sweden 160ms SICS, Sweden 160ms St. Louis, Missouri 20ms Seattle, Washington 20ms Seattle, Washington 20ms Los Angeles, California 20ms Los Angeles, California 40ms Atlanta, Georgia
Time 09:52 09:53 16:03 14:23 13:41 13:49 13:17 13:26 16:56
Duration Rate 8hr 8hr 2days 2days 2hr 27min 2hr 27min 2hr 2hr 6hr
Loss 2.70% 11.03% 1.91% 5.24% 1.69% 3.76% 1.38% 3.37% 2.22%
model (sometimes called the Gilbert model) has been proposed in [14, 7, 8, 11]. Discrete-time Markov chain models of increasing levels of complexity, including the 2-state Markov chain model have been described in [14]. This paper takes a more careful look at both temporal dependence and the validity of using various models for packet loss using hours of stationary data. The remainder of this paper is structured as follows. In Section 2 we describe how the data was collected. In Section 3 we rst describe two ways of representing the data and then go on to discuss our analysis of stationarity and the temporal dependence in the data. The models and an evaluation of their accuracy is discussed in Section 4. In Section 5 we discuss the memory size required for on-line estimation of model parameters and also the relative suitability of exponential smoothing and sliding window averaging as ways of estimating the average loss rate. Finally, the conclusions are given in Section 6.
2 Measurement
Our study of end-to-end loss in the Internet is based on hours of traces. These traces were gathered by sending out packet probes along unicast and multicast end-to-end connections at periodic intervals (of , , or ms) and recording the sequence numbers of the probe packets that arrived successfully at the receiver. The packets whose sequence numbers were not recorded were assumed to have been lost. Thus the loss data can be considered to be a binary time series where takes the value if the probe packet arrived successfully and the value if it was lost. The interval at which the probe packets are sent out into the network is referred to as the sampling interval ( ) for the rest of the paper. The measurements are summarized in Table 1. The source was located at Amherst, Massachusetts for all the traces. Note that one of multicast trace (not shown in the table) displayed an extremely high loss rate of and was therefore disregarded for analysis. We refer to the datasets using the notation Date.Type (for example, 20Nov97.uni).
30
25
Loss percentage
20
15
10
0 0
1000
2000
6000
7000
8000
3 Analysis
We consider two ways of representing the loss data. The rst is as a binary time series where takes the value if the probe packet arrived successfully and the value if it was lost. The time series is a discrete-valued time series which takes on values in the set . A second way of representing the data is as alternating sequences of good runs and loss runs. The trace can be divided into portions of consecutive s and portions of consecutive s. The good runs are the lengths (expressed in number of packets) of the portions of the trace which consist solely of consecutive s. Similarly, the loss runs are the lengths of the portions of the traces which consist solely of consecutive s. Thus the data can be considered to be and where is the good run length and the interleaving of the two sequences of observations: is the corresponding loss run length. The remainder of this section is structured as follows. We discuss our analysis of the traces for stationarity in Section 3.1. We nd that approximately hours exhibit stationarity. In Section 3.2 we focus on the temporal dependence in these stationary trace segments as shown by the sample autocorrelation function of the binary time series representation of the data. We discuss our analysis of the temporal dependence using the good run and loss run representation of the data in Section 3.3.
3.1 Stationarity
By looking at the smoothed trend of the loss data, obvious non-stationarity was found in most of the traces. Therefore, the traces are divided into approximately two-hour segments and each segment was checked for stationarity. For the rest of the paper we refer to the datasets using the notation Date.Type-Segment# (for example, 14Nov97.uni3). A time series is said to be stationary in the strict-sense, if the statistical properties remain constant over the entire series. A time series is said to be stationary in the wide-sense (also called weak stationarity) if the mean and covariance function remain constant over time. Since there is no good way to rigorously test for stationarity,
Dataset 1 2 3 4 5 6 7 8 9
Table 2: Stationarity sampling seginterval ments 80ms 80ms 160ms 160ms 20ms 20ms 20ms 20ms 40ms 4 4 24 25 1 1 1 1 3
we checked whether the average loss rate varied signicantly in the trace segments. We plotted the smoothed trace packets) to judge the extent to which the average loss using a nite moving average lter (of window size of in the average loss rate seen in the rate varied over the length of the trace. Abrupt increases of greater than smoothed trace, were taken to be an indication of non-stationarity. Also, to test for gradual increase or decrease in the average loss rate, we t a straight line to the data (a straight line which minimizes the least-square error) as an estimate of the linear trend. A total change in the average loss rate of or greater, over a hour trace segment, as estimated by the least-squares straight line, was considered to be an indication of non-stationarity. There are different kinds of observed non-stationary effects. For example, consider the smoothed loss for the third segment of trace 14Nov97.uni in Fig. 1. In the rst half of the segment there is a slow, linear decay in loss to over one hour. Such slow decays and increases were noticed in other traces as well. percentage from Also, in this case there is an abrupt, dramatic increase from loss to loss which lasts for approximately 10 minutes before abruptly dropping to a lower loss rate. Such abrupt increases were noticed in other traces although this trace segment shows the most dramatic increases which last the longest time. Table 2 summarizes the results of our analysis for stationarity. Traces 20Dec97.multi and 21Dec97.multi exhibit periodic behavior (with a period of ) making them difcult to analyze and model. Thus we identied out of two hour segments as exhibiting stationary behavior. The stationary segments of ve traces: 20Nov97.uni, 12Dec97.multi, 20Dec97.uni, 21Dec97.uni and 26Jun98.uni have been analyzed for temporal dependence in the following sections.
where
where is the sampling interval. Correlation timescale, is the minimum lag, in terms of time, such that is uncorrelated at all lags, , . In the case of an independent stochastic process, the autocorrelation function is zero, and if the stochastic process , is zero at that lag. However, in the case of an is independent at a lag , then the autocorrelation function, observed sample sequence which is the realization of an independent stochastic process, the sample autocorrelation function could have non-zero, though very small, values. Proposition 1 gives an idea of what constitutes a signicant value for the sample autocorrelation function for a given number of samples, . Proposition 1 For large , the sample autocorrelations of an IID sequence with nite variance are approximately IID with a normal distribution (mean of and variance of )(see [10]). Hence if is a realization of such an IID sequence then, approximately of the sample autocorrelations should fall between the condence bounds . This proposition species what constitutes a signicant value for the sample autocorrelation function for a given number of samples. The correlation timescale is the smallest lag in terms of time, , at and beyond which the value of the sample autocorrelation function becomes small enough to be considered insignicant. Using Proposition 1 we can test the following hypothesis: Hypothesis 1 is independent at lag
The test statistic is calculated from the sample autocorrelation function at lag , of the sample sequence,
R S IQ Q
5 V U
The lag can also be expressed in terms of time. That is, let
5 D D HGFE5
5 D D HPIE5
$ 4
) 2 3
where is the mean and is the lag. The sample mean for observed data
is
& '%
(1) as
! & ) "10#
$# ! (
is dened as
! "
5
97 8 @97 8
B B @98 A 7 C 6 95 B #A B
W5 V T U
@ 7
5 aU X Y `
0.2
observed bounds
0.2
0.15
Autocorrelation
Autocorrelation
0.1
0.1
0.05
0.05
0
0 500 1000 1500 2000
10
20
30
40
50
Lag
follows.
If the hypothesis is true, then has a limiting standard normal distribution. Thus, for example, if then the hypothesis is rejected at signicance level . The results for the loss data show that there is some correlation at small lags, and that the sample autocorrelation function decays rapidly. As an example, consider the rst segment of trace 20Nov97.uni, 20Nov97.uni-1 with sampling interval of . Fig. 2 shows the autocorrelation function of the binary loss data. A value close to zero of the sample autocorrelation function, for a lag of , indicates independence between and . Since there are a nite number of samples, the condence bounds of show the range of values which are close enough to zero to indicate independence. In the gure, the condence bounds ( ) are plotted as dotted lines. The autocorrelation at lag (that is, the correlation between consecutive packets) is approximately . It is clear that from a lag of onwards the autocorrelation function becomes insignicantly small except for a few stray points. This means that the data is dependent up to a lag of 2 and that the correlation timescale for this trace segment is . We analyzed all of the stationary segments of our traces. The traces, 20Nov97.uni and 12Dec97.multi had sampling intervals of , the traces 20Dec97.uni and 21Dec97.uni had sampling intervals of and the trace 26Jun98.uni had a sampling interval of . For the trace segments with sampling intervals of , the sample autocorrelation functions were similar to that described in the example trace segment 20Nov97.uni-1. That is, the autocorrelation function for lag 1 is usually signicant and approximately and the function decays to insignicant values for lags of around and more. When the lag is expressed in terms of time (refer to 2) , this corresponds to a correlation timescale of or less. The correlation between consecutive packets for datasets 20Nov97.uni and 12Dec97.multi is found to vary between and with an average of . The behavior of the sample autocorrelation function for traces with sampling intervals of and is somewhat different from that for traces with sampling intervals of . For the traces, the autocorrelation
X
C
5 ` U X Y
Figure 3: Sample Autocorrelation Function for three trace segments with sampling intervals of , and
5 U X Y `
5 ` U @@7
12
10
Frequency
160
320
480
640
800
960
1120
Correlation Timescale in ms
function at lag 1 is approximately which is much higher than the value of approximately for the trace segments. This indicates much higher correlation between consecutive packets for sampling intervals of than is present for traces with . The trace segments with sampling intervals of showed a similarly high at lag of . The autocorrelation functions for the traces with sampling interval autocorrelation of approximately remains signicant for lags of or less, and for the traces with sampling intervals of it remains signicant for lags of or less. Fig. 3 shows the sample autocorrelation as a function of lag expressed in terms of time (2), for three trace segments each with different sampling intervals. Here the chosen trace segments are 20Nov97.uni1, 26Jun98.uni and 20Dec97.uni with sampling intervals of , and respectively. Thus from this gure it is clear that the correlation timescale is or less. The correlation timescale of a data segment is estimated as the smallest lag (expressed in terms of time) for which the Hypothesis 1 is rejected (that is, as the smallest lag at which the time series is independent). A signicance level is used. This method could underestimate the correlation timescale, since the autocorrelation function could of show dependence at some higher lag, but we have ignored this effect. The results for all the stationary trace segments are summarized in Fig. 4 as a frequency distribution of the correlation timescale estimated for each data segment. Here we can see that the correlation timescale is less than except in two cases. For comparison with traces with sampling intervals of , the traces 20Dec97.uni and 21Dec97.uni were partitioned into subtraces, each subtrace corresponding to a trace with sampling interval of . Also, the segments of trace 26Jun98.uni was partitioned into subtraces, each subtrace corresponding to a trace with sampling interval of . As expected, the sample autocorrelation functions of the subtraces are similar to those . seen for the traces with sampling intervals of Thus, the results indicate a correlation timescale of 1 second or less over all the stationary trace segments. Also, this suggests that loss may not be adequately modelled by an IID process such as the Bernoulli process since the traces exhibit some correlation at small lags. The 2-state Markov chain model is able to accurately model the correlation for lag 1 but since the correlation predicted decays rapidly for larger lags, it is inaccurate with respect to modelling the correlation for lags greater than 1.
0.5
0.5
Autocorrelation/ Crosscorrelation
Autocorrelation/ Crosscorrelation
0.4
40
50
0.5 0
10
20
30
40
50
Lag
Lag
Figure 5: Autocorrelations and Crosscorrelation of Good Run and Loss Run Lengths for 20Nov97.uni-1
Figure 6: Autocorrelations and Crosscorrelation of Good Run and Loss Run Lengths for 20Dec97.uni
20Dec97.uni. The distribution of good run lengths shows geometric decay except for the high probability for burst lengths of less than . The distribution of loss run lengths for this trace shown in Fig. 8, shows approximately geometric decay. The traces 21Dec97.uni and 26Jun98.uni show similar results. Since the good run lengths for 20Nov97.uni and 12Dec97.multi appear to be geometrically distributed, we use the Chi-Square goodness-of-t test to provide more systematic evidence that the distributions are geometric. Instead of the discrete-valued geometric distribution we use its continuous-valued equivalent, the exponential distribution. We also compute the shape and rate parameters assuming a Gamma distribution for the good run lengths. , where is the coefcient of variation) of the Gamma distribution gives an The shape parameter ( idea of how close the distribution is to the exponential distribution. A shape parameter of corresponds to an exponential distribution, which is a special case of the Gamma distribution.The shape parameter for the segments of 20Nov97.uni varied between and with an average of . The shape parameter for the segments of 12Dec97.multi varied between and with an average of . The Chi-Square goodness-of-t test is used to check whether to reject the hypothesis that the good run lengths are IID random variables with the given distribution. Using the Chi-Square goodness-of-t test, the following hypothesis can be tested for both the Gamma and the Exponential distributions. Hypothesis 2 The good run lengths are IID random variables with the given distribution function. It is difcult to similarly use the Chi-Square goodness-of-t test for the loss run lengths because the lengths of the loss run lengths vary within a very small range. The continuous-value distributions were used to make it easier to determine the partitions for the test. For traces 20Nov97.uni and 12Dec97.multi, the hypothesis is rejected for the exponential distribution for out of segments and for the Gamma distribution it was rejected for out of segments. Thus we conclude that the good run lengths are approximately geometrically distributed for the traces 20Nov97.uni and 12Dec97.multi, although the t is not exact for one-third of the trace segments. For comparison with the traces with sampling intervals of , the traces 20Dec97.uni, 21Dec97.uni and 26Jun98.uni were partitioned into subtraces, each subtrace corresponding to a trace with sampling intervals of . These subtraces were then analyzed. As expected, the results are similar to those seen for the rst two traces
10
X X
) V U
10
10
10
10
10
50
100
150
200
250
300
10
10
10
10
10
10
10
10
10
50
100
150
200
250
300
10
10
11
4 Modelling
In this section we discuss the potential of using three models of increasing complexity: the Bernoulli loss model, the 2-state Markov chain model and the -th order Markov chain model for modelling the data. The loss process which characterizes the binary loss data (as described in Section 3) is considered to be a stationary discrete-time stochastic process taking values in the set . The order of the process is the number of previous values of the process on which the current value depends and can be considered as a measure of the complexity of the model. For an order of , there are states. The Bernoulli model is the simplest model with order of and no states. The 2-state Markov chain model is of order . The Markov chain model of -th order is a generalization of these two models and has states. First, in Section 4.1 we describe the models and associated parameter estimators. Then in Section 4.2 we assess the validity of the models and the results for the stationary trace segments.
4.1 Models
4.1.1 Bernoulli Loss Model
, is IID (independent and identically disIn the Bernoulli loss model, the sequence of random variables, tributed). That is, the probability of being either or is independent of all other values of the time series and the probabilities are the same irrespective of . Thus this model is characterized by a single parameter, , the probability of being a (value corresponds to a lost packet). It can be estimated from a sample trace by
is the number of times the value occurs in the observed time series, where samples in the time series. The good run length distribution for this model is
and
4.1.2
In the 2-state Markov chain model the loss process is modelled as a discrete-time Markov chain with two states. The current state, of the stochastic process is depends only by the previous value, . Unlike the Bernoulli model, this model is able to capture the dependence between consecutive losses but has an additional parameter. This model can be characterized by two parameters, and , the transition probabilities between the two states.
12
A &
5
A 7 7
A 7
5
(4)
where is the number of times in the observed time series that follows a and is the number of times follows . is the number of times a occurs in the trace and is the number of times a occurs in the trace. The good run length distribution for this model is
Note that if equals this 2-state Markov chain model is equivalent to the Bernoulli model, since it means that the probability of loss is independent of which state the process is in. Also, the relative value of and provides us us information about how bursty the losses are. If then a lost packet is more likely provided that the previous packet was lost than when the previous packet was successfully received. On the other hand, indicates that loss is more likely if the previous packet was not lost.
4.1.3
A class of random processes that is rich enough to capture a large variety of temporal dependencies is the class of nite-state Markov processes. The Bernoulli model and the 2-state Markov chain model are special cases of this class of models. In the Markov chain model, the current state of the process depends on a certain number of previous values of the process. This number of previous values is is the order of the process. Such a process is characterized by its order , and by a conditional probability matrix whose rows may ) according to which, the next random variable is be interpreted as probability mass functions on ( generated when the process is in a state .
is independent of for all (see [3]). Note that the Bernoulli and the 2-state Markov chain model are special cases of the Markov chain model corresponding to orders of and respectively. Let be an observed sequence or a sample path from a Markov source. The -th order state transition probabilities of the Markov chain can be estimated for all as follows. The number of states is and the number of conditional probabilities is .
13
&
A
A
A process
A A &
& A
5
5
A 7
A 7
of [2]) of
&
7
A A
A
Let be a given state of the chain. Let be the number of times state is followed by value in the sample sequence. Let be the number of times state is seen. Let be an estimate of the probability that , given that . Then estimates the state transition probability from state to state . The maximum likelihood estimators of the state transition probabilities of the -th order Markov chain are if
otherwise
where is the number of times occurs in the sample sequence and similarly, is the number of times occurs in the sample sequence. is the number of times and (that is, the number of times follows after a lag of ). is distributed as a chi-square distribution with degree of freedom. The order of the Markov chain process can be estimated by estimating the minimum lag beyond which the process is independent. This is similar to the estimation of correlation timescale from the sample autocorrelation function as discussed in Section 3.2. Thus we test Hypothesis 1 for lags from to . Let be the rst lag at which the hypothesis is not rejected. Then the estimated order is . Fig. 11 shows the estimated order of the Markov chain for the stationary segments of all the datasets. The two methods of testing the independence hypothesis (using the sample autocorrelation function and using the chi-square test for indepedence) give the same results. Figure 11 shows that, out of segments, The Bernoulli loss model is accurate for segments, the 2-state Markov chain model is accurate for segments and for the rest of the segments Markov chain models of higher orders are necessary for accurate modelling. The estimated order for datasets 20Nov97.uni and 12Dec97.multi ranges from to and that for datasets 20Dec97.uni, 21Dec97.uni and 26Jun98.uni ranges from to and is much higher than that estimated for the other datasets. We also checked the good run and loss run lengths for independence. It was found that, in most cases, as discussed in Section 3.3 the good run and the loss run lengths are, indeed, independent. Thus the good runs and the loss runs can be modelled separately as IID random variables. Both the Bernoulli and the 2-state Markov chain model model the good run and loss run processes as being independent. Next we look at the distributions of the good run and the loss runs lengths and check how well the two models
14
C
and
) W5 U
5 U 5 5 5 5
A
5 6
A
11 10 9 8
Frequency
7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 35 40 45
Figure 11: Frequency Distribution of the Order of The Markov Chain Model
predict the empirical distributions. Both the Bernoulli model and the 2-state Markov chain model predict that the good run and the loss run lengths are distributed according to geometric distributions. As discussed in Section 3.3, the good run length distribution was found to be approximately geometrically distributed for traces 20Nov97.uni and 12Dec97.multi, which means that the 2-state Markov chain model is suitable for these traces. However, since the Chi-Square goodness-of-t test shows that the exponential distribution is not accurate for one-third of the segments, a Markov chain model with more states would more accurately model the good run lengths. The loss run lengths were found to be geometrically distributed in all cases. The traces 20Dec97.uni, 21Dec97.uni and 26Jun98.uni show a greater degree of correlation. The good run lengths are clearly not geometrically distributed as shown in Fig. 9 although the loss run lengths are geometrically distributed. Thus we conclude that neither the Bernoulli nor the 2-state Markov chain model are accurate for these traces and a greater number of states are necessary to accurately model the loss process. and For the stationary segments of 20Nov97.uni the estimated parameter, varies between with an average of , varied between and with an average of and varied between and with an average of . Table 4.2 summarizes the estimated model parameters (as discussed in Section 4.1). It shows the minimum, maximum and average over all stationary segments for each the dataset. For trace segments where the value of is higher than the value of , the packet loss is burstier than predicted by the Bernoulli model and the 2-state Markov chain model is more accurate.
15
X X
7
Table 3: Parameters for the Bernoulli and the 2-state Markov Chain Models Data 20Nov97.uni minimum 0.0005 0.0004 0.0208 maximum 0.0351 0.0347 0.0909 average 0.0162 0.0158 0.0471 12Dec97.multi minimum 0.0376 0.0373 0.0408 maximum 0.0735 0.0749 0.0636 average 0.0496 0.0496 0.0513 20Dec97.uni 0.0168 0.0141 0.1732 21Dec97.uni 0.0135 0.0109 0.2085 26Jun98.uni minimum 0.0204 0.0178 0.1452 maximum 0.0254 0.0218 0.1608 average 0.0222 0.0192 0.1546
to get up-to-date and accurate average loss rate estimates. This is discussed in Section 5.3
Fig. 12 shows the estimation of the mean loss rate for the example trace segment 20Nov97.uni-1 for memory sizes of , and plotted as a function of the packet sequence number. The graphs display the extent of and for a memory size of samples). variation in the estimated mean loss rate (it varies between For an accuracy of in the estimator, and a condence level of , the number of observations, that are required is
Note that this depends on the loss rate, . For example, for an accuracy of in the estimate of , the required memory size as calculated using (7) for a loss rate of is and the required memory size for a loss rate of is .
16
) X
7
Fig. 13 shows the estimate of the value of for the example trace segment 20Nov97.uni-1 for memory sizes and plotted as a function of the packet sequence number. The graphs display the extent of variation of in the estimated value (it varies between and for a memory size of samples). Similarly, Fig. 14 shows the estimate of the value of for the example trace segment. The estimate of uctuates wildly especially . for a memory size of in the estimator of parameter , with a condence level of , the number of For an accuracy of observations, that are required is
Note that this depends on the fraction of packets lost, . For an absolute permissible error of in estimating , with a condence level of , the memory, required is
17
7 ) X 7 7
7 ) X 7 7
) X
7
7 7
7
7
7
) X
7 7 7
7 X 7 7 7
and the
memory of values of the basic loss data, are available for estimation of the parameter, . Then the condence interval for the estimator for this parameter is
7 X 7 7 7
7 X 7 7 7
0.05
0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10
4
0 0
0.5
1.5
2.5
3.5
4.5 x 10
4
0.2 0.18
0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 0.5 1 1.5 2 2.5 3
3.5
4.5 x 10
4
and
18
Figure 12: Estimating the mean loss rate for 20Nov97.uni-1 using sliding window averaging for memory sizes of , and
for and
used. Restricting the memory size is important because of possible non-stationarities in the loss data. Exponential smoothing is a computationally quicker method for estimating the current mean and it also requires less buffer space. The question we address here is: whether sliding window averaging gives a signicantly better estimate of the loss rate in terms of both the variance of the estimator and the memory required to get an accurate estimate. As we will see the sliding window average is, indeed, a signicantly better estimator. For the same effective memory size, exponential smoothing produces an estimate with more than double the variance of the estimate produced by a sliding window estimate and for the same variance, it requires more than twice as much memory. In the analysis of the variance of the estimate we have assumed, for simplicity, that the loss process is IID (that is, we assume the Bernoulli model). In the presence of correlation, the variance of the sample mean estimator is higher than that calculated assuming independence. In order to compare the two styles of on-line mean estimation, we rst compute the effective memory size in . then for a gain of , the the case of exponential smoothing. Consider the loss data as a binary time series exponential smoothing estimator at time is
Thus all the previous samples are used but the weights given to the old samples decay geometrically with the age of the sample relative to the latest sample. at time , where , be . Let be a given number of Let the weight given to a sample observation, observations immediately older than the observation at time and let be the sum of the weights given to all observations as old as or older than the observation at time .
The effective memory, , is dened as the such that the total weight given to observations as old as or older than is restricted to a small fraction, , . That is, using (7), is calculated as the such that . The effective memory is dened as the memory, , for which the weight in the tail, is approximately equal to some fraction , . (for small values of a)
An important property of an estimator is the relative efciency dened as, its variance relative to the sliding window average (which is equivalent to the sample mean estimator). Let the population mean be and let the sample mean estimator (that is, the sliding window average) be . Then the variance of the sample mean estimator is
19
U )
) 7
For
is given by
A
7
A 6
7
7 5
(7)
(8)
0.05
0.045 0.04
0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10
4
0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10
4
The relative efciency of the exponential smoothing estimator (relative to the sliding window average) is the ratio of the variances
The gain, can be calculated for a given effective memory size using (8). This means that the relative efciency of the exponential smoothing estimator with the effective memory ( ) equal to the memory of the sliding window average is . This implies that we see twice as much variance in the exponential smoothing estimator as we do with the sliding window averaging. This difference in variance can be seen in Fig. 15, which compares samples. The the estimation of the means for the example trace segment 20Nov97.uni-1 for memory size of mean estimator values are plotted versus the sequence number of the probe packets. The straight line is the overall mean, the solid line is the sliding window average and the dotted line is the exponential smoothing estimator values. Similarly, Fig. 16 shows the comparison of exponential smoothing estimator with effective memory with the sliding window averaging estimator with memory . Here the variance is almost the same since the exponential smoothing uses double the memory. Yet another way of comparing exponential smoothing and sliding window averaging is by considering the case when the gain of the exponential smoothing algorithm and the window size for the sliding window average are set such that the variances of the estimators are equal. In this case, in the exponential smoothing algorithm, the total weight given to samples older than the corresponding window size of the sliding window averaging algorithm, is approximately . Thus in order to get the same variance in the estimator the exponential smoothing algorithm constructs a weighted average with a weight of given to samples beyond the window.
20
A)
) )
7 7
)
7
where
7
Figure 15: Estimating the mean loss rate for 20Nov97.uni-1: comparison of exponential smoothing: and sliding window average:
Figure 16: Estimating the mean loss rate for 20Nov97.uni-1: comparison of exponential smoothing: and sliding window average: is
Acknowledgments
We would like to thank Supratik Bhattacharyya, Timur Friedman, Christopher Rafael, Sanjeev Khudanpur and Joseph Horowitz for fruitful discussions and mathematical insights. We are grateful to Chuck Cranor of Washington Univ., St.Louis, Stephen Pink of Swedish Institute of Computer Science, Linda Prime of Univ. of Washington Peter Wan of Georgia Institute of Technology and Lixia Zhang of Univ. of California, Los Angeles for providing computer accounts which allowed us to take the measurements.
References
[1] A NDREN , J., AND H ILDING , M. Real-time Determination of Trafc Conditions over the Internet. Masters thesis, Chalmers University of Technology Gothenburg, Sweden, Jan. 1998. [2] B ILLINGSLEY, P. Statistical Inference for Markov Processes. The University of Chicago Press, 1961. [3] B ILLINGSLEY, P. Statistical Methods in Markov Chains. Ann. Math. Stat. 32 (1961), 1240. [4] B OLOT, J. End-to-end Packet Delay and Loss Behavior in the Internet. In SIGCOMM Symposium on Communications Architectures and Protocols (San Francisco, Sept. 1993), pp. 289298.
21
[5] B OLOT, J., C REPIN , H., AND G ARCIA , A. V. Analysis of Audio Packet Loss in the Internet. In Proceedings of the International Worskshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV) (Durham, New Hampshire, Apr. 1995), pp. 163174. [6] B OLOT, J., AND G ARCIA , A. V. Control Mechamisms for Packet audio in the Internet. In Proceedings of the Conference on Computer Communications (IEEE Infocom) (San Fransisco, CA, Apr. 1996), pp. 232239. [7] B OLOT, J., AND G ARCIA , A. V. The Case for FEC-Based Error Control for Packet Audio in the Internet. In Proceedings of ACM International Conference on Multimedia (Sept. 1996). [8] B OLOT, J., AND T URLETTI , T. Adaptive Error Control for Packet Video in the Internet. In ICIP (Lausanne, Switzerland, Sept. 1996). [9] B OLOT, J., AND T URLETTI , T. Experience with Rate Control Mechanisms for Packet Video in the Internet. In SIGCOMM Symposium on Communications Architectures and Protocols (Jan. 1998), pp. 45. [10] B ROCKWELL , P., AND DAVIS , R. Introduction to Time Series and Forecasting. Springer-Verlag, 1996. [11] H ANDLEY, M. An Examination of MBone Performance. Tech. Rep. ISI/RR-97-450, USC/ISI, January 1997. [12] L EVIN , R. Statistics for Management. Prentice-Hall, Inc., 1987. [13] M AXEMCHUK , N., AND L O , S. Measurement and Interpretation of Voice Trafc on the Iinternet. In Conference Record of the International Conference on Communications (ICC) (Montreal, Canada, June 1997). [14] YAJNIK , M., K UROSE , J., AND T OWSLEY, D. Packet Loss Correlation in the MBone Multicast Network: Experimental Measurements and Markov Chain Models. Tech. Rep. UMASS CMPSCI 95-115, University of Massachusetts, Amherst, MA, 1995. [15] YAJNIK , M., K UROSE , J., AND T OWSLEY, D. Packet Loss Correlation in the MBone Multicast Network. In IEEE Global Internet Mini-conference, part of GLOBECOM96 (London, England, Nov. 1996), pp. 9499.
22