📄 rfc2330.txt
字号:
It can be proved that if G(t) is an exponential distribution with rate lambda, that is G(t) = 1 - exp(-lambda * t) then the arrival of new samples *cannot* be predicted (and, again, the sampling is unbiased). Furthermore, the sampling is asymptotically unbiased even if the act of sampling affects the network's state. Such sampling is referred to as "Poisson sampling". It is not prone to inducing synchronization, it can be used to accurately collect measurements of periodic behavior, and it is not prone to manipulation by anticipating when new samples will occur. Because of these valuable properties, we in general prefer that samples of Internet measurements are gathered using Poisson sampling. {Comment: We note, however, that there may be circumstances that favor use of a different G(t). For example, the exponential distribution is unbounded, so its use will on occasion generate lengthy spaces between sampling times. We might instead desire to bound the longest such interval to a maximum value dT, to speed the convergence of the estimation derived from the sampling. This could be done by using G(t) = Unif(0, dT)Paxson, et. al. Informational [Page 21]RFC 2330 Framework for IP Performance Metrics May 1998 that is, the uniform distribution between 0 and dT. This sampling, of course, becomes highly predictable if an interval of nearly length dT has elapsed without a sample occurring.} In its purest form, Poisson sampling is done by generating independent, exponentially distributed intervals and gathering a single measurement after each interval has elapsed. It can be shown that if starting at time T one performs Poisson sampling over an interval dT, during which a total of N measurements happen to be made, then those measurements will be uniformly distributed over the interval [T, T+dT]. So another way of conducting Poisson sampling is to pick dT and N and generate N random sampling times uniformly over the interval [T, T+dT]. The two approaches are equivalent, except if N and dT are externally known. In that case, the property of not being able to predict measurement times is weakened (the other properties still hold). The N/dT approach has an advantage that dealing with fixed values of N and dT can be simpler than dealing with a fixed lambda but variable numbers of measurements over variably-sized intervals.11.1.2. Geometric Sampling Closely related to Poisson sampling is "geometric sampling", in which external events are measured with a fixed probability p. For example, one might capture all the packets over a link but only record the packet to a trace file if a randomly generated number uniformly distributed between 0 and 1 is less than a given p. Geometric sampling has the same properties of being unbiased and not predictable in advance as Poisson sampling, so if it fits a particular Internet measurement task, it too is sound. See [CPB93] for more discussion.11.1.3. Generating Poisson Sampling Intervals To generate Poisson sampling intervals, one first determines the rate lambda at which the singleton measurements will on average be made (e.g., for an average sampling interval of 30 seconds, we have lambda = 1/30, if the units of time are seconds). One then generates a series of exponentially-distributed (pseudo) random numbers E1, E2, ..., En. The first measurement is made at time E1, the next at time E1+E2, and so on. One technique for generating exponentially-distributed (pseudo) random numbers is based on the ability to generate U1, U2, ..., Un, (pseudo) random numbers that are uniformly distributed between 0 and 1. Many computers provide libraries that can do this. Given suchPaxson, et. al. Informational [Page 22]RFC 2330 Framework for IP Performance Metrics May 1998 Ui, to generate Ei one uses: Ei = -log(Ui) / lambda where log(Ui) is the natural logarithm of Ui. {Comment: This technique is an instance of the more general "inverse transform" method for generating random numbers with a given distribution.} Implementation details: There are at least three different methods for approximating Poisson sampling, which we describe here as Methods 1 through 3. Method 1 is the easiest to implement and has the most error, and method 3 is the most difficult to implement and has the least error (potentially none). Method 1 is to proceed as follows: 1. Generate E1 and wait that long. 2. Perform a measurement. 3. Generate E2 and wait that long. 4. Perform a measurement. 5. Generate E3 and wait that long. 6. Perform a measurement ... The problem with this approach is that the "Perform a measurement" steps themselves take time, so the sampling is not done at times E1, E1+E2, etc., but rather at E1, E1+M1+E2, etc., where Mi is the amount of time required for the i'th measurement. If Mi is very small compared to 1/lambda then the potential error introduced by this technique is likewise small. As Mi becomes a non-negligible fraction of 1/lambda, the potential error increases. Method 2 attempts to correct this error by taking into account the amount of time required by the measurements (i.e., the Mi's) and adjusting the waiting intervals accordingly: 1. Generate E1 and wait that long. 2. Perform a measurement and measure M1, the time it took to do so. 3. Generate E2 and wait for a time E2-M1. 4. Perform a measurement and measure M2 .. This approach works fine as long as E{i+1} >= Mi. But if E{i+1} < Mi then it is impossible to wait the proper amount of time. (Note that this case corresponds to needing to perform two measurements simultaneously.)Paxson, et. al. Informational [Page 23]RFC 2330 Framework for IP Performance Metrics May 1998 Method 3 is generating a schedule of measurement times E1, E1+E2, etc., and then sticking to it: 1. Generate E1, E2, ..., En. 2. Compute measurement times T1, T2, ..., Tn, as Ti = E1 + ... + Ei. 3. Arrange that at times T1, T2, ..., Tn, a measurement is made. By allowing simultaneous measurements, Method 3 avoids the shortcomings of Methods 1 and 2. If, however, simultaneous measurements interfere with one another, then Method 3 does not gain any benefit and may actually prove worse than Methods 1 or 2. For Internet phenomena, it is not known to what degree the inaccuracies of these methods are significant. If the Mi's are much less than 1/lambda, then any of the three should suffice. If the Mi's are less than 1/lambda but perhaps not greatly less, then Method 2 is preferred to Method 1. If simultaneous measurements do not interfere with one another, then Method 3 is preferred, though it can be considerably harder to implement.11.2. Self-Consistency A fundamental requirement for a sound measurement methodology is that measurement be made using as few unconfirmed assumptions as possible. Experience has painfully shown how easy it is to make an (often implicit) assumption that turns out to be incorrect. An example is incorporating into a measurement the reading of a clock synchronized to a highly accurate source. It is easy to assume that the clock is therefore accurate; but due to software bugs, a loss of power in the source, or a loss of communication between the source and the clock, the clock could actually be quite inaccurate. This is not to argue that one must not make *any* assumptions when measuring, but rather that, to the extent which is practical, assumptions should be tested. One powerful way for doing so involves checking for self-consistency. Such checking applies both to the observed value(s) of the measurement *and the values used by the measurement process itself*. A simple example of the former is that when computing a round trip time, one should check to see if it is negative. Since negative time intervals are non-physical, if it ever is negative that finding immediately flags an error. *These sorts of errors should then be investigated!* It is crucial to determine where the error lies, because only by doing so diligently can we build up faith in a methodology's fundamental soundness. For example, it could be that the round trip time is negative because during the measurement the clock was set backward in the process of synchronizing it with another source. But it could also be that thePaxson, et. al. Informational [Page 24]RFC 2330 Framework for IP Performance Metrics May 1998 measurement program accesses uninitialized memory in one of its computations and, only very rarely, that leads to a bogus computation. This second error is more serious, if the same program is used by others to perform the same measurement, since then they too will suffer from incorrect results. Furthermore, once uncovered it can be completely fixed. A more subtle example of testing for self-consistency comes from gathering samples of one-way Internet delays. If one has a large sample of such delays, it may well be highly telling to, for example, fit a line to the pairs of (time of measurement, measured delay), to see if the resulting line has a clearly non-zero slope. If so, a possible interpretation is that one of the clocks used in the measurements is skewed relative to the other. Another interpretation is that the slope is actually due to genuine network effects. Determining which is indeed the case will often be highly illuminating. (See [Pa97] for a discussion of distinguishing between relative clock skew and genuine network effects.) Furthermore, if making this check is part of the methodology, then a finding that the long-term slope is very near zero is positive evidence that the measurements are probably not biased by a difference in skew. A final example illustrates checking the measurement process itself for self-consistency. Above we outline Poisson sampling techniques, based on generating exponentially-distributed intervals. A sound measurement methodology would include testing the generated intervals to see whether they are indeed exponentially distributed (and also to see if they suffer from correlation). In the appendix we discuss and give C code for one such technique, a general-purpose, well-regarded goodness-of-fit test called the Anderson-Darling test. Finally, we note that what is truly relevant for Poisson sampling of Internet metrics is often not when the measurements began but the wire times corresponding to the measurement process. These could well be different, due to complications on the hosts used to perform the measurement. Thus, even those with complete faith in their pseudo-random number generators and subsequent algorithms are encouraged to consider how they might test the assumptions of each measurement procedure as much as possible.11.3. Defining Statistical Distributions One way of describing a collection of measurements (a sample) is as a statistical distribution -- informally, as percentiles. There are several slightly different ways of doing so. In this section we define a standard definition to give uniformity to these descriptions.Paxson, et. al. Informational [Page 25]RFC 2330 Framework for IP Performance Metrics May 1998 The "empirical distribution function" (EDF) of a set of scalar measurements is a function F(x) which for any x gives the fractional proportion of the total measurements that were <= x. If x is less than the minimum value observed, then F(x) is 0. If it is greater or equal to the maximum value observed, then F(x) is 1. For example, given the 6 measurements: -2, 7, 7, 4, 18, -5 Then F(-8) = 0, F(-5) = 1/6, F(-5.0001) = 0, F(-4.999) = 1/6, F(7) = 5/6, F(18) = 1, F(239) = 1. Note that we can recover the different measured values and how many times each occurred from F(x) -- no information regarding the range in values is lost. Summarizing measurements using histograms, on the other hand, in general loses information about the different values observed, so the EDF is preferred. Using either the EDF or a histogram, however, we do lose information regarding the order in which the values were observed. Whether this loss is potentially significant will depend on the metric being measured. We will use the term "percentile" to refer to the smallest val
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -