📄 rfc2330.txt

📁 著名的RFC文档,其中有一些文档是已经翻译成中文的的.
💻 TXT
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
   It can be proved that if G(t) is an exponential distribution with   rate lambda, that is       G(t) = 1 - exp(-lambda * t)   then the arrival of new samples *cannot* be predicted (and, again,   the sampling is unbiased).  Furthermore, the sampling is   asymptotically unbiased even if the act of sampling affects the   network's state.  Such sampling is referred to as "Poisson sampling".   It is not prone to inducing synchronization, it can be used to   accurately collect measurements of periodic behavior, and it is not   prone to manipulation by anticipating when new samples will occur.   Because of these valuable properties, we in general prefer that   samples of Internet measurements are gathered using Poisson sampling.   {Comment: We note, however, that there may be circumstances that   favor use of a different G(t).  For example, the exponential   distribution is unbounded, so its use will on occasion generate   lengthy spaces between sampling times.  We might instead desire to   bound the longest such interval to a maximum value dT, to speed the   convergence of the estimation derived from the sampling.  This could   be done by using       G(t) = Unif(0, dT)Paxson, et. al.              Informational                     [Page 21]RFC 2330          Framework for IP Performance Metrics          May 1998   that is, the uniform distribution between 0 and dT.  This sampling,   of course, becomes highly predictable if an interval of nearly length   dT has elapsed without a sample occurring.}   In its purest form, Poisson sampling is done by generating   independent, exponentially distributed intervals and gathering a   single measurement after each interval has elapsed.  It can be shown   that if starting at time T one performs Poisson sampling over an   interval dT, during which a total of N measurements happen to be   made, then those measurements will be uniformly distributed over the   interval [T, T+dT].  So another way of conducting Poisson sampling is   to pick dT and N and generate N random sampling times uniformly over   the interval [T, T+dT].  The two approaches are equivalent, except if   N and dT are externally known.  In that case, the property of not   being able to predict measurement times is weakened (the other   properties still hold).  The N/dT approach has an advantage that   dealing with fixed values of N and dT can be simpler than dealing   with a fixed lambda but variable numbers of measurements over   variably-sized intervals.11.1.2. Geometric Sampling   Closely related to Poisson sampling is "geometric sampling", in which   external events are measured with a fixed probability p.  For   example, one might capture all the packets over a link but only   record the packet to a trace file if a randomly generated number   uniformly distributed between 0 and 1 is less than a given p.   Geometric sampling has the same properties of being unbiased and not   predictable in advance as Poisson sampling, so if it fits a   particular Internet measurement task, it too is sound.  See [CPB93]   for more discussion.11.1.3. Generating Poisson Sampling Intervals   To generate Poisson sampling intervals, one first determines the rate   lambda at which the singleton measurements will on average be made   (e.g., for an average sampling interval of 30 seconds, we have lambda   = 1/30, if the units of time are seconds).  One then generates a   series of exponentially-distributed (pseudo) random numbers E1, E2,   ..., En.  The first measurement is made at time E1, the next at time   E1+E2, and so on.   One technique for generating exponentially-distributed (pseudo)   random numbers is based on the ability to generate U1, U2, ..., Un,   (pseudo) random numbers that are uniformly distributed between 0 and   1.  Many computers provide libraries that can do this.  Given suchPaxson, et. al.              Informational                     [Page 22]RFC 2330          Framework for IP Performance Metrics          May 1998   Ui, to generate Ei one uses:       Ei = -log(Ui) / lambda   where log(Ui) is the natural logarithm of Ui.  {Comment: This   technique is an instance of the more general "inverse transform"   method for generating random numbers with a given distribution.}   Implementation details:   There are at least three different methods for approximating Poisson   sampling, which we describe here as Methods 1 through 3.  Method 1 is   the easiest to implement and has the most error, and method 3 is the   most difficult to implement and has the least error (potentially   none).   Method 1 is to proceed as follows:   1.  Generate E1 and wait that long.   2.  Perform a measurement.   3.  Generate E2 and wait that long.   4.  Perform a measurement.   5.  Generate E3 and wait that long.   6.  Perform a measurement ...   The problem with this approach is that the "Perform a measurement"   steps themselves take time, so the sampling is not done at times E1,   E1+E2, etc., but rather at E1, E1+M1+E2, etc., where Mi is the amount   of time required for the i'th measurement.  If Mi is very small   compared to 1/lambda then the potential error introduced by this   technique is likewise small.  As Mi becomes a non-negligible fraction   of 1/lambda, the potential error increases.   Method 2 attempts to correct this error by taking into account the   amount of time required by the measurements (i.e., the Mi's) and   adjusting the waiting intervals accordingly:   1.  Generate E1 and wait that long.   2.  Perform a measurement and measure M1, the time it took to do so.   3.  Generate E2 and wait for a time E2-M1.   4.  Perform a measurement and measure M2 ..   This approach works fine as long as E{i+1} >= Mi.  But if E{i+1} < Mi   then it is impossible to wait the proper amount of time.  (Note that   this case corresponds to needing to perform two measurements   simultaneously.)Paxson, et. al.              Informational                     [Page 23]RFC 2330          Framework for IP Performance Metrics          May 1998   Method 3 is generating a schedule of measurement times E1, E1+E2,   etc., and then sticking to it:   1.  Generate E1, E2, ..., En.   2.  Compute measurement times T1, T2, ..., Tn, as Ti = E1 + ... + Ei.   3.  Arrange that at times T1, T2, ..., Tn, a measurement is made.   By allowing simultaneous measurements, Method 3 avoids the   shortcomings of Methods 1 and 2.  If, however, simultaneous   measurements interfere with one another, then Method 3 does not gain   any benefit and may actually prove worse than Methods 1 or 2.   For Internet phenomena, it is not known to what degree the   inaccuracies of these methods are significant.  If the Mi's are much   less than 1/lambda, then any of the three should suffice.  If the   Mi's are less than 1/lambda but perhaps not greatly less, then Method   2 is preferred to Method 1.  If simultaneous measurements do not   interfere with one another, then Method 3 is preferred, though it can   be considerably harder to implement.11.2. Self-Consistency   A fundamental requirement for a sound measurement methodology is that   measurement be made using as few unconfirmed assumptions as possible.   Experience has painfully shown how easy it is to make an (often   implicit) assumption that turns out to be incorrect.  An example is   incorporating into a measurement the reading of a clock synchronized   to a highly accurate source.  It is easy to assume that the clock is   therefore accurate; but due to software bugs, a loss of power in the   source, or a loss of communication between the source and the clock,   the clock could actually be quite inaccurate.   This is not to argue that one must not make *any* assumptions when   measuring, but rather that, to the extent which is practical,   assumptions should be tested.  One powerful way for doing so involves   checking for self-consistency.  Such checking applies both to the   observed value(s) of the measurement *and the values used by the   measurement process itself*.  A simple example of the former is that   when computing a round trip time, one should check to see if it is   negative.  Since negative time intervals are non-physical, if it ever   is negative that finding immediately flags an error.  *These sorts of   errors should then be investigated!* It is crucial to determine where   the error lies, because only by doing so diligently can we build up   faith in a methodology's fundamental soundness.  For example, it   could be that the round trip time is negative because during the   measurement the clock was set backward in the process of   synchronizing it with another source.  But it could also be that thePaxson, et. al.              Informational                     [Page 24]RFC 2330          Framework for IP Performance Metrics          May 1998   measurement program accesses uninitialized memory in one of its   computations and, only very rarely, that leads to a bogus   computation.  This second error is more serious, if the same program   is used by others to perform the same measurement, since then they   too will suffer from incorrect results.  Furthermore, once uncovered   it can be completely fixed.   A more subtle example of testing for self-consistency comes from   gathering samples of one-way Internet delays.  If one has a large   sample of such delays, it may well be highly telling to, for example,   fit a line to the pairs of (time of measurement, measured delay), to   see if the resulting line has a clearly non-zero slope.  If so, a   possible interpretation is that one of the clocks used in the   measurements is skewed relative to the other.  Another interpretation   is that the slope is actually due to genuine network effects.   Determining which is indeed the case will often be highly   illuminating.  (See [Pa97] for a discussion of distinguishing between   relative clock skew and genuine network effects.) Furthermore, if   making this check is part of the methodology, then a finding that the   long-term slope is very near zero is positive evidence that the   measurements are probably not biased by a difference in skew.   A final example illustrates checking the measurement process itself   for self-consistency.  Above we outline Poisson sampling techniques,   based on generating exponentially-distributed intervals.  A sound   measurement methodology would include testing the generated intervals   to see whether they are indeed exponentially distributed (and also to   see if they suffer from correlation).  In the appendix we discuss and   give C code for one such technique, a general-purpose, well-regarded   goodness-of-fit test called the Anderson-Darling test.   Finally, we note that what is truly relevant for Poisson sampling of   Internet metrics is often not when the measurements began but the   wire times corresponding to the measurement process.  These could   well be different, due to complications on the hosts used to perform   the measurement.  Thus, even those with complete faith in their   pseudo-random number generators and subsequent algorithms are   encouraged to consider how they might test the assumptions of each   measurement procedure as much as possible.11.3. Defining Statistical Distributions   One way of describing a collection of measurements (a sample) is as a   statistical distribution -- informally, as percentiles.  There are   several slightly different ways of doing so.  In this section we   define a standard definition to give uniformity to these   descriptions.Paxson, et. al.              Informational                     [Page 25]RFC 2330          Framework for IP Performance Metrics          May 1998   The "empirical distribution function" (EDF) of a set of scalar   measurements is a function F(x) which for any x gives the fractional   proportion of the total measurements that were <= x.  If x is less   than the minimum value observed, then F(x) is 0.  If it is greater or   equal to the maximum value observed, then F(x) is 1.   For example, given the 6 measurements:   -2, 7, 7, 4, 18, -5   Then F(-8) = 0, F(-5) = 1/6, F(-5.0001) = 0, F(-4.999) = 1/6, F(7) =   5/6, F(18) = 1, F(239) = 1.   Note that we can recover the different measured values and how many   times each occurred from F(x) -- no information regarding the range   in values is lost.  Summarizing measurements using histograms, on the   other hand, in general loses information about the different values   observed, so the EDF is preferred.   Using either the EDF or a histogram, however, we do lose information   regarding the order in which the values were observed.  Whether this   loss is potentially significant will depend on the metric being   measured.   We will use the term "percentile" to refer to the smallest val
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -