📄 statistics.texi
字号:
@deftypefn Statistics double gsl_stats_wvariance (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})This function returns the estimated variance of the dataset @var{data}with stride @var{stride} and length @var{n}, using the set of weights@var{w} with stride @var{wstride} and length @var{n}. The estimatedvariance of a weighted dataset is defined as,@tex\beforedisplay$$\Hat\sigma^2 = {{\sum w_i} \over {(\sum w_i)^2 - \sum (w_i^2)}} \sum w_i (x_i - \Hat\mu)^2$$\afterdisplay@end tex@ifinfo@example\Hat\sigma^2 = ((\sum w_i)/((\sum w_i)^2 - \sum (w_i^2))) \sum w_i (x_i - \Hat\mu)^2@end example@end ifinfo@noindentNote that this expression reduces to an unweighted variance with thefamiliar @math{1/(N-1)} factor when there are @math{N} equal non-zeroweights.@end deftypefn@deftypefn Statistics double gsl_stats_wvariance_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean})This function returns the estimated variance of the weighted dataset@var{data} using the given weighted mean @var{wmean}.@end deftypefn@deftypefn Statistics double gsl_stats_wsd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})The standard deviation is defined as the square root of the variance.This function returns the square root of the corresponding variancefunction @code{gsl_stats_wvariance} above.@end deftypefn@deftypefn Statistics double gsl_stats_wsd_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean})This function returns the square root of the corresponding variancefunction @code{gsl_stats_wvariance_m} above.@end deftypefn@deftypefn Statistics double gsl_stats_wvariance_with_fixed_mean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})This function computes an unbiased estimate of the variance of weighteddataset @var{data} when the population mean @var{mean} of the underlyingdistribution is known @emph{a priori}. In this case the estimator forthe variance replaces the sample mean @math{\Hat\mu} by the knownpopulation mean @math{\mu},@tex\beforedisplay$$\Hat\sigma^2 = {{\sum w_i (x_i - \mu)^2} \over {\sum w_i}}$$\afterdisplay@end tex@ifinfo@example\Hat\sigma^2 = (\sum w_i (x_i - \mu)^2) / (\sum w_i)@end example@end ifinfo@end deftypefn@deftypefn Statistics double gsl_stats_wsd_with_fixed_mean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})The standard deviation is defined as the square root of the variance.This function returns the square root of the corresponding variancefunction above.@end deftypefn@deftypefn Statistics double gsl_stats_wabsdev (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})This function computes the weighted absolute deviation from the weightedmean of @var{data}. The absolute deviation from the mean is defined as,@tex\beforedisplay$$absdev = {{\sum w_i |x_i - \Hat\mu|} \over {\sum w_i}}$$\afterdisplay@end tex@ifinfo@exampleabsdev = (\sum w_i |x_i - \Hat\mu|) / (\sum w_i)@end example@end ifinfo@end deftypefn@deftypefn Statistics double gsl_stats_wabsdev_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean})This function computes the absolute deviation of the weighted dataset@var{data} about the given weighted mean @var{wmean}.@end deftypefn@deftypefn Statistics double gsl_stats_wskew (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})This function computes the weighted skewness of the dataset @var{data}.@tex\beforedisplay$$skew = {{\sum w_i ((x_i - xbar)/\sigma)^3} \over {\sum w_i}}$$\afterdisplay@end tex@ifinfo@exampleskew = (\sum w_i ((x_i - xbar)/\sigma)^3) / (\sum w_i)@end example@end ifinfo@end deftypefn@deftypefn Statistics double gsl_stats_wskew_m_sd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}, double @var{wsd})This function computes the weighted skewness of the dataset @var{data}using the given values of the weighted mean and weighted standarddeviation, @var{wmean} and @var{wsd}.@end deftypefn@deftypefn Statistics double gsl_stats_wkurtosis (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n})This function computes the weighted kurtosis of the dataset @var{data}.@tex\beforedisplay$$kurtosis = {{\sum w_i ((x_i - xbar)/sigma)^4} \over {\sum w_i}} - 3$$\afterdisplay@end tex@ifinfo@examplekurtosis = ((\sum w_i ((x_i - xbar)/sigma)^4) / (\sum w_i)) - 3@end example@end ifinfo@end deftypefn@deftypefn Statistics double gsl_stats_wkurtosis_m_sd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}, double @var{wsd})This function computes the weighted kurtosis of the dataset @var{data}using the given values of the weighted mean and weighted standarddeviation, @var{wmean} and @var{wsd}.@end deftypefn@node Maximum and Minimum values@section Maximum and Minimum values@deftypefn Statistics double gsl_stats_max (const double @var{data}[], size_t @var{stride}, size_t @var{n})This function returns the maximum value in @var{data}, a dataset oflength @var{n} with stride @var{stride}. The maximum value is definedas the value of the element @math{x_i} which satisfies @c{$x_i \ge x_j$}@math{x_i >= x_j} for all @math{j}.If you want instead to find the element with the largest absolutemagnitude you will need to apply @code{fabs} or @code{abs} to your databefore calling this function.@end deftypefn@deftypefn Statistics double gsl_stats_min (const double @var{data}[], size_t @var{stride}, size_t @var{n})This function returns the minimum value in @var{data}, a dataset oflength @var{n} with stride @var{stride}. The minimum value is definedas the value of the element @math{x_i} which satisfies @c{$x_i \le x_j$}@math{x_i <= x_j} for all @math{j}.If you want instead to find the element with the smallest absolutemagnitude you will need to apply @code{fabs} or @code{abs} to your databefore calling this function.@end deftypefn@deftypefn Statistics void gsl_stats_minmax (double * @var{min}, double * @var{max}, const double @var{data}[], size_t @var{stride}, size_t @var{n})This function finds both the minimum and maximum values @var{min},@var{max} in @var{data} in a single pass.@end deftypefn@deftypefn Statistics size_t gsl_stats_max_index (const double @var{data}[], size_t @var{stride}, size_t @var{n})This function returns the index of the maximum value in @var{data}, adataset of length @var{n} with stride @var{stride}. The maximum value isdefined as the value of the element @math{x_i} which satisfies @c{$x_i \ge x_j$}@math{x_i >= x_j} for all @math{j}. When there are several equal maximumelements then the first one is chosen.@end deftypefn@deftypefn Statistics size_t gsl_stats_min_index (const double @var{data}[], size_t @var{stride}, size_t @var{n})This function returns the index of the minimum value in @var{data}, adataset of length @var{n} with stride @var{stride}. The minimum valueis defined as the value of the element @math{x_i} which satisfies@c{$x_i \ge x_j$}@math{x_i >= x_j} for all @math{j}. When there are several equalminimum elements then the first one is chosen.@end deftypefn@deftypefn Statistics void gsl_stats_minmax_index (size_t * @var{min_index}, size_t * @var{max_index}, const double @var{data}[], size_t @var{stride}, size_t @var{n})This function returns the indexes @var{min_index}, @var{max_index} ofthe minimum and maximum values in @var{data} in a single pass.@end deftypefn@node Median and Percentiles@section Median and PercentilesThe median and percentile functions described in this section operate onsorted data. For convenience we use @dfn{quantiles}, measured on a scaleof 0 to 1, instead of percentiles (which use a scale of 0 to 100).@deftypefn Statistics double gsl_stats_median_from_sorted_data (const double @var{sorted_data}[], size_t @var{stride}, size_t @var{n})This function returns the median value of @var{sorted_data}, a datasetof length @var{n} with stride @var{stride}. The elements of the arraymust be in ascending numerical order. There are no checks to seewhether the data are sorted, so the function @code{gsl_sort} shouldalways be used first.When the dataset has an odd number of elements the median is the valueof element @math{(n-1)/2}. When the dataset has an even number ofelements the median is the mean of the two nearest middle values,elements @math{(n-1)/2} and @math{n/2}. Since the algorithm forcomputing the median involves interpolation this function always returnsa floating-point number, even for integer data types.@end deftypefn@deftypefn Statistics double gsl_stats_quantile_from_sorted_data (const double @var{sorted_data}[], size_t @var{stride}, size_t @var{n}, double @var{f})This function returns a quantile value of @var{sorted_data}, adouble-precision array of length @var{n} with stride @var{stride}. Theelements of the array must be in ascending numerical order. Thequantile is determined by the @var{f}, a fraction between 0 and 1. Forexample, to compute the value of the 75th percentile @var{f} should havethe value 0.75.There are no checks to see whether the data are sorted, so the function@code{gsl_sort} should always be used first.The quantile is found by interpolation, using the formula@tex\beforedisplay$$\hbox{quantile} = (1 - \delta) x_i + \delta x_{i+1}$$\afterdisplay@end tex@ifinfo@examplequantile = (1 - \delta) x_i + \delta x_@{i+1@}@end example@end ifinfo@noindentwhere @math{i} is @code{floor}(@math{(n - 1)f}) and @math{\delta} is@math{(n-1)f - i}.Thus the minimum value of the array (@code{data[0*stride]}) is given by@var{f} equal to zero, the maximum value (@code{data[(n-1)*stride]}) isgiven by @var{f} equal to one and the median value is given by @var{f}equal to 0.5. Since the algorithm for computing quantiles involvesinterpolation this function always returns a floating-point number, evenfor integer data types.@end deftypefn@comment @node Statistical tests@comment @section Statistical tests@comment FIXME, do more work on the statistical tests@comment -@deftypefn Statistics double gsl_stats_ttest (const double @var{data1}[], double @var{data2}[], size_t @var{n1}, size_t @var{n2})@comment -@deftypefnx Statistics double gsl_stats_int_ttest (const double @var{data1}[], double @var{data2}[], size_t @var{n1}, size_t @var{n2})@comment The function @code{gsl_stats_ttest} computes the t-test statistic for@comment the two arrays @var{data1}[] and @var{data2}[], of lengths @var{n1} and@comment -@var{n2} respectively.@comment The t-test statistic measures the difference between the means of two@comment datasets.@node Example statistical programs@section Example statistical programsHere is a basic example of how to use the statistical functions:@example#include <stdio.h>#include <gsl/gsl_statistics.h>intmain(void)@{ double data[5] = @{17.2, 18.1, 16.5, 18.3, 12.6@}; double mean, variance, largest, smallest; mean = gsl_stats_mean(data, 1, 5); variance = gsl_stats_variance(data, 1, 5); largest = gsl_stats_max(data, 1, 5); smallest = gsl_stats_min(data, 1, 5); printf("The dataset is %g, %g, %g, %g, %g\n", data[0], data[1], data[2], data[3], data[4]); printf("The sample mean is %g\n", mean); printf("The estimated variance is %g\n", variance); printf("The largest value is %g\n", largest); printf("The smallest value is %g\n", smallest); return 0;@}@end exampleThe program should produce the following output,@exampleThe dataset is 17.2, 18.1, 16.5, 18.3, 12.6The sample mean is 16.54The estimated variance is 4.2984The largest value is 18.3The smallest value is 12.6@end exampleHere is an example using sorted data,@example#include <stdio.h>#include <gsl/gsl_sort.h>#include <gsl/gsl_statistics.h>intmain(void)@{ double data[5] = @{17.2, 18.1, 16.5, 18.3, 12.6@}; double median, upperq, lowerq; printf("Original dataset: %g, %g, %g, %g, %g\n", data[0], data[1], data[2], data[3], data[4]); gsl_sort (data, 1, 5); printf("Sorted dataset: %g, %g, %g, %g, %g\n", data[0], data[1], data[2], data[3], data[4]); median = gsl_stats_median_from_sorted_data (data, 1, 5); upperq = gsl_stats_quantile_from_sorted_data (data, 1, 5, 0.75); lowerq = gsl_stats_quantile_from_sorted_data (data, 1, 5, 0.25); printf("The median is %g\n", median); printf("The upper quartile is %g\n", upperq); printf("The lower quartile is %g\n", lowerq); return 0;@}@end exampleThis program should produce the following output,@exampleOriginal dataset: 17.2, 18.1, 16.5, 18.3, 12.6Sorted dataset: 12.6, 16.5, 17.2, 18.1, 18.3The median is 17.2The upper quartile is 18.1The lower quartile is 16.5@end example@node Statistics References and Further Reading@section References and Further Reading@noindentThe standard reference for almost any topic in statistics is themulti-volume @cite{Advanced Theory of Statistics} by Kendall and Stuart.@itemize @asis@itemMaurice Kendall, Alan Stuart, and J. Keith Ord.@cite{The Advanced Theory of Statistics} (multiple volumes)reprinted as @cite{Kendall's Advanced Theory of Statistics}.Wiley, ISBN 047023380X.@end itemize@noindentMany statistical concepts can be more easily understood by a Bayesianapproach. The following book by Gelman, Carlin, Stern and Rubin gives acomprehensive coverage of the subject.@itemize @asis@itemAndrew Gelman, John B. Carlin, Hal S. Stern, Donald B. Rubin.@cite{Bayesian Data Analysis}.Chapman & Hall, ISBN 0412039915.@end itemize@noindentFor physicists the Particle Data Group provides useful reviews ofProbability and Statistics in the "Mathematical Tools" section of itsAnnual Review of Particle Physics. @itemize @asis@item@cite{Review of Particle Properties}R.M. Barnett et al., Physical Review D54, 1 (1996)@end itemize@noindentThe Review of Particle Physics is available online at@url{http://pdg.lbl.gov/}.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -