📄 statistics.texi

📁 用于VC.net的gsl的lib库文件包
💻 TEXI
📖 第 1 页 / 共 2 页
字号:
12 下一页
@cindex statistics
@cindex mean
@cindex standard deviation
@cindex variance
@cindex estimated standard deviation
@cindex estimated variance
@cindex t-test
@cindex range
@cindex min
@cindex max

This chapter describes the statistical functions in the library.  The
basic statistical functions include routines to compute the mean,
variance and standard deviation.  More advanced functions allow you to
calculate absolute deviations, skewness, and kurtosis as well as the
median and arbitrary percentiles.  The algorithms use recurrence
relations to compute average quantities in a stable way, without large
intermediate values that might overflow. 

The functions are available in versions for datasets in the standard
floating-point and integer types.  The versions for double precision
floating-point data have the prefix @code{gsl_stats} and are declared in
the header file @file{gsl_statistics_double.h}.  The versions for integer
data have the prefix @code{gsl_stats_int} and are declared in the header
files @file{gsl_statistics_int.h}. 

@menu
* Mean and standard deviation and variance::  
* Absolute deviation::          
* Higher moments (skewness and kurtosis)::  
* Autocorrelation::             
* Covariance::                  
* Weighted Samples::            
* Maximum and Minimum values::  
* Median and Percentiles::      
* Example statistical programs::  
* Statistics References and Further Reading::  
@end menu

@node Mean and standard deviation and variance
@section Mean, Standard Deviation and Variance

@deftypefn Statistics double gsl_stats_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the arithmetic mean of @var{data}, a dataset of
length @var{n} with stride @var{stride}.  The arithmetic mean, or
@dfn{sample mean}, is denoted by @math{\Hat\mu} and defined as,

@tex
\beforedisplay
$$
{\Hat\mu} = {1 \over N} \sum x_i
$$
\afterdisplay
@end tex
@ifinfo
@example
\Hat\mu = (1/N) \sum x_i
@end example
@end ifinfo

@noindent
where @math{x_i} are the elements of the dataset @var{data}.  For
samples drawn from a gaussian distribution the variance of
@math{\Hat\mu} is @math{\sigma^2 / N}.
@end deftypefn

@deftypefn Statistics double gsl_stats_variance (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function returns the estimated, or @dfn{sample}, variance of
@var{data}, a dataset of length @var{n} with stride @var{stride}.  The
estimated variance is denoted by @math{\Hat\sigma^2} and is defined by,

@tex
\beforedisplay
$$
{\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - {\Hat\mu})^2
$$
\afterdisplay
@end tex
@ifinfo
@example
\Hat\sigma^2 = (1/(N-1)) \sum (x_i - \Hat\mu)^2
@end example
@end ifinfo

@noindent
where @math{x_i} are the elements of the dataset @var{data}.  Note that
the normalization factor of @math{1/(N-1)} results from the derivation
of @math{\Hat\sigma^2} as an unbiased estimator of the population
variance @math{\sigma^2}.  For samples drawn from a gaussian distribution
the variance of @math{\Hat\sigma^2} itself is @math{2 \sigma^4 / N}.

This function computes the mean via a call to @code{gsl_stats_mean}.  If
you have already computed the mean then you can pass it directly to
@code{gsl_stats_variance_m}.
@end deftypefn

@deftypefn Statistics double gsl_stats_variance_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
This function returns the sample variance of @var{data} relative to the
given value of @var{mean}.  The function is computed with @math{\Hat\mu}
replaced by the value of @var{mean} that you supply,

@tex
\beforedisplay
$$
{\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - mean)^2
$$
\afterdisplay
@end tex
@ifinfo
@example
\Hat\sigma^2 = (1/(N-1)) \sum (x_i - mean)^2
@end example
@end ifinfo
@end deftypefn

@deftypefn Statistics double gsl_stats_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n})
@deftypefnx Statistics double gsl_stats_sd_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
The standard deviation is defined as the square root of the variance.
These functions return the square root of the corresponding variance
functions above.
@end deftypefn

@deftypefn Statistics double gsl_stats_variance_with_fixed_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
This function computes an unbiased estimate of the variance of
@var{data} when the population mean @var{mean} of the underlying
distribution is known @emph{a priori}.  In this case the estimator for
the variance uses the factor @math{1/N} and the sample mean
@math{\Hat\mu} is replaced by the known population mean @math{\mu},

@tex
\beforedisplay
$$
{\Hat\sigma}^2 = {1 \over N} \sum (x_i - \mu)^2
$$
\afterdisplay
@end tex
@ifinfo
@example
\Hat\sigma^2 = (1/N) \sum (x_i - \mu)^2
@end example
@end ifinfo
@noindent
@end deftypefn


@deftypefn Statistics double gsl_stats_sd_with_fixed_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
This function calculates the standard deviation of @var{data} for a
fixed population mean @var{mean}.  The result is the square root of the
corresponding variance function.
@end deftypefn

@node Absolute deviation
@section Absolute deviation

@deftypefn Statistics double gsl_stats_absdev (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the absolute deviation from the mean of
@var{data}, a dataset of length @var{n} with stride @var{stride}.  The
absolute deviation from the mean is defined as,

@tex
\beforedisplay
$$
absdev  = {1 \over N} \sum |x_i - {\Hat\mu}|
$$
\afterdisplay
@end tex
@ifinfo
@example
absdev  = (1/N) \sum |x_i - \Hat\mu|
@end example
@end ifinfo

@noindent
where @math{x_i} are the elements of the dataset @var{data}.  The
absolute deviation from the mean provides a more robust measure of the
width of a distribution than the variance.  This function computes the
mean of @var{data} via a call to @code{gsl_stats_mean}.
@end deftypefn

@deftypefn Statistics double gsl_stats_absdev_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean})
This function computes the absolute deviation of the dataset @var{data}
relative to the given value of @var{mean},

@tex
\beforedisplay
$$
absdev  = {1 \over N} \sum |x_i - mean|
$$
\afterdisplay
@end tex
@ifinfo
@example
absdev  = (1/N) \sum |x_i - mean|
@end example
@end ifinfo

@noindent
This function is useful if you have already computed the mean of
@var{data} (and want to avoid recomputing it), or wish to calculate the
absolute deviation relative to another value (such as zero, or the
median).
@end deftypefn

@node Higher moments (skewness and kurtosis)
@section Higher moments (skewness and kurtosis)

@deftypefn Statistics double gsl_stats_skew (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the skewness of @var{data}, a dataset of length
@var{n} with stride @var{stride}.  The skewness is defined as,

@tex
\beforedisplay
$$
skew = {1 \over N} \sum 
 {\left( x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^3
$$
\afterdisplay
@end tex
@ifinfo
@example
skew = (1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^3
@end example
@end ifinfo

@noindent
where @math{x_i} are the elements of the dataset @var{data}.  The skewness
measures the asymmetry of the tails of a distribution.

The function computes the mean and estimated standard deviation of
@var{data} via calls to @code{gsl_stats_mean} and @code{gsl_stats_sd}.
@end deftypefn

@deftypefn Statistics double gsl_stats_skew_m_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}, double @var{sd})
This function computes the skewness of the dataset @var{data} using the
given values of the mean @var{mean} and standard deviation @var{sd},

@tex
\beforedisplay
$$
skew = {1 \over N}
     \sum {\left( x_i - mean \over sd \right)}^3
$$
\afterdisplay
@end tex
@ifinfo
@example
skew = (1/N) \sum ((x_i - mean)/sd)^3
@end example
@end ifinfo
@noindent
These functions are useful if you have already computed the mean and
standard deviation of @var{data} and want to avoid recomputing them.
@end deftypefn

@deftypefn Statistics double gsl_stats_kurtosis (const double @var{data}[], size_t @var{stride}, size_t @var{n})
This function computes the kurtosis of @var{data}, a dataset of length
@var{n} with stride @var{stride}.  The kurtosis is defined as,

@tex
\beforedisplay
$$
kurtosis = \left( {1 \over N} \sum 
 {\left(x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^4 
 \right) 
 - 3
$$
\afterdisplay
@end tex
@ifinfo
@example
kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4)  - 3
@end example
@end ifinfo

@noindent
The kurtosis measures how sharply peaked a distribution is, relative to
its width.  The kurtosis is normalized to zero for a gaussian
distribution.
@end deftypefn

@deftypefn Statistics double gsl_stats_kurtosis_m_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}, double @var{sd})
This function computes the kurtosis of the dataset @var{data} using the
given values of the mean @var{mean} and standard deviation @var{sd},

@tex
\beforedisplay
$$
kurtosis = {1 \over N}
  \left( \sum {\left(x_i - mean \over sd \right)}^4 \right) 
  - 3
$$
\afterdisplay
@end tex
@ifinfo
@example
kurtosis = ((1/N) \sum ((x_i - mean)/sd)^4) - 3
@end example
@end ifinfo
@noindent
This function is useful if you have already computed the mean and
standard deviation of @var{data} and want to avoid recomputing them.
@end deftypefn

@node Autocorrelation
@section Autocorrelation

@deftypefun double gsl_stats_lag1_autocorrelation (const double @var{data}[], const size_t @var{stride}, const size_t @var{n})
This function computes the lag-1 autocorrelation of the dataset @var{data}.

@tex
\beforedisplay
$$
a_1 = {\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i-1} - \Hat\mu)
\over
\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i} - \Hat\mu)}
$$
\afterdisplay
@end tex
@ifinfo
@example
a_1 = @{\sum_@{i = 1@}^@{n@} (x_@{i@} - \Hat\mu) (x_@{i-1@} - \Hat\mu)
       \over
       \sum_@{i = 1@}^@{n@} (x_@{i@} - \Hat\mu) (x_@{i@} - \Hat\mu)@}
@end example
@end ifinfo
@noindent
@end deftypefun


@deftypefun double gsl_stats_lag1_autocorrelation_m (const double @var{data}[], const size_t @var{stride}, const size_t @var{n}, const double @var{mean})
This function computes the lag-1 autocorrelation of the dataset
@var{data} using the given value of the mean @var{mean}.

@end deftypefun

@node Covariance
@section Covariance
@cindex covariance, of two datasets

@deftypefun double gsl_stats_covariance (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{stride2}, const size_t @var{n})
This function computes the covariance of the datasets @var{data1} and
@var{data2} which must both be of the same length @var{n}.

@tex
\beforedisplay
$$
covar = {1 \over (n - 1)} \sum_{i = 1}^{n} (x_{i} - \Hat x) (y_{i} - \Hat y)
$$
\afterdisplay
@end tex
@ifinfo
@example
covar = (1/(n - 1)) \sum_@{i = 1@}^@{n@} (x_i - \Hat x) (y_i - \Hat y)
@end example
@end ifinfo
@noindent
@end deftypefun

@deftypefun double gsl_stats_covariance_m (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{n}, const double @var{mean1}, const double @var{mean2})
This function computes the covariance of the datasets @var{data1} and
@var{data2} using the given values of the means, @var{mean1} and
@var{mean2}.  This is useful if you have already computed the means of
@var{data1} and @var{data2} and want to avoid recomputing them.
@end deftypefun
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -