📄 students_t_examples.qbk

📁 Boost provides free peer-reviewed portable C++ source libraries. We emphasize libraries that work
💻 QBK
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
for example at the 95% level, 14 measurements in total for a two-sided test.[endsect][section:two_sample_students_t Comparing the means of two samples with the Students-t test]Imagine that we have two samples, and we wish to determine whethertheir means are different or not.  This situation often arises whendetermining whether a new process or treatment is better than an old one.In this example, we'll be using the [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda3531.htmCar Mileage sample data] from the [@http://www.itl.nist.gov NIST website].  The data comparesmiles per gallon of US cars with miles per gallon of Japanese cars.The sample code is in [@../../../example/students_t_two_samples.cpp students_t_two_samples.cpp].There are two ways in which this test can be conducted: we can assumethat the true standard deviations of the two samples are equal or not.If the standard deviations are assumed to be equal, then the calculationof the t-statistic is greatly simplified, so we'll examine that case first.In real life we should verify whether this assumption is valid with a Chi-Squared test for equal variances.We begin by defining a procedure that will conduct our test assuming equalvariances:   // Needed headers:   #include <boost/math/distributions/students_t.hpp>   #include <iostream>   #include <iomanip>   // Simplify usage:   using namespace boost::math;   using namespace std;   void two_samples_t_test_equal_sd(           double Sm1,       // Sm1 = Sample 1 Mean.           double Sd1,       // Sd1 = Sample 1 Standard Deviation.           unsigned Sn1,     // Sn1 = Sample 1 Size.           double Sm2,       // Sm2 = Sample 2 Mean.           double Sd2,       // Sd2 = Sample 2 Standard Deviation.           unsigned Sn2,     // Sn2 = Sample 2 Size.           double alpha)     // alpha = Significance Level.   {      Our procedure will begin by calculating the t-statistic, assumingequal variances the needed formulae are:[equation dist_tutorial1]where Sp is the "pooled" standard deviation of the two samples, and /v/ is the number of degrees of freedom of the two combinedsamples.  We can now write the code to calculate the t-statistic:   // Degrees of freedom:   double v = Sn1 + Sn2 - 2;   cout << setw(55) << left << "Degrees of Freedom" << "=  " << v << "\n";   // Pooled variance:   double sp = sqrt(((Sn1-1) * Sd1 * Sd1 + (Sn2-1) * Sd2 * Sd2) / v);   cout << setw(55) << left << "Pooled Standard Deviation" << "=  " << v << "\n";   // t-statistic:   double t_stat = (Sm1 - Sm2) / (sp * sqrt(1.0 / Sn1 + 1.0 / Sn2));   cout << setw(55) << left << "T Statistic" << "=  " << t_stat << "\n";   The next step is to define our distribution object, and calculate thecomplement of the probability:   students_t dist(v);   double q = cdf(complement(dist, fabs(t_stat)));   cout << setw(55) << left << "Probability that difference is due to chance" << "=  "       << setprecision(3) << scientific << 2 * q << "\n\n";Here we've used the absolute value of the t-statistic, because we initiallywant to know simply whether there is a difference or not (a two-sided test).However, we can also test whether the mean of the second sample is greateror is less (one-sided test) than that of the first:all the possible tests are summed up in the following table:[table[[Hypothesis][Test]][[The Null-hypothesis: there is*no difference* in means]  [Reject if complement of CDF for |t| < significance level / 2:`cdf(complement(dist, fabs(t))) < alpha / 2`]][[The Alternative-hypothesis: there is a*difference* in means]  [Reject if complement of CDF for |t| > significance level / 2:`cdf(complement(dist, fabs(t))) < alpha / 2`]][[The Alternative-hypothesis: Sample 1 Mean is *less* thanSample 2 Mean.]  [Reject if CDF of t > significance level:`cdf(dist, t) > alpha`]][[The Alternative-hypothesis: Sample 1 Mean is *greater* thanSample 2 Mean.]  [Reject if complement of CDF of t > significance level:`cdf(complement(dist, t)) > alpha`]]][noteFor a two-sided test we must compare against alpha / 2 and not alpha.]Most of the rest of the sample program is pretty-printing, so we'llskip over that, and take a look at the sample output for alpha=0.05(a 95% probability level).  For comparison the dataplot outputfor the same data is in [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htmsection 1.3.5.3] of the __handbook.[pre'''   ________________________________________________   Student t test for two samples (equal variances)   ________________________________________________   Number of Observations (Sample 1)                      =  249   Sample 1 Mean                                          =  20.14458   Sample 1 Standard Deviation                            =  6.41470   Number of Observations (Sample 2)                      =  79   Sample 2 Mean                                          =  30.48101   Sample 2 Standard Deviation                            =  6.10771   Degrees of Freedom                                     =  326.00000   Pooled Standard Deviation                              =  326.00000   T Statistic                                            =  -12.62059   Probability that difference is due to chance           =  5.273e-030   Results for Alternative Hypothesis and alpha           =  0.0500'''   Alternative Hypothesis              Conclusion   Sample 1 Mean != Sample 2 Mean       NOT REJECTED   Sample 1 Mean <  Sample 2 Mean       NOT REJECTED   Sample 1 Mean >  Sample 2 Mean       REJECTED]So with a probability that the difference is due to chance of just5.273e-030, we can safely conclude that there is indeed a difference.The tests on the alternative hypothesis show that we mustalso reject the hypothesis that Sample 1 Mean isgreater than that for Sample 2: in this case Sample 1 represents the miles per gallon for Japanese cars, and Sample 2 the miles per gallon for US cars, so we conclude that Japanese cars are on average morefuel efficient.Now that we have the simple case out of the way, let's look for a momentat the more complex one: that the standard deviations of the two samplesare not equal.  In this case the formula for the t-statistic becomes:[equation dist_tutorial2]And for the combined degrees of freedom we use the [@http://en.wikipedia.org/wiki/Welch-Satterthwaite_equation Welch-Satterthwaite]approximation:[equation dist_tutorial3]Note that this is one of the rare situations where the degrees-of-freedomparameter to the Student's t distribution is a real number, and not aninteger value.[noteSome statistical packages truncate the effective degrees of freedom toan integer value: this may be necessary if you are relying on lookup tables,but since our code fully supports non-integer degrees of freedom there is noneed to truncate in this case.  Also note that when the degrees of freedom is small then the Welch-Satterthwaite approximation may be a significantsource of error.]Putting these formulae into code we get:   // Degrees of freedom:   double v = Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2;   v *= v;   double t1 = Sd1 * Sd1 / Sn1;   t1 *= t1;   t1 /=  (Sn1 - 1);   double t2 = Sd2 * Sd2 / Sn2;   t2 *= t2;   t2 /= (Sn2 - 1);   v /= (t1 + t2);   cout << setw(55) << left << "Degrees of Freedom" << "=  " << v << "\n";   // t-statistic:   double t_stat = (Sm1 - Sm2) / sqrt(Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2);   cout << setw(55) << left << "T Statistic" << "=  " << t_stat << "\n";Thereafter the code and the tests are performed the same as before.  Usingare car mileage data again, here's what the output looks like:[pre'''   __________________________________________________   Student t test for two samples (unequal variances)   __________________________________________________   Number of Observations (Sample 1)                      =  249   Sample 1 Mean                                          =  20.145   Sample 1 Standard Deviation                            =  6.4147   Number of Observations (Sample 2)                      =  79   Sample 2 Mean                                          =  30.481   Sample 2 Standard Deviation                            =  6.1077   Degrees of Freedom                                     =  136.87   T Statistic                                            =  -12.946   Probability that difference is due to chance           =  1.571e-025   Results for Alternative Hypothesis and alpha           =  0.0500'''   Alternative Hypothesis              Conclusion   Sample 1 Mean != Sample 2 Mean       NOT REJECTED   Sample 1 Mean <  Sample 2 Mean       NOT REJECTED   Sample 1 Mean >  Sample 2 Mean       REJECTED]This time allowing the variances in the two samples to differ has yieldeda higher likelihood that the observed difference is down to chance alone(1.571e-025 compared to 5.273e-030 when equal variances were assumed).However, the conclusion remains the same: US cars are less fuel efficientthan Japanese models.[endsect][section:paired_st Comparing two paired samples with the Student's t distribution]Imagine that we have a before and after reading for each item in the sample:for example we might have measured blood pressure before and after administrationof a new drug.  We can't pool the results and compare the means before and afterthe change, because each patient will have a different baseline reading.Instead we calculate the difference between before and after measurementsin each patient, and calculate the mean and standard deviation of the differences.To test whether a significant change has taken place, we can then test the null-hypothesis that the true mean is zero using the same procedure we used in the single sample cases previously discussed.That means we can:* [link math_toolkit.dist.stat_tut.weg.st_eg.tut_mean_intervals Calculate confidence intervals of the mean]. If the endpoints of the interval differ in sign then we are unable to reject the null-hypothesis that there is no change.* [link math_toolkit.dist.stat_tut.weg.st_eg.tut_mean_test Test whether the true mean is zero]. If theresult is consistent with a true mean of zero, then we are unable to reject thenull-hypothesis that there is no change.* [link math_toolkit.dist.stat_tut.weg.st_eg.tut_mean_size Calculate how many pairs of readings we would needin order to obtain a significant result].[endsect][endsect][/section:st_eg Student's t][/   Copyright 2006 John Maddock and Paul A. Bristow.  Distributed under the Boost Software License, Version 1.0.  (See accompanying file LICENSE_1_0.txt or copy at  http://www.boost.org/LICENSE_1_0.txt).]
上一页 1 23
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -