📄 chap8.tex
字号:
$n$ independent ``samples'' of\/~$X$ on $\Omega$ and adding them together.The mean of $X_1+X_2+\cdots+X_n$ is $n\mu$, and the standard deviationis $\sqrt n\,\sigma$; hence the average of the $n$ samples,\begindisplay{1\over n}(X_1+X_2+\cdots+X_n)\,,\enddisplaywill lie between $\mu-10\sigma/\mskip-1mu\sqrt n$ and$\mu+10\sigma/\mskip-1mu\sqrt n$ at least 99\% of the time.\g(That is, the average will fall between the stated limitsin at least 99\% of all cases when we look at a set of $n$ independentsamples, for any fixed value of $n$. Don't misunderstandthis as a statement about theaverages of an infinite sequence $X_1$, $X_2$, $X_3$, \dots\ as~$n$ varies.)\g In other words, if wechoose a large enough value of~$n$, the average of $n$ independent sampleswill almost always be very near the expected value $EX$.(An even stronger theorem called the Strong"Law of Large Numbers" is proved in textbooks of probability theory;but the simple consequence of Chebyshev's inequality that we havejust derived is enough for our purposes.)Sometimes we don't know the characteristics of a probability space, andwe want to estimate the mean of a random variable~$X$ by sampling itsvalue repeatedly. (For example, we might wantto know the average temperature at noon on a January day in San Francisco;or we may wish to know the mean life expectancy of insurance agents.)If we have obtained independent empirical observations $X_1$, $X_2$,\dots,~$X_n$, we can guess that the true mean is approximately\begindisplay\widehat EX={X_1+X_2+\cdots+X_n\over n}\,.\eqno\eqref|hat-e|\enddisplayAnd we can also make an estimate of the variance, using the formula\begindisplay\widehat VX={X_1^2+X_2^2+\cdots+X_n^2\over n-1} -{(X_1+X_2+\cdots+X_n)^2\over n(n-1)}\,.\eqno\eqref|hat-v|\enddisplayThe $(n-1)$'s in this formula look like typographic errors; it seemsthey should be $n$'s, as in \eq(|hat-e|), becausethe true variance $VX$ is defined by expected values in \eq(|var2|).Yet we get a better estimate with $n-1$ instead of~$n$here, because definition \thiseq\ implies that\begindisplayE(\widehat VX)=VX\,.\eqno\enddisplayHere's why:\begindisplay \openup5ptE(\widehat VX)&={1\over n-1}E\Bigl(\,\sum_{k=1}^n X_k^2\,-\,{1\over n} \sum_{j=1}^n\,\sum_{k=1}^n X_jX_k\Bigr)\cr&={1\over n-1}\Bigl(\,\sum_{k=1}^n E(X_k^2)\,-\,{1\over n} \sum_{j=1}^n\,\sum_{k=1}^n E(X_jX_k)\Bigr)\cr&={1\over n-1}\Bigl(\,\sum_{k=1}^n E(X^2)\,-\,{1\over n} \sum_{j=1}^n\,\sum_{k=1}^n \bigl(E(X)^2\[j\ne k]+E(X^2)\[j=k]\bigr)\Bigr)\cr&={1\over n-1}\Bigl(nE(X^2)-{1\over n}\bigl(nE(X^2)+n(n-1)E(X)^2\bigr)\Bigr)\cr&=E(X^2)-E(X)^2=VX\,.\enddisplay(This derivation uses the independence of the observations when itreplaces $E(X_jX_k)$ by $(EX)^2\[j\ne k]+E(X^2)\[j=k]$.)In practice, experimental results about a random variable~$X$ are usuallyobtained by calculating a sample mean $\hat\mu=\widehat EX$ and a samplestandard deviation $\hat\sigma=\sqrt{\widehat VX}$, and presenting the answerin the form `$\,\hat\mu\pm\hat\sigma/\mskip-1mu\sqrt n\,$'. For example, here are tenrolls of two supposedly fair dice:\begindisplay \openup4pt \advance\abovedisplayskip-2pt\diefour\diethree\qquad \diefive\diethree\qquad \diethree\dieone\qquad \diesix\diefour\qquad \dietwo\diesix\cr\diefive\diesix\qquad \diefour\dieone\qquad \diefive\dieone\qquad \dietwo\diesix\qquad \diefour\diethree\cr\enddisplayThe sample mean of the spot sum $S$ is\begindisplay\hat\mu=(7+11+8+5+4+6+10+8+8+7)/10=7.4\,;\enddisplaythe sample variance is\begindisplay(7^2+11^2+8^2+5^2+4^2+6^2+10^2+8^2+8^2+7^2-10{\hat\mu}^2)/9\approx2.1^2\,.\enddisplayWe estimate the average spot sum of these dice to be $7.4\pm2.1/\mskip-1mu\sqrt{10}=7.4\pm0.7$,on the basis of these experiments.\smallskipLet's work one more example of means and variances, in order to show howthey can be calculated theoretically instead of empirically. One of thequestions we considered in Chapter~5 wasthe ``"football victory problem",\qback'' where $n$~"hats" are throwninto the air and the result is a random permutation of hats. We showedin equation \equ(5.|subfactorial-sol|) that there's a probabilityof $n\?/n!\approx1/e$that nobody gets the right hat back. We also derivedthe formula\begindisplayP(n,k)={1\over n!}{n\choose k}(n-k)\?={1\over k!}{(n-k)\?\over(n-k)!}\eqno\enddisplayfor the probability that exactly $k$ people end up with their own hats.Restating these results in the formalism just learned, we can considerthe probability space $\Pi_n$ of all $n!$ permutations~$\pi$ of $\{1,2,\ldots,n\}$, where $\Pr(\pi)=1/n!$ for all $\pi\in\Pi_n$. The random variable\begindisplayF_n(\pi)=\hbox{number of ``"fixed points"'' of $\pi$}\,,\qquad\hbox{for $\pi\in\Pi_n$},\enddisplay\g\vskip-31pt Not to be confused with a Fibonacci number.\gmeasures the number of correct hat-falls in the football victory problem.Equation \thiseq\ gives $\Pr\prp(F_n=k)$, but let's pretend that we don'tknow any such formula; we merely want to study the average value of~$F_n$,and its standard deviation.The average value is, in fact, extremely easy to calculate, avoidingall the complexities of Chapter~5. We simply observe that\begindisplayF_n(\pi)&=F_{n,1}(\pi)+F_{n,2}(\pi)+\cdots+F_{n,n}(\pi)\,,\crF_{n,k}(\pi)&=[@\hbox{position $k$ of $\pi$ is a fixed point}]\,,\qquad\hbox{for $\pi\in\Pi_n$}.\enddisplayHence\begindisplayEF_n=EF_{n,1}+EF_{n,2}+\cdots+EF_{n,n}\,.\enddisplayAnd the expected value of $F_{n,k}$ is simply the probability that $F_{n,k}=1$,which is $1/n$ because exactly $(n-1)!$ of the $n!$ permutations$\pi=\pi_1\pi_2\ldots\pi_n\in\Pi_n$ have $\pi_k=k$. Therefore\begindisplayEF_n=n/n=1\,,\qquad\hbox{for $n>0$}.\eqno\eqref|e-fixedpts|\enddisplayOn the average,\g One the average.\g one hat will be in its correct place. ``A random permutationhas one fixed point, on the average.''Now what's the standard deviation? This question is more difficult,because the $F_{n,k}$'s are not independent of each other. But we cancalculate the variance by analyzing the mutual dependencies among them:\begindisplay \openup3ptE(F_n^2)&=E\biggl(\Bigl(\,\sum_{k=1}^nF_{n,k}\Bigr)^{\!2\,}\biggr)=E\Bigl(\,\sum_{j=1}^n\,\sum_{k=1}^nF_{n,j\,}F_{n,k}\Bigr)\cr&=\sum_{j=1}^n\sum_{k=1}^n E(F_{n,j\,}F_{n,k})=\sum_{1\le k\le n}E(F_{n,k}^2)\!+\!2\!\! \sum_{1\le j<k\le n}E(F_{n,j\,}F_{n,k})\,.\cr\enddisplay(We used a similar trick when we derived \equ(2.|upper-triangle|)in Chapter~2.) Now $F_{n,k}^2=F_{n,k}$, since $F_{n,k}$ is either $0$ or~$1$;hence $E(F_{n,k}^2)=EF_{n,k}=1/n$ as before. And if $j<k$ we have$E(F_{n,j\,}F_{n,k})=\Pr(\pi$ has both $j$ and $k$ as fixed points$)=(n-2)!/n!=1/n(n-1)$. Therefore\begindisplayE(F_n^2)={n\over n}+{n\choose 2}{2\over n(n-1)}=2\,,\qquad\hbox{for $n\ge2$}.\eqno\eqref|mu2-fixedpts|\enddisplay(As a check when $n=3$, we have ${2\over6}0^2+{3\over6}1^2+{0\over6}2^2+{1\over6}3^2=2$.) The variance is $E(F_n^2)-(EF_n)^2=1$, so the standarddeviation (like the mean) is~$1$. ``A random permutation of $n\ge2$ elementshas $1\pm1$ fixed points.''\begingroup \advance\parindent-7.5pt % KLUDGE!\beginsection 8.3 Probability Generating FunctionsIf $X$ is a random variable that takes only nonnegative integer values, we\endgroup % end kludgecan capture its probability distribution nicely by using the techniquesof Chapter~7. The {\it"probability generating function"\/} or "pgf" of\/~$X$ is\begindisplayG_X(z)=\sum_{k\ge0}\Pr\prp(X=k)\,z^k\,.\eqno\enddisplayThis power series in $z$ contains all the information about the randomvariable~$X$. We can also express it in two other ways:\begindisplayG_X(z)=\sum_{\omega\in\Omega}\Pr(\omega)\,z^{X(\omega)}=E(z^X)\,.\eqno\enddisplayThe coefficients of $G_X(z)$ are nonnegative, and they sum to~$1$; thelatter condition can be written\begindisplayG_X(1)=1\,.\eqno\eqref|pgf0|\enddisplayConversely, any power series $G(z)$ with nonnegative coefficients andwith $G(1)=1$ is the pgf of some random variable.The nicest thing about pgf's is that they usually simplify the computationof means and variances. For example, the mean is easily expressed:\begindisplayEX&=\sum_{k\ge0}k\cdt\Pr\prp(X=k)\cr&=\sum_{k\ge0}\Pr\prp(X=k)\cdt kz^{k-1}\big\vert_{z=1}\cr&=G'_X(1)\,.\eqno\eqref|pgf1|\enddisplayWe simply differentiate the pgf with respect to $z$ and set $z=1$.The variance is only slightly more complicated:\begindisplayE(X^2)&=\sum_{k\ge0}k^2\cdt\Pr\prp(X=k)\cr&=\sum_{k\ge0}\Pr\prp(X=k)\cdt\bigl( k(k-1)z^{k-2}+kz^{k-1}\bigr)\,\big\vert_{z=1}=G''_X(1)+G'_X(1)\,.\enddisplayTherefore\begindisplayVX=G''_X(1)+G'_X(1)-G'_X(1)^2\,.\eqno\eqref|pgf2|\enddisplayEquations \eq(|pgf1|) and \eq(|pgf2|) tell us that we can compute themean and variance if we can compute the values of two derivatives,$G'_X(1)$ and $G''_X(1)$. We don't have to know a closed form for theprobabilities; we don't even have to know a closed form for $G_X(z)$ itself.It is convenient to write\begindisplay\Mean(G)&=G'(1)\,,\eqno\eqref|pgf-mean|\cr\Var(G)&=G''(1)+G'(1)-G'(1)^2\,,\eqno\eqref|pgf-var|\cr\enddisplaywhen $G$ is any function, since we frequently want to compute thesecombinations of derivatives.The second-nicest thing about pgf's is that they are comparativelysimple functions of~$z$, in many important cases. For example, let'slook at the{\it"uniform distribution"\/} of order~$n$, in which the random variabletakes on each of the values $\{0,1,\ldots,n-1\}$ with probability $1/n$.The pgf in this case is\begindisplayU_n(z)={1\over n}(1+z+\cdots+z^{n-1})={1\over n}{1-z^n\over 1-z}\,,\qquad\hbox{for $n\ge1$}.\eqno\eqref|u-pgf|\enddisplayWe have a closed form for $U_n(z)$ because this is a geometric series.\looseness=-1But this closed form proves to be somewhat embarrassing: Whenwe plug in $z=1$ (the value of~$z$ that's most critical for the pgf\/),we get the undefined ratio $0/0$, even though $U_n(z)$ is a polynomialthat is perfectly well defined at any value of~$z$. The value$U_n(1)=1$ is obvious from the non-closed form $(1+z+\cdots+z^{n-1})/n$,yet it seems that we must resort to "L'Hospital's rule" to find$\lim_{z\to1}U_n(z)$ if we want to determine $U_n(1)$ from the closed form.The determination of $U'_n(1)$ by L'Hospital's rule will be even harder,because there will be a factor of $(z-1)^2$ in the denominator; $U_n''(1)$will be harder still.Luckilythere's a nice way out of this dilemma. If $G(z)=\sum_{n\ge0}g_nz^n$is any power series that converges for at least one value of~$z$ with$\vert z\vert>1$, the power series $G'(z)=\sum_{n\ge0}ng_nz^{n-1}$will also have this property, and so will $G''(z)$, $G'''(z)$, etc.Therefore by "Taylor's theorem" we can write\begindisplayG(1+t)=G(1)+{G'(1)\over1!}t+{G''(1)\over2!}t^2+{G'''(1)\over3!}t^3+\cdots\,;\eqno\eqref|derivs-at-1|\enddisplayall derivatives of $G(z)$ at $z=1$ will appear as coefficients, when$G(1+t)$ is expanded in powers of~$t$.For example, the derivatives of the uniform pgf\/ $U_n(z)$ are easilyfound in this way:\begindisplay \openup3ptU_n(1+t)&={1\over n}{(1+t)^n-1\over t}\cr&={1\over n}{n\choose1} +{1\over n}{n\choose2}t +{1\over n}{n\choose3}t^2+\cdots +{1\over n}{n\choose n}t^{n-1}\,.\enddisplayComparing this to \eq(|derivs-at-1|) gives\begindisplayU_n(1)=1\,;\quad U_n'(1)={n-1\over2}\,;\quadU_n''(1)={(n-1)(n-2)\over3}\,;\eqno\enddisplayand in general $U_n^{(m)}(1)=(n-1)\_m/(m+1)$, although we need only thecases $m=1$ and $m=2$ to compute the mean and the variance. Themean of the uniform distribution is\begindisplayU_n'(1)={n-1\over2}\,,\eqno\eqref|unif-mean|\enddisplayand the variance is\begindisplay \openup3ptU_n''(1)+U_n'(1)-U_n'(1)^2&=4\,{(n-1)(n-2)\over12}+6\,{(n-1)\over12}-3\,{(n-1)^2\over12}\cr&={n^2-1\over12}\,.\eqno\eqref|unif-var|\enddisplayThe third-nicest thing about pgf's is that the product of pgf's correspondsto the sum of independent random variables. We learned in Chapters5 and~7 that the product of generating functions corresponds to theconvolution of sequences; but it's even more important inapplications to know that the convolution of probabilities correspondsto the sum of independent random variables. Indeed, if $X$ and~$Y$are random variables that take on nothing but integer values, theprobability that $X+Y=n$ is\begindisplay\Pr\prp(X+Y=n)=\sum_k\Pr\prp(X=k\ {\rm and}\ Y=n-k)\,.\enddisplayIf $X$ and~$Y$ are independent, we now have\begindisplay\Pr\prp(X+Y=n)=\sum_k\Pr\prp(X=k)\,\Pr\prp(Y=n-k)\,,\enddisplaya convolution. Therefore\dash---and this is the punch line\dash---\begindisplayG_{X+Y}(z)=G_X(z)\,G_Y(z)\,,\qquad\hbox{if $X$ and $Y$ are independent}.\eqno\enddisplayEarlier this chapter we observed that $V(X+Y)=VX+VY$ when $X$ and~$Y$are independent. Let $F(z)$ and~$G(z)$ be the pgf's for $X$ and~$Y$,and let $H(z)$ be the pgf for $X+Y$. Then
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -