📄 chap8.tex
字号:
\eqno\eqref|mean2|\enddisplayIn our dice-throwing example, this sum has 36 terms (one for each elementof~$\Omega$), while \eq(|mean1|) is a sum of only eleven terms. Butboth sums have the same value, because they're both equal to\begindisplay\sum\twoconditions{\omega\in\Omega}{x\in X(\Omega)} x\Pr(\omega)\bigi[x=X(\omega)\bigr]\,.\enddisplayThe mean ofa random variable turns out to be\g I get it:\par On average, ``average'' means ``mean.\qback''\gmore meaningful in applications than the other kinds of averages, sowe shall largely forget about medians and modes from now on.We will use the terms ``expected value,\qback'' ``mean,\qback'' and ``average''almost interchangeably in the rest of this chapter.If $X$ and $Y$ are any two random variables defined on the same probabilityspace, then $X+Y$ is also a random variable on that space. By formula\eq(|mean2|), the average of their sum is the sum of their averages:\begindisplayE(X+Y)=\sum_{\omega\in\Omega}\bigl(X(\omega)+Y(\omega)\bigr)\Pr(\omega)=EX+EY\,.\eqno\enddisplaySimilarly, if $\alpha$ is any constant we have the simple rule\begindisplayE(\alpha X)=\alpha EX\,.\eqno\enddisplayBut the corresponding rule for multiplication of random variables is morecomplicated in general; %this happens becausethe expected value is defined as a sum over elementary events,and sums of products don't often have a simple form.In spite of this difficulty, there is a very niceformula for the mean of a product in the special case that therandom variables are independent:\begindisplayE(XY)=(EX)(EY),\qquad\hbox{if $X$ and $Y$ are independent}.\eqno\enddisplayWe can prove this by the distributive law for products,\begindisplay \openup3ptE(XY)&=\sum_{\omega\in\Omega}X(\omega)Y(\omega)\cdt\Pr(\omega)\cr&=\sum\twoconditions{x\in X(\Omega)}{y\in Y(\Omega)} xy\cdt\Pr\prp(X=x\ {\rm and}\ Y=y)\cr&=\sum\twoconditions{x\in X(\Omega)}{y\in Y(\Omega)} xy\cdt\Pr\prp(X=x)\Pr\prp(Y=y)\cr&=\sum_{x\in X(\Omega)}x\Pr\prp(X=x)\;\cdot\! \sum_{y\in Y(\Omega)}y\Pr\prp(Y=y)=(EX)(EY)\,.\enddisplayFor example, we know that $S=S_1+S_2$ and $P=S_1S_2$, when $S_1$ and~$S_2$are the numbers of spots on the first and second of a pair of random dice.We have $ES_1=ES_2={7\over2}$, hence $ES=7$; furthermore $S_1$ and~$S_2$are independent, so $EP={7\over2}\cdt{7\over2}={49\over4}$, as claimed earlier.We also have $E(S+P)=ES+EP=7+{49\over4}$. But $S$ and~$P$ are notindependent, so we cannot assert that $E(SP)=7\cdt{49\over4}={343\over4}$.In fact, the expected value of $SP$ turns out to equal $637\over6$in distribution $\Pr_{00}$, whileit equals $112$ (exactly) in distribution $\Pr_{11}$.\beginsection 8.2 Mean and VarianceThe next most important property of a random variable, after we knowits expected value, is its {\it"variance"}, defined as the meansquare deviation from the mean:\begindisplayVX=E\bigl((X-EX)^2\bigr)\,.\eqno\eqref|var1|\enddisplayIf we denote $EX$ by $\mu$, the variance $VX$ is the expected value of$(X-\mu)^2$. This measures the ``spread'' of $X$'s distribution.As a simple example of variance computation, let's suppose we havejust been made an offer we can't refuse: Someone has given us"!greed"two gift certificates for a certain "lottery". The lottery organizerssell 100 tickets for each weekly drawing. One of these tickets isselected by a uniformly random process\dash---that is,each ticket is equally likely to be chosen\dash---and the lucky ticket holderwins a hundred million dollars. The other 99 ticket holders win nothing.We can use our gift in two ways: Either we buy two tickets in the\g(Slightly subtle point:\par There are two probability spaces,depending on what strategy we use; but\/ $EX_1$ and\/ $EX_2$are the same in both.)\gsame lottery, or we buy one ticket in each of two lotteries.Which is a better strategy? Let's try to analyze this by letting$X_1$ and $X_2$ be random variables that represent the amount wewin on our first and second ticket. The expected value of $X_1$,in millions, is\begindisplayEX_1=\textstyle{99\over100}\cdot0+{1\over100}\cdot100=1\,,\enddisplayand the same holds for $X_2$. Expected values are additive, so ouraverage total winnings will be\begindisplayE(X_1+X_2)=EX_1+EX_2=2\hbox{ million dollars,}\enddisplayregardless of which strategy we adopt.Still, the two strategies seem different. Let's look beyond expectedvalues and study the exact probability distribution of $X_1+X_2$:\begindisplay \def\preamble{\bigstrut\hfil##\quad&\vrule##&&\quad\hfil$##$\hfil}% \offinterlineskip&&\multispan3\hfil\quad winnings (millions)\hfil\cr&&0&100&200\cr\noalign{\hrule}\omit&height 2pt\crsame drawing&&.9800&.0200\crdifferent drawings&&.9801&.0198&.0001\cr\enddisplayIf we buy two tickets in the same lottery we have a 98\% chance ofwinning nothing and a 2\% chance of winning \$100 million.If we buy them in different lotteries we have a 98.01\% chance ofwinning nothing, so this is slightly more likely than before;and we have a 0.01\% chance of winning \$200 million, also slightlymore likely than before; and our chances of winning \$100 millionare now 1.98\%. So the distribution of $X_1+X_2$ in this secondsituation is slightly more spread out; the middle value, \$100 million,is slightly less likely, but the extreme values are slightly more likely.It's this notion of the spread of a random variable that the variance isintended to capture. We measure the spread in terms of the squareddeviation of the random variable from its mean. In case~1, the varianceis therefore \setmathsize{.98(0M-2M)^2\,+\,.02(100M-2M)^2}\begindisplay\mathsize{.98(0M-2M)^2\,+\,.02(100M-2M)^2}=196M^2\,;\enddisplayin case 2 it is\begindisplay&.9801(0M-2M)^2\,+\,.0198(100M-2M)^2+.0001(200M-2M)^2\cr&\mathsize{}=198M^2\,.\enddisplayAs we expected, the latter variance is slightly larger, because thedistribution of case~2 is slightly more spread out.When we work with variances, everything is squared, so the numberscan get pretty big. (The factor $M^2$ is one trillion,\g Interesting:The variance of a dollar amount is expressed in units of square dollars.\gwhich is somewhat imposing even for high-stakes gamblers.) To convert thenumbers back to the more meaningful original scale, we often take thesquare root of the variance. The resulting number is called the{\it"standard deviation"}, and it is usually denoted by the Greek letter~"$\sigma$":\begindisplay\sigma=\sqrt{VX}\,.\eqno\enddisplayThe standard deviations of the random variables $X_1+X_2$ in our twolottery strategies are $\sqrt{196M^{\mathstrut2}}=14.00M$ and $\sqrt{198M^{\mathstrut2}}\approx14.071247M$. In some sense the secondalternative is about \$71,247 riskier.How does the variance help us choose a strategy? It's notclear. The strategy with higher variance is a little riskier; but do weget the most for our money by taking more risks or by playing it safe?\g Another way to reduce risk might be to bribe the lottery officials.I~guess that's where probability becomes indiscreet.\bigskip(N.B.: Opinions expressed in these margins do not necessarily representthe opinions of the management.)\gSuppose we had the chance to buy 100 tickets instead of only~two. Then wecould have a guaranteed victory in a single lottery (and the variancewould be zero); or we could gamble on a hundred different lotteries,with a $.99^{100}\approx.366$ chance of winning nothing but also witha nonzero probability of winning up to \$10,000,000,000. To decidebetween these alternatives is beyond the scope of this book; all we cando here is explain how to do the calculations.In fact, there is a simpler way to calculate the variance, insteadof using the definition \eq(|var1|). (We suspect that there must besomething going on in the mathematics behind the scenes, becausethe variances in the lottery example magically came out to beinteger multiples of\/~$M^2$.) We have\begindisplayE\bigl((X-EX)^2\bigr)&=E\bigl(X^2-2X(EX)+(EX)^2\bigr)\cr&=E(X^2)-2(EX)(EX)+(EX)^2\,,\cr\enddisplaysince $(EX)$ is a constant; hence\begindisplay \postdisplaypenalty=10000VX=E(X^2)-(EX)^2\,.\eqno\eqref|var2|\enddisplay``The variance is the mean of the square minus the square of the mean.''For example, the mean of $(X_1+X_2)^2$ comes to $.98(0M)^2+.02(100M)^2=200M^2$ or to $.9801(0M)^2+.0198(100M)^2+.0001(200M)^2=202M^2$ in thelottery problem. Subtracting $4M^2$ (the square of the mean) gives theresults we obtained the hard way.There's an even easier formula yet, if we want to calculate $V(X+Y)$when $X$ and~$Y$ are independent: We have\begindisplay \openup3ptE\bigl((X+Y)^2\bigr)&=E(X^2+2XY+Y^2)\cr&=E(X^2)+2(EX)(EY)+E(Y^2)\,,\enddisplaysince we know that $E(XY)=(EX)(EY)$ in the independent case. Therefore\begindisplay \openup3ptV(X+Y)&=E\bigl((X+Y)^2\bigr)-(EX+EY)^2\cr&=E(X^2)+2(EX)(EY)+E(Y^2)\cr&\qquad-(EX)^2-2(EX)(EY)-(EY)^2\cr&=E(X^2)-(EX)^2+E(Y^2)-(EY)^2\cr&=VX+VY\,.\eqno\enddisplay``The variance of a sum of independent random variables is the sum of theirvariances.'' For example, the variance of the amount we can win with asingle lottery ticket is\begindisplayE(X_1^2)-(EX_1)^2=.99(0M)^2+.01(100M)^2-(1M)^2=99M^2\,.\enddisplayTherefore the variance of the total winnings of two lottery ticketsin two separate (independent) lotteries is $2\times99M^2=198M^2$.And the corresponding variance for $n$ independent lottery ticketsis $n\times99M^2$.The variance of the dice-roll sum $S$ drops out of this same formula,since $S=S_1+S_2$ is the sum of two independent random variables.We have\begindisplayVS_1={1\over6}(1^2+2^2+3^2+4^2+5^2+6^2)-\left(7\over2\right)^{\!2}={35\over12}\enddisplaywhen the dice are fair;hence $VS={35\over12}+{35\over12}={35\over6}$. The loaded die has\begindisplayVS_1={1\over8}(2\cdt1^2+2^2+3^2+4^2+5^2+2\cdt6^2)-\left(7\over2\right)^{\!2}={45\over12}\,;\enddisplayhence $VS={45\over6}=7.5$ when both dice are loaded. Notice that the loaded dice give$S$ a larger variance, although $S$ actually assumes its average value~$7$more often than it would with fair dice. If our goal is to shoot lots oflucky~$7$'s, the variance is not our best indicator of success.OK, we have learned how to compute variances. But we haven't reallyseen a good reason why the variance is a natural thing to compute.Everybody does it, but why? The main reason is\g If he proved it in 1867, it's a classic '67 Chebyshev.\g{\it "Chebyshev's inequality"\/} ([|bienayme|] and [|chebyshev-ineq|]),which states that the variance has a significant property:\begindisplay\Pr\pbigi((X-EX)^2\ge\alpha\bigr)\le VX/\alpha\,,\qquad\hbox{for all $\alpha>0$}.\eqno\eqref|cheb1|\enddisplay(This is different from the monotonic inequalities of "Chebyshev" that weencountered in Chapter~2.)Very roughly, \thiseq\ tells us that a random variable~$X$ will rarelybe far from its mean~$EX$ if its variance~$VX$ is small.The proof is amazingly simple. We have\begindisplay \openup3ptVX&=\sum_{\omega\in\Omega}\bigl(X(\omega)-EX\bigr)^{\!2}\,\Pr(\omega)\cr&\ge\sum\twoconditions{\omega\in\Omega}{(X(\omega)-EX)^2\ge\alpha} \!\!\!\bigl(X(\omega)-EX\bigr)^{\!2}\,\Pr(\omega)\cr&\ge\sum\twoconditions{\omega\in\Omega}{(X(\omega)-EX)^2\ge\alpha} \!\!\!\alpha\Pr(\omega)\;=\;\alpha\cdt\Pr\pbigi((X-EX)^2\ge\alpha\bigr)\,;\enddisplaydividing by $\alpha$ finishes the proof.If we write $\mu$ for the mean and $\sigma$ for the standard deviation,and if we replace $\alpha$ by $c^2VX$ in \eq(|cheb1|), the condition$(X-EX)^2\ge c^2VX$ is the same as $(X-\mu)^2\ge(c\sigma)^2$; hence\eq(|cheb1|) says that\begindisplay\Pr\pbigi(\vert X-\mu\vert\ge c\sigma\bigr)\le 1/c^2\,.\eqno\eqref|cheb2|\enddisplayThus, $X$ will lie within $c$ standard deviations of its mean valueexcept with probability at most $1/c^2$. A random variable will liewithin $2\sigma$ of~$\mu$ at least 75\% of the time; it will liebetween $\mu-10\sigma$ and $\mu+10\sigma$ at least 99\% of the time.These are the cases $\alpha=4VX$ and $\alpha=100VX$ of Chebyshev's inequality.If we roll a pair of fair dice $n$ times, the total value of the $n$~rollswill almost always be near $7n$, for large~$n$.Here's why: The variance of $n$ independentrolls is ${35\over6}n$. A variance of ${35\over6}n$ means a standarddeviation of only\begindisplay\textstyle\sqrt{{35\over6}n}\,.\enddisplaySo Chebyshev's inequalitytells us that the final sum will lie between\begindisplay \postdisplaypenalty=10000\textstyle 7n-10\sqrt{{35\over6}n}\And7n+10\sqrt{{35\over6}n}\enddisplayin at least 99\% of all experiments when $n$ fair dice are rolled.For example, the odds are betterthan 99 to~1 that the total value of a million rollswill be between 6.976~million and 7.024~million.In general, let $X$ be {\it any\/} random variable over a probability space~$\Omega$,having finite mean~$\mu$ and finite standard deviation~$\sigma$.Then we can consider the probability space $\Omega^n$ whose elementaryevents are $n$-tuples $(\omega_1,\omega_2,\ldots,\omega_n)$ witheach $\omega_k\in\Omega$, and whose probabilities are\begindisplay\Pr(\omega_1,\omega_2,\ldots,\omega_n)=\Pr(\omega_1)\Pr(\omega_2)\ldots\Pr(\omega_n)\,.\enddisplayIf we now define random variables $X_k$ by the formula\begindisplayX_k(\omega_1,\omega_2,\ldots,\omega_n)=X(\omega_k)\,,\enddisplaythe quantity\begindisplayX_1+X_2+\cdots+X_n\enddisplayis a sum of $n$ independent random variables, which corresponds to taking
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -