Section 3.3 Expected Value and Variance
¶Subsection 3.3.1 Expected Values
¶Consider the probability space \((S,P)\) with sample space \(S = \{1,2,3\}\) and probability function \(P\) defined by \(P(1)=4/5\text{,}\) \(P(2)=1/10\text{,}\) and \(P(3)=1/10\text{.}\) Assume we choose an element in \(S\) according to this probability function. Let \(X\) be the random variable whose value is equal to the element in \(S\) that is chosen. Thus, as a function \(X : S \rightarrow \mathbb{R}\text{,}\) we have \(X(1)=1\text{,}\) \(X(2)=2\text{,}\) and \(X(3)=3\text{.}\)
The “expected value” of \(X\) is the value of \(X\) that we observe “on average”. How should we define this? Since \(X\) has a much higher probability to take the value \(1\) than the other two values \(2\) and \(3\text{,}\) the value \(1\) should get a larger “weight” in the expected value of \(X\text{.}\) Based on this, it is natural to define the expected value of \(X\) to be
Definition 3.3.1. Expected Value.
Let \((S,P)\) be a probability space and let \(X : S \rightarrow \mathbb{R}\) be a random variable. The expected value (or expectation or weighted average) of \(X\) is defined to be
Example 3.3.2. Expected value of a coin flip.
Assume we flip a fair coin, in which case the sample space is \(S = \{H,T\}\) and \(P(H) = P(T) = 1/2\text{.}\) Define the random variable \(X\) to have value
Thus, as a function \(X : S \rightarrow \mathbb{R}\text{,}\) we have \(X(H)=1\) and \(X(T)=0\text{.}\) The expected value \(\mathbb{E}(X)\) of \(X\) is equal to
This example shows that the term “expected value” is a bit misleading: \(\mathbb{E}(X)\) is not the value that we expect to observe, because many times the value of \(X\) can never equal to its expected value.
Definition 3.3.3. Bernoulli Trial.
A Bernoulli trial is a special kind of experiment that can have only two outcomes: 1 or 0. A 1 is called a “success” and a 0 is called a “failure”. The probability of success is defined as \(p\) and the probability of failure is therefore \(1-p\) or \(q\text{.}\) If \(X\) is a random variable that represents the outcome of a Bernoulli trial then
and
where \(p + q = 1\text{.}\)
In the preceding example (Example 3.3.2) we defined a random variable \(X\) where
Each coin flip is a Bernoulli trial.
Theorem 3.3.4. Expected Successes in a Bernoulli Trial.
Let \(X\) be a random variable representing a Bernoulli trial that takes the value 1 with probability \(p\) and the value 0 with probability \(1-p\text{.}\) Then
Proof.
Example 3.3.5. Expected value of a die roll.
Assume we roll a fair die. Define the random variable \(X\) to be the value of the result. Then, \(X\) takes each of the values in \(\{1,2,3,4,5,6\}\) with equal probability \(1/6\text{,}\) and we get
Now define the random variable \(Y\) to be equal to one divided by the result of the die. In other words, \(Y = 1/X\text{.}\) This random variable takes each of the values in \(\{1,1/2,1/3,1/4,1/5,1/6\}\) with equal probability \(1/6\text{,}\) and we get
Note that \(\mathbb{E}(Y) \neq 1 / \mathbb{E}(X)\text{.}\) Thus, this example shows that, in general, \(\mathbb{E}(1/X) \neq 1 / \mathbb{E}(X)\text{.}\)
Example 3.3.6. Expected value of rolling two dice.
Consider a fair red die and a fair blue die, and assume we roll them independently, just like Example 3.1.7. The sample space is
where \(i\) is the result of the red die and \(j\) is the result of the blue die. Each outcome \((i,j)\) in \(S\) has the same probability of \(1/36\text{.}\)
Let \(X\) be the random variable whose value is equal to the sum of the results of the two rolls. As a function \(X : S \rightarrow \mathbb{R}\text{,}\) we have \(X(i,j) = i+j\text{.}\) The matrix below gives all possible values of \(X\text{.}\) The leftmost column indicates the result of the red die, the top row indicates the result of the blue die, and each other entry is the corresponding value of \(X\text{.}\)
The expected value \(\mathbb{E}(X)\) of \(X\) is equal to
Subsubsection 3.3.1.1 Comparing the Expected Values of Comparable Random Variables
¶Consider a probability space \((S,P)\text{,}\) and let \(X\) and \(Y\) be two random variables on \(S\text{.}\) Recall that \(X\) and \(Y\) are functions that map elements of \(S\) to real numbers. We will write \(X \leq Y\text{,}\) if for each element \(s \in S\text{,}\) we have \(X(s) \leq Y(s)\text{.}\) In other words, the value of \(X\) is at most the value of \(Y\text{,}\) no matter which outcome \(s\) is chosen.
Theorem 3.3.7. Comparison of Expectations.
Let \((S,P)\) be a probability space and let \(X\) and \(Y\) be two random variables on \(S\text{.}\) If \(X \leq Y\text{,}\) then \(\mathbb{E}(X) \leq \mathbb{E}(Y)\text{.}\)
Proof.
Using Definition 3.3.1 and the assumption that \(X \leq Y\text{,}\) we obtain
Subsubsection 3.3.1.2 An Alternative Expression for the Expected Value
¶In the Example 3.3.6, we used Definition 3.3.1 to compute the expected value \(\mathbb{E}(X)\) of the random variable \(X\) that was defined to be the sum of the results when rolling two fair and independent dice. This was a painful way to compute \(\mathbb{E}(X)\text{,}\) because we added all \(36\) entries in the matrix. There is a slightly easier way to determine \(\mathbb{E}(X)\text{:}\) By looking at the matrix, we see that the value \(4\) occurs three times. Thus, the event “\(X=4\)” has size \(3\text{,}\) i.e., if we consider the subset of the sample space \(S\) that corresponds to this event, then this subset has size \(3\text{.}\) Similarly, the event “\(X=7\)” has size \(6\text{,}\) because the value \(7\) occurs \(6\) times in the matrix. The table below lists the sizes of all non-empty events, together with their probabilities.
Based on this, we get
Even though this is still quite painful, less computation is needed. What we have done is the following: In the definition of \(\mathbb{E}(X)\text{,}\) i.e.,
we rearranged the terms in the summation. That is, instead of taking the sum over all elements \((i,j)\) in \(S\text{,}\) we
grouped together all outcomes \((i,j)\) for which \(X(i,j) = i+j\) has the same value, say, \(k\text{,}\)
multiplied this common value \(k\) by the probability that \(X\) is equal to \(k\text{,}\)
and took the sum of the resulting products over all possible values of \(k\text{.}\)
This resulted in
The following theorem states that this can be done for any random variable.
Theorem 3.3.8. Expected Value Alternative Equation.
Let \((S,P)\) be a probability space and let \(X : S \rightarrow \mathbb{R}\) be a random variable. The expected value of \(X\) is equal to
Proof.
Recall that the event “\(X=x\)” corresponds to the subset
of the sample space \(S\text{.}\) We have
When determining the expected value of a random variable \(X\text{,}\) it is usually easier to use Theorem 3.3.8 than Definition 3.3.1. To use Theorem 3.3.8, you have to do the following:
Determine all values \(x\) that \(X\) can take, i.e., determine the range of the function \(X\text{.}\)
For each such value \(x\text{,}\) determine \(P(X=x)\text{.}\)
Compute the sum of all products \(x \cdot P(X=x)\text{.}\)
Subsection 3.3.2 Linearity of Expectation
¶We now come to one of the most useful tools for determining expected values:
Theorem 3.3.9. Linearity of Expectation.
Let \((S,P)\) be a probability space. For any two random variables \(X\) and \(Y\) on \(S\text{,}\) and for any two real numbers \(a\) and \(b\text{,}\)
Proof.
Recall that both \(X\) and \(Y\) are functions from \(S\) to \(\mathbb{R}\text{.}\) Define the random variable \(Z\) to be \(Z=aX+bY\text{.}\) That is, as a function \(Z : S \rightarrow \mathbb{R}\text{,}\) \(Z\) is defined by
for all \(s\) in \(S\text{.}\) Using Definition 3.3.1, we get
Let us return to the example in which we roll two fair and independent dice, one being red and the other being blue. Define the random variable \(X\) to be the sum of the results of the two rolls. We have seen two ways to compute the expected value \(\mathbb{E}(X)\) of \(X\text{.}\) We now present a third way, which is the easiest one: We define two random variables
and
We have already seen that
By the same computation, we have
Observe that
Then, by the Linearity of Expectation (i.e., Theorem 3.3.9), we have
We have stated the Linearity of Expectation for two random variables. The proof of Theorem 3.3.9 can easily be generalized to any finite sequence of random variables:
Theorem 3.3.10. Generalized Linearity of Expectation.
Let \((S,P)\) be a probability space, let \(n \geq 2\) be an integer, let \(X_1,X_2,\ldots,X_n\) be a sequence of random variables on \(S\text{,}\) and let \(a_1,a_2,\ldots,a_n\) be a sequence of real numbers. Then,
Subsection 3.3.3 The Geometric Distribution
¶Say we are performing repeated independent Bernoulli trials such that each one is successful with probability \(p\) and fails with probability \(1-p\text{.}\) What is the expected number of times that we must perform the trial before we see a success?
We model this problem in the following way: Assume we have a coin that comes up heads with probability \(p\) and, thus, comes up tails with probability \(1-p\text{.}\) We flip this coin repeatedly and independently until it comes up heads for the first time. Define the random variable \(X\) to be the number of times that we flip the coin; this includes the last coin flip, which resulted in heads. We want to determine the expected value \(\mathbb{E}(X)\) of \(X\text{.}\)
The sample space is given by
where \(T^{k-1} H\) denotes the sequence consisting of \(k-1\) tails followed by one heads. Since the coin flips are independent, the outcome \(T^{k-1} H\) has a probability of \((1-p)^{k-1} p = p (1-p)^{k-1}\text{,}\) i.e.,
For any integer \(k \geq 1\text{,}\) \(X=k\) if and only if the coin flips give the sequence \(T^{k-1} H\text{.}\) It follows that
Definition 3.3.11. Geometric Distribution.
Let \(p\) be a real number with \(0 <p <1\text{.}\) A random variable \(X\) has a geometric distribution with parameter \(p\text{,}\) if its distribution function satisfies
for any integer \(k \geq 1\text{.}\)
Theorem 3.3.12. Expectation of a Geometric Distribution.
Let \(p\) be a real number with \(0 < p < 1\) and let \(X\) be a random variable that has a geometric distribution with parameter \(p\text{.}\) Then
Proof.
Informally, this makes sense. If we see a success with probability \(p\) in each trial, then we should expect to see a success in 1 out of \(p\) trials, (if \(p = 1/n\) then we expect to perform \(1/p = n\) trials).
A formal proof requires calculus so not given here.
For example, if we flip a fair coin (in which case \(p=1/2\)) repeatedly and independently until it comes up heads for the first time, then the expected number of coin flips is equal to \(2\text{.}\)
Subsection 3.3.4 The Binomial Distribution
¶Say as in Subsection 3.3.3 we are performing repeated independent Bernoulli trials such that each one is successful with probability \(p\) and fails with probability \(1-p\text{.}\) But now we repeat the experiment a fixed number of times, say \(n\) times, integer \(n \geq 1\text{.}\) What number of successes can we expect to see in those \(n\) trials?
We again model this problem using a coin that comes up heads with probability \(p\) and, thus, comes up tails with probability \(1-p\text{.}\) We flip the coin, independently, \(n\) times and define the random variable \(X\) to be the number of times the coin comes up heads. We want to determine the expected value \(\mathbb{E}(X)\text{.}\)
Let \(n \geq 1\) and \(k\) be integers with \(0 \leq k \leq n\text{.}\) Then, \(X=k\) if and only if there are exactly \(k\) \(H\)'s in the sequence of \(n\) coin flips. The number of such sequences is equal to \(n \choose k\text{,}\) and each one of them has probability \(p^k (1-p)^{n-k}\text{.}\)
Definition 3.3.13. Binomial Distribution.
Let \(n \geq 1\) be an integer and let \(p\) be a real number with \(0 < p < 1\text{.}\) A random variable \(X\) has a binomial distribution with parameters \(n\) and \(p\text{,}\) if its distribution function satisfies
for any integer \(k\) with \(0 \leq k \leq n\text{.}\)
Theorem 3.3.14. Expectation of a Binomial Distribution.
Let \(n \geq 1\) be an integer, let \(p\) be a real number with \(0< p < 1\text{,}\) and let \(X\) be a random variable that has a binomial distribution with parameters \(n\) and \(p\text{.}\) Then
Proof.
We define a sequence \(X_1,X_2,\ldots,X_n\) of random variables each representing a Bernoulli trial that takes the value 1 with probability \(p\) and the value 0 with probability \(1-p\text{.}\) Observe that
because
\(X\) counts the number of heads in the sequence of \(n\) coin flips, and
the summation on the right-hand side is equal to the number of \(1\)'s in the sequence \(X_1,X_2,\ldots,X_n\text{,}\) which, by definition, is equal to the number of successes in the sequence of \(n\) Bernoulli trials.
Using the Linearity of Expectation (Theorem 3.3.10), we have
Thus, we have to determine the expected value for each \(X_i\text{.}\) Since each \(X_i\) is a Bernoulli trial, by Theorem 3.3.4,
We conclude that
Subsection 3.3.5 Variance
¶The usefulness of the expected value as a prediction for the outcome of an experiment is increased when the outcome is not likely to deviate too much from the expected value. In this section we shall introduce a measure of this deviation, called the variance.
First, we must define what we mean by deviation.
Definition 3.3.15. Deviation of a Random Variable.
Let \(X\) be a random variable with expected value \(\mathbb{E}(X)\text{.}\) Then the deviation of \(X\) at \(s \in S\) is
The deviation can be thought of as the measurement of how far \(X(s)\) is from the expected value of \(X\text{.}\)
The variance is the weighted average (or expectation) of the square of the deviation. This can be seen as answering the question “how much on average does the value of \(X\) vary from its expected value?”
Definition 3.3.16. Variance.
Let \(X\) be a random variable with expected value \(\mathbb{E}(X)\text{.}\) Then the variance of \(X\text{,}\) denoted by \(V(X)\) or \(\sigma^2\text{,}\) is
Note that because of the squaring, the variance is not in the same units as \(X(s)\) and \(\mathbb{E}(X)\text{.}\) A low variance indicates that the values of \(X\) tend to be close to the expected value, while a large variance indicates that \(X\)'s outcomes are spread out over a wider range.
Definition 3.3.17. Standard Deviation of a Random Variable.
Let \(X\) be a random variable with variance \(V(X)\text{.}\) Then the standard deviation of \(X\) is
Like the variance, a low standard deviation indicates that the outcomes of an experiment, or values of \(X\) tend to be close to the expected value, while a high standard deviation indicates that the outcomes are spread out over a wider range of values. The standard deviation is often more useful than the variance because it is in the same units as \(X\) and \(\mathbb{E}(X)\text{.}\)
Theorem 3.3.18. Variance as Expectation of Deviation.
If \(X\) is a numerically valued random variable with expected value \(\mathbb{E}(X) = \mu\text{,}\) we can rewrite the formula above as an expectation of the deviations.
Theorem 3.3.19. Variance Using Squared Expectations.
Applying the definition of \(\mathbb{E}(X)\text{,}\)Definition 3.3.1 to the formula for variance Definition 3.3.16, we have a third form:
Example 3.3.20. Variance of a Die Roll.
Continuing our scenario from Example 3.3.5, assume we roll a fair die. Define the random variable \(X\) to be the value of the result, \(X\) takes each of the values in \(S = \{1,2,3,4,5,6\}\) with equal probability \(1/6\text{,}\) and we have calculated \(\mathbb{E}(X) = \frac{7}{2}\text{.}\) To use the variance formula in Definition 3.3.16 we calculate the squared difference between \(X(s)\) and \(\mathbb{E}(X)\text{,}\) shown in the table below:
From the table we can calculate
Example 3.3.21. Variance of a Die Roll using \(\mu^2\).
We can calculate the same variance of a fair die using Theorem 3.3.19. First we calculate
Then we must calculate the expectation of the squares of \(X\text{:}\)
Finally:
Theorem 3.3.22. Variance of Successes in a Bernoulli Trial.
Let \(X\) be a random variable representing a Bernoulli Trial that takes the value 1 with probability \(p\) and the value 0 with probability \(1-p = q\text{.}\) The variance of \(X\) is
Proof.
If \(X\) is a random variable representing a Bernoulli trial, then we know from Theorem 3.3.4 that \(\mathbb{E}(X) = p.\) By Definition 3.3.1
It follows using Theorem 3.3.19 that
Theorem 3.3.23. Bienaymé's Formula.
Let \(X_1, X_2, \dots, X_n\) be \(n\) independent random variables on sample space \(\text{.}\) The variance of the sum \(X_1 + X_2 + \dots + X_n\) is the sum of the variances
Theorem 3.3.24. Variance of a Geometric Distribution.
Let \(p\) be a real number with \(0 < p < 1\) and let \(X\) be a random variable that has a geometric distribution with parameter \(p\text{.}\) The variance of \(X\) is:
Proof.
Requires calculus so not given here
.Theorem 3.3.25. Variance of a Binomial Distribution.
Let \(X\) be a random variable that has a binomial distribution with parameters \(n\) and \(p\text{.}\) Then the variance of \(X\) is
Proof.
Like in the proof of Theorem 3.3.14 we define a sequence of random variables \(X_1,X_2,\ldots,X_n\) each representing a Bernoulli trial that takes the value 1 with probability \(p\) and the value 0 with probability \(1-p\text{.}\) We know from Theorem 3.3.22 that
Therefore using Bienaymé's Formula, Theorem 3.3.23, the variance for the whole distribution is
Exercises 3.3.6 Exercises for Section 3.3
1.
A number is chosen at random from the set \(S = \{−1, 0, 1\}\text{.}\) Let \(X\) be the number chosen. Find the expected value, variance, and standard deviation of \(X\text{.}\)
Because the numbers are chosen randomly:
The expected value of \(X\) is then
Using \(V(X) =\mathbb{E}(X^2) - (\mathbb{E}(X))^2\text{:}\)
The standard deviation is then:
2.
A random variable X has the distribution
Find the expected value, variance, and standard deviation of \(X\text{.}\)
3.
7 A coin is tossed three times. Let\(X\) be the number of heads that turn up. Find \(V (X)\) and \(\sigma(X)\) (the standard deviation of \(X\)).
This is a straightforward application of the variance of a binomial distribution. \(X\) is a random variable with binomial distribution with parameters \(n = 3>\) and \(p = 1/2\)
4.
A random sample of 2400 people are asked if they favor a government proposal to develop new nuclear power plants. If 40 percent of the people in the country are in favor of this proposal, find the expected value and the standard deviation for the number of people in the sample who favored the proposal.
5.
In Las Vegas, a roulette wheel has 38 slots numbered 0, 00, 1, 2, . . . , 36. The 0 and 00 slots are green and half of the remaining 36 slots are red and half are black. A croupier spins the wheel and throws in an ivory ball. If you bet 1 dollar on red, you win 1 dollar if the ball stops in a red slot and otherwise you lose 1 dollar.
You place a 1-dollar bet on black. Let \(X\) be your winnings. Define \(S \) and calculate the values of \(P(X), \mathbb{E}(X) \) and \(V (X)\text{.}\)
Here the set of outcomes \(S\) is the color of the slots we care about: {black, not black}. Let \(X\) be a random variable that represents your winnings, it takes the value 1 if a spin results in a black slot, and the value -1 otherwise. The probability of winning in a spin is \(P(X = 1)\) the probability the ball lands on a black slot: \(\frac{18}{38}\text{.}\) The probability of losing a spin is \(P(X = -1)\) the probability the ball lands on a green or red slot: \(\frac{20}{38}\text{.}\) Therefore:
To calculate the variance we calculate \(\mathbb{E}(X^2)\)
Using \(V(X) =\mathbb{E}(X^2) - (\mathbb{E}(X))^2\text{:}\)
6.
Another form of bet for roulette is to bet that a specific number (say 17) will turn up. If the ball stops on your number, you get your dollar back plus 35 dollars. If not, you lose your dollar.
You place a 1-dollar bet on the number 17. Let \(Y\) be your winnings. Define \(S \) and calculate the values of \(P(Y), \mathbb{E}(Y) \) and \(V (Y)\text{.}\) Compare your answers from exercise 5, \(\mathbb{E}(X), \mathbb{E}(Y)\text{,}\) and \(V (X), V (Y).\) What do these computations tell you about the nature of your winnings if you make a sequence of bets, betting each time on a number versus betting each time on a color?
7.
We flip a fair coin 27 times (independently). For each heads, you win 3 dollars, whereas for each tails, you lose 2 dollars. Define the random variable \(Y\) to be the amount of money that you win. Compute the expected value \(\mathbb{E}(Y)\text{.}\)
For a single flip:
Therefore the expected amount of winnings is \(27 * \frac{1}{2} = 13.5\) dollars
Alternatively we can think of this as a binomial distribution with \(p = 1/2, n = 27\text{.}\) Let \(X\) be a random variable that takes the value 1 for each head and 0 for each tail.
The 13.5 is the expected number of wins, so the expected winnings is \(13.5 * 3 = 40.5\) dollars. Then we also must calculate the expected losses which is \(13.5 * 2 = 27\) dollars. So the overall expected winnings is \(40.5 - 27 = 13.5\)
8.
Assume we flip a fair coin twice, independently of each other. Define the following random variables:
Determine the expected values of these three random variables.
Are \(X\) and \(Y\) independent random variables?
Are \(X\) and \(Z\) independent random variables?
Are \(Y\) and \(Z\) independent random variables?