DM4CS Conditional Probability, Independence, and Bayes' Rule

Section 3.2 Conditional Probability, Independence, and Bayes' Rule

Subsection 3.2.1 Conditional probability

Anil Maheshwari has two children. We are told that one of them is a boy. What is the probability that the other child is also a boy? Most people will say that this probability is \(1/2\text{.}\) We will show below that this is not the correct answer.

Since Anil has two children, the sample space is

\begin{equation*} S = \{ (b,b) , (b,g) , (g,b) , (g,g) \} , \end{equation*}

where, for example, \((b,g)\) indicates that the youngest child is a boy and the oldest child is a girl. We assume a uniform probability function, so that each outcome has a probability of \(1/4\text{.}\)

We are given the additional information that one of the two children is a boy, or, to be more precise, that at least one of the two children is a boy. This means that the actual sample space is not \(S\text{,}\) but

\begin{equation*} \{ (b,b) , (b,g) , (g,b) \} . \end{equation*}

When asking for the probability that the other child is also a boy, we are really asking for the probability that both children are boys. Since there is only one possibility (out of three) for both children to be boys, it follows that this probability is equal to \(1/3\text{.}\)

This is an example of a conditional probability: We are asking for the probability of an event (both children are boys) given that another event (at least one of the two children is a boy) occurs.

Definition 3.2.1. Conditional Probability.

Let \((S,P)\) be a probability space and let \(A\) and \(B\) be two events with \(P(B) > 0\text{.}\) The conditional probability \(P(A \mid B)\text{,}\) pronounced as “the probability of \(A\) given \(B\)”, is defined to be

\begin{equation*} P(A \mid B) = \frac{P(A \cap B)}{P(B)} . \end{equation*}

Example 3.2.2. Probability of a second boy.

Returning to Anil's two children, we saw that the sample space is

\begin{equation*} S = \{ (b,b) , (b,g) , (g,b) , (g,g) \} \end{equation*}

and we assumed a uniform probability function. The events we considered are

\begin{equation*} A= \text{both children are boys} \end{equation*}

and

\begin{equation*} B= \text{at least one of the two children is a boy} \end{equation*}

and we wanted to know \(P(A \mid B)\text{.}\) Writing \(A\) and \(B\) as subsets of the sample space \(S\text{,}\) we get

\begin{equation*} A = \{ (b,b) \} \end{equation*}

and

\begin{equation*} B = \{ (b,b) , (b,g) , (g,b) \} . \end{equation*}

Using Definition 3.2.1, it follows that

\begin{equation*} P(A \mid B) = \frac{P(A \cap B)}{P(B)} = \frac{P(A)}{P(B)} = \frac{|A|/|S|}{|B|/|S|} = \frac{1/4}{3/4} = 1/3 , \end{equation*}

which is the same answer as we got before.

Example 3.2.3. Probability of rolling a three given an odd number is rolled.

Assume we roll a fair die, i.e., we choose an element uniformly at random from the sample space

\begin{equation*} S = \{1,2,3,4,5,6\} . \end{equation*}

Consider the events

\begin{equation*} A=\text{the result is } 3 \end{equation*}

and

\begin{equation*} B=\text{the result is an odd integer.} \end{equation*}

What is the conditional probability \(P(A \mid B)\text{?}\) To determine this probability, we assume that event \(B\) occurs, i.e., the roll of the die resulted in one of \(1\text{,}\) \(3\text{,}\) and \(5\text{.}\) Given that event \(B\) occurs, event \(A\) occurs in one out of these three possibilities. Thus, \(P(A \mid B)\) should be equal to \(1/3\text{.}\) We are going to verify that this is indeed the answer we get when using Definition 3.2.1: Since

\begin{equation*} A = \{ 3 \} \end{equation*}

and

\begin{equation*} B = \{ 1,3,5 \} , \end{equation*}

we have

\begin{equation*} P(A \mid B) = \frac{P(A \cap B)}{P(B)} = \frac{P(A)}{P(B)} = \frac{|A|/|S|}{|B|/|S|} = \frac{1/6}{3/6} = 1/3 . \end{equation*}

Let us now consider the conditional probability \(P(B \mid A)\text{.}\) Thus, we are given that event \(A\) occurs, i.e., the roll of the die resulted in \(3\text{.}\) Since \(3\) is an odd integer, event \(B\) is guaranteed to occur. Therefore, \(P(B \mid A)\) should be equal to \(1\text{.}\) Again, we are going to verify that this is indeed the answer we get when using Definition 3.2.1:

\begin{equation*} P(B \mid A) = \frac{P(B \cap A)}{P(A)} = \frac{P(A)}{P(A)} = 1 . \end{equation*}

This shows that, in general, \(P(A \mid B)\) is not equal to \(P(B \mid A)\text{.}\) Observe that this is not surprising. (Do you see why?)

Theorem 3.2.4. Sum of the conditional probabilities of complements.

Let \((S,P)\) be a probability space and let \(A\) and \(B\) be two events with \(P(B)>0\text{.}\) Then

\begin{equation*} P(A \mid B) + P \left( \overline{A} \mid B \right) = 1 . \end{equation*}

Proof.

By definition, we have

\begin{equation*} \begin{array}{cc} P(A \mid B) + P \left(\overline{A} \mid B\right) &= \frac{P(A \cap B)}{P(B)} + \frac{P\left(\overline{A} \cap B\right)}{P(B)} \\ &= \frac{P(A \cap B) + P\left(\overline{A} \cap B\right)}{P(B)} . \end{array} \end{equation*}

Since the events \(A \cap B\) and \(\overline{A} \cap B\) are disjoint, we have, by Theorem 3.1.10,

\begin{equation*} P(A \cap B) + P\left(\overline{A} \cap B\right) = P\left( (A \cap B) \cup \left(\overline{A} \cap B\right) \right) . \end{equation*}

By drawing a Venn diagram, you will see that

\begin{equation*} (A \cap B) \cup \left(\overline{A} \cap B\right) = B , \end{equation*}

implying that

\begin{equation*} P(A \cap B) + P\left(\overline{A} \cap B\right) = P(B) . \end{equation*}

We conclude that

\begin{equation*} P(A \mid B) + P\left(\overline{A} \mid B\right) = \frac{P(B)}{P(B)} = 1 . \end{equation*}

Theorem 3.2.5. The Law of Total Probability.

Let \((S,P)\) be a probability space and let \(A\) be an event. Assume that \(B_1,B_2,\ldots,B_n\) is a sequence of events such that

\(P \left( B_i \right) > 0\) for all \(i\) with \(1 \leq i \leq n\text{,}\)
the events \(B_1,B_2,\ldots,B_n\) are pairwise disjoint, and
\(\bigcup_{i=1}^n B_i = S\text{.}\)

Then

\begin{equation*} P(A) = \sum_{i=1}^n P \left( A \mid B_i \right) \cdot P \left( B_i \right) . \end{equation*}

Proof.

The assumptions imply that

\begin{equation*} \begin{array}{cc} A &= A \cap S \\ &= A \cap \left( \bigcup_{i=1}^n B_i \right) \\ &= \bigcup_{i=1}^n \left( A \cap B_i \right) . \end{array} \end{equation*}

Since the events \(A \cap B_1 , A \cap B_2 , \ldots , A \cap B_n\) are pairwise disjoint, it follows from Lemma~\ref{lemSumPr} that

\begin{equation*} \begin{array}{cc} P(A) &= P \left( \bigcup_{i=1}^n \left( A \cap B_i \right) \right) \\ &= \sum_{i=1}^n P \left( A \cap B_i \right) . \end{array} \end{equation*}

The theorem follows by observing that, from Definition 3.2.1,

\begin{equation*} P \left( A \cap B_i \right) = P \left( A \mid B_i \right) \cdot P \left( B_i \right) . \end{equation*}

Subsection 3.2.2 Independent Events

Consider two events \(A\) and \(B\) in a sample space \(S\text{.}\) In this section, we will define the notion of these two events being “independent”. Intuitively, this should express that (i) the probability that event \(A\) occurs does not depend on whether or not event \(B\) occurs, and (ii) the probability that event \(B\) occurs does not depend on whether or not event \(A\) occurs. Thus, if we assume that \(P(A)>0\) and \(P(B)>0\text{,}\) then (i) \(P(A)\) should be equal to the conditional probability \(P(A \mid B)\text{,}\) and (ii) \(P(B)\) should be equal to the conditional probability \(P(B \mid A)\text{.}\) As we will show below, the following definition exactly captures this.

Definition 3.2.6. Independent Events.

Let \((S,P)\) be a probability space and let \(A\) and \(B\) be two events. We say that \(A\) and \(B\) are independent if

\begin{equation*} P(A \cap B) = P(A) \cdot P(B) . \end{equation*}

In this definition, it is not assumed that \(P(A)>0\) and \(P(B)>0\text{.}\) If \(P(B) > 0\text{,}\) then

\begin{equation*} P(A \mid B) = \frac{P(A \cap B)}{P(B)} , \end{equation*}

and \(A\) and \(B\) are independent if and only if

\begin{equation*} P(A \mid B) = P(A) . \end{equation*}

Similarly, if \(P(A)>0\text{,}\) then \(A\) and \(B\) are independent if and only if

\begin{equation*} P(B \mid A) = P(B) . \end{equation*}

Example 3.2.7. Independence of Rolling Two Dice.

Assume we roll a red die and a blue die; thus, the sample space is

\begin{equation*} S = \{ (i,j) : 1 \leq i \leq 6 , 1 \leq j \leq 6 \} , \end{equation*}

where \(i\) is the result of the red die and \(j\) is the result of the blue die. We assume a uniform probability function. Thus, each outcome has a probability of \(1/36\text{.}\)

Let \(D_1\) denote the result of the red die and let \(D_2\) denote the result of the blue die. Consider the events

\begin{equation*} A: D_1+D_2=7 \end{equation*}

and

\begin{equation*} B: D_1=4. \end{equation*}

Are these events independent?

Since

\begin{equation*} A = \{ (1,6) , (2,5) , (3,4) , (4,3) , (5,2) , (6,1) \} , \end{equation*}

we have \(P(A) = 6/36 = 1/6\text{.}\)
Since

\begin{equation*} B = \{ (4,1) , (4,2) , (4,3) , (4,4) , (4,5) , (4,6) \} , \end{equation*}

we have \(P(B) = 6/36 = 1/6\text{.}\)
Since

\begin{equation*} A \cap B = \{ (4,3) \} , \end{equation*}

we have \(P(A \cap B) = 1/36\text{.}\)
It follows that \(P(A \cap B) = P(A) \cdot P(B)\) and we conclude that \(A\) and \(B\) are independent.

As an exercise, you should verify that the events

\begin{equation*} A_2: D_1+D_2=11 \end{equation*}

and

\begin{equation*} B_2: D_1=5 \end{equation*}

are not independent.

Now consider the two events

\begin{equation*} A_3: D_1+D_2=4 \end{equation*}

and

\begin{equation*} B_3: D_1=4. \end{equation*}

Since \(A_3 \cap B_3 = \emptyset\text{,}\) we have

\begin{equation*} P \left( A_3 \cap B_3 \right) = P(\emptyset) = 0 . \end{equation*}

On the other hand, \(P \left( A_3 \right) = 1/12\) and \(P \left( B_3 \right) = 1/6\text{.}\) Thus,

\begin{equation*} P \left( A_3 \cap B_3 \right) \neq P \left( A_3 \right) \cdot P \left( B_3 \right) \end{equation*}

and the events \(A_3\) and \(B_3\) are not independent. This is not surprising: If we know that \(B_3\) occurs, then \(A_3\) cannot not occur. Thus, the event \(B_3\) has an effect on the probability that the event \(A_3\) occurs.

Consider two events \(A\) and \(B\) in a sample space \(S\text{.}\) If these events are independent, then the probability that \(A\) occurs does not depend on whether or not \(B\) occurs. Since whether or not \(B\) occurs is the same as whether the complement \(\overline{B}\) does not or does occur, it should not be a surprise that the events \(A\) and \(\overline{B}\) are independent as well.

Theorem 3.2.8. Independence of Complements.

Let \((S,P)\) be a probability space and let \(A\) and \(B\) be two events. If \(A\) and \(B\) are independent, then \(A\) and \(\overline{B}\) are also independent.

Proof.

To prove that \(A\) and \(\overline{B}\) are independent, we have to show that

\begin{equation*} P \left( A \cap \overline{B} \right) = P(A) \cdot P \left( \overline{B} \right) . \end{equation*}

Using Theorem 3.1.13, this is equivalent to showing that

\begin{equation*} P \left( A \cap \overline{B} \right) = P(A) \cdot \left( 1 - P(B) \right) . \end{equation*}

Since the events \(A \cap B\) and \(A \cap \overline{B}\) are disjoint and

\begin{equation*} A = \left( A \cap B \right) \cup \left( A \cap \overline{B} \right) , \end{equation*}

it follows from Theorem 3.1.10 that

\begin{equation*} P(A) = P( A \cap B ) + P \left( A \cap \overline{B} \right) . \end{equation*}

Since \(A\) and \(B\) are independent, we have

\begin{equation*} P(A \cap B) = P(A) \cdot P(B) . \end{equation*}

It follows that

\begin{equation*} P(A) = P(A) \cdot P(B) + P \left( A \cap \overline{B} \right) , \end{equation*}

which is equivalent to

\begin{equation*} P \left( A \cap \overline{B} \right) = P(A) \cdot \left( 1 - P(B) \right) .\Box\text{.} \end{equation*}

We have defined the notion of two events being independent. The following definition generalizes this in two ways to sequences of events:

Definition 3.2.9. Pairwise and Independent Events.

Let \((S,P)\) be a probability space, let \(n\geq 2\text{,}\) and let \(A_1,A_2,\ldots,A_n\) be a sequence of events.

We say that this sequence is pairwise independent if for any two distinct indices \(i\) and \(j\text{,}\) the events \(A_i\) and \(A_j\) are independent, i.e.,

\begin{equation*} P \left( A_i \cap A_j \right) = P \left( A_i \right) \cdot P \left( A_j \right) . \end{equation*}
We say that this sequence is mutually independent if for all \(k\) with \(2 \leq k \leq n\) and all indices \(i_1 <i_2 <\ldots <i_k\text{,}\)

\begin{equation*} P \left( A_{i_1} \cap A_{i_2} \cap \cdots \cap A_{i_k} \right) = P \left( A_{i_1} \right) \cdot P \left( A_{i_2} \right) \cdots P \left( A_{i_k} \right) . \end{equation*}

Thus, in order to show that the sequence \(A_1,A_2,\ldots,A_n\) is pairwise independent, we have to verify \(n \choose 2\) equalities. On the other hand, to show that this sequence is mutually independent, we have to verify \(\sum_{k=2}^n {n \choose k} = 2^n - 1 - n\) equalities.

For example, if we want to prove that the sequence \(A,B,C\) of three events is mutually independent, then we have to show that

\begin{equation*} P(A \cap B) = P(A) \cdot P(B) , \end{equation*}

\begin{equation*} P(A \cap C) = P(A) \cdot P(C) , \end{equation*}

\begin{equation*} P(B \cap C) = P(B) \cdot P(C) , \end{equation*}

and

\begin{equation*} P(A \cap B \cap C) = P(A) \cdot P(B) \cdot P(C) . \end{equation*}

Example 3.2.10. Pairwise but not Mutually Independent Coin Flips.

Consider flipping a coin three times and assume that the result is a uniformly random element from the sample space

\begin{equation*} S = \{ HHH , HHT , HTH , THH , HTT , THT , TTH , TTT \} , \end{equation*}

where, e.g., \(HHT\) indicates that the first two flips result in heads and the third flip results in tails. Define the events

\begin{equation*} A = \text{"flips 1 and 2 have the same outcome"}, \end{equation*}

\begin{equation*} B = \text{"flips 2 and 3 have the same outcome"}, \end{equation*}

and

\begin{equation*} C = \text{"flips 1 and 3 have the same outcome".} \end{equation*}

If we write these events as subsets of the sample space, then we get

\begin{equation*} A = \{ HHH , HHT , TTH , TTT \} , \end{equation*}

\begin{equation*} B = \{ HHH , THH , HTT , TTT \} , \end{equation*}

and

\begin{equation*} C = \{ HHH , HTH , THT , TTT \} . \end{equation*}

It follows that

\begin{equation*} \begin{array}{ccccccc} P(A) & = & |A|/|S| & = & 4/8 & = & 1/2 , \\ P(B) & = & |B|/|S| & = & 4/8 & = & 1/2 , \\ P(C) & = & |C|/|S| & = & 4/8 & = & 1/2 , \\ P(A \cap B) & = & |A \cap B|/|S| & = & 2/8 & = & 1/4 , \\ P(A \cap C) & = & |A \cap C|/|S| & = & 2/8 & = & 1/4 , \\ P(B \cap C) & = & |B \cap C|/|S| & = & 2/8 & = & 1/4 . \end{array} \end{equation*}

Thus, the sequence \(A,B,C\) is pairwise independent. Since

\begin{equation*} A \cap B \cap C = \{ HHH , TTT \} , \end{equation*}

we have

\begin{equation*} P(A \cap B \cap C) = |A \cap B \cap C|/|S| = 2/8 = 1/4 . \end{equation*}

Thus,

\begin{equation*} P(A \cap B \cap C) \neq P(A) \cdot P(B) \cdot P(C) \end{equation*}

and, therefore, the sequence \(A,B,C\) is not mutually independent. Of course, this is not surprising: If both events \(A\) and \(B\) occur, then event \(C\) also occurs.

Subsection 3.2.3 Random Variables

A random variable is neither random nor variable.

We have already seen random variables in Section 3.1, even though we did not use that term there. For example, in Example 3.1.5, we rolled a die twice and were interested in the sum of the results of these two rolls. In other words, we did an “experiment” (rolling a die twice) and asked for a function of the outcome (the sum of the results of the two rolls).

Definition 3.2.11. Random Variable.

Let \(S\) be a sample space. A random variable on the sample space \(S\) is a function \(X : S \rightarrow \mathbb{R}\text{.}\)

In the example given above, the sample space is

\begin{equation*} S = \{ (i,j) : 1 \leq i \leq 6 , 1 \leq j \leq 6 \} \end{equation*}

and the random variable is the function \(X : S \rightarrow \mathbb{R}\) defined by

\begin{equation*} X(i,j) = i+j \end{equation*}

for all \((i,j)\) in \(S\text{.}\)

Note that the term “random variable” is misleading: A random variable is not random, but a function that assigns, to every outcome \(\omega\) in the sample space \(S\text{,}\) a real number \(X(\omega)\text{.}\) Also, a random variable is not a variable, but a function.

Example 3.2.12. Flipping Three Coins with Random Variables.

Assume we flip three coins. The sample space is

\begin{equation*} S = \{ HHH , HHT , HTH , HTT , THH , THT , TTH , TTT \} , \end{equation*}

where, e.g., \(TTH\) indicates that the first two coins come up tails and the third coin comes up heads.

Let \(X : S \rightarrow \mathbb{R}\) be the random variable that maps any outcome (i.e., any element of \(S\)) to the number of heads in the outcome. Thus,

\begin{equation*} \begin{array}{lcc} X(HHH) & = & 3 , \\ X(HHT) & = & 2 , \\ X(HTH) & = & 2 , \\ X(HTT) & = & 1 , \\ X(THH) & = & 2 , \\ X(THT) & = & 1 , \\ X(TTH) & = & 1 , \\ X(TTT) & = & 0 . \end{array} \end{equation*}

If we define the random variable \(Y\) to be the function \(Y: S \rightarrow \mathbb{R}\) that

maps an outcome to \(1\) if all three coins come up heads or all three coins come up tails, and
maps an outcome to \(0\) in all other cases,

then we have

\begin{equation*} \begin{array}{lcc} Y(HHH) & = & 1 , \\ Y(HHT) & = & 0 , \\ Y(HTH) & = & 0 , \\ Y(HTT) & = & 0 , \\ Y(THH) & = & 0 , \\ Y(THT) & = & 0 , \\ Y(TTH) & = & 0 , \\ Y(TTT) & = & 1 . \end{array} \end{equation*}

Since a random variable is a function \(X : S \rightarrow \mathbb{R}\text{,}\) it maps any outcome \(\omega\) to a real number \(X(\omega)\text{.}\) Usually, we just write \(X\) instead of \(X(\omega)\text{.}\) Thus, for any outcome in the sample space \(S\text{,}\) we denote the value of the random variable, for this outcome, by \(X\text{.}\) In the example above, we flip three coins and write

\begin{equation*} X = \text{ the number of heads} \end{equation*}

and

\begin{equation*} Y = \left\{ \begin{array}{ll} 1 & \text{if all flips have the same outcome,} \\ 0 & \text{otherwise.} \end{array} \right. \end{equation*}

Random variables give rise to events in a natural way. In the three-coin example, “\(X=0\)” corresponds to the event \(\{TTT\}\text{,}\) whereas “\(X=2\)” corresponds to the event \(\{HHT,HTH,THH\}\text{.}\) The table below gives some values of the random variables \(X\) and \(Y\text{,}\) together with the corresponding events.

\begin{equation*} \begin{array}{|c|l|} \hline \text{value} & \text{event} \\ \hline X=0 & \{TTT\} \\ X=1 & \{HTT,THT,TTH\} \\ X=2 & \{HHT,HTH,THH\} \\ X=3 & \{HHH\} \\ X=4 & \emptyset \\ \hline Y=0 & \{ HHT , HTH , HTT , THH , THT , TTH \} \\ Y=1 & \{HHH,TTT\} \\ Y=2 & \emptyset \\ \hline \end{array} \end{equation*}

Thus, the event “\(X=x\)” corresponds to the set of all outcomes that are mapped, by the function \(X\text{,}\) to the value \(x\text{:}\)

Definition 3.2.13. \(X = x\).

Let \(S\) be a sample space and let \(X : S \rightarrow \mathbb{R}\) be a random variable. For any real number \(x\text{,}\) we define “\(X=x\)” to be the event

\begin{equation*} \{ \omega \in S : X(\omega) = x \} . \end{equation*}

Example 3.2.14. Probabilities of Flipping Three Coins with Random Variables.

Let us return to the example in which we flip three coins. Assume that the coins are fair and the three flips are mutually independent. Consider again the corresponding random variables \(X\) and \(Y\text{.}\) It should be clear how we determine, for example, the probability that \(X\) is equal to \(0\text{,}\) which we will write as \(P(X=0)\text{.}\) Using our interpretation of “\(X=0\)” as being the event \(\{TTT\}\text{,}\) we get

\begin{equation*} \begin{array}{rl} P(X=0) & = P(TTT) \\ & = 1/8 . \end{array} \end{equation*}

Similarly, we get

\begin{equation*} \begin{array}{rl} P(X=1) & = P(\{HTT,THT,TTH\}) \\ & = 3/8 , \\ P(X=2) & = P(\{HHT,HTH,THH\}) \\ & = 3/8 , \\ P(X=3) & = P(\{HHH\}) \\ & = 1/8 , \\ P(X=4) & = P(\emptyset) \\ & = 0 , \\ P(Y=0) & = P(\{ HHT , HTH , HTT , THH , THT , TTH \}) \\ & = 6/8 \\ & = 3/4 , \\ P(Y=1) & = P(\{HHH,TTT\}) \\ & = 2/8 \\ & = 1/4 , \\ P(Y=2) & = P(\emptyset) \\ & = 0 . \end{array} \end{equation*}

Consider an arbitrary probability space \((S,P)\) and let \(X : S \rightarrow \mathbb{R}\) be a random variable. Using Definition 3.1.4 and Definition 3.2.13, the probability of the event “\(X=x\)”, i.e., the probability that \(X\) is equal to \(x\text{,}\) is equal to

\begin{equation*} \begin{array}{rl} P(X=x) & = P( \{ \omega \in S : X(\omega) = x \} ) \\ & = \sum_{\omega: X(\omega) = x} P(\omega) . \end{array} \end{equation*}

We have interpreted “\(X=x\)” as being an event. We extend this to more general statements involving \(X\text{.}\) For example, “\(X \geq x\)” denotes the event

\begin{equation*} \{ \omega \in S : X(\omega) \geq x \} . \end{equation*}

For our three-coin example, the random variable \(X\) can take each of the values \(0\text{,}\) \(1\text{,}\) \(2\text{,}\) and \(3\) with a positive probability. As a result, “\(X \geq 2\)” denotes the event “\(X=2\) or \(X=3\)”, and we have

\begin{equation*} \begin{array}{rl} P(X \geq 2) & = P(X=2 \vee X=3) \\ & = P(X=2) + P(X=3) \\ & = 3/8 + 1/8 \\ & = 1/2 . \end{array} \end{equation*}

In Subsection 3.2.2, we have defined the notion of two events being independent. The following definition extends this to random variables.

Definition 3.2.15. Independent Random Variables.

Let \((S,P)\) be a probability space and let \(X\) and \(Y\) be two random variables on \(S\text{.}\) We say that \(X\) and \(Y\) are independent if for all real numbers \(x\) and \(y\text{,}\) the events “\(X=x\)” and “\(Y=y\)” are independent, i.e.,

\begin{equation*} P(X=x \wedge Y=y) = P(X=x) \cdot P(Y=y) . \end{equation*}

There are two ways to generalize the notion of two random variables being independent to sequences of random variables:

Definition 3.2.16. Pairwise and Mutually Independent Random Variables.

Let \((S,P)\) be a probability space, let \(n\geq 2\text{,}\) and let \(X_1,X_2,\ldots,X_n\) be a sequence of random variables on \(S\text{.}\)

We say that this sequence is pairwise independent if for all real numbers \(x_1,x_2,\ldots,x_n\text{,}\) the sequence “\(X_1=x_1\)”, “\(X_2=x_2\)”, \(\ldots\text{,}\) “\(X_n=x_n\)” of events is pairwise independent.
We say that this sequence is mutually independent if for all real numbers \(x_1,x_2,\ldots,x_n\text{,}\) the sequence “\(X_1=x_1\)”, “\(X_2=x_2\)”, \(\ldots\text{,}\) “\(X_n=x_n\)” of events is mutually independent.

Subsection 3.2.4 Bayes' Theorem

Bayes' theorem is to the theory of probability what the Pythagorean theorem is to geometry.
―Sir Harold Jeffreys

Bayes' theorem (alternatively Bayes' law or Bayes' rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if cancer is related to age, then, using Bayes' theorem, a person's age can be used to more accurately assess the probability that they have cancer, compared to the assessment of the probability of cancer made without knowledge of the person's age.

One of the many applications of Bayes' theorem is Bayesian inference, a particular approach to statistical inference.

Definition 3.2.17. Bayes' Theorem.

Let \(A\) and \(B\) be events in some probability space \((S,P)\text{,}\) such that \(P(A) \neq 0\) and \(P(B) \neq 0\text{,}\) then:

\begin{equation*} \begin{array}{rl} P(A \mid B) &= \frac{P(B \mid A) \cdot P(A)}{P(B)} \\ &= \frac{P(B \mid A) \cdot P(A)}{(P(B \mid A) \cdot P(A)) + (P(B \mid \overline{A}) \cdot P(\overline{A}))}\end{array} . \end{equation*}

Example 3.2.18. Probability of having chosen from a particular urn.

We have two urns \(U_i\) and \(U_j\text{.}\) \(U_i\) contains 2 black balls and 3 white balls. \(U_j\) contains 1 black ball and 1 white ball. An urn is chosen at random, then a ball is chosen at random from it. If a black ball is chosen, what is the probability it came from urn \(U_i\text{?}\)

Let \(I\) be the event urn \(U_i\) is chosen and \(B\) be the event a black ball is chosen.

Note that the probability of choosing either urn is \(P(I) = P(\overline{I}) = 1/2\) and the probability that we chose a black ball from urn \(U_i\) is \(P(B \mid I) = 2/5\text{.}\)
Next we need to know the probability of choosing a black ball no matter which urn was chosen. This is the probability of choosing a black ball from either urn.

\begin{equation*} \begin{array}{rl} P(B) & = P(B|I)P(I) + P(B|\overline{I})P(\overline{I})\\ & = (2/5)(1/2) + (1/2)(1/2)\\ & = (1/5) + (1/4)\\ & = (9/20) \end{array} \end{equation*}
Calculate the probability urn \(U_i\) is where the black ball came from \(P(I|B)\) using Bayes' Rule:

\begin{equation*} \begin{array}{rl} P(I|B) & = \frac{P(B|I)P(I)}{P(B)}\\ & =\frac{(2/5)(1/2)}{9/20}\\ & = \frac{1/5}{(9/20)}\\ & = \frac{4}{9} \end{array} \end{equation*}

Bayes' Theorem can be expanded to calculate the conditional probability from more than two events, \(A, \overline{A}\text{.}\)

Definition 3.2.19. Generalized Bayes' Theorem.

Let \(B\) be an event and \(A_1, A_2, \dots, A_n\) be mutually exclusive events in some probability space \((S,P)\text{,}\) such that \(P(A_i) \neq 0\) and \(P(B) \neq 0\) then:

\begin{equation*} \begin{array}{rl} P(A_j \mid B) &= \frac{P(B \mid A_j) \cdot P(A_j)}{\sum_{i=1}^{n}(P(B \mid A_i) \cdot P(A_i))} \end{array}. \end{equation*}

Exercises 3.2.5 Exercises for Section 3.2

1.

Assume that \(E\) and \(F\) are two events with positive probabilities. Show that if \(P(E|F) = P(E)\text{,}\) then \(P(F|E) = P(F)\text{.}\)

Solution

If \(P(E|F) = P(E)\) then the two events are independent.

\begin{equation*} \begin{array}{rl} P(E) & = \frac{P(E \cap F)}{P(F)}\\ P(E) * P(F) & = P(E \cap F) \\ P(F) & = \frac{P(E \cap F)}{P(E)} \\ P(F) & = P(F|E). \end{array} \end{equation*}

2.

A coin is tossed three times. What is the probability that exactly two heads occur, given that

the first outcome was a head?
the first outcome was a tail?
the first two outcomes were heads?
the first two outcomes were tails?
the first outcome was a head and the third outcome was a head?

3.

A die is rolled twice. What is the probability that the sum of the faces is greater than 7, given that

the first outcome was a 4?
the first outcome was greater than 3?
the first outcome was a 1?
the first outcome was less than 5?

Solution

\(\frac{1}{2} \)
\(\frac{2}{3} \)
\(0\)
\(\frac{1}{4} \)

4.

A card is drawn at random from a deck of cards. What is the probability that

it is a heart, given that it is red?
it is higher than a 10, given that it is a heart? (Interpret J, Q, K, A as 11, 12, 13, 14.)
it is a jack, given that it is red?

5.

A coin is tossed three times. Consider the following events:

A: Heads on the first toss.
B: Tails on the second.
C: Heads on the third toss.
D: All three outcomes the same (HHH or TTT).
E: Exactly one head turns up.

Which of the following pairs of these events are independent?
1. \(A,B\)
2. \(A,D\)
3. \(A,E\)
4. \(D,E\)
Which of the following triples of these events are independent?
1. \(A, B, C\)
2. \(A, B, D\)
3. \(C, D, E\)

Solution

(1) and (2)
(1)

6.

From a deck of five cards numbered 2, 4, 6, 8, and 10, respectively, a card is drawn at random and replaced. This is done three times. What is the probability that the card numbered 2 was drawn exactly two times, given that the sum of the numbers on the three draws is 12?

7.

A coin is tossed twice. Consider the following events.

A: Heads on the first toss.
B: Heads on the second toss.
C: The two tosses come out the same.

Show that \(A\text{,}\)~\(B\text{,}\)~\(C\) are pairwise independent but not independent.
Show that \(C\) is independent of \(A\) and \(B\) but not of \(A \cap B\text{.}\)

Solution

We have

\begin{equation*} \begin{array}{rl} P(A \cap B) = P(A \cap C) = P(B \cap C) & = \frac{1}{4}\\ P(A)P(B) = P(A)P(C) = P(B)P(C) & = \frac{1}{4}\\ P(A \cap B \cap C) = \frac{1}{4} & \neq P(A)P(B)P(C) = \frac{1}{8}. \end{array} \end{equation*}
A and C; and C and B are independent:

\begin{equation*} \begin{array}{rl} P(A \cap C) = P(A)P(C) & = \frac{1}{4}\\ P(C \cap B) = P(B)P(C) & = \frac{1}{4}\\ P(C \cap (A \cap B)) = \frac{1}{4} & \neq P(C)P(A \cap B) = \frac{1}{8}. \end{array} \end{equation*}

8.

Let \(S = \{a,b,c,d,e,f\}\text{.}\) Assume that \(P(a) = P(b) = 1/8\) and \(P(c) = P(d) = P(e) = P(f) = 3/16\text{.}\) Let \(A\text{,}\)~\(B\text{,}\) and~\(C\) be the events \(A = \{d,e,a\}\text{,}\) \(B = \{c,e,a\}\text{,}\) \(C = \{c,d,a\}\text{.}\) Show that \(P(A \cap B \cap C) = P(A)P(B)P(C)\) but no two of these events are independent.

9.

We have two urns \(U_a\) and \(U_b\text{.}\) \(U_a\) contains 9 black balls and 6 white balls. \(U_b\) contains 3 black balls and 1 white ball. An urn is chosen at random, then a ball is chosen at random from it. Let \(A\) be the event urn \(U_a\) is chosen and \(W\) be the event a white ball is chosen.

Calculate \(P(A|W)\) using Bayes' Rule

Solution

\begin{equation*} \begin{array}{rl} P(A|W) & = \frac{P(W|A)P(A)}{P(W|A)P(A) + P(W|B)P(B)}\\ & =\frac{(6/15)(1/2)}{(6/15)(1/2) + (1/4)(1/2)}\\ & = \frac{6/30}{(6/30) + (1/8)}\\ & = \frac{2/10}{(78/240)}\\ & = \frac{48}{78}\\ & = \frac{8}{13} \end{array} \end{equation*}

10.

Suppose that 1% of the patients tested in a hospital are infected with a virus. Furthermore, suppose that when a test for the virus is given, 98% of the patients actually infected with the virus test positive, and that 1% of the patients not infected still test positive for it. What is the probability that:

a patient testing positive is actually infected with the virus?
a patient testing positive is not infected with the virus?
a patient testing negative is infected with the virus?
a patient testing negative is not infected with the virus?