Section 3.2 Conditional Probability, Independence, and Bayes' Rule
¶Subsection 3.2.1 Conditional probability
¶Anil Maheshwari has two children. We are told that one of them is a boy. What is the probability that the other child is also a boy? Most people will say that this probability is \(1/2\text{.}\) We will show below that this is not the correct answer.
Since Anil has two children, the sample space is
where, for example, \((b,g)\) indicates that the youngest child is a boy and the oldest child is a girl. We assume a uniform probability function, so that each outcome has a probability of \(1/4\text{.}\)
We are given the additional information that one of the two children is a boy, or, to be more precise, that at least one of the two children is a boy. This means that the actual sample space is not \(S\text{,}\) but
When asking for the probability that the other child is also a boy, we are really asking for the probability that both children are boys. Since there is only one possibility (out of three) for both children to be boys, it follows that this probability is equal to \(1/3\text{.}\)
This is an example of a conditional probability: We are asking for the probability of an event (both children are boys) given that another event (at least one of the two children is a boy) occurs.
Definition 3.2.1. Conditional Probability.
Let \((S,P)\) be a probability space and let \(A\) and \(B\) be two events with \(P(B) > 0\text{.}\) The conditional probability \(P(A \mid B)\text{,}\) pronounced as “the probability of \(A\) given \(B\)”, is defined to be
Example 3.2.2. Probability of a second boy.
Returning to Anil's two children, we saw that the sample space is
and we assumed a uniform probability function. The events we considered are
and
and we wanted to know \(P(A \mid B)\text{.}\) Writing \(A\) and \(B\) as subsets of the sample space \(S\text{,}\) we get
and
Using Definition 3.2.1, it follows that
which is the same answer as we got before.
Example 3.2.3. Probability of rolling a three given an odd number is rolled.
Assume we roll a fair die, i.e., we choose an element uniformly at random from the sample space
Consider the events
and
What is the conditional probability \(P(A \mid B)\text{?}\) To determine this probability, we assume that event \(B\) occurs, i.e., the roll of the die resulted in one of \(1\text{,}\) \(3\text{,}\) and \(5\text{.}\) Given that event \(B\) occurs, event \(A\) occurs in one out of these three possibilities. Thus, \(P(A \mid B)\) should be equal to \(1/3\text{.}\) We are going to verify that this is indeed the answer we get when using Definition 3.2.1: Since
and
we have
Let us now consider the conditional probability \(P(B \mid A)\text{.}\) Thus, we are given that event \(A\) occurs, i.e., the roll of the die resulted in \(3\text{.}\) Since \(3\) is an odd integer, event \(B\) is guaranteed to occur. Therefore, \(P(B \mid A)\) should be equal to \(1\text{.}\) Again, we are going to verify that this is indeed the answer we get when using Definition 3.2.1:
This shows that, in general, \(P(A \mid B)\) is not equal to \(P(B \mid A)\text{.}\) Observe that this is not surprising. (Do you see why?)
Theorem 3.2.4. Sum of the conditional probabilities of complements.
Let \((S,P)\) be a probability space and let \(A\) and \(B\) be two events with \(P(B)>0\text{.}\) Then
Proof.
By definition, we have
Since the events \(A \cap B\) and \(\overline{A} \cap B\) are disjoint, we have, by Theorem 3.1.10,
By drawing a Venn diagram, you will see that
implying that
We conclude that
Theorem 3.2.5. The Law of Total Probability.
Let \((S,P)\) be a probability space and let \(A\) be an event. Assume that \(B_1,B_2,\ldots,B_n\) is a sequence of events such that
- \(P \left( B_i \right) > 0\) for all \(i\) with \(1 \leq i \leq n\text{,}\)
- the events \(B_1,B_2,\ldots,B_n\) are pairwise disjoint, and
- \(\bigcup_{i=1}^n B_i = S\text{.}\)
Then
Proof.
The assumptions imply that
Since the events \(A \cap B_1 , A \cap B_2 , \ldots , A \cap B_n\) are pairwise disjoint, it follows from Lemma~\ref{lemSumPr} that
The theorem follows by observing that, from Definition 3.2.1,
Subsection 3.2.2 Independent Events
¶Consider two events \(A\) and \(B\) in a sample space \(S\text{.}\) In this section, we will define the notion of these two events being “independent”. Intuitively, this should express that (i) the probability that event \(A\) occurs does not depend on whether or not event \(B\) occurs, and (ii) the probability that event \(B\) occurs does not depend on whether or not event \(A\) occurs. Thus, if we assume that \(P(A)>0\) and \(P(B)>0\text{,}\) then (i) \(P(A)\) should be equal to the conditional probability \(P(A \mid B)\text{,}\) and (ii) \(P(B)\) should be equal to the conditional probability \(P(B \mid A)\text{.}\) As we will show below, the following definition exactly captures this.
Definition 3.2.6. Independent Events.
Let \((S,P)\) be a probability space and let \(A\) and \(B\) be two events. We say that \(A\) and \(B\) are independent if
In this definition, it is not assumed that \(P(A)>0\) and \(P(B)>0\text{.}\) If \(P(B) > 0\text{,}\) then
and \(A\) and \(B\) are independent if and only if
Similarly, if \(P(A)>0\text{,}\) then \(A\) and \(B\) are independent if and only if
Example 3.2.7. Independence of Rolling Two Dice.
Assume we roll a red die and a blue die; thus, the sample space is
where \(i\) is the result of the red die and \(j\) is the result of the blue die. We assume a uniform probability function. Thus, each outcome has a probability of \(1/36\text{.}\)
Let \(D_1\) denote the result of the red die and let \(D_2\) denote the result of the blue die. Consider the events
and
Are these events independent?
-
Since
\begin{equation*} A = \{ (1,6) , (2,5) , (3,4) , (4,3) , (5,2) , (6,1) \} , \end{equation*}we have \(P(A) = 6/36 = 1/6\text{.}\)
-
Since
\begin{equation*} B = \{ (4,1) , (4,2) , (4,3) , (4,4) , (4,5) , (4,6) \} , \end{equation*}we have \(P(B) = 6/36 = 1/6\text{.}\)
-
Since
\begin{equation*} A \cap B = \{ (4,3) \} , \end{equation*}we have \(P(A \cap B) = 1/36\text{.}\)
It follows that \(P(A \cap B) = P(A) \cdot P(B)\) and we conclude that \(A\) and \(B\) are independent.
As an exercise, you should verify that the events
and
are not independent.
Now consider the two events
and
Since \(A_3 \cap B_3 = \emptyset\text{,}\) we have
On the other hand, \(P \left( A_3 \right) = 1/12\) and \(P \left( B_3 \right) = 1/6\text{.}\) Thus,
and the events \(A_3\) and \(B_3\) are not independent. This is not surprising: If we know that \(B_3\) occurs, then \(A_3\) cannot not occur. Thus, the event \(B_3\) has an effect on the probability that the event \(A_3\) occurs.
Consider two events \(A\) and \(B\) in a sample space \(S\text{.}\) If these events are independent, then the probability that \(A\) occurs does not depend on whether or not \(B\) occurs. Since whether or not \(B\) occurs is the same as whether the complement \(\overline{B}\) does not or does occur, it should not be a surprise that the events \(A\) and \(\overline{B}\) are independent as well.
Theorem 3.2.8. Independence of Complements.
Let \((S,P)\) be a probability space and let \(A\) and \(B\) be two events. If \(A\) and \(B\) are independent, then \(A\) and \(\overline{B}\) are also independent.
Proof.
To prove that \(A\) and \(\overline{B}\) are independent, we have to show that
Using Theorem 3.1.13, this is equivalent to showing that
Since the events \(A \cap B\) and \(A \cap \overline{B}\) are disjoint and
it follows from Theorem 3.1.10 that
Since \(A\) and \(B\) are independent, we have
It follows that
which is equivalent to
We have defined the notion of two events being independent. The following definition generalizes this in two ways to sequences of events:
Definition 3.2.9. Pairwise and Independent Events.
Let \((S,P)\) be a probability space, let \(n\geq 2\text{,}\) and let \(A_1,A_2,\ldots,A_n\) be a sequence of events.
-
We say that this sequence is pairwise independent if for any two distinct indices \(i\) and \(j\text{,}\) the events \(A_i\) and \(A_j\) are independent, i.e.,
\begin{equation*} P \left( A_i \cap A_j \right) = P \left( A_i \right) \cdot P \left( A_j \right) . \end{equation*} -
We say that this sequence is mutually independent if for all \(k\) with \(2 \leq k \leq n\) and all indices \(i_1 <i_2 <\ldots <i_k\text{,}\)
\begin{equation*} P \left( A_{i_1} \cap A_{i_2} \cap \cdots \cap A_{i_k} \right) = P \left( A_{i_1} \right) \cdot P \left( A_{i_2} \right) \cdots P \left( A_{i_k} \right) . \end{equation*}
Thus, in order to show that the sequence \(A_1,A_2,\ldots,A_n\) is pairwise independent, we have to verify \(n \choose 2\) equalities. On the other hand, to show that this sequence is mutually independent, we have to verify \(\sum_{k=2}^n {n \choose k} = 2^n - 1 - n\) equalities.
For example, if we want to prove that the sequence \(A,B,C\) of three events is mutually independent, then we have to show that
and
Example 3.2.10. Pairwise but not Mutually Independent Coin Flips.
Consider flipping a coin three times and assume that the result is a uniformly random element from the sample space
where, e.g., \(HHT\) indicates that the first two flips result in heads and the third flip results in tails. Define the events
and
If we write these events as subsets of the sample space, then we get
and
It follows that
Thus, the sequence \(A,B,C\) is pairwise independent. Since
we have
Thus,
and, therefore, the sequence \(A,B,C\) is not mutually independent. Of course, this is not surprising: If both events \(A\) and \(B\) occur, then event \(C\) also occurs.
Subsection 3.2.3 Random Variables
¶A random variable is neither random nor variable.
We have already seen random variables in Section 3.1, even though we did not use that term there. For example, in Example 3.1.5, we rolled a die twice and were interested in the sum of the results of these two rolls. In other words, we did an “experiment” (rolling a die twice) and asked for a function of the outcome (the sum of the results of the two rolls).
Definition 3.2.11. Random Variable.
Let \(S\) be a sample space. A random variable on the sample space \(S\) is a function \(X : S \rightarrow \mathbb{R}\text{.}\)
In the example given above, the sample space is
and the random variable is the function \(X : S \rightarrow \mathbb{R}\) defined by
for all \((i,j)\) in \(S\text{.}\)
Note that the term “random variable” is misleading: A random variable is not random, but a function that assigns, to every outcome \(\omega\) in the sample space \(S\text{,}\) a real number \(X(\omega)\text{.}\) Also, a random variable is not a variable, but a function.
Example 3.2.12. Flipping Three Coins with Random Variables.
Assume we flip three coins. The sample space is
where, e.g., \(TTH\) indicates that the first two coins come up tails and the third coin comes up heads.
Let \(X : S \rightarrow \mathbb{R}\) be the random variable that maps any outcome (i.e., any element of \(S\)) to the number of heads in the outcome. Thus,
If we define the random variable \(Y\) to be the function \(Y: S \rightarrow \mathbb{R}\) that
- maps an outcome to \(1\) if all three coins come up heads or all three coins come up tails, and
- maps an outcome to \(0\) in all other cases,
then we have
Since a random variable is a function \(X : S \rightarrow \mathbb{R}\text{,}\) it maps any outcome \(\omega\) to a real number \(X(\omega)\text{.}\) Usually, we just write \(X\) instead of \(X(\omega)\text{.}\) Thus, for any outcome in the sample space \(S\text{,}\) we denote the value of the random variable, for this outcome, by \(X\text{.}\) In the example above, we flip three coins and write
and
Random variables give rise to events in a natural way. In the three-coin example, “\(X=0\)” corresponds to the event \(\{TTT\}\text{,}\) whereas “\(X=2\)” corresponds to the event \(\{HHT,HTH,THH\}\text{.}\) The table below gives some values of the random variables \(X\) and \(Y\text{,}\) together with the corresponding events.
Thus, the event “\(X=x\)” corresponds to the set of all outcomes that are mapped, by the function \(X\text{,}\) to the value \(x\text{:}\)
Definition 3.2.13. \(X = x\).
Let \(S\) be a sample space and let \(X : S \rightarrow \mathbb{R}\) be a random variable. For any real number \(x\text{,}\) we define “\(X=x\)” to be the event
Example 3.2.14. Probabilities of Flipping Three Coins with Random Variables.
Let us return to the example in which we flip three coins. Assume that the coins are fair and the three flips are mutually independent. Consider again the corresponding random variables \(X\) and \(Y\text{.}\) It should be clear how we determine, for example, the probability that \(X\) is equal to \(0\text{,}\) which we will write as \(P(X=0)\text{.}\) Using our interpretation of “\(X=0\)” as being the event \(\{TTT\}\text{,}\) we get
Similarly, we get
Consider an arbitrary probability space \((S,P)\) and let \(X : S \rightarrow \mathbb{R}\) be a random variable. Using Definition 3.1.4 and Definition 3.2.13, the probability of the event “\(X=x\)”, i.e., the probability that \(X\) is equal to \(x\text{,}\) is equal to
We have interpreted “\(X=x\)” as being an event. We extend this to more general statements involving \(X\text{.}\) For example, “\(X \geq x\)” denotes the event
For our three-coin example, the random variable \(X\) can take each of the values \(0\text{,}\) \(1\text{,}\) \(2\text{,}\) and \(3\) with a positive probability. As a result, “\(X \geq 2\)” denotes the event “\(X=2\) or \(X=3\)”, and we have
In Subsection 3.2.2, we have defined the notion of two events being independent. The following definition extends this to random variables.
Definition 3.2.15. Independent Random Variables.
Let \((S,P)\) be a probability space and let \(X\) and \(Y\) be two random variables on \(S\text{.}\) We say that \(X\) and \(Y\) are independent if for all real numbers \(x\) and \(y\text{,}\) the events “\(X=x\)” and “\(Y=y\)” are independent, i.e.,
There are two ways to generalize the notion of two random variables being independent to sequences of random variables:
Definition 3.2.16. Pairwise and Mutually Independent Random Variables.
Let \((S,P)\) be a probability space, let \(n\geq 2\text{,}\) and let \(X_1,X_2,\ldots,X_n\) be a sequence of random variables on \(S\text{.}\)
We say that this sequence is pairwise independent if for all real numbers \(x_1,x_2,\ldots,x_n\text{,}\) the sequence “\(X_1=x_1\)”, “\(X_2=x_2\)”, \(\ldots\text{,}\) “\(X_n=x_n\)” of events is pairwise independent.
We say that this sequence is mutually independent if for all real numbers \(x_1,x_2,\ldots,x_n\text{,}\) the sequence “\(X_1=x_1\)”, “\(X_2=x_2\)”, \(\ldots\text{,}\) “\(X_n=x_n\)” of events is mutually independent.
Subsection 3.2.4 Bayes' Theorem
¶Bayes' theorem (alternatively Bayes' law or Bayes' rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if cancer is related to age, then, using Bayes' theorem, a person's age can be used to more accurately assess the probability that they have cancer, compared to the assessment of the probability of cancer made without knowledge of the person's age.Bayes' theorem is to the theory of probability what the Pythagorean theorem is to geometry.
―Sir Harold Jeffreys
One of the many applications of Bayes' theorem is Bayesian inference, a particular approach to statistical inference.
Definition 3.2.17. Bayes' Theorem.
Let \(A\) and \(B\) be events in some probability space \((S,P)\text{,}\) such that \(P(A) \neq 0\) and \(P(B) \neq 0\text{,}\) then:
Example 3.2.18. Probability of having chosen from a particular urn.
We have two urns \(U_i\) and \(U_j\text{.}\) \(U_i\) contains 2 black balls and 3 white balls. \(U_j\) contains 1 black ball and 1 white ball. An urn is chosen at random, then a ball is chosen at random from it. If a black ball is chosen, what is the probability it came from urn \(U_i\text{?}\)
Let \(I\) be the event urn \(U_i\) is chosen and \(B\) be the event a black ball is chosen.
Note that the probability of choosing either urn is \(P(I) = P(\overline{I}) = 1/2\) and the probability that we chose a black ball from urn \(U_i\) is \(P(B \mid I) = 2/5\text{.}\)
-
Next we need to know the probability of choosing a black ball no matter which urn was chosen. This is the probability of choosing a black ball from either urn.
\begin{equation*} \begin{array}{rl} P(B) & = P(B|I)P(I) + P(B|\overline{I})P(\overline{I})\\ & = (2/5)(1/2) + (1/2)(1/2)\\ & = (1/5) + (1/4)\\ & = (9/20) \end{array} \end{equation*} -
Calculate the probability urn \(U_i\) is where the black ball came from \(P(I|B)\) using Bayes' Rule:
\begin{equation*} \begin{array}{rl} P(I|B) & = \frac{P(B|I)P(I)}{P(B)}\\ & =\frac{(2/5)(1/2)}{9/20}\\ & = \frac{1/5}{(9/20)}\\ & = \frac{4}{9} \end{array} \end{equation*}
Bayes' Theorem can be expanded to calculate the conditional probability from more than two events, \(A, \overline{A}\text{.}\)
Definition 3.2.19. Generalized Bayes' Theorem.
Let \(B\) be an event and \(A_1, A_2, \dots, A_n\) be mutually exclusive events in some probability space \((S,P)\text{,}\) such that \(P(A_i) \neq 0\) and \(P(B) \neq 0\) then:
Exercises 3.2.5 Exercises for Section 3.2
1.
Assume that \(E\) and \(F\) are two events with positive probabilities. Show that if \(P(E|F) = P(E)\text{,}\) then \(P(F|E) = P(F)\text{.}\)
If \(P(E|F) = P(E)\) then the two events are independent.
2.
A coin is tossed three times. What is the probability that exactly two heads occur, given that
the first outcome was a head?
the first outcome was a tail?
the first two outcomes were heads?
the first two outcomes were tails?
the first outcome was a head and the third outcome was a head?
3.
A die is rolled twice. What is the probability that the sum of the faces is greater than 7, given that
the first outcome was a 4?
the first outcome was greater than 3?
the first outcome was a 1?
the first outcome was less than 5?
\(\frac{1}{2} \)
\(\frac{2}{3} \)
\(0\)
\(\frac{1}{4} \)
4.
A card is drawn at random from a deck of cards. What is the probability that
it is a heart, given that it is red?
it is higher than a 10, given that it is a heart? (Interpret J, Q, K, A as 11, 12, 13, 14.)
it is a jack, given that it is red?
5.
A coin is tossed three times. Consider the following events:
- A
Heads on the first toss.
- B
Tails on the second.
- C
Heads on the third toss.
- D
All three outcomes the same (HHH or TTT).
- E
Exactly one head turns up.
-
Which of the following pairs of these events are independent?
\(A,B\)
\(A,D\)
\(A,E\)
\(D,E\)
-
Which of the following triples of these events are independent?
\(A, B, C\)
\(A, B, D\)
\(C, D, E\)
(1) and (2)
(1)
6.
From a deck of five cards numbered 2, 4, 6, 8, and 10, respectively, a card is drawn at random and replaced. This is done three times. What is the probability that the card numbered 2 was drawn exactly two times, given that the sum of the numbers on the three draws is 12?
7.
A coin is tossed twice. Consider the following events.
- A
Heads on the first toss.
- B
Heads on the second toss.
- C
The two tosses come out the same.
Show that \(A\text{,}\)~\(B\text{,}\)~\(C\) are pairwise independent but not independent.
Show that \(C\) is independent of \(A\) and \(B\) but not of \(A \cap B\text{.}\)
-
We have
\begin{equation*} \begin{array}{rl} P(A \cap B) = P(A \cap C) = P(B \cap C) & = \frac{1}{4}\\ P(A)P(B) = P(A)P(C) = P(B)P(C) & = \frac{1}{4}\\ P(A \cap B \cap C) = \frac{1}{4} & \neq P(A)P(B)P(C) = \frac{1}{8}. \end{array} \end{equation*} -
A and C; and C and B are independent:
\begin{equation*} \begin{array}{rl} P(A \cap C) = P(A)P(C) & = \frac{1}{4}\\ P(C \cap B) = P(B)P(C) & = \frac{1}{4}\\ P(C \cap (A \cap B)) = \frac{1}{4} & \neq P(C)P(A \cap B) = \frac{1}{8}. \end{array} \end{equation*}
8.
Let \(S = \{a,b,c,d,e,f\}\text{.}\) Assume that \(P(a) = P(b) = 1/8\) and \(P(c) = P(d) = P(e) = P(f) = 3/16\text{.}\) Let \(A\text{,}\)~\(B\text{,}\) and~\(C\) be the events \(A = \{d,e,a\}\text{,}\) \(B = \{c,e,a\}\text{,}\) \(C = \{c,d,a\}\text{.}\) Show that \(P(A \cap B \cap C) = P(A)P(B)P(C)\) but no two of these events are independent.
9.
We have two urns \(U_a\) and \(U_b\text{.}\) \(U_a\) contains 9 black balls and 6 white balls. \(U_b\) contains 3 black balls and 1 white ball. An urn is chosen at random, then a ball is chosen at random from it. Let \(A\) be the event urn \(U_a\) is chosen and \(W\) be the event a white ball is chosen.
Calculate \(P(A|W)\) using Bayes' Rule
10.
Suppose that 1% of the patients tested in a hospital are infected with a virus. Furthermore, suppose that when a test for the virus is given, 98% of the patients actually infected with the virus test positive, and that 1% of the patients not infected still test positive for it. What is the probability that:
a patient testing positive is actually infected with the virus?
a patient testing positive is not infected with the virus?
a patient testing negative is infected with the virus?
a patient testing negative is not infected with the virus?