Skip to main content

Section 10.2 Enumerative probability

A “probability measure” is any function from the subsets of a sample space, which we call events, to the real numbers, that satisfies the Kolmogorov axioms.

Definition 10.2.1. Counting measure.

Let \(A\) be a finite set contained in the finite and nonempty set \(\Omega\text{.}\) Define the probability of \(A\) as

\begin{equation*} P(A) = \frac{|A|}{|\Omega|}. \end{equation*}

Let us verify that this “probability” is indeed a probability using the axioms.

We verify the three axioms of probability:

  • \(P(A) \ge 0\) because both \(|A|\) and \(|\Omega|\) are non-negative numbers by definition.

  • \(P(\Omega) = 1\) because \(|\Omega|/|\Omega|=1\text{.}\)

  • Let \(A\) and \(B\) be disjoint subsets of \(\Omega\text{.}\) Then,

    \begin{align*} P(A \cup B) \amp = \frac{|A\cup B|}{|\Omega|}\\ \amp = \frac{|A| + |B| - |A \cap B|}{|\Omega|}\\ \amp = \frac{|A|}{|\Omega|} + \frac{|B|}{|\Omega|} - \frac{|A\cap B|}{|\Omega|}\\ \amp = \frac{|A|}{|\Omega|} + \frac{|B|}{|\Omega|}\\ \amp = P(A) + P(B). \end{align*}

So, \(P(A)\) is indeed a probability.

(It is worth pointing out that the counting measure defined above satisfies not only our modified axioms but also the real ones, as well. It is the only probability measure we will define in this book, so we are now done having to worry about this distinction.)

Effectively, we can assign probabilities by calculating the proportion of the elements in a sample space taken up by the elements of the event. This is great news for us, because we just spent two chapters getting very good at counting. So, everything we have learned there carries over very nicely, including the following axioms. (These axioms are true in general, but we will just verify them in the case of the counting probability defined above.)

The proof of this fact is left as an exercise to the reader.

So, the probability of an “or”-type event is calculated exactly the way we would expect. What about an “and”-type event?

Two events happening simultaneously, due to one outcome (not a pair of outcomes like before), is represented by the event \(A \cap B\text{.}\) It is not surprising that we will calculate \(P(A\cap B)\) by multiplication. However, considering the following example.

Suppose we are playing a game where you win if the sum showing on two six-sided dice is 8 or higher. If you are a dramatic sort of person, you will roll the dice one at a time. Let \(A\) be the event “the roll of the first dice is a 1,” let \(B\) be the event “the sum of the rolls is at least 8.” Then, we are interested in the event \(A \cap B\text{,}\) where the roll of the first dice is a 1 and the sum of the rolls is at least an 8.

We should not simply multiply the probabilities of \(A\) and \(B\) together because, as you may have already noticed, \(A \cap B\) is impossible. However, both \(A\) and \(B\) are possible. So, if we use the formula \(P(A \cap B) = P(A)P(B)\text{,}\) we will have zero on one side of the equation and two non-zero numbers being multiplied on the other, which should not happen.

The trick is that the occurrence of event \(A\) changes the probability of event \(B\text{.}\) In other words, event \(B\) is dependent on event \(A\) (and, perhaps surprisingly, vice-versa). The concept of independence is key to understanding probability.

Definition 10.2.6.

Let \(A\) and \(B\) be events. Then the conditional probability of \(B\) given \(A\text{,}\) denoted by \(P(B|A)\text{,}\) is the probability of \(B\) occurring when the sample space is “localized” to the outcomes of \(A\) that is,

\begin{equation*} P(B|A) = \dfrac{P(A \cap B)}{P(A)}. \end{equation*}
Definition 10.2.7.

Two events \(A\) and \(B\) are independent if

\begin{equation*} P(B|A) = P(B) \end{equation*}

and dependent otherwise.

The idea is that the occurrence of \(A\) does not affect the probability of \(B\text{.}\)

Consider the following fictitious class of \(29\) discrete math students organized by year and primary major.

Table 10.2.9.
CS other
First year \(7\) \(1\)
Sophomore \(8\) \(2\)
Junior \(3\) \(5\)
Senior \(2\) \(1\)
Totals \(20\) \(9\)

Let \(A\) be the event that a student is a first-year student and let \(B\) be the event that their major is CS (computer science). Are the events \(A\) and \(B\) independent?

There are \(29\) students of which \(7+1=8\) are in their first year, so \(P(A) = 8/29 \approx 0.28.\) However, if we look at only the CS students in their first year, \(P(A|B) = 7/20 \approx 0.35.\) In other words it is more likely that a CS student would be in their first year than any student would.

We can go the other way too. The probability of being a CS major is \(P(B) = 20/29 \approx 0.69.\) However, if we localize our sample space to the first year students, the probability of being a CS major becomes \(P(B|A) = 7/8 \approx 0.88.\) First-year students are more likely to be CS majors in this made-up class.

Note that it is precisely for this reason that \(P(A \cap B) \neq P(A)P(B)\) in general. The probability of being both a CS major and a first-year student is \(P(A \cap B) = 7/29 \approx 0.24\text{,}\) but \(P(A)P(B) \approx 0.19\text{.}\)

It turns out that if \(P(B|A)=P(B)\text{,}\) then \(P(A|B)=P(A)\text{.}\)

Now, we can write down the real multiplication rule.

The thinking here is that in order for \(A\) and \(B\) to both occur, “first” (not really) \(A\) must occur. Then, that changes the probability of \(B\) to \(P(B|A)\text{,}\) and now you can multiply. We can give a proof using the counting measure:

Let \(A\) and \(B\) be events contained in the sample space \(\Omega\text{.}\) Then,

\begin{align*} P(A)P(B|A) \amp = \frac{|A|}{|\Omega|} \times \frac{|B\cap A|}{|A|} \\ \amp = \frac{|B \cap A|}{|\Omega|}\\ \amp = P(A \cap B). \end{align*}

Notice that there really isn't a prioritizing of \(A\) over \(B\) here; \(P(A\cap B) = P(B)P(A|B)\) is true as well.

Finally, we can prove one more corollary analogous to the one for disjoint events:

First, suppose that \(A\) and \(B\) are independent events. Then, \(P(B|A) = P(B)\text{,}\) which implies that

\begin{equation*} P(A \cap B) = P(A)P(B|A) = P(A)P(B). \end{equation*}

Conversely, suppose that \(P(A \cap B) = P(A)P(B).\) Then,

\begin{equation*} P(B|A) = \dfrac{P(A \cap B)}{P(A)} = \dfrac{P(A)P(B)}{P(B)}, \end{equation*}

so \(P(B|A) = P(B)\text{.}\) Therefore, \(A\) and \(B\) are independent if \(P(A \cap B)=P(A)P(B).\)

Let us see these results all together in a couple of examples.

A standard deck of \(52\) playing cards is divided into four suits: clubs and spades are black, and hearts and diamonds are red. Each suit, meanwhile, features \(13\) ranks: the numbers \(2\) through \(10\text{,}\) followed by the face cards Jack, Queen, and King, “followed by” the Ace (sometimes Ace is low, sometimes high). Some readers know this already, but there it is if you did not grow up playing cards.

Let \(A\) be the event that a red card is drawn and let \(B\) be the event that a face card is drawn. Convince yourself that \(P(A)=26/52\) and that \(P(B)=12/52\text{.}\) (It is more convenient not to simplify the fractions.)

A card being red has no bearing on it being a face card; that is, there are the same number of red and black face cards. The events \(A\) and \(B\) are independent. So, the probability that a card is both red and a face card is

\begin{equation*} P(A \cap B) = P(A) P(B) = \frac{26}{52} \times \frac{12}{52} = \frac{6}{52}. \end{equation*}

(You could also verify this by just counting the red face cards— that's why this section is called “enumerative probability”).

The probability a card is red or a face card is

\begin{equation*} P(A \cup B) = P(A) + P(B) - P(A \cap B) = \frac{26}{52}+\frac{12}{52}-\frac{6}{52} = \frac{32}{52}. \end{equation*}

Let's look at two dependent events and draw two cards without replacement. We would like their total to be 21. Let \(F\) be the event that our first card is a face card. Let \(E\) be the event that the second card is an ace (which is what we will need to hit 21). Therefore,

\begin{equation*} P(E|F) = \frac{4}{51}, \end{equation*}

because once we have drawn the face card there are only 51 cards left. Then the probability of drawing both a face card and an ace is

\begin{equation*} P(E \cap F) = P(F)P(E|F) = \frac{12}{52} \times \frac{4}{51}. \end{equation*}

Suppose that a team of five students is to be made from a group of ten. Of the ten students, three are first-years, four are second-years, one is a junior, and one is a senior.

Let \(A\) be the event that a team of five contains all three first years. Observe that we must choose all three first-years, and then we must choose two of the others. All of these are unordered samples without replacement, so

\begin{equation*} P(A) = \frac{{{3}\choose{3}}{{7}\choose{2}}}{{{10}\choose{5}}}. \end{equation*}

The denominator represents all possible ways to choose five students from ten.

Suppose that all three first years are chosen. What is the probability that of the remaining two students, one is the senior? Let \(B\) be the event that the senior is chosen. Note that there are only 7 students left to choose from, and we are choosing 2. So,

\begin{equation*} P(B|A) = \frac{{{1}\choose{1}}{{6}\choose{1}}}{{{7}\choose{2}}}. \end{equation*}
Therefore, the probability we choose both all the first-years and the senior is
\begin{align*} P(B \cap A) \amp = P(A)P(B|A)\\ \amp = \frac{{{3}\choose{3}}{{7}\choose{2}}}{{{10}\choose{5}}} \times \frac{{{1}\choose{1}}{{6}\choose{1}}}{{{7}\choose{2}}}\\ \amp = \frac{{{3}\choose{3}}{{1}\choose{1}}{{6}\choose{1}}}{{{10}\choose{5}}}. \end{align*}