Skip to main content

Section 10.1 Axioms of probability

Randomness rules our world. There are obvious examples, like games of chance. Less obvious examples include things that aren't necessarily random but may as well be, like the weather, traffic accidents, and the aggregate behavior of groups of people. So it is useful to have a way to quantify and study randomness mathematically.

One way to divide this study of probability is by the types of quantities involved. Continuous probability studies random measurements like time, height, or proportion--quantities that can take any value on a spectrum. As you might have guessed, we are more interested in discrete probability, which studies quantities that can be counted.

A probability is a function that takes an event and outputs a real number, called the probability of the event. Events may be regarded as sets in a universal set called the sample space, typically denoted \(\Omega\text{.}\) A probability must satisfy the Kolmogorov axioms. Let \(A, A_1, \ldots, A_n \subseteq \Omega\) be some events:

Two events (sets) \(A\) and \(B\) are disjoint if \(A \cap B = \varnothing\text{.}\) By induction, the axiom of additivity extends to any finite sequence of sets where any two of those sets are disjoint.

Why “modified” Kolmogorov axioms? Well, the axiom of additivity is really something called \(\sigma\)-additivity, which is the same idea applied to a countably infinite sequence of events. However, we would like to finish this book without having to discuss infinite sums, so for that reason we will only talk about finite unions of sets.

From these axioms a number of useful conclusions immediately arise.

To prove monotonicity, consider that if \(A \subseteq B\) then we may partition \(B\) into two disjoint sets, \(A\) and \(B-A\text{.}\) By the principle of \(\sigma\)-additivity therefore

\begin{equation*} P(B) = P(A) + P(B-A). \end{equation*}

Because \(P(B-A) \ge 0\) (non-negativity), \(P(A)\) cannot be larger than \(P(B)\text{.}\)

To prove that the probability of the empty set is zero, consider that \(\varnothing = \varnothing \cup \varnothing\text{,}\) in which case

\begin{equation*} P(\varnothing) = P(\varnothing) + P(\varnothing). \end{equation*}

This can only be possible if \(P(\varnothing) = 0\text{.}\)

The complement rule is true because for any event \(A\text{,}\) we may write \(\Omega = A \cup \overline{A}\text{.}\) The axiom of \(\sigma\)- additivity implies

\begin{equation*} P(\Omega) = P(A) + P(\overline{A}), \end{equation*}

which means \(P(A) + P(\overline{A}) = 1\text{.}\) The complement rule follows.

Finally, we already know that \(P(A) \ge 0\) for any event \(A\text{.}\) The additional fact that \(P(A) \le 1\) arises from monotonicity, since \(A \subseteq \Omega\) and \(P(\Omega) = 1\text{.}\)

This section was quite abstract, only setting up the theoretical foundation for probability as a function that maps a set to a real number in the interval \([0,1]\text{.}\) In the sections that follow we will see probability theory in its myriad “real-world” applications.