Axioms of probability

Section 11.1 Axioms of probability

Randomness rules our world. There are obvious examples, like games of chance. Less obvious examples include things that aren't necessarily random but may as well be, like the weather, traffic accidents, and the aggregate behavior of groups of people. So it is useful to have a way to quantify and study randomness mathematically.

One way to divide this study of probability is by the types of quantities involved. Continuous probability studies random measurements like time, height, or proportion--quantities that can take any value on a spectrum. As you might have guessed, we are more interested in discrete probability, which studies quantities that can be counted.

A probability is a function that takes an event and outputs a real number, called the probability of the event. Events may be regarded as sets in a universal set called the sample space, typically denoted \(\Omega\text{.}\) A probability must satisfy the Kolmogorov axioms. Let \(A, A_1, \ldots, A_n \subseteq \Omega\) be some events:

Axiom 11.1.1. (Modified) Kolmogorov axioms.

Non-negativity: \(P(A) \ge 0\)

Every probability must be a non-negative number. Events may be impossible (meaning \(P(A)=0\)), or they may be possible (\(P(A)>0\)), but there is no “negative probability.”
Unitarity: \(P(\Omega) = 1\)

The probability that any outcome in the sample space happens is 1. In other words, something must happen.
Additivity: Suppose that \(A\) and \(B\) are disjoint events. Then,

\begin{equation*} P(A \cup B) = P(A) + P(B) \end{equation*}

The axiom of additivity says that the probability of a union of disjoint events must equal the sum of the individual probabilities of those events. Remember that a union corresponds to the word “or,” so the union of two events is the event where at least one of those two events happens.

Two events (sets) \(A\) and \(B\) are disjoint if \(A \cap B = \varnothing\text{.}\) By induction, the axiom of additivity extends to any finite sequence of sets where any two of those sets are disjoint.

Why “modified” Kolmogorov axioms? Well, the axiom of additivity is really something called \(\sigma\)-additivity, which is the same idea applied to a countably infinite sequence of events. However, we would like to finish this book without having to discuss infinite sums, so for that reason we will only talk about finite unions of sets.

From these axioms a number of useful conclusions immediately arise.

Theorem 11.1.2.

Let \(A\) and \(B\) be events in the sample space \(\Omega\text{.}\) Then,

Monotonicity: If \(A \subseteq B\) then \(P(A) \le P(B)\text{.}\)
The empty event: If \(A = \varnothing\) (there are no outcomes corresponding to the event \(A\)) then \(P(A)=0\text{.}\)
The complement rule: \(P(\overline{A})=1-P(A)\text{.}\)
Bounds on probability: \(0 \le P(A) \le 1\text{.}\)

Proof.

To prove monotonicity, consider that if \(A \subseteq B\) then we may partition \(B\) into two disjoint sets, \(A\) and \(B-A\text{.}\) By the principle of \(\sigma\)-additivity therefore

\begin{equation*} P(B) = P(A) + P(B-A). \end{equation*}

Because \(P(B-A) \ge 0\) (non-negativity), \(P(A)\) cannot be larger than \(P(B)\text{.}\)

To prove that the probability of the empty set is zero, consider that \(\varnothing = \varnothing \cup \varnothing\text{,}\) in which case

\begin{equation*} P(\varnothing) = P(\varnothing) + P(\varnothing). \end{equation*}

This can only be possible if \(P(\varnothing) = 0\text{.}\)

The complement rule is true because for any event \(A\text{,}\) we may write \(\Omega = A \cup \overline{A}\text{.}\) The axiom of \(\sigma\)- additivity implies

\begin{equation*} P(\Omega) = P(A) + P(\overline{A}), \end{equation*}

which means \(P(A) + P(\overline{A}) = 1\text{.}\) The complement rule follows.

Finally, we already know that \(P(A) \ge 0\) for any event \(A\text{.}\) The additional fact that \(P(A) \le 1\) arises from monotonicity, since \(A \subseteq \Omega\) and \(P(\Omega) = 1\text{.}\)

This section was quite abstract, only setting up the theoretical foundation for probability as a function that maps a set to a real number in the interval \([0,1]\text{.}\) In the sections that follow we will see probability theory in its myriad “real-world” applications.