Binomial distribution

Section 11.4 Binomial distribution

We will end with a particular distribution that you are extremely well-equipped to understand: the binomial distribution. Its name should give you a clue as to the basic idea, but first let's talk about distributions.

A random variable's distribution is its support (remember: the values it can take) together with the probability of each of those values. Some distributions are not special, but others appear often enough and have suitable “real world” interpretations that they receive their own names. Some other discrete distributions you may encounter are the uniform distribution, the hypergeometric distribution, and the Poisson distribution.

Let's start with the simplest possible context: counting the heads in a sequence of a fixed number of \(n\) coin flips. Each coin may only land in one of two states, heads or tails. Each coin's result is independent of the next, and the probability of each coin flip resulting in a heads is \(p=0.5\) each time.

Suppose \(n=5\) and let \(X\) count the number of heads. The support of \(X\) is the set \(\{0,1,2,3,4,5\}\text{.}\) Let's calculate the probabilities that \(X\) attains a couple of these values until we figure out the pattern.

Example 11.4.1.

Let \(X\) count the number of heads in a sequence of 5 flips of a fair coin.

In order for there to be 0 heads there must be 5 tails. The probability of this occurring, according to the multiplication principle, is

\begin{equation*} P[X=0] = 0.5^5 = 0.03125 \end{equation*}

because each flip is independent.

In order for there to be 1 heads there must be 4 tails. You might say, okay, that's \(0.5\) (the probability of a heads) times \(0.5^4\) (the probability of four tails), right? Well, here is a wrinkle: there are five different coins that could be the heads! In other words we may see the strings \(HTTTT\text{,}\) \(THTTT\text{,}\) \(TTHTT\text{,}\) \(TTTHT\text{,}\) or \(TTTTH\text{.}\)

Because one of these strings or the other must occur, we use the addition principle, so

\begin{equation*} P[X=1] = 5 \times 0.5 \times 0.5^4 = 0.15625. \end{equation*}

What about when \(X=2\text{?}\) Now we must count the number of ways that a sequence of 5 coin flips can have 2 heads. However, you just spent two chapters learning about binomial coefficients, so you may remember that this number is just \({{5}\choose{2}}\text{.}\) Therefore,

\begin{equation*} P[X=2] = {{5}\choose{2}} \times 0.5^2 \times 0.5^3 = 0.3125. \end{equation*}

From here we can figure out \(P[X=x]\) for the other values of \(X\text{.}\)

\begin{equation*} P[X=3] = {{5}\choose{3}} \times 0.5^3 \times 0.5^2 = 0.3125 \end{equation*}

\begin{equation*} P[X=4] = {{5}\choose{4}} \times 0.5^4 \times 0.5^1 = 0.15625 \end{equation*}

\begin{equation*} P[X=5] = {{5}\choose{5}} \times 0.5^5 \times 0.5^0 = 0.03125 \end{equation*}

So is the binomial distribution only good for coin flips? No, or else we'd get bored of it fairly quickly. The binomial distribution works whenever we have a sequence of a fixed number of independent “success/fail” trials whose probability is constant. Such trials are called Bernoulli trials.

Note that “success” and “failure” just refer to the two states each trial can end in; it is not true that one has to be better than the other. For example, heads isn't really better than tails. The “successes” are just what we are counting.

Definition 11.4.2.

Let \(X\) count the number of successes in a sequence of \(n\) independent binary trials that each have a probability of success \(p\text{.}\) Then we say \(X\) follows a binomial distribution, and write \(X \sim Bin(n,p)\text{.}\) The pmf of \(X\) is

\begin{equation*} f(x) = {{n}\choose{x}} p^x (1-p)^{n-x}. \end{equation*}

In the last section we wrote pmfs as tables, but for particularly nice discrete variables (like binomial ones) it is possible to also express them as a function, like in the definition above.

Notation like \(X \sim Bin(n,p)\) is common in probability theory to summarize the distribution of a random variable. You can read it aloud as “The random variable \(X\) follows a binomial distribution with parameters \(n\) and \(p\text{.}\)” Anyone else who knows probability knows that \(n\) is the number of trials and that \(p\) is each trial's probability of success, but if you are speaking to a novice, you might make that explicit.

Some authors write the probability of failure as \(q=1-p\) so that the pmf above is simpler to write and use.

Theorem 11.4.3.

The pmf of the binomial distribution is a valid pmf, ie., its sum over the support is 1.

It is important to check quickly that the binomial pmf \(f(x)\) is always non-negative. However, the real meat of the problem is to determine that if you sum all the values of \(f(x)\text{,}\) you get 1. Fortunately we have already introduced the tool to do so: the binomial theorem!

Proof.

Let \(X \sim Bin(n,p)\text{.}\) The support of \(X\) is \(\{0,1,2,\ldots,n\}\text{.}\) The sum of the pmf over the support is

\begin{align*} \sum_{x=0}^n f(x) \amp= \sum_{x=0}^n {{n}\choose{x}} p^x (1-p)^{n-x} \\ \amp= \big(p + (1-p)\big)^n \end{align*}

by the binomial theorem. Since \(p+(1-p)=1\) and \(1^n=1\text{,}\) the proof is complete.

A common use of a binomial variable is to count the number of participants in a random sample who will have a particular trait. If the average incidence of the trait in the population is known, then someone in a random sample will have that probability of having the trait. Each person is independent because the sample is random.

Example 11.4.4.

Suppose that among likers of ice cream, \(14\%\) like chocolate ice cream the best. A random sample of \(27\) ice cream likers is taken. What is the probability that \(4\) of the sample likes chocolate the best? How about \(10\text{?}\)

Let \(Y\) be the number of people in the sample whose favorite flavor is chocolate. We are basically told that \(Y\sim Bin(27, 0.14)\text{.}\) Then,

\begin{equation*} P[Y=4] = {{27}\choose{4}} 0.14^4 0.86^{23} = 0.21. \end{equation*}

This number is actually pretty high since \(14\%\) of \(27\) is \(3.78\text{.}\) Notice how the number decreases as we ask for a higher number of chocolate ice cream lovers:

\begin{equation*} P[Y=10] = {{27}\choose{10}} 0.14^{10} 0.86^{13} = 0.0034. \end{equation*}

The preceding example makes a suggestion about the expected value of a binomial random variable. If \(14\%\) of a population has a trait and you look at \(27\) of those people, you would expect \(14\%\) of \(27\) to have the trait. So we are not surprised that in general the expected value of a binomial variable is \(np\text{.}\)

Theorem 11.4.5.

If \(X \sim Bin(n,p)\) then \(E[X]=np\text{.}\)

We will delay the proof for an example.

Example 11.4.6.

Suppose it is known that \(1.3\%\) of a particular computer part fail. Furthermore the failure of one part on the assembly line does not suggest that the next part will fail, i.e., they are independent. In a batch of \(600\) parts, how many defective parts do we expect?

The number of defective parts is binomial: there are \(600\) independent trials with a fixed probability of \(0.013\) that result in either defective or functional equipment. Therefore, the average number of defective parts in a batch of \(600\) is

\begin{equation*} np = 600(0.013) = 7.8 \end{equation*}

parts.

The following proof is technical and, to be perfectly frank, you don't need to understand it to be able to understand binomial random variables. However, it is very cool. It involves two tricks: re-indexing a sum, which is a powerful tool of combinatorics to calculate sums indirectly; and immediately replacing any sum of a pmf over its support with 1, which is a common move in probability proofs. (Those of you who have had experience with calculus may be interested to know that the continuous analogue of this trick can be used to deal with some integrals.)

Read the following proof, but if it is too difficult don't worry if you need to come back later.

Proof.

Let \(X \sim Bin(n,p)\text{.}\) Then the expected value of \(X\) is

\begin{equation*} \sum_{x=0}^n xf(x) = \sum_{x=0}^n x \cdot {{n}\choose{x}} p^x (1-p)^{n-x} \end{equation*}

by definition.

Observe that when \(x=0\text{,}\) the entire term of the sum is 0. So we lose nothing by instead writing

\begin{equation*} \sum_{x=1}^n x \cdot {{n}\choose{x}} p^x (1-p)^{n-x}. \end{equation*}

The fact that we have a sum involving binomial coefficients suggests the binomial theorem, but the binomial theorem requires us to start at \(x=0\text{.}\) Also, that \(x\) in the sum is going to be a problem. Rewrite the sum as follows:

\begin{align*} \sum_{x=1}^n x \cdot {{n}\choose{x}} p^x (1-p)^{n-x} \amp= \sum_{x=1}^n x \cdot \dfrac{n!}{x!(n-x)!} p^x (1-p)^{n-x}\\ \amp= \sum_{x=1}^n x \cdot \dfrac{n!}{x(x-1)!(n-x)!} p^x (1-p)^{n-x} \\ \amp= \sum_{x=1}^n \dfrac{n!}{(x-1)!(n-x)!} p^x (1-p)^{n-x}. \end{align*}

We have gotten rid of the \(x\text{,}\) but now the factorial quotient is no longer a binomial coefficient because \(x-1+n-x\) is not \(n\text{.}\) However, it is \(n-1\text{.}\) We can continue rewriting the sum to work with this.

\begin{align*} \sum_{x=1}^n \dfrac{n!}{(x-1)!(n-x)!} p^x (1-p)^{n-x} \amp= \sum_{x=1}^n \dfrac{n(n-1)!}{(x-1)!(n-x)!} p^x (1-p)^{n-x}\\ \amp= \sum_{x=1}^n n \dfrac{(n-1)!}{(x-1)!(n-x)!} p^x (1-p)^{n-x}\\ \amp= n \sum_{x=1}^n \dfrac{(n-1)!}{(x-1)!(n-x)!} p^x (1-p)^{n-x} \end{align*}

(because \(n\) is constant with respect to the sum). We are closer to a binomial expansion now, but we still need to start at \(x=0\text{.}\) Let's do this by replacing every occurrence of \(x\) in the sum with \(x+1\text{.}\) Then, \(x\) will run from \(0\) to \(n-1\text{.}\) When \(x=0\text{,}\) \(x+1=1\text{,}\) and when \(x=n-1\text{,}\) \(x+1=n\text{,}\) so the sum still runs over the same set of values. We are re-indexing the sum to make it easier to manipulate.

\begin{align*} n \sum_{x=1}^n \dfrac{(n-1)!}{(x-1)!(n-x)!} p^x (1-p)^{n-x} \amp= n \sum_{x=0}^{n-1} \dfrac{(n-1)!}{(x+1-1)!(n-(x+1))!} p^{x+1} (1-p)^{n-(x+1)}\\ \amp= n \sum_{x=0}^{n-1} \dfrac{(n-1)!}{x!(n-1-x)!} p^{x+1} (1-p)^{n-1-x}\\ \amp= n \sum_{x=0}^{n-1} {{n-1}\choose{x}} p^{x+1} (1-p)^{n-1-x} \end{align*}

Doing this put an extra \(p\) in the sum, but we can pull it through just like we did with the \(n\) a moment ago.

\begin{equation*} n \sum_{x=0}^{n-1} {{n-1}\choose{x}} p^{x+1} (1-p)^{n-1-x} = np\sum_{x=0}^{n-1} {{n-1}\choose{x}} p^x (1-p)^{n-1-x} \end{equation*}

Next, observe that the function

\begin{equation*} {{n-1}\choose{x}} p^x (1-p)^{n-1-x} \end{equation*}

is the pmf of a random variable that follows a \(Bin(n-1,p)\) distribution. No matter the parameters, the sum of a pmf over its support must equal 1! Therefore, we can skip calculating this sum by making this observation:

\begin{equation*} np\sum_{x=0}^{n-1} {{n-1}\choose{x}} p^x (1-p)^{n-1-x} = np \times 1 = np. \end{equation*}

This completes the proof.