Unordered samples

Section 9.5 Unordered samples

Now that we can count ordered samples we can expand these methods slightly to allow for the counting of ordered samples. Two techniques introduced in the previous section, shuffling the universe and using prototype objects, will help us here.

Example 9.5.1.

Let's go back to Alicia, Bob, Carlos, Derrick, and Eve, and this time choose three to be on a team together. The team has no special roles, so every position on the team is the same. How many teams can we make?

This is an unordered sample without replacement because every position on the team is identical, and because a person can't be on the team twice (that would make it a team of two). Now, once again let us play the game where we shuffle the entire universe of people and separate the chosen and unchosen people.

\begin{equation*} DCA|BE \end{equation*}

Above we have the sample where we choose Derrick, Carlos, and Alicia, but do not choose Bob and Eve. Observe that since neither the order of the students on the team nor the order of the students left behind are important, we could also shuffle within both groups:

\begin{equation*} ACD|EB \end{equation*}

is functionally the same sample.

So to count the total number of samples we must take the total number of ways to shuffle the universe and then divide by both the number of ways to shuffle the elements we choose and the elements we don't choose. In this case there are

\begin{equation*} \frac{5!}{3!2!} = \frac{5\times 4}{2} = 10 \end{equation*}

possible teams of three.

Once again we can generalize.

Theorem 9.5.2. Unordered samples without replacement.

The number of unordered samples of size \(r\) from a universe of size \(n\) is \(n!/r!(n-r)!\text{,}\) denoted

\begin{equation*} {{n}\choose{r}} \end{equation*}

(“\(n\) choose \(r\)”). This is sometimes called the number of \(r\)-combinations from a universe of \(n\) objects.

There are two useful prototypes for unordered samples without replacement. The first is a subset of \(r\) elements from a set of \(n\) elements. This arises immediately from the definition of a set: elements are unordered and appear at most once in a set.

The second is a bit string of \(n\) digits with exactly \(r\) ones. To see why, let us exploit the correspondence principle. Take a set of \(n\) objects. Then rename the objects in the set \(x_1, x_2, \ldots, x_n\text{.}\)

Take a subset of \(r\) elements from \(\{x_1, x_2,\ldots, x_n\}\text{.}\) We know there are \({{n}\choose{r}}=n!/r!(n-r)!\) subsets. For each such subset, we can write down a bit string of \(n\) characters with exactly \(r\) ones.

Example 9.5.3.

For example, take the subset \(\{x_1, x_4, x_6\}\) of the set \(\{x_1,x_2,x_3,x_4,x_5,x_6\}\text{.}\) This subset corresponds exactly to the string \(100101\text{.}\)

Meanwhile, the string \(001100\) corresponds to the subset \(\{x_3,x_4\}\text{.}\) So, we have one string for every subset and vice-versa.

Therefore, bit strings with a fixed number of ones are an equally serviceable prototype object, and they are a little more concrete than subsets.

Knowing now how to count unordered samples that do not allow for replacement or repetition, we turn our attention to counting the ones that do. This is the trickiest of the four configurations, so we begin with an example to direct our intuition to the right place.

Example 9.5.4.

Suppose you are going to visit a friend in another town for three days—say Thursday, Friday, and Saturday. Your friend has ten things they want to take you to do. How many ways can we allocate the ten activities into the three days?

This is a tricky example. First, observe that we are only interested in counting the distributions of activities into days, ie., how many activities are on each day. It doesn't matter which activity is done each day.

It's really like we are sampling ten times from the universe \(\{T,F,S\}\) with replacement, and since we don't care which activity is on which day, the sample is unordered. So, how many ways can you count the number of samples?

Perhaps you have heard the word “three days two nights,” or a similar phrase, used in the discussion of a vacation. If you stay somewhere three days, you only need to stay for two nights if you leave on the third day. Suppose we do two things the first day, five things the second day, and three things the last day. Such an arrangement can be diagrammed by the following bit string:

\begin{equation*} 001000001000 \end{equation*}

To be clear, we are letting the ones represent the “dividers” between the elements of the universe (the days), and the number of zeros between each one represents the number of activities on each day.

Therefore, the number of these unordered samples with replacement is the number of bit strings with \(3-1+10\) characters and \(3-1\) ones. We already know that this number is exactly

\begin{equation*} {{12}\choose{2}} = \frac{12!}{2!10!} = \frac{12\times 11}{2} = 66. \end{equation*}

Having identified that the unordered samples without replacement exist in correspondence with bit strings with a fixed number of ones, we are ready for a formula.

Theorem 9.5.5. Unordered samples with replacement.

The number of unordered samples of size \(r\) from a universe of size \(n\) is

\begin{equation*} {{n-1+r}\choose{r}} = \frac{(n-1+r)!}{r!(n-1)!} \end{equation*}

It may be confusing that we have gone from having \(n-1\) be the second number to having \(r\text{.}\) However, as we will see in the next chapter, counting the number of bit strings with a fixed number of ones is the same as counting the number of bit strings with a fixed number of zeros. It is just easier to say “choose \(r\)” than it is to say “choose \(n-1\text{.}\)”

The prototypical object for an unordered sample with replacement is something called a multiset. You can and should think of a multiset as a set where the elements can be repeated. However, there is really no such thing, because sets do not allow repetition, so we must be careful how we define a multiset.

Definition 9.5.6.

Let \(\Omega\) be a set of objects and fix an integer \(r\text{.}\) A multiset of cardinality \(r\) from \(\Omega\) is a function that assigns each element \(x\) of \(\Omega\) a multiplicity \(\alpha(x)\) so that the sum of all the multiplicities is \(r\text{.}\)

If all the multiplicities are just 1 or 0, then your multiset is just a set.

Example 9.5.7.

In the last example we considered the possibility where two activities were done Thursday, five Friday, and three Saturday. This example corresponds to the multiset

\begin{equation*} \{ T, T, F, F, F, F, F, S, S, S\} \end{equation*}

or more concisely, \(\{T^2, F^5, S^3\}\text{.}\) In this case \(\alpha(T)=2\text{,}\) \(\alpha(F)=5\text{,}\) and \(\alpha(S)=3\text{.}\)

In another example maybe four activities are done Thursday and six are done Friday, with none on Saturday. The corresponding multiset is \(\{T^4, F^6\}\text{,}\) which stands for the function that sets \(\alpha(T)=4\text{,}\) \(\alpha(F)=6\text{,}\) and \(\alpha(S)=0\text{.}\)

Example 9.5.8.

Multisets arise frequently (this will be a pun momentarily) in the context of probability and statistics. Suppose we have a class of 30 students. This class is a math class, so there are a limited number of majors. Let's say five: marine science (\(MS\)), math (\(MA\)), computer science (\(CS\)), engineering (\(EN\)), and information systems (\(IS\)). The instructor asks each student what their major is (for simplicity, each student has one major).

The distribution of majors is the list of possible majors together with the number of times, or frequency, with which each major appears in the list of responses. For a class of 30 students the number of possible distributions is

\begin{equation*} {{5-1+30}\choose{30}} = \frac{34!}{4!30!} = 46,376. \end{equation*}

Suppose that in this course 8 students are CS majors, 7 students are math majors, 5 students are marine science majors, 6 students major in engineering, and the rest major in information systems. This is one of the 46,376 distributions counted above. This particular distribution corresponds to the multiset

\begin{equation*} \{MS^5, MA^7, CS^8, EN^6, IS^4\} \end{equation*}

as well as the bit string

\begin{equation*} 0000010000000100000000100000010000. \end{equation*}

We end the chapter by summarizing the four formulas for counting the binary samples of various types. For a list of contexts for each type, see the beginning of the chapter. In the next chapter, we will explore some combinatorial identities that fill in some of the web of ideas connecting these formulas, particularly the ones for unordered samples.

Table 9.5.9.

	with replacement	without replacement
ordered	\(n^r\)	\(\dfrac{n!}{(n-r)!}\)
unordered	\(\dfrac{(n-1+r)!}{r!(n-1)!}\)	\(\dfrac{n!}{r!(n-r)!}\)