Random selections from a set

Question 1: Your good friend has a bag of forty candies. You ask if he will share some of them with you. He generously gives you ten, but you notice that there are no red cherry candies (your favorite flavor!) in the group he gave you. You sneak a look at what is left in the bag and see six cherry candies that he did not share with you. No fair!!!

Is this evidence that your friend deliberately kept the good candies for himself?

Question 2: A law school student is concerned about bias in jury selection processes. She notices that in a particular jury pool of 50 citizens, 15 of them are above the age of 65. However, when the jury selection process is completed, there are 12 members of the jury and only 2 are over the age of 65.

Is this evidence that there was bias in the process of selection? Or can it be explained by chance?

These scenarios involve a process in which there are $N$ members of a total population size. As part of the experiment, we will select $n$ members of the population to be in our focal group.

Now, among the members of the population, there is some condition of interest and for $K$ members of total population we would say YES they have the condition.

In the end, we summarize the experiment by counting $X$, which is the number of members of the study group who have the condition.

The letters $N$, $K$, and $n$ are called the parameters of the problem. They are considered to be fixed before the experiment is conducted. The letter $X$ represents the (random) outcome of the experiment. If we “re-ran” the scenario, the parameters would be the same, but the outcome might be different.

In the questions presented above, the parameters have these values

Question 1: $N = 40, K = 6, n=10$ with the experimental outcome $X=0.$ Question 2: $N = 50, K = 15, n = 12$ with the experimental outcome $X = 2.$

<aside> <img src="/icons/fleur-de-lis_purple.svg" alt="/icons/fleur-de-lis_purple.svg" width="40px" />

Definition (Hypergeometric Random Variable)

An integer-valued random variable $X\sim \text{HyperGeom}(N,K,n)$ if

$$ P(X = x) = \frac{\displaystyle \binom{K}{x}\binom{N-K}{n-x}}{\displaystyle \binom{N}{n}}. $$

For the formula to be valid, the number of target individuals that are selected $x$ must be less or equal to than the number of selections: $0 \leq x \leq n$. Also, $x$ is bounded by the number of members of the target group: $0 \leq x \leq K$.

</aside>

Here are some examples of pmfs of Hypergeometric random variables with different choices of parameters.

HypergeomtricN16K3n10.png

HypergeometricN100K3n10.png

HypergeometricN100K30n10.png