A Bernoulli trial is the most basic building block in probability theory. It is the mathematical representation of a YES/NO, TRUE/FALSE, or SUCCESS/FAILURE dichotomy.

Throughout these notes, we will reserve the letter $Y$ to indicate YES/NO outcomes. For YES, we will assign the numerical value 1 and for NO we will assign the numerical outcome 0. Given a value $p \in [0,1]$, we write

$$ Y \sim \mathrm{Bernoulli}(p) \, \text{ if } \,Y = \begin{cases} 1,& \text{with probability } p; \\ 0,& \text{with probability } 1-p. \end{cases} $$

Traditionally, the parameter $p$ is called the success probability of the trials.

When there is a collection of Bernoulli trials, we write the outcome of $i$th trial as $Y_i$ and, in general, these might be assigned different success probabilities.

<aside> <img src="/icons/fleur-de-lis_yellow.svg" alt="/icons/fleur-de-lis_yellow.svg" width="40px" />

Experiments comprised of Bernoulli trials

Let $n$ denote an experiments sample size and for each $i \in \{1::n\}$, let the success probabilities $\{p_i\}{i=1}^n \subset [0,1]$ be given. We say that the set of random variables $\{Y_i\}{i=1}^n$is a Bernoulli experiment of size $n$ if the $Y_i$ are respectively $\mathrm{Bernoulli}(p_i)$ and the set of random variables is mutually independent. We write

$$ \text{Bernoulli Experiment: } \quad Y_i \stackrel{indep.}{\sim} \mathrm{Bernoulli}(p_i) $$

If the success probability of each trial is the same value $p$, then we say the experiment is iid Bernoulli. We write

$$ \text{iid Bernoulli Experiment: } \quad Y_i \stackrel{iid}{\sim} \mathrm{Bernoulli}(p). $$

</aside>

We can take advantage of the numerical assignment of 0 and 1 to do things like count the number of successes in a group. In this way, we can write the number of YES outcomes among $n$ trials as a sum $X = \sum_{i=1}^n Y_i$ . This gives rise to one of the most fundamental distributions in all of probability theory, the Binomial distribution, which is the number of YES outcomes in an iid Bernoulli experiment.

<aside> <img src="/icons/fleur-de-lis_yellow.svg" alt="/icons/fleur-de-lis_yellow.svg" width="40px" />

Binomial Random Variables

Let $Y_i \stackrel{iid}{\sim} \mathrm{Bernoulli}(p)$ be an experiment of size $n$. Then the sum of these random variables is said to have a Binomial distribution with $n$ trials and success probability $p.$ Mathematically, this is expressed

$$ X \sim \mathrm{Binom}(n, p) \text{ if } X \coloneqq \sum_{i=1}^n Y_i. $$

</aside>

If we allow for the possibility of an “infinite-sized” experiment, we can define two distributions of “firsts”.

<aside> <img src="/icons/fleur-de-lis_yellow.svg" alt="/icons/fleur-de-lis_yellow.svg" width="40px" />

Random variables representing Bernoulli firsts

Let $Y_i \stackrel{iid}{\sim} \mathrm{Bernoulli}(p)$ be an experiment with an infinite number of trials.

Geometric Random Variables: We say that $X$ is geometrically distributed with success probability $p$ if it is the index of the first success in the sequence of trials, i.e.,

$$ X \sim \mathrm{Geom}(p), \text{ if } X \coloneqq \min\{i \in \mathbb{N} \, : \, Y_i = 1\}. $$

Negative Binomial Random Variables: We say that $X$ has a negative binomial distribution with size parameter $k$ and success probability $p$ if it is the index of the $k$th success in the sequence of trials, i.e.,

$$ X \sim \mathrm{NegBinom}(k,p) \text{ if } X = \min\Big\{n \, : \, \sum_{i=1}^n Y_i= k\Big\} $$

</aside>

The probability of at least one

During the pandemic, there were periods of widespread quarantine, and then periods when we would come out of isolation and start to meet in groups again. At the height of the pandemic, we were asked to restrict ourselves to “pods” of 5-10 people, and not have close contact outside of that group. Then as the prevalence of the disease lessened, we could meet in groups of 20, and then 50, and then 100, and then 1000s.

This progression of green-lighting wider and wider gatherings had a fundamental probability question at root:

As a public health official, if I have a safety target $q \in (0,1)$ and the prevalence of a disease $p \in (0,1)$, what is the largest crowd size I can have so that there is less than a probability $q$ that at least one person in that crowd has that disease.

There are a few other ways to pose the same question. One would be to ash: if 1 in 100 people have the disease ($p$), what is the largest crowd size I can permit so that I can be 95% certain ($1-q$) that no one has the disease in that crowd. Another is posed from a survey perspective: if the national disease prevalence is 1 in 100 ($p$), and I want to conduct a survey to decide whether or not the disease is in my community, how large of a sample size do I need so that I can 95% ($1-q)$ certain that the disease is not yet in my small town?

The fundamental assumption driving the calculation follows from whether you can safely assume that whether or not individuals in your crowd have independent probabilities of having the disease.

The birthday problem and sampling with and without replacement