Suppose a random variable $X$ has a probability density function $f$ that depends on some unknown parameter $\theta$. We write
$$ X \sim f(x \, ; \, \theta). $$
Some experiment is conducted and our job is to provide our best estimate for the value of $\theta$, along with an expression for our uncertainty, based on our pre-existing, or prior, assumptions about $\theta$, and a probability model for the data we observe in during the experiment.
Then, Bayes rule for continuous distributions says the following:
$$ \phi(\theta \, | \, \mathbf{x}) = \frac{f(\mathbf{x} \, ; \, \theta) \phi(\theta)}{f_\phi(\mathbf{x})} $$
where $f_\phi(\mathbf{x}) = \int_\Theta f(x \, ; \, t) \phi(t) \mathrm{d} t$ is the average likelihood of observing $\mathbf{x}$ weighted by the prior distribution $\phi$.
For most of the commonly used probability models, there exists a specially chosen prior distribution such that, when you compute the posterior distribution, it exists within the same family of probability distributions. For example, the Beta distribution is conjugate for the Binomial parameter $p$; the Gamma distribution is conjugate for the Poisson parameter $\mu$; and the normal distribution conjugate to itself for the parameter $\mu$ when $\sigma$ is assumed to be known.
<aside> <img src="/icons/fleur-de-lis_purple.svg" alt="/icons/fleur-de-lis_purple.svg" width="40px" />
Suppose that $X$ is the outcome of a Binomial experiment with $n$ trials and an unknown YES probability $p$.
If we express our prior uncertainty for $p$ in terms of the $\mathrm{Beta}(a_0,b_0)$ distribution, then the posterior distribution for $p$ is $\mathrm{Beta}\big(a_0 + x, b_0 + (n-x)\big)$.
In particular, the Bayes estimator for $p$ is
$$ \hat{p} = \frac{a_0 + x}{a_0 + b_0 +n}. $$
The standard deviation of the posterior distribution is then
$$ \sigma_{\mathrm{post}} = \sqrt{\frac{\hat{p}(1-\hat{p})}{n + 1}}. $$
When $a_0 = 0$ and $b_0 = 0$, we say that we are using a non-informative prior.
</aside>
Proof. Since the prior for $p$ is a $\mathrm{Beta}\big(a_0, b_0\big)$ distribution, its pdf is
$$ \phi(p) = C_0 \, p^{a_0 - 1}(1-p)^{b_0 - 1} $$
where $C_0$ is a constant that depends only on $a_0$ and $b_0$ (and specifically not on $p$).
Bayes theorem for the posterior distribution then implies
$$ \begin{aligned} \phi(p \, | \, x) &= \frac{\binom{n}{x} p^x (1-p)^{n-x} C_0 p^{a_0 - 1} (1-p)^{b_0-1}}{f_\phi(x)} \\ &= \frac{C_0 \binom{n}{x}}{f_\phi(x)} \, p^{a_0 + x - 1} (1-p)^{b_0 + (n - x)- 1} \\ &= C_\mathrm{post} \, p^{a_0 + x - 1} (1-p)^{b_0 + (n - x)- 1}. \end{aligned} $$
The key thing to notice is that the terms $C_0$, $\binom{n}{x}$, and $f_\phi(x)$ do not depend on $p$, so they can be gathered together into one constant $C_\mathrm{post}$. We may be worried that it would be hard to compute $C_\mathrm{post}$ but recall that $\phi(p \, | \, \mathbf{x})$ is a probability density, so it must integrate to 1. Since we recognize the terms involving $p$ as having the form of the Beta distribution, then we know what value $C_\mathrm{post}$ must have. $\square$
Suppose that I have purchased a bag of Candio candies that contains 100 candies. Of these, 23 are strawberry-flavored. Assuming a non-informative prior, provide a 2$\sigma$-posterior credible region for the true proportion of Candios that are strawberry-flavored.