Updating your opinion

The Bayesian method consists of three steps:

quantifying your prior assumptions about a parameter;
establishing a probability model for an experiment and its outcome;
using Bayes’ Rule to quantitatively update your opinion to a posterior understanding.

$$ \mathrm{Prior} \times \mathrm{Likelihood} \rightarrow \mathrm{Posterior} $$

In the general setting there is a parameter $\theta$ that we want to learn about from an experiment. We have a prior assumption about it that be quantified by the function $\mathrm{Prior}(\theta)$. The function $P(\mathrm{Data} \, | \, \theta)$ is read “the likelihood of observing the data given that the parameter value is $\theta$.” The function $P(\theta \, | \, \mathrm{Data})$ is read “the likelihood that the parameter’s true value is $\theta$ given the observed data.” This are related through the general formula

$$ P(\theta \, | \, \mathrm{Data}) = \frac{P(\mathrm{Data} \, | \, \theta) \, \mathrm{Prior}(\theta)}{P_\mathrm{prior}(\mathrm{Data})} $$

where $P_\mathrm{Prior}(\mathrm{Data})$ is the “the likelihood of observing the data weighted by the prior distribution of $\theta$.”

Bayesian parameter estimation

The general perspective

Suppose a random variable $X$ has a probability density function $f$ that depends on some unknown parameter $\theta$. We write

$$ X \sim f(x \, ; \, \theta). $$

Some experiment is conducted and our job is to provide our best estimate for the value of $\theta$, along with an expression for our uncertainty, based on our pre-existing, or prior, assumptions about $\theta$, and a probability model for the data we observe in during the experiment.

$\mathrm{Prior}(\theta)$: Let $\phi(\theta)$ be a pdf that expresses our prior uncertainty about the parameter $\theta$.
Let $\mathbf{x}$ express the data that we have observed. This is often a vector of $n$ values: $\mathbf{x} = (x_1, x_2, \ldots, x_n)$.

Then, Bayes rule for continuous distributions says the following:

$$ \phi(\theta \, | \, \mathbf{x}) = \frac{f(\mathbf{x} \, ; \, \theta) \phi(\theta)}{f_\phi(\mathbf{x})} $$

where $f_\phi(\mathbf{x}) = \int_\Theta f(x \, ; \, t) \phi(t) \mathrm{d} t$ is the average likelihood of observing $\mathbf{x}$ weighted by the prior distribution $\phi$.

Updating your opinion

Bayesian parameter estimation

The general perspective

Conjugate priors