Sampling from a distribution

TL;DR You can generate a sample from a probability density function by applying the inverse of its cumulative distribution function to a continuous uniform distribution. ↗

Have you ever wondered how to take a sample from a probability density function? Some libraries like numpy provide functions for sampling from well-known distributions like gaussian, beta or poisson. Alternatively, different approaches exist to sample from specific distributions. For example, the Box-Muller transform can be used to retrieve samples from a gaussian. But how does sampling work for more generic distributions?

Because sampling ultimately depends on some source of randomness, we assume that we have access to a method for sampling from a continuous uniform distribution $Y \sim \text{uniform}(0, 1)$. Most programming languages offer a pseudo-random number generator: C++, JavaScript, Python.

Please note, that this post is only meant to give an intuition about sampling from a generic probability density function and is not meant as a real-world implementation—especially not in a cryptographic context.

Probability mass function

Let’s start by examining a simple random experiment: Tossing a coin. Rolling a die. In this setting there are two six equally likely outcomes with a probability of $P(X=x_i) =$ $\frac{1}{2}$ $\frac{1}{6}$ . We can imagine stacking the probabilities of each outcome on top of each other to receive a line of length one. Each segment of this line represents a different outcome and each outcome occupies an area equal to their probability.

If we drop a ball somewhere on this line (such that each place is equally likely), it lands on the segment belonging to an outcome $x_i \in \{\text{Heads}, \text{Tails}\}$ $x_i \in \{\text{one}, \text{two}, \text{three}, \text{four}, \text{five}, \text{six}\}$ with a probability of $\frac{1}{2}$ $\frac{1}{6}$ .

Something went wrong and an interactive element could not be displayed here.

This analogy provides us with the means of expressing the sampling of a generic, discrete distribution (e.g. tossing a coinrolling a die) in terms of sampling from a uniform distribution (i.e. dropping a ball somewhere on the line such that each point is equally likely).

Probability density function

Something went wrong and an interactive element could not be displayed here.

For a continuous random variable a probability density function $f(\cdot)$ is used to describe the distribution. This function does not assign probabilities to specific outcomes (since there are infinitely many and each has a probability of zero), but to a range of outcomes. It can be used to answer the question: What is the probability that the outcome of the random variable $X$ falls within the interval $[a,b]$? This probability is defined by the area under the curve: $$P(a \leq X \leq b) = \int_a^b f_X(t) dt $$

We can transform the continuous distribution into a discrete distribution by dividing it into two sections: $$\underbrace{P(-5 \leq X \leq -1)}_{P(A)} = \underbrace{P(1 \leq X \leq 5)}_{P(B)} = \frac{1}{2}$$ $$\underbrace{P(X \leq 0)}_{P(A)} = \underbrace{P(X \geq 0)}_{P(B)} = \frac{1}{2}$$ $$\underbrace{P(X \leq 0)}_{P(A)} = \underbrace{P(X \geq 0)}_{P(B)} = \frac{1}{2}$$

Analogous to sampling from a discrete random variable, we can now ‘stack’ the probabilities of the events A and B, observe into which section a sample from the uniform distribution falls and finally return the respective event. Since our events cover large areas of the domain, this is not sufficient to sample from the original distribution. However, by narrowing the intervals, we receive finer sampling results:

For narrower and narrower intervals this can be expressed with the cumulative probability density function: $$F_X(x) = P(X \leq x) = \int_{-\infty}^x f_X(t) dt$$

Something went wrong and an interactive element could not be displayed here.

In the cumulative distribution function our ‘line’ analogy corresponds to picking a value on the y-axis and mapping it to its respective x value. For this function a flat section indicates an unlikely event and a steep section (that covers much of the y-range) indicates a likely event. For example, the intervals $[-4, -2]$ and $[2,4]$ are likely to occur while the interval $[-1,1]$ is impossible to occur. intervals $[-5, -2]$ and $[2,5]$ are less likely to occur than the interval $[-1,1]$. interval $[-1,1]$ is likely to occur, while the intervals $[-5, -2]$, $[2, 5]$ are impossible to occur.

This intuition leads to the inverse transform sampling method and can be formalized as: $$ \begin{aligned} Y \sim \text{uniform}(0,1) && X = F^{-1}_X(Y) \end{aligned} $$

Although this approach seems compelling, we still need to prove that uniform sampling from the inverse of the cumulative distribution function is indeed equal (and not just close) to sampling from the original function. For this, we define a (generic) random variable $Y = F_X(X)$ which has a cumulative distribution function $F_Y(y) = P(Y \leq y)$: $$ \begin{aligned} F_Y(y) &= P(Y \leq y) & \text{definition of cdf}\\ &= P(F_X(X) \leq y) & \text{definition of $Y$}\\ &= P(X \leq F^{-1}_X(y)) & \text{apply inverse}\\ &= F_X(F^{-1}_X(y)) & \text{definititon of cdf}\\ &= y & \text{resolve identity} \end{aligned} $$ The cumulative distribution function $F_Y(y) = y$ corresponds to a continuous uniform distribution. This insight implies two things: First, if we take values from our random variable $X$ and plug them into its cumulative distribution function $F_X(X)$, the output is uniformly distributed. And vice versa, this implies that we can apply the inverse of the cumulative distribution function to values from a uniform distribution $F^{-1}_X(Y)$ to generate samples for our random variable $X$.