Sampling from Probability Distribution

Subscribe Send me a message home page tags


#math  #simulation  #probability 

In this post, we will talk about how to generate samples from a given probability distribution. The building block is the sampling from the standard uniform distribution. From there, we can use Box-Muller Transform to generate samples that follow a normal distribution. We will also provide a brief description of the rejection sampling, which transforms sampling from an easy-to-generate distribution to a sampling from the target distribution.

Uniform Distribution

Linear Congruential Generator

This is one of the most commonly used algorithm to generate a sequence of pseudo-randomized numbers.

$$ X_{n+1} = (aX_{n} + c) \;\;\; mod \; m $$

Fibonacci Method

$$ X_{n} = (X_{n-1} + X_{n-2}) \;\;\; mod \; m $$

Von Neumann Method

To use the Von Neumann method, we first specify the number of digits and an initial value. To generate the next number, we calculate the square of the current number, adding zeros to the left if necessary to make it have doubled number of digits and the use the middle part as the next number.

For example, if the number of digits is 2 and the initial value is 11 (e.g. \(X_0 = 11 \)). To calcualte \(X_1\), we calculate the square of 11, which gives 121. In this case, we need to add a zero to the left to make it have 4 digits. Now we have 0121 and we select the middle part 12 as the value for \(X_1\).

This method has many drawbacks but it's one used by John Von Neumann in 1946 when he worked with ENIAC.

Normal Distribution

There are two ways to generate samples from Normal distribution.

Use Central Limit Theorem

We could rely on Central Limit Theorem: generate samples from a known distribution and calculate the average. For example, we could generate a \(n\) iid samples from uniform distribution and the average can be calculated as follows:

$$ \sqrt{\frac{12}{n}}\left( \sum^{n}_{j=1} u_j - \frac{n}{2}\right) $$

Use Box-Muller Transform

Suppose \(U\) and \(V\) are uniform distribution and we have

$$ X = \sqrt{-2log U} sin(2\pi V) \\ Y = \sqrt{-2log U} cos(2\pi V) $$

Then \(X\) and \(Y\) are independent normal distributions.

We can find a proof here. The main steps are as follows:

The last step involves a change of variables.

Cumulative Density Function

Let \(F(x)\) be the cumulative density function. That is

$$ F(x) = P(X \leq x) $$

Suppose \(U\) is a uniform distribution on \([0, 1]\). We observe that \(X\) and \(F^{-1}(U)\) have the same cumulative density function

$$ \begin{eqnarray} P(F^{-1}(U) \leq x) & = & P(U \leq F(x)) \\ & = & F(x) \end{eqnarray} $$

The first equation is due to the fact that \(F\) is an increasing function. The second equation holds because \(U\) is a uniform distribution.

Now suppose we want to generate sample from a given distribution. We can follow the steps below:

This method involves the calculation of an inverse function so it might be computationally expensive unless we have a closed formula for the inverse function.

Rejection Sampling

Suppose we want to generate samples with probability density function \(f\). We further assume there is an easy-to-sample probability density \(g\) such that

$$ \exists \epsilon, \; h(x) = \epsilon\frac{f(x)}{g(x)} \leq 1 $$

The process of rejection sampling is as follows:

The accepted samples from the above process follow the distribution with probability density \(f\).

----- END -----

If you have questions about this post, you could find me on Discord.
Send me a message Subscribe to blog updates

Want some fun stuff?

/static/shopping_demo.png