In this post, we will talk about how to generate samples from a given probability distribution. The building block is the sampling from the standard uniform distribution. From there, we can use Box-Muller Transform to generate samples that follow a normal distribution. We will also provide a brief description of the rejection sampling, which transforms sampling from an easy-to-generate distribution to a sampling from the target distribution.

### Uniform Distribution

#### Linear Congruential Generator

This is one of the most commonly used algorithm to generate a sequence of pseudo-randomized numbers.

#### Fibonacci Method

#### Von Neumann Method

To use the Von Neumann method, we first specify the number of digits and an initial value. To generate the next number, we calculate the square of the current number, adding zeros to the left if necessary to make it have doubled number of digits and the use the middle part as the next number.

For example, if the number of digits is 2 and the initial value is 11 (e.g. \(X_0 = 11 \)). To calcualte \(X_1\), we calculate the square of 11, which gives 121. In this case, we need to add a zero to the left to make it have 4 digits. Now we have 0121 and we select the middle part 12 as the value for \(X_1\).

This method has many drawbacks but it's one used by John Von Neumann in 1946 when he worked with ENIAC.

### Normal Distribution

There are two ways to generate samples from Normal distribution.

#### Use Central Limit Theorem

We could rely on Central Limit Theorem: generate samples from a known distribution and calculate the average. For example, we could generate a \(n\) iid samples from uniform distribution and the average can be calculated as follows:

#### Use Box-Muller Transform

Suppose \(U\) and \(V\) are uniform distribution and we have

Then \(X\) and \(Y\) are independent normal distributions.

We can find a proof here. The main steps are as follows:

- Let \(R = \sqrt{-log U}\) and \(\Theta = 2\pi V\).
- Calculate the distribution of \(R\) and \(\Theta\).
- Think of \(X\) and \(Y\) as function of \(R\) and \(\Theta\) and calculate \(P(x_1 \leq X \leq x_2, y_1 \leq Y \leq y2)\) in coordinate \(xy\).

The last step involves a change of variables.

### Cumulative Density Function

Let \(F(x)\) be the cumulative density function. That is

Suppose \(U\) is a uniform distribution on \([0, 1]\). We observe that \(X\) and \(F^{-1}(U)\) have the same cumulative density function

The first equation is due to the fact that \(F\) is an increasing function. The second equation holds because \(U\) is a uniform distribution.

Now suppose we want to generate sample from a given distribution. We can follow the steps below:

- Calculate the cumulative density function \(F\).
- Generate a sample \(u\) from a standard uniform distribution.
- Calculate \(F^{-1}(u)\), which has the given distribution.

This method involves the calculation of an inverse function so it might be computationally expensive unless we have a closed formula for the inverse function.

### Rejection Sampling

Suppose we want to generate samples with probability density function \(f\). We further assume there is an easy-to-sample probability density \(g\) such that

The process of rejection sampling is as follows:

- Generate a sample \(x\) with probability density \(g\).
- Calculate \(h(x)\).
- Generate a sample \(u\) from an independent uniform distribution.
- If \(u \leq h(x) \) then accept \(x\) as a sample; otherwise reject \(x\) and repeat the process.

The accepted samples from the above process follow the distribution with probability density \(f\).

----- END -----

©2019 - 2022 all rights reserved