Week10

Week10#

Random Number Generators#

Exercise 1) Linear Congrugential Generators#

In the lecture notes of L17, we showed several examples of LCGs. You will now be tasked with creating a more general framework for creating and using LCGs in an object-oriented manner.

Recall that an LCG generates a new pseudorandom number according to the formula

\[ X_{n+1} = a\cdot X_n + c \mod m.\]

1a) Defining the class#

(In Python) Define a class LinearCongrugentialGenerator. The class should have a constructor that takes the seed as input, and sets the state of the pRNG to be this seed.

The constructor should also take the parameters $a$, $c$ and $m$ as keyword arguments and store these in the class. As the default values, choose one of the LCG commonly used parameters found in this table on wikipedia. You can for example choose the one used by glibc, i.e., the gcc compiler.

1b) Adding a call method#

Create a __call__ special method that advances the state of the pRNG by one number, and returns this new number.

Test your method by implementing an instance of your pRNG and producing ten numbers.

1c) Random floats#

Add a new method called rand() that should be called with no input. The function should return a random floating point number on the interval $[0, 1)$.

Hint: To do this, the method should call on the __call__ special method to advance the state and produce a random integer. This integer then needs to be scaled to a proper float by dividing it by some value.

1d) Uniform#

Add a new method called uniform(a, b) that takes two numbers in: $a$ and $b$ and returns a uniformly distributed floating point number on the interval $[a, b)$.

Hint, you can first call rand() to get a number on the range $[0, 1)$, and then scale this number by multiplying and shifting it.

1e) Randint#

Now add a method called randint(a, b) (short for random integer). That should return a uniformly distributed integer on the interval $[a, b]$.

Test your function by throwing 1000 dice, and computing the average result, which should be close to 3.5.

1f) Normally Distributed Numbers#

Now add a method normal() that should return a normally distributed number with standard deviation 1 and a mean of 0.

To compute a normally distributed number, use the Box-Muller metho, as described in the lecture notes. To do this, you need two uniformly distributed numbers on the interval $[0, 1)$: $U_1$ and $U_2$. Then you can compute the two independent normally distributed numbers based on the formulas:

\[\begin{split}\begin{align} Z_1 = \sqrt{-2 \ln U_1}\cos (2\pi U_2), \\ Z_2 = \sqrt{-2 \ln U_1}\sin (2\pi U_2). \end{align}\end{split}\]

Note that to compute and return one normally distributed number, you can simply draw two numbers with rand(), and compute $Z_1$, and ignore $Z_2$.

However, slightly more efficient, is to compute both values, and then use the other number the next time normal() is called. See if you can find a nice way to do this, and if not, simply implement the simpler solution of only computing $Z_1$.

Exercise 2#

2a) Implementing RANDU#

Use your LinearCongrugentialGenerator class to implement a RANDU pRNG. RANDU is a LCG with $a = 65539$, $c=0$ and $m=2^{31}.$$

2b) - Checking the mean of the generated numbers#

As we have discussed in the lectures. For a pRNG to produce numbers that “look” random, they have to reproduce certain statistical properties. One of these is the mean. If we are drawing numbers on the interval $[0, 1)$, then the average over a large number of random numbers, i.e., the sample mean, should tend to exactly 0.5.

Generate $n=10^6$ samples from your RANDU class and find the sample mean. Does RANDU reproduce the expected mean?

2c) Checking the sample variance#

The random numbers should, in addition to respecting the mean, reproduce the variance of the distribution. For random numbers drawn from a uniform distribution between 0 and 1, this variance should be $1/12$.

Using your $n=10^6$ generated samples, check that the sample variance is reasonable.

2d) Moments of the pdf (Only for those who have taken or are taking STK1100)#

We are attempting to draw random numbers from the probability density function

\[\begin{split}f(x) = \begin{cases} 1 & \mbox{ if } 0 \leq x < 1, \\ 0 & \mbox{else}. \end{cases}\end{split}\]

For any pdf, the moments of the sample are defined as: $$E(x^k) = \int_{-\infty}^\infty x^k \cdot f(x) \ {\rm d}x.$$

Show that for our given pdf, that the $k$-th moment can be written as $$E(X^k) = \frac{1}{k+1}.$$

From this, use that fact that the variance can be written as $${\rm Var}(X) = E(X^2) - E(X)^2,$$ to show that the variance of the pdf is 1/12.

Plotting the correlation found in RANDU#

We have seen that RANDU reproduces the mean and the variance of its distribution, which is good. However, there are other statistical properties it also needs to follow. One of these is that different samples from the distribution, i.e., different generated numbers, should be uncorrelated. It turns out that RANDU breaks this requirement, horribly. This is very briefly shown in L17.

A common way to show how bad RANDU actually is, is to plot random points within a three dimensional unit cube. A unit cube is a cube with sides that goes from 0 to 1 in all three dimensions. We can place a randomly located point inside the cube by drawing three random numbers in the $[0, 1)$ range, one for each of the three coordinates $x$, $y$ and $z$. If all three coordinates are chosen randomly and independent of each other, the random point will have an equal likelihood of landing anywhere within the cube. For RANDU however, if we plot this out, we see that these “random” points aren’t distributed uniformly throughout the cube at all, but rather all land in specific planes. Let us produce such a plot.

2e) Drawing random points#

Use your RANDU-generator to generate $n=1000$ such points. Note that for this to work you need to draw the three coordinates for each point consecutively, you cannot first draw all the x-positions, then all the y-positions for example.

2f) A three dimensional scatter plot#

Now, plot your $n=1000$ points in a 3D scatter plot. If you do not know how to draw a 3D scatter plot. Take a look at this matplotlib example script.

Once you have your scatter plot, the points might look uniformly distributed. To properly see that they lie in planes, we need to look at it from the right angle. If you are plotting outside Jupyter, the plot window should be interactive and so you can simply drag the view around untill you find a good angle. If you are plotting inside Jupyter you can use ax.view_init(elevation, method) before you show the plot. For example view_init(30, 60) should be a good angle. At least for me.

Increase the number of points to $n=10000$ to really make the planes apparent.

2g) A proper uniformly unit cube#

Repeat the plot, but this time replace your RANDU generater with numpy.random.rand() which produces floats in the range $[0, 1)$ based on the Mersenne Twister algorithm. Verify that no planes are visible in the cube in this case.

Exercise 3) Randomness in C++#

Before you do this exercise, be sure to read about generating random numbers in C++ in the lecture notes, or watch this video:

https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful

For this exercise, the C++ reference is a good resource.

Exercise 3a) Uniform numbers#

Create a C++ script that uses the Mersenne Twister algorithm to produce 10 uniformly distributed numbers in the range 0 and 1

Exercise 3b) Normally distributed numbers#

Use a different distribution to produce normally distributed numbers.

Exercise 4) The Birthday Problem#

The Birthday Problem concerns the probabilty that two or more people share a birthday in a room with $n$ people. This is also a good test of an RNG, because an RNG of truley uncorrelated numbers should produce duplicates with a given frequency.

4a) Drawing a random birthday#

Use numpy.random.randint (click for the offical reference) to draw a birthday as a random integer in the range $[1, 366]$. We have 366 possibilities because we include leap years.

4b) Drawing $n$ random birthdays#

Now use the size keyword to draw a whole set of random birthdays, to simulate the birthdays of $n$ randomly people located in the same room. As an example, let’s say we are $n=20$ people in the same room.

4c) Checking for duplicates#

If the array we get from randint contains any duplicate values. Write a function for checking if any duplicates are contained in the array.

Hint: There are many ways to check for duplicates. You can for example do len(np.unique(birthdays) to check how many unique birthdays there are. Or you can use the fact that a Python set cannot contain any duplicates, and so on. Give it a go on your own, and if you cannot figure it out, google is sure to be helpful :)

4d) Repeated Simulation#

Now use a loop to repeat the experiment of drawing $n=23$ birthdays and checking wether there is a duplicate 1000 times. Count the number of times there is a duplicate, and the number of times there is not.

The probabilty should be close to 50/50 for 23 people, if the RNG returns the expected number of duplicates. Did you get a value close to 50/50?

4e) Edge cases#

For $n=1$, the probability of a duplicate is “obivously” 0. And for $n=366$ it is “obviously” 1. Why is this?

Drawing out the full curve#

The probability of a duplicate birthday can be found analytically, and produces the curve shown in this Figure:

Let’s try to produce a similar function ourselves.

4f) Drawing the Curve#

Generalize your answer to 4D so that it instead finds the probability of $n$ people having a shared birthday.

4g) Plotting the curve#

For $n$ in the range $[0, 100]$, simulate 1000 cases and count the number of shared birthdays for each case. Divide by the number of simulations, $1000$, to find the probability of a shared birthday for each $n$

If you have managed to find the probability for each $n$, use plt.step to plot it. The function plt.step works just like plt.plot, but plots the data as a discontinous stepwise function, just like in the analytical function above.

4h) Comparing the curves#

Does your program look like the analytical function? Should it?

More exercises#

For more exercises, turn to Langtangen Chapter 8. We suggest the following:

Exercise 8.8
Exercise 8.9
Exercise 8.19
Exercise 8.21