Loop Example: Monte Carlo Experiments

In this example:

Random Numbers, BUFFER, Single Line Loops.

A Monte Carlo experiment is a mathematical method that uses a series of random numbers to solve a problem for which no direct algebraic solution is available; the name alludes to playing roulette at the Monte Carlo casino.

With its three types of random numbers, Hypatia is well suited for performing Monte Carlo experiments. Some may involve nested loops, but by using your own index variables, as demonstrated on the previous page, this problem can be solved.

As a reminder, these are Hypatia's three types of random numbers:

RAND is a pseudo constant that returns a random number in the range of 0.0 to 1.0 (inclusive).

a b RANDINT is a 2-argument operator that returns a random integer in the range of a to b (inclusive).

a b RANDND is a 2-argument operator that returns a normal distributed random number with mean a and standard deviation b.

(0 1 RANDND means standard normal distribution, for which you can also write 0 0 RANDND.)

Here is a rather simple Monte Carlo experiment, for which we do not even need to use a script (though if we wanted to do it repeatedly with different parameters, a script would make sense).

(As always, the question mark at the start of the line is Hypatia's input prompt, you do not type it.)

The problem to solve: When observing heterosexual couples, in the large majority of cases the man is taller than the woman. Is this only due to a difference in average body heights between the sexes, or is there also a cultural factor involved?

I cannot actually answer the question because I do not have the data for real life couples, but I can use Hypatia to calculate what percentage of couples with men larger than women, women larger than men, or equal heights (with an arbitrary measurement tolerance) would be statistically expected if the pairings were random regarding height.

There's probably a formula to calculate this but I do not know it, so, let's use a Monte Carlo experiment.

Human body heights are approximately normal distributed. I cannot vouch for the correctness of these numbers, but allegedly in the U.S. the mean height of men is 178 cm with a standard deviation of 7.6, while the mean height of women is 164 cm with a standard deviation of 6.35.

Let's create a hundred thousand random couples (this will take a few seconds), and write their differences in height to the buffer.

(While we usually use * in a loop command for infinite loops which run until an exit condition is met, and where the maximum number of passes that * represents only serves as a safeguard, here we use * to actually stand for this number -- you remember, one hundred thousand by default, but with the MAXLOOP command you can set any value from ten to ten million.)

? BUFFER START

? DO * :: 178 7.6 RANDND 164 6.35 RANDND - &

= 3.93700787401575

The result that is shown has no meaning, it is just the last random difference (and thus the value of $), the 100000 values are stored in the buffer.

Now we copy the content of the buffer to a user defined element, discard the buffer which also turns off buffer mode, and then we can use the n-argument operators N+, N0 and N- to count the pairings with positive, zero or negative values (a positive value here defined as the man being taller than the woman).

To these operators "zero" means any absolute value below the zero threshold (see page The Zero Threshold in chapter "Numbers").

By setting the zero threshold (by default the very small value of 1e-16) we can define below which difference we still consider the heights as being equal -- let's say, 1 cm.

? @pairs = (buffer)

@pairs = 14.5550868671434 13.1752588519109 7.06750654917872 10.0979210053069 -12.8059232074885 20.5201707541086 -2...

? BUFFER DISCARD

Buffer deleted, buffer mode is now OFF

? $zero = 1

? @pairs N+

14.5550868671434 13.1752588519109 7.06750654917872 10.0979210053069 -12.8059232074885 20.5201707541086 -2.38360650...

= 90'500

? @pairs N0

14.5550868671434 13.1752588519109 7.06750654917872 10.0979210053069 -12.8059232074885 20.5201707541086 -2.38360650...

= 3'036

? @pairs N-

14.5550868671434 13.1752588519109 7.06750654917872 10.0979210053069 -12.8059232074885 20.5201707541086 -2.38360650...

= 6'464

(This was the result I got. If you repeat the test you will get slightly different results, but this lies in the nature of Monte Carlo experiments.)

So, if height differences in US heterosexual couples were random, we'd expect to see some 90.5% of couples where the man is taller than the woman, some 3% with differences below 1 cm, and some 6.5% where the woman is taller than the man. Unfortunatey I do not have the empirical data that we'd have to compare with these values to decide the original question, but be that as it may, this example was meant to demonstrate Hypatia's Monte Carlo abilities.

If you don't leave Hypatia after this calculation do not forget to set the zero threshold back to the default vaule of 1e-16 with DEL $zero.

Monte Carlo experiments can be used for a wide variety of problems, and while some of them may require more powerful tools than Hypatia, Hypatia can do a lot in this direction.

To conclude, let's do a quick test to verify that we can trust Hypatia's random generator -- let's roll a die 100000 times and add up the eyes (the expected result would be 350000).

Using silent mode in the calculation is important -- otherwise each one of the one hundred thousand loop passes would write the current value of $ to hy, which would make this loop very slow. (Using buffer mode would speed it up, too, but, since in this example we do not need the loop pass results, silent mode is easier to use and even faster.)

Note that silent mode still applies when the loop ends -- the final value of $ gets displayed, but not written to hy.

? 0

= 0

? DO * :: $ 1 6 RANDINT + #

= 349'162

Looks good enough, but, while we're at it, let's also see if we come close enough to rolling the expected number of sixes (which would be 16667), or, if we divide 100000 (or 1e5) by that number, how many times on average we have to roll the die to get a six (which should be 6 times):

? 0

= 0

? DO * :: IF 1 6 RANDINT 6 == THEN 1 + #

= 16'708

? 1e5 $ /

= 5.98515681110845

In the DO ... calculation line we can write ... THEN 1 + because the + operator misses an argument, and $ is automatically inserted, so this is the same as writing ... THEN $ 1 +

This works only with the first operator in a calculation -- in the previous example RANDINT was the first operator, therefore we had to include $ and write $ 1 6 RANDINT + # (see page Chain Calculations in chapter "First Steps").

If you repeat these tests you'll get different results, which should be scattered around the expected value -- could we observe this by repeating the test ten times in a loop?

We cannot put the line DO * :: IF 1 6 RANDINT 6 == THEN 1 + # in a script and call that in a loop -- but we can have a loop with one million passes, and at every one hundred thousand passes save our result, and reset the counter -- this way, we get 10 independent trials, of which at the end we calculate the average.

This requires a script, let's call it test6. Because we only have 10 results, we do not need to use the buffer -- we do have to clear hy at the start, though.

At each pass of the loop we roll the die and increase the counter by 1 if we roll a six. Every 100000 rolls of the die we calculate the average number of tries we needed to roll a six, write this result to hy in accumulation mode, display it with the command =, and set the counter back to 0.

Because to keep it simple we use $ instead of a variable as our counter, and so we have to set $ to 0 at the beginning of the loop. Since this line is a calculation, it would write the result 0 to hy at the end of the first loop pass (which would be after the commmand & has cleared hy!) -- to avoid this we have to use silent mode. We also need silent mode when we add 1 to the value of $ after we've rolled a 6, and when we set $ back to 0 after the end of each cycle of 100000 rolls of the die. (Maybe it would be easier to use a variable, after all ...)

I1: 0 #

I1: &

IF 1 6 RANDINT 6 == THEN 1 + #

IF I 100000 MULTIPLE THEN 1e5 $ / &&

ALSO: =

ALSO: 0 #

I*: # Average:

I*: (hy) MEAN

To let this script perform ten tests with one hundred thousand passes each, we need to set the maximum number of loops to 1 million, and then call our script test6 with that maximum.

The script displays the ten results as they are calculated every 100000 passes, and at the end calculates their mean, which, as the last calculation result and therefore the current value of $, gets displayed when the loop has ended.

? MAXLOOP 1 MILLION

Max. number of passes set to: 1'000'000

? _*test6

= 6.01865783930184

= 5.97514340344168

= 5.91191250369495

= 5.99916011758354

= 6.01793344165614

= 5.99880023995201

= 6.0146758089739

= 6.03864734299517

= 6.0146758089739

= 5.95486214494134

Average:

= 5.99444686515145

Which is very close to the expected 6.0 rolls needed on average to roll a six, though even with the total of 1 million random numbers you'll still get slightly different results each time you repeat this experiment.

Hypatia's (that is, Phix's) random numbers are software-generated and not as strictly random as those that a hardware random number generator (or maybe a more sophisticated software random number generator) would provide.

I've tested all three types of Hypatia's random numbers, found no repeating patterns and no systematic biases, and for most practical purposes except cryptography and advanced scientific research they seem to be random enough. Be aware that they are not perfect, though.

Home, Up: Loop and Script Examples, Prev: Loop Example: Nested Loops, Next: Loop Example: Fibonacci Sequences