A Plea for an Equal Opportunity Random

I was reading the documentation for Random.nextDouble() in Java 8

The general contract of nextDouble is that one double value, chosen (approximately) uniformly from the range 0.0d (inclusive) to 1.0d (exclusive), is pseudorandomly generated and returned

Nothing new here. It just returns a uniform distributed random value in the interval [0, 1) (uniform means that for any two subintervals with the same length there is an equal chance the returned number is inside any of them).

But, can it return every possible double between 0 and 1? Let's continue reading.

The method nextDouble is implemented by class Random as if by:

public double nextDouble() {
    return (((long)next(26) << 27) + next(27)) / (double)(1L << 53);
}

By looking the Javadoc for next(int n) we learn that it returns a random integer between 0 and 2n. That is, n random bits of information.

The above code generates a long with 53 random bits and then divides it by 253. As the IEEE double format has 53 bits of information in the significand part, that seems correct. It is filling the significand with random bits.

IEEE Double Format
Image by Charles Esson / CC BY-SA
Double Formula

If we look again at the above code, we realize that it returns at most 253 different values. But, how many doubles are in the interval [0, 1)? The answer is about 262. nextDouble() returns just a tiny subset of all the possible double numbers (1 in 500). This small elite of privileged doubles are selected while the vast majority are neglected.

NoLuck
Sorry little fellow, but you never had a chance.

So, where is the problem? Let's take a closer look at the returned values. Specially the significand part. Let's generate some random doubles and dump the bits in the significand part.

bitmap
Each row is a random value. Each column is a bit.

They look mostly random, except for the rightmost column (column 0). There seems to be very few 1s. If we repeat the experiment more times we discover that only 25% of the rows have a 1 at column 0. This lack of 1s also affects (in part) the nearby columns.

What is happening here? We generate fully random bits, why is that the rightmost column has so much 0s? The answer is that after the significand is filled with random bits, the double is normalized. There are multiple ways to express the same value but just one of them is in normalized form. For example (in binary):

0.01 × 2-1
0.10 × 2-2
1.00 × 2-3

All the above above expressions represent the same value, but only the last one is normalized. In the normalized form, you never have leading zeros The normalization process shifts the bits to the left and fill the right part with zeros until the leftmost bit is a 1. That explains the excess of 0s on the rightmost columns (Trivia: as the first bit of a normalized value is always 1, you don't need to keep it. Only the fraction part is stored).

So, if the first bit is 1, the number is already normalized and the last bit may be 0 or 1 . But, if the first bit is 0, after normalization the last bit becomes 0. Likewise, If the first two bits are 0, after normalization the last two bits becomes 0. And so on.

probs
Probability of finding a 1 at column i

Can we do any better? We are generating 53 random bits, and then normalizing. What if we generate infinite random bits, normalize them, and finally truncate to 53 bits? That is equivalent as instead of inserting 0s on the right, insert more random bits.

We can do some tests on this code. Let’s dump the significand bits:

bitmap2

Now the distribution of 0s and 1s seems much more even. Also the average and variance of the returned doubles seems right.

With this code, every single double in the interval [0, 1) has a chance (provided a good enough entropy source). One particular value of interest is 0. How can the above code ever return 0? If the next() method happens to returns 0 continuously for 35 loop cycles, the divisor variable underflows and is rounded down to 0.

Moreover, the proposed code doesn’t seem to be any slower than the Java nextDouble().

One consideration is that IEEE double numbers get more dense as they approach 0. There are as many valid values in [0.5, 1) as there are in [0.25, 0.5), as there are in [0.125, 0.25) and so on.

We can visualize a simple example with 2 bits of significand (in binary) as :

ruler

To produce a uniform distribution we need to give each subsegment with same length equal chances. But, as two subsegments may have different number of valid values, some values may have more chances than others.

For example: The interval [0.11, 1.0) has the same length as the interval [0.010, 0,10). As the length is the same, both intervals must have equal chances. The first interval has only one valid value: 0.11 . The second interval has two valid values: 0.010 and 0.011 . These two values together must have the same chances as the value 0.11 . The Java nextDouble() assigns 0.010 the same chances as 0.11 and no chances to 0.011. The algorithm described above split the chances between 0.010 and 0.011 by generating one more random bit and deciding who is the winner.

The value 0.11 is generated by producing two bits 1 1. The value 0.010 is generated by producing three bits 0 1 0 . And the value 0.011 is generated by producing three bits 0 1 1 . For this simple example the algorithm stops as soon as there are 2 useful bits (total bits disregarding leading zeros).

In the end, all numbers are equal, but some numbers are more equal than others.

One thought on “A Plea for an Equal Opportunity Random

Leave a Reply

Your email address will not be published.

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">