I was reading the documentation for Random.nextDouble() in Java 8

The general contract of nextDouble is that one double value, chosen (approximately) uniformly from the range 0.0d (inclusive) to 1.0d (exclusive), is pseudorandomly generated and returned

Nothing new here. It just returns a uniform distributed random value in the interval [0, 1) (*uniform* means that for any two subintervals with the same length there is an equal chance the returned number is inside any of them).

But, can it return every possible double between 0 and 1? Let's continue reading.

The method nextDouble is implemented by class Random as if by:

public double nextDouble() { return (((long)next(26) << 27) + next(27)) / (double)(1L << 53); }

By looking the Javadoc for *next(int n)* we learn that it returns a random integer between 0 and 2^{n}. That is, *n* random bits of information.

The above code generates a *long* with 53 random bits and then divides it by 2^{53}. As the IEEE double format has 53 bits of information in the significand part, that seems correct. It is filling the significand with random bits.

If we look again at the above code, we realize that it returns at most 2^{53} different values. But, how many doubles are in the interval [0, 1)? The answer is about 2^{62}. *nextDouble()* returns just a tiny subset of all the possible double numbers (1 in 500). This small elite of privileged doubles are selected while the vast majority are neglected.

*Sorry little fellow, but you never had a chance.*

So, where is the problem? Let's take a closer look at the returned values. Specially the significand part. Let's generate some random doubles and dump the bits in the significand part.

*Each row is a random value. Each column is a bit.*

They look mostly random, except for the rightmost column (column 0). There seems to be very few 1s. If we repeat the experiment more times we discover that only 25% of the rows have a 1 at column 0. This lack of 1s also affects (in part) the nearby columns.

What is happening here? We generate fully random bits, why is that the rightmost column has so much 0s? The answer is that after the significand is filled with random bits, the double is *normalized*. There are multiple ways to express the same value but just one of them is in *normalized form*. For example (in binary):

0.01 × 2^{-1}

0.10 × 2^{-2}

1.00 × 2^{-3}

All the above above expressions represent the same value, but only the last one is normalized. In the normalized form, you never have leading zeros The normalization process shifts the bits to the left and fill the right part with zeros until the leftmost bit is a 1. That explains the excess of 0s on the rightmost columns (Trivia: as the first bit of a normalized value is always 1, you don't need to keep it. Only the fraction part is stored).

So, if the first bit is 1, the number is already normalized and the last bit may be 0 or 1 . But, if the first bit is 0, after normalization the last bit becomes 0. Likewise, If the first two bits are 0, after normalization the last two bits becomes 0. And so on.

*Probability of finding a 1 at column i*

Can we do any better? We are generating 53 random bits, and then normalizing. What if we generate infinite random bits, normalize them, and finally truncate to 53 bits? That is equivalent as instead of inserting 0s on the right, insert more random bits.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
public double nextDouble() { long significand = 0; double divisor = 1; while (true) { int leadingZeros = Long.numberOfLeadingZeros(significand); int usefulBits = 64 - leadingZeros; int pendingBits = 53 - usefulBits; // We need 53 useful bits if (pendingBits == 0) break; int bits = Math.min(pendingBits, 30); // 31+ overflows below significand = (significand << bits) + next(bits); divisor = divisor / (1 << bits); // Keep track of total shifts done } return significand * divisor; } |

We can do some tests on this code. Let’s dump the significand bits:

Now the distribution of 0s and 1s seems much more even. Also the average and variance of the returned doubles seems right.

With this code, every single double in the interval [0, 1) has a chance (provided a good enough entropy source). One particular value of interest is 0. How can the above code ever return 0? If the *next()* method happens to returns 0 continuously for 35 loop cycles, the *divisor* variable underflows and is rounded down to 0.

Moreover, the proposed code doesn’t seem to be any slower than the Java *nextDouble()*.

One consideration is that IEEE double numbers get more dense as they approach 0. There are as many valid values in [0.5, 1) as there are in [0.25, 0.5), as there are in [0.125, 0.25) and so on.

We can visualize a simple example with 2 bits of significand (in binary) as :

To produce a uniform distribution we need to give each subsegment with same length equal chances. But, as two subsegments may have different number of valid values, some values may have more chances than others.

For example: The interval [0.11, 1.0) has the same length as the interval [0.010, 0,10). As the length is the same, both intervals must have equal chances. The first interval has only one valid value: 0.11 . The second interval has two valid values: 0.010 and 0.011 . These two values together must have the same chances as the value 0.11 . The Java *nextDouble()* assigns 0.010 the same chances as 0.11 and no chances to 0.011. The algorithm described above split the chances between 0.010 and 0.011 by generating one more random bit and deciding who is the winner.

The value 0.11 is generated by producing two bits 1 1. The value 0.010 is generated by producing three bits 0 1 0 . And the value 0.011 is generated by producing three bits 0 1 1 . For this simple example the algorithm stops as soon as there are 2 useful bits (total bits disregarding leading zeros).

In the end, all numbers are equal, but some numbers are more equal than others.

Submitted: https://bugs.openjdk.java.net/browse/JDK-8054807