BaarkingDog

# CSPRNGs: How to Properly Generate Random Numbers

An in-depth guide to secure random number generation Not all random numbers in a program need to be highly secure. For instance, when deciding what's for dinner, a basic random module provided by the programming language is usually sufficient. However, in cases generating sensitive data such as nonces and private keys, the randomness should be unpredictable. The occurrence of predictable random data should never happen, but unfortunately, sometimes it does.

In August 2023, a vulnerability related to weak randomness in the Bitcoin development toolkit Libbitcoin Explorer was disclosed by Milk Sad, which resulted in an estimated loss of over \$900,000 USD worth of cryptocurrency assets. This leads us to the fundamental question of the article: How do we properly generate random numbers?

## Random Number Generators in Software​

To address the fundamental question, we first need to understand the different types of number generators.

Like flipping a coin or rolling a die, a computer can also generate random numbers using various sources, including device drivers and external sources such as network packets and mouse movements. These external sources are diverse enough that predicting the random numbers is considered impossible, and this randomness is referred to as true randomness.

However, computers cannot solely rely on external sources for randomness. When a large amount of randomness is required in a short period, the values from external sources may not change sufficiently to provide adequate entropy. Moreover, reading values from external sources can be slower. Therefore, when a large amount of random numbers is needed, a common method is to obtain a short-length seed from a true randomness source and use this seed as input to generate random numbers. This is called a pseudorandom number generator (PRNG).

The main difference between a PRNG and a true random number generator (TRNG), which generates true random numbers, lies in reproducibility. With a PRNG, the sequence is deterministic, meaning that the same sequence is reproduced if the seed is the same. In contrast, with a TRNG, the sequence is entirely random and cannot be reproduced.

However, not all PRNGs are secure. (Although the meaning of secure can be accepted as meaning "unpredictable", we will discuss what secure formally means in the CSPRNG section.)

Let's first consider a simple 32-bit PRNG, denoted as $\text{PRNG}(s) = {x_0, x_1, \dots}$, where

$\begin{cases} x_0 &= s \ x_i &= (ax_{i-1} + b) \textsf{ \& 0xFFFFFFFF if } i > 0 \ \end{cases}$

and where $s$ represents the seed and $a, b$ are fixed constants. This PRNG is known as a linear congruential generator (LCG) and is straightforward for efficiently generating random numbers. However, once $x_i$ is observed, then all other elements in the sequence $x\_{i+1}, x\_{i+2}, \dots$ can be derived. Furthermore, its period is limited to at most $2^{32}$, which is insufficient. One approach to mitigate these issues is to truncate some bits. For example, Java's random employs a 48-bit state, and each generated number is the 32 most significant bits. However, this truncation only marginally improves its statistical characteristics and security. It is still easy to derive the seed given multiple captured elements, and the sequence can be predicted forwards and backwards arbitrarily long. This Github repository is one example to break Java's random.

Another PRNG known as the Mersenne Twister is utilized in languages such as Python and C++. The Mersenne Twister still relies on relatively simple operations but has a long period of $2^{19937} - 1$. Despite Mersenne Twister offering better security compared to LCGs, it is still predictable. Once the 624 consecutive 32-bit random numbers are given, then the inner state can be reconstructed, rendering the sequence predictable. You may refer to this Github repository.

The default random-number--generation functions provided by most programming languages prioritize performance over security. If robust security is a priority, then a cryptographic secure pseudorandom number generator (CSPRNG) must be employed.

## CSPRNG​

A CSPRNG is a PRNG that is cryptographically secure because its output is as unpredictable and unbiased as a TRNG. It is therefore defined in terms of indistinguishability: if a PRNG $G$ is indistinguishable from a TRNG, then $G$ is called a CSPRNG.

To understand its mathematical properties, let’s take a look at an example game played between Alice and Bob.

1. Alice generates a sequence, either one generated from the TRNG or one from the PRNG $G$, each with probability $1/2$.
2. Alice provides the generated sequence to Bob.
3. Bob's objective is to guess whether the sequence originated from a TRNG or the PRNG. If Bob's guess is correct, he wins; conversely, if Bob's guess is incorrect, Alice wins.

If Bob takes a random guess, then the rate he wins at is exactly $1/2$. This value of $1/2$ can be considered as a baseline. However, if the PRNG $G$ exhibits unusual characteristics, then Bob's advantage can be higher than $1/2$. For example, if the first element always being fixed to an odd number, then Bob's strategy is to answer PRNG if the sequence starts with an odd number and answer TRNG otherwise. When the sequence is generated by the PRNG, Bob's guess is always correct. However, if the sequence is generated by the TRNG, Bob's guess is right with $1/2$ probability. In this scenario, Bob's winning rate is $(1/2) \times 1 + (1/2) \times (1/2) = 3/4$, which is $1/4$ higher than the baseline. In other words, Bob has a $1/4$ advantage. If $G$ is CSPRNG, then this advantage should be negligible. When this advantage is smaller than $2^{-128}$, we say the CSPRNG achieves 128-bits security.

When a PRNG’s inner state can be reconstructed or its next number can be predicted, it is possible to distinguish the output of a PRNG from a TRNG. This does not happen with CSPRNGs because their output is indistinguishable from a TRNG (unless its seed is somehow revealed or compromised). In other words, CSPRNGs must be resilient to any kind of "seed recovery" attacks.

For further details, we recommend referring to these lecture notes from Princeton University.

## Vulnerabilities From a Weak Random​

But is a CSPRNG truly important in properly generating random numbers? Yes. When sensitive data is generated using a weak PRNG, it can lead to significant security problems. The Common Weakness Enumeration (CWE) also includes entries related to randomness vulnerabilities, such as "CWE-334: Small Space of Random Values", "CWE-335: Incorrect Usage of Seeds in PRNG", and "CWE-338: Use of Cryptographically Weak PRNG". These types of vulnerabilities have been known for a long time but still continue to appear. Here are several real-world examples resulting from a weak random.

Milk Sad refers to a vulnerability in the Libbitcoin Explorer cryptocurrency wallet tool. This vulnerability was discovered during an investigation into lost funds.

The primary issue revolves around a PRNG used in the bx seed command. This command generates a pseudorandom seed, as shown in the following example:

bx seed -b 192d915107990a42dfb2ea5762729002d47a4748e0f24aceb61

At first glance, it appears to be random. However, this PRNG used in the command has two critical flaws:

1. The PRNG is based on the Mersenne Twister, which is not a CSPRNG.
2. The seed for the PRNG is chosen based on the current time, providing only 32 bits of entropy.

While the Mersenne Twister can be hard to predict if the output is short and the seed has sufficient entropy, in this case, the seed is improperly chosen, allowing attackers to brute-force the private key generated by the bx seed command. Although there were warning statements on the bx seed wiki page, they may not have been sufficiently emphasized to users. It is suspected that even developers might have overlooked this issue, as the wiki landing page included an example of creating a new wallet address with the bx seed command without any warnings.

Initially, the Libbitcoin team responded by stating that the bx seed command was never intended to be cryptographically secure, and they had no plans to address it. However, they later released a patch that removed the command.

In addition to Milk Sad, Cake Wallet, Profanity Ethereum address tool (CVE-2022-40769), and Trust Wallet (CVE-2023-31290) are other examples of vulnerabilities where seeds are generated with incorrect way.

### ECDSA Nonce Misuse​

The digital signature scheme is a cryptographic scheme that a signer who has the private key can generate a signature $\sigma$ for a message $m$ and a verifier who knows the corresponding public key can verify the validity of the signature $\sigma$.

An Elliptic Curve Digital Signature Algorithm (ECDSA) is a digital signature scheme over elliptic curves. In cryptocurrencies, ECDSA allows to ensure that transactions are made by the legitimate wallet owner. However, ECDSA, when misused, has a significant security vulnerability related to a nonce denoted as $k$. The nonce is a randomly generated value used during signature generation.

While ECDSA is secure when used correctly, the nonce $k$ introduces a potential security weakness, most notably in cases of nonce reuse. This vulnerability allows an attacker to recover a secret key from a signature if the same nonce is reused for the same secret key. If the nonce is chosen with high entropy, the nonce will not collide. However, if the nonce is generated with low entropy, the nonce can collide. In 2013, weaknesses in the PRNG used in Android led to nonce reuse, resulting in the theft of a substantial amount of Bitcoin (see here).

Furthermore, choosing a nonce in an insecure way, such as concatenating half of the transaction hash and half of the private key, can also lead to a leakage of the private key (as seen here). Given sufficient messages, even a small leakage of less than one bit per nonce can be exploited, too. As biased nonce generation can lead to a complete compromise, this makes secure nonce generation extremely important.

In summary, generating the nonce $k$ securely and randomly, without bias, is crucial in an ECDSA. To address this issue, RFC 6979 recommends using the result of running an HMAC-DRBG with the private key and the hash value of the message as seeds to generate the nonce $k$. This approach helps mitigate the risks associated with nonce misuse in ECDSA.

For further details, we recommend referring to this blog post from Trail of Bits.

## CSPRNG Constructions​

Then how can we construct a CSPRNG?

In 2006, the National Institute of Standards and Technology (NIST) published a technical publication titled Recommendation for Random Number Generation Using Deterministic Random Bit Generators. In this document, four CSPRNGs are suggested:

1. Hash-DRBG
2. HMAC-DRBG
3. CTR-DRBG
4. Dual_EC_DRBG

(Note: DRBG stands for deterministic random bit generators, which means the same as a CSPRNG.)

Among these, Dual_EC_DRBG was withdrawn in 2014 due to the discovery of a backdoor. The backstory is interesting but off-subject. You can find more information here if you are curious.

In our discussion of PRNGs, we focused on the basic form where the PRNG is initialized with a seed, and then it generates pseudorandom numbers. In this basic form, it is impossible to provide additional entropy to the PRNG. However, allowing the infusion of additional entropy during operation can enhance security. Without this capability, a PRNG could be entirely compromised if its state or entropy source is compromised for a period. On the other hand, if it is possible to provide additional entropy, the PRNG can recover to a secure state after sufficient entropy is supplied. Importantly, its internal state must change each time new entropy is provided.

These are three key components of a PRNG:

1. $\textsf{Setup}(s)$: PRNG is initialized with a seed $s$.
2. $\textsf{Refresh}(I)$: PRNG accepts a new entropy $I$.
3. $\textsf{Next}(n)$: PRNG generates the $n$ bits output based on the current state.

Now let's take a look into how to generate a cryptographic secure PRNG. Excluding Dual_EC_DRBG, the other three CSPRNGs are based on the symmetric primitive, such as hash functions and block ciphers. This choice is reasonable because standardized algorithms for hash functions and block ciphers exist, and their security properties are widely accepted. Even if a vulnerability is found for a specific symmetric primitive, it's not a catastrophic scenario because transitioning to more secure algorithms is relatively easier than constructing new DRBGs.

As it is the most straightforward, Hash-DRBG is the best example to demonstrate how a CSPRNG works.

### Hash-DRBG​

Hash-DRBG is a DRBG built from the cryptographic hash function denoted as $H$. To aid in understanding, we will introduce a simplified version. In this version, we denote the state as $S$.

1. $\textsf{Setup}(s)$: Initialize the state as $S = H(s)$.
2. $\textsf{Refresh}(I)$: Update the state as $S = H(S \|\| I)$.
3. $\textsf{Next}(n)$:
• Generate a sequence $R = H(S) \|\| H(S+1) \|\| ... \|\| H(S+k)$ where $k$ is the smallest number that $\|R\| \geq n$.
• Update the state as $S = H(S \|\| H(S))$.
• Return the leftmost $n$ bits of $R$.

These three DRBGs (Hash-DRBG, HMAC-DRBG, and CTR-DRBG) are constructed using symmetric primitives, and their security can be proven based on the security properties of these symmetric primitives, utilizing the concept of indistinguishability.

Note that there may be a slight gap between the claimed security of these DRBGs and their theoretical analysis, as strong assumptions are made in the proofs. However, these DRBGs are generally considered secure enough for real-world use, and their security has been the subject of ongoing research until relatively recently, as evidenced by papers presented in EUROCRYPT 2019 (such as An Analysis of the NIST SP 800-90A Standard).

## How to Properly Generate Random Numbers​

To properly generate random numbers, there are only two things we need to keep in mind. First, the seed should be generated from enough entropy, and second, a CSPRNG should be used instead of a weak PRNG, such as default random modules in each language. (For more, see this post, which enumerates the methods to generate secure random numbers in various programming languages.)

## Conclusion​

In this article, we've discussed the mathematical definitions of CSPRNGs and studied vulnerabilities in the wild. While these vulnerabilities are relatively easy to detect and defend against, their occurrence can lead to the exposure of sensitive information and significant damage.

As PRNGs are a fundamental component in almost every software program, there is always a risk that a developer might improperly generate random numbers for sensitive data. Therefore, it is essential for developers to ensure that random numbers are generated in a secure way and that auditors track the logic for the PRNG in code.