Skip to main content

7 posts tagged with "ZK"

View All Tags
BaarkingDog

An in-depth guide to secure random number generation
Article heading

Not all random numbers in a program need to be highly secure. For instance, when deciding what's for dinner, a basic random module provided by the programming language is usually sufficient. However, in cases generating sensitive data such as nonces and private keys, the randomness should be unpredictable. The occurrence of predictable random data should never happen, but unfortunately, sometimes it does.

In August 2023, a vulnerability related to weak randomness in the Bitcoin development toolkit Libbitcoin Explorer was disclosed by Milk Sad, which resulted in an estimated loss of over $900,000 USD worth of cryptocurrency assets. This leads us to the fundamental question of the article: How do we properly generate random numbers?

Random Number Generators in Software

To address the fundamental question, we first need to understand the different types of number generators.

Like flipping a coin or rolling a die, a computer can also generate random numbers using various sources, including device drivers and external sources such as network packets and mouse movements. These external sources are diverse enough that predicting the random numbers is considered impossible, and this randomness is referred to as true randomness.

However, computers cannot solely rely on external sources for randomness. When a large amount of randomness is required in a short period, the values from external sources may not change sufficiently to provide adequate entropy. Moreover, reading values from external sources can be slower. Therefore, when a large amount of random numbers is needed, a common method is to obtain a short-length seed from a true randomness source and use this seed as input to generate random numbers. This is called a pseudorandom number generator (PRNG).

The main difference between a PRNG and a true random number generator (TRNG), which generates true random numbers, lies in reproducibility. With a PRNG, the sequence is deterministic, meaning that the same sequence is reproduced if the seed is the same. In contrast, with a TRNG, the sequence is entirely random and cannot be reproduced.

However, not all PRNGs are secure. (Although the meaning of secure can be accepted as meaning "unpredictable", we will discuss what secure formally means in the CSPRNG section.)

Let's first consider a simple 32-bit PRNG, denoted as PRNG(s)=x0,x1,\text{PRNG}(s) = {x_0, x_1, \dots}, where

{x0=s xi=(axi1+b) & 0xFFFFFFFF if i>0 \begin{cases} x_0 &= s \ x_i &= (ax_{i-1} + b) \textsf{ \& 0xFFFFFFFF if } i > 0 \ \end{cases}

and where ss represents the seed and a,ba, b are fixed constants. This PRNG is known as a linear congruential generator (LCG) and is straightforward for efficiently generating random numbers. However, once xix_i is observed, then all other elements in the sequence x_i+1,x_i+2,x\_{i+1}, x\_{i+2}, \dots can be derived. Furthermore, its period is limited to at most 2322^{32}, which is insufficient. One approach to mitigate these issues is to truncate some bits. For example, Java's random employs a 48-bit state, and each generated number is the 32 most significant bits. However, this truncation only marginally improves its statistical characteristics and security. It is still easy to derive the seed given multiple captured elements, and the sequence can be predicted forwards and backwards arbitrarily long. This Github repository is one example to break Java's random.

Another PRNG known as the Mersenne Twister is utilized in languages such as Python and C++. The Mersenne Twister still relies on relatively simple operations but has a long period of 21993712^{19937} - 1. Despite Mersenne Twister offering better security compared to LCGs, it is still predictable. Once the 624 consecutive 32-bit random numbers are given, then the inner state can be reconstructed, rendering the sequence predictable. You may refer to this Github repository.

The default random-number--generation functions provided by most programming languages prioritize performance over security. If robust security is a priority, then a cryptographic secure pseudorandom number generator (CSPRNG) must be employed.

CSPRNG

A CSPRNG is a PRNG that is cryptographically secure because its output is as unpredictable and unbiased as a TRNG. It is therefore defined in terms of indistinguishability: if a PRNG GG is indistinguishable from a TRNG, then GG is called a CSPRNG.

To understand its mathematical properties, let’s take a look at an example game played between Alice and Bob.

  1. Alice generates a sequence, either one generated from the TRNG or one from the PRNG GG, each with probability 1/21/2.
  2. Alice provides the generated sequence to Bob.
  3. Bob's objective is to guess whether the sequence originated from a TRNG or the PRNG. If Bob's guess is correct, he wins; conversely, if Bob's guess is incorrect, Alice wins.

If Bob takes a random guess, then the rate he wins at is exactly 1/21/2. This value of 1/21/2 can be considered as a baseline. However, if the PRNG GG exhibits unusual characteristics, then Bob's advantage can be higher than 1/21/2. For example, if the first element always being fixed to an odd number, then Bob's strategy is to answer PRNG if the sequence starts with an odd number and answer TRNG otherwise. When the sequence is generated by the PRNG, Bob's guess is always correct. However, if the sequence is generated by the TRNG, Bob's guess is right with 1/21/2 probability. In this scenario, Bob's winning rate is (1/2)×1+(1/2)×(1/2)=3/4(1/2) \times 1 + (1/2) \times (1/2) = 3/4, which is 1/41/4 higher than the baseline. In other words, Bob has a 1/41/4 advantage. If GG is CSPRNG, then this advantage should be negligible. When this advantage is smaller than 21282^{-128}, we say the CSPRNG achieves 128-bits security.

When a PRNG’s inner state can be reconstructed or its next number can be predicted, it is possible to distinguish the output of a PRNG from a TRNG. This does not happen with CSPRNGs because their output is indistinguishable from a TRNG (unless its seed is somehow revealed or compromised). In other words, CSPRNGs must be resilient to any kind of "seed recovery" attacks.

For further details, we recommend referring to these lecture notes from Princeton University.

Vulnerabilities From a Weak Random

But is a CSPRNG truly important in properly generating random numbers? Yes. When sensitive data is generated using a weak PRNG, it can lead to significant security problems. The Common Weakness Enumeration (CWE) also includes entries related to randomness vulnerabilities, such as "CWE-334: Small Space of Random Values", "CWE-335: Incorrect Usage of Seeds in PRNG", and "CWE-338: Use of Cryptographically Weak PRNG". These types of vulnerabilities have been known for a long time but still continue to appear. Here are several real-world examples resulting from a weak random.

Milk Sad (CVE-2023-39910)

Milk Sad refers to a vulnerability in the Libbitcoin Explorer cryptocurrency wallet tool. This vulnerability was discovered during an investigation into lost funds.

The primary issue revolves around a PRNG used in the bx seed command. This command generates a pseudorandom seed, as shown in the following example:

bx seed -b 192
d915107990a42dfb2ea5762729002d47a4748e0f24aceb61

At first glance, it appears to be random. However, this PRNG used in the command has two critical flaws:

  1. The PRNG is based on the Mersenne Twister, which is not a CSPRNG.
  2. The seed for the PRNG is chosen based on the current time, providing only 32 bits of entropy.

While the Mersenne Twister can be hard to predict if the output is short and the seed has sufficient entropy, in this case, the seed is improperly chosen, allowing attackers to brute-force the private key generated by the bx seed command. Although there were warning statements on the bx seed wiki page, they may not have been sufficiently emphasized to users. It is suspected that even developers might have overlooked this issue, as the wiki landing page included an example of creating a new wallet address with the bx seed command without any warnings.

Initially, the Libbitcoin team responded by stating that the bx seed command was never intended to be cryptographically secure, and they had no plans to address it. However, they later released a patch that removed the command.

In addition to Milk Sad, Cake Wallet, Profanity Ethereum address tool (CVE-2022-40769), and Trust Wallet (CVE-2023-31290) are other examples of vulnerabilities where seeds are generated with incorrect way.

ECDSA Nonce Misuse

The digital signature scheme is a cryptographic scheme that a signer who has the private key can generate a signature σ\sigma for a message mm and a verifier who knows the corresponding public key can verify the validity of the signature σ\sigma.

An Elliptic Curve Digital Signature Algorithm (ECDSA) is a digital signature scheme over elliptic curves. In cryptocurrencies, ECDSA allows to ensure that transactions are made by the legitimate wallet owner. However, ECDSA, when misused, has a significant security vulnerability related to a nonce denoted as kk. The nonce is a randomly generated value used during signature generation.

While ECDSA is secure when used correctly, the nonce kk introduces a potential security weakness, most notably in cases of nonce reuse. This vulnerability allows an attacker to recover a secret key from a signature if the same nonce is reused for the same secret key. If the nonce is chosen with high entropy, the nonce will not collide. However, if the nonce is generated with low entropy, the nonce can collide. In 2013, weaknesses in the PRNG used in Android led to nonce reuse, resulting in the theft of a substantial amount of Bitcoin (see here).

Furthermore, choosing a nonce in an insecure way, such as concatenating half of the transaction hash and half of the private key, can also lead to a leakage of the private key (as seen here). Given sufficient messages, even a small leakage of less than one bit per nonce can be exploited, too. As biased nonce generation can lead to a complete compromise, this makes secure nonce generation extremely important.

In summary, generating the nonce kk securely and randomly, without bias, is crucial in an ECDSA. To address this issue, RFC 6979 recommends using the result of running an HMAC-DRBG with the private key and the hash value of the message as seeds to generate the nonce kk. This approach helps mitigate the risks associated with nonce misuse in ECDSA.

For further details, we recommend referring to this blog post from Trail of Bits.

CSPRNG Constructions

Then how can we construct a CSPRNG?

In 2006, the National Institute of Standards and Technology (NIST) published a technical publication titled Recommendation for Random Number Generation Using Deterministic Random Bit Generators. In this document, four CSPRNGs are suggested:

  1. Hash-DRBG
  2. HMAC-DRBG
  3. CTR-DRBG
  4. Dual_EC_DRBG

(Note: DRBG stands for deterministic random bit generators, which means the same as a CSPRNG.)

Among these, Dual_EC_DRBG was withdrawn in 2014 due to the discovery of a backdoor. The backstory is interesting but off-subject. You can find more information here if you are curious.

In our discussion of PRNGs, we focused on the basic form where the PRNG is initialized with a seed, and then it generates pseudorandom numbers. In this basic form, it is impossible to provide additional entropy to the PRNG. However, allowing the infusion of additional entropy during operation can enhance security. Without this capability, a PRNG could be entirely compromised if its state or entropy source is compromised for a period. On the other hand, if it is possible to provide additional entropy, the PRNG can recover to a secure state after sufficient entropy is supplied. Importantly, its internal state must change each time new entropy is provided.

These are three key components of a PRNG:

  1. Setup(s)\textsf{Setup}(s): PRNG is initialized with a seed ss.
  2. Refresh(I)\textsf{Refresh}(I): PRNG accepts a new entropy II.
  3. Next(n)\textsf{Next}(n): PRNG generates the nn bits output based on the current state.

Now let's take a look into how to generate a cryptographic secure PRNG. Excluding Dual_EC_DRBG, the other three CSPRNGs are based on the symmetric primitive, such as hash functions and block ciphers. This choice is reasonable because standardized algorithms for hash functions and block ciphers exist, and their security properties are widely accepted. Even if a vulnerability is found for a specific symmetric primitive, it's not a catastrophic scenario because transitioning to more secure algorithms is relatively easier than constructing new DRBGs.

As it is the most straightforward, Hash-DRBG is the best example to demonstrate how a CSPRNG works.

Hash-DRBG

Hash-DRBG is a DRBG built from the cryptographic hash function denoted as HH. To aid in understanding, we will introduce a simplified version. In this version, we denote the state as SS.

  1. Setup(s)\textsf{Setup}(s): Initialize the state as S=H(s)S = H(s).
  2. Refresh(I)\textsf{Refresh}(I): Update the state as S=H(SI)S = H(S \|\| I).
  3. Next(n)\textsf{Next}(n):
    • Generate a sequence R=H(S)H(S+1)...H(S+k)R = H(S) \|\| H(S+1) \|\| ... \|\| H(S+k) where kk is the smallest number that Rn\|R\| \geq n.
    • Update the state as S=H(SH(S))S = H(S \|\| H(S)).
    • Return the leftmost nn bits of RR.

These three DRBGs (Hash-DRBG, HMAC-DRBG, and CTR-DRBG) are constructed using symmetric primitives, and their security can be proven based on the security properties of these symmetric primitives, utilizing the concept of indistinguishability.

Note that there may be a slight gap between the claimed security of these DRBGs and their theoretical analysis, as strong assumptions are made in the proofs. However, these DRBGs are generally considered secure enough for real-world use, and their security has been the subject of ongoing research until relatively recently, as evidenced by papers presented in EUROCRYPT 2019 (such as An Analysis of the NIST SP 800-90A Standard).

How to Properly Generate Random Numbers

To properly generate random numbers, there are only two things we need to keep in mind. First, the seed should be generated from enough entropy, and second, a CSPRNG should be used instead of a weak PRNG, such as default random modules in each language. (For more, see this post, which enumerates the methods to generate secure random numbers in various programming languages.)

Conclusion

In this article, we've discussed the mathematical definitions of CSPRNGs and studied vulnerabilities in the wild. While these vulnerabilities are relatively easy to detect and defend against, their occurrence can lead to the exposure of sensitive information and significant damage.

As PRNGs are a fundamental component in almost every software program, there is always a risk that a developer might improperly generate random numbers for sensitive data. Therefore, it is essential for developers to ensure that random numbers are generated in a secure way and that auditors track the logic for the PRNG in code.

About Us

Zellic is a smart contract auditing firm founded by hackers, for hackers. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants. Whether you’re developing or deploying smart contracts, Zellic’s experienced team can prevent you from being hacked.

Contact us for an audit that’s better than the rest. Real audits, not rubber stamps.

If you think you’d be a good fit to work at Zellic, we’re hiring!

Cayden Liao

An intro to Zero-Knowledge and applications of zero-knowledge including ZK payments, ZK hardware acceleration, and zkVMs
Article heading

Introduction

Last time, we explored the fundamentals of zkSNARKS and their applications in blockchains. We introduced the following:

  • zkEVMs and how they increase transaction speed and scalability on the Ethereum network
  • zkBridges and how they enable secure exchanges between different cryptocurrencies
  • ZK programming languages and how they simplify the development of ZK circuits for software engineers

For a refresher on these aspects of the ZK landscape, see Part 1 here.

Having laid the foundation of the ZK landscape, let’s take a look at a few more applications of zero-knowledge: ZK payments, ZK hardware acceleration, and zkVMs. With these applications in mind, we’ll have taken a brief look at some of the most notable areas of the landscape and arrived at the end of our tour.

As a brief overview of ZK payments, we'll cover an introduction to fully private and secure transactions. For ZK hardware acceleration, we'll cover how zkSNARKs are used to speed up cryptographic operations. Lastly, for zkVMs, we'll cover how zkSNARKs are used to increase the speed of any blockchain network, not just Ethereum.

ZK Payments

Bitcoin, Ethereum, and Solana are all public cryptocurrencies, which means that all transactions, accounts, and data are public. However, in some cases, private transactions are necessary. ZK payments aim to use ZK proofs to ensure three criteria:

  1. Private transaction amounts. The amount of money transferred between accounts must be hidden to other users. Only the sender and the receiver know how many coins were transferred.
  2. Private user identities. Traditional public blockchains often expose the addresses of the sender and receiver. On the other hand, ZK payments ensure the privacy of the addresses in public.
  3. Private transaction history. Information such as transaction history and account balance is kept hidden from the public blockchain. While data is still being uploaded to the main chain, sensitive information is kept secret.

On the surface level, public blockchains seem to be private. Take for example this Ethereum address. Does this look like a person's identity?

0xB256227cfc6209fCC7bA1c4acb9329721Acb89d4

Although it might not look like it, the person's transactions and activities are public. These bits of information allow for others to trace a person's identity and be able to fairly quickly monitor their blockchain usage at all times.

The bulk of ZK payments is made through designs based on Zcash. In a nutshell, Zcash allows users to transfer coins on the Zcash blockchain, giving users the choice of keeping their transactions public or private.

Transactions on Zcash use a UTXO structure similar to Bitcoin's. When a sender processes a transaction, the sender posts two notes. One note contains the sender's new balance, while the other contains the amount of money being sent. To stay private, these notes are encrypted with ChaCha20-Poly1305. ChaCha20 is a symmetric encryption algorithm similar to AES and was chosen in this instance for its fast encryption speed. Only the sender and receiver can decrypt the notes with their own set of keys. The two new encrypted notes are then hashed and committed into a central Merkle tree.

Up to this point, the public is unable to verify the transaction since the notes are encrypted. This is where ZKPs come in. Older versions of Zcash, before the release of Sapling activation in 2018, used BCTV14, a slower, older, more energy-intensive proving system than Groth16. Given that Groth16 was created in 2016, Zcash stuck with BCTV14 to ensure the safety of Groth16 before implementing Sapling activation two years later. With Sapling activation, Groth16 is chosen for its fast proving/generating speeds. This allows others to verify the validity of the private transaction without revealing any data on the users themselves.

(As a fun fact, these proving systems are both named after the research papers that originally proposed them. BCTV14 was published in 2014 by authors whose names start with B, C, T, and V. The Groth16 paper was published in 2016 by Jens Groth.)

Zcash specifically uses two ZK circuits and a value-commitment check:

  1. Spend circuit. The spend statement is a ZK proof that is in charge of making sure a transaction is able to go through correctly. This circuit creates a proof that confirms to a verifier that the note and Merkle path is valid, the original spender is authorized to perform this transaction, the previous note is nullified, and the balances on the two new notes add up to the original note.

  2. Output circuit. The output statement is a ZK proof in the creation of new coins. It allows verifiers to confirm that the note created for the coin is valid along with the value commitment.

  3. (Homomorphic) Pedersen commitment. The Pedersen commitment, used by Zcash (Zcash Protocol Specification, p. 18), is used whenever a value is committed. The homomorphic property allows Pedersen commits to be added and subtracted. For example, to ensure coins aren't created/destroyed during a transaction, the value commit to the sender's new balance can be added to the value commit to the sent coins and checked to ensure the total amount of coins stays the same.

ZK Hardware Acceleration

Our next stop is on accelerating ZKP generation. Currently, generating ZKPs for transactions is a time- and resource-consuming process. However, work is in progress to speed this up using specialized hardware. Proof generation has gone from the usage of CPUs to now GPUs and even specially designed circuits on FPGAs.

CPUs are slow. They are used for everyday computation that isn't fit for math-intensive ZK proof generation. People gradually moved to use GPUs since graphic cards were designed to perform special-purpose, vectorized calculations at much faster speeds (e.g., gaming and high--frame rate video streaming). However, GPUs were still designed for a wide variety of uses. The community began to explore solutions leveraging FPGAs, which are even more specialized.

FPGAs are pieces of hardware that can be bought off the shelf. Users can reconfigure the circuits inside to fit specific circuit requirements. Rather than having dedicated sections of GPUs/CPUs for everyday computer calculations, the entire FPGA circuitry can be changed and dedicated to ZK. The most time-consuming parts of proof generation come from two categories.

  1. (Fast) Fourier transforms (FFTs). A ZKP typically uses Fourier transforms to perform division over an elliptic pairing. Although FFTs run in subexponential time, they are still the bulk of a ZKP generation.

  2. Multiplications. Most ZKPs consist of elliptic curve multiplications, which are computationally expensive. Since a single gate cannot express an elliptic operation, multiple binary gates must be stringed together in order to perform elliptic operations.

To address these challenges and speed up the proof-generation process, a solution can be found through the development of application-specific integrated circuits (ASICs), specialized pieces of hardware specifically built for the purpose of ZKP generation. By hardwiring the required mathematical operations into hardware, ASICs can perform these computations much more efficiently and in parallel, dramatically reducing the time taken to generate ZKPs.

The use of ASICs for ZKP generation offers several advantages, including increased speed, reduced power consumption, and improved scalability. These specialized hardware solutions can optimize the most time-consuming operations, allowing zkSNARKs to be more practical for high-performance applications, such as privacy-preserving transactions, secure communications, and verifiable computations.

However, using ASICs comes with drawbacks as well. ASICs are best suited for mature applications that rarely change, like SHA-256 POW mining. On the other hand, ZKP generation continues to evolve rapidly. Designing custom ASICs for ZKPs requires significant resources, and it's essential to ensure the security of the hardware to prevent potential attacks. Additionally, unlike FPGAs, once an ASIC is hardwired, it is permanently ingrained in the chip. If the ZKP algorithm is changed or an update is needed, a whole new ASIC must be used and the original would be useless. Not to mention the tape-out costs of creating ASICs, typically ranging from $2 to $3 million. Thus, the development costs of ASICs have made them prohibitive for ZKPs so far.

One example is Cysic, whose approach is based on FPGAs and ASICs. Customers submit the ZK circuit they seek to accelerate, and Cysic builds it for them. Without relying on GPUs, FPGAs and ASICs are much faster, resulting in less time and resources during proof generation.

zkVMs

Now let's travel to our last stop on ZK virtual machines (zkVMs), also known as zkCPUs. Contrary to zkEVMs from the last blog, zkVMs is a broader concept applicable to any binary program, making it more versatile in its use cases. zkVMs run precompiled binaries and generate a verifiable proof that the binary ran correctly. This proof can be independently verified by anyone without needing to see the actual inputs, outputs, or intermediate values of the computation.

Unlike zkEVMs, zkVMs are also not tied down to a blockchain. zkVMs can be used for general purposes that are not run on Ethereum. Since zkVMs take precompiled binaries as an input, programs can be written in a variety of programming languages such as Rust, C, and C++, all compiled languages. Considering that Solidity development is often a bottleneck, access to the broader ecosystem of general software is a major benefit.

Inside the zkVM, each assembly instruction in the binary is associated with an R1CS gate to check for verification. For example, in pseudocode, it might look something like this:

if instruction is "add a, b":
add constraint for c = add a,b
else if instruction is "mul a, b":
add constraint for c = mul a,b

However, zkVMs have certain limitations. Due to the large amount of assembly instructions and R1CS gates for each instruction, it is typically more suitable for smaller binaries with a manageable complexity. Running larger and more complex programs might be impractical or less efficient using a zkVM. For larger binaries, circuits need to be optimized in order to allow for fast ZKP generation. Specific computations need a more optimized method to prevent slowdowns in the zkVM.

RISC Zero is a zkVM developed for general-purpose use. On a simple level, a binary is used as the input and the VM extracts each assembly instruction. For example, here is how memory loads are handled in RISC Zero:

pub fn ram_load(&mut self, triple: &TripleWord) -> Result<bool> {
trace!("RAM_LOAD[{}]: {triple:?}", self.cycle);
self.start();
self.code[ControlIndex::RamLoad] = BabyBearElem::ONE;
self.code[ControlIndex::Info] = BabyBearElem::new(triple.addr);
(
self.code[ControlIndex::Data1Lo],
self.code[ControlIndex::Data1Hi],
) = split_word16(triple.data[0]);
(
self.code[ControlIndex::Data2Lo],
self.code[ControlIndex::Data2Hi],
) = split_word16(triple.data[1]);
(
self.code[ControlIndex::Data3Lo],
self.code[ControlIndex::Data3Hi],
) = split_word16(triple.data[2]);
self.next()
}

After extracting all assembly instructions, they are run one by one as listed below. Along the way, constraints are added to the main circuit, which proves the validity of the program.

pub fn body(&mut self) -> Result<()> {
debug!("BODY");
loop {
self.start();
self.code[ControlIndex::Body] = BabyBearElem::ONE;
if !self.next_fini(FINI_TAILROOM)? {
break;
}
}
Ok(())
}

Rather than using a zkSNARK, RISC Zero uses a zkSTARK that includes four processing phases. Phase 1 is essentially the commitment phase. The prover creates trace polynomials after each step and commits them to a Merkle tree. Phase 2 allows the validator to validate the polynomials and send PLONK-mixing parameters, which the prover uses to generate Merkle roots. Phase 3 and Phase 4 use the DEEP-FRI technique, which asks the prover for values of a polynomial outside of the original Merkle tree commitments. Of course, this is a vast oversimplification of the entire process.

Conclusion

This two-part article toured the fundamentals of ZKPs — from zkSNARKs and their implications in zkEVMs, ZK programming languages, and zkBridges all the way to ZK payments, ZK hardware acceleration, and zkVMs. The world of zero-knowledge is a vast and fascinating one, and we’ve just touched the surface.

This guide can be used to kickstart further exploration of ZKPs. ZKPs have become increasingly important to ensure the security and validity of transactions on the blockchain as well as prove knowledge in other aspects. As the ZK landscape continues to evolve and expand, the potential for real-world applications will grow exponentially.

Special thanks to fellow Zellic engineer Mohit Sharma for his help and input in this article.

About Us

Zellic is a smart contract auditing firm founded by hackers, for hackers. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants. Whether you’re developing or deploying smart contracts, Zellic’s experienced team can prevent you from being hacked.

Contact us for an audit that’s better than the rest. Real audits, not rubber stamps.

William Bowling

A brief look at Cairo 1.0, Starknet, and the security considerations
Article heading

Introduction

In this article, we will be looking at the recently released Cairo 1.0, Starknet's native language. We will give a brief introduction to Cairo and Starknet, explore some security features of Cairo, and look at some potential pitfalls when writing contracts in Cairo. For anyone considering writing contracts in Cairo, this article will give you a starting point and some things to consider when writing secure code.

Meet Cairo 1.0

Cairo 1.0 is a Rust-inspired language designed to allow anyone to create STARK-provable smart contracts. It is the native language of Starknet, which is a zkRollup designed for high throughput and low gas costs. For the purposes of this article, we will be focusing on the security features of Cairo when used to write smart contracts on Starknet.

First, let's begin by walking through a simple contract written in Cairo 1.0:

#[contract]
mod hello {
use starknet::get_caller_address;
use starknet::ContractAddress;

struct Storage {
last_caller: ContractAddress,
}

#[event]
fn Hello(from: ContractAddress, value: felt252) {}


#[external]
fn say_hello(message: felt252) {
let caller = get_caller_address();
last_caller::write(caller);
Hello(caller, message);
}

#[view]
fn get_last_caller() -> ContractAddress {
last_caller::read()
}
}

If you have used Rust before, then the above code might look familiar, as Cairo 1.0 is heavily inspired by it. If you are unfamiliar with it, then starklings-cairo1 is a great place to start. Starklings is to Cairo what Rustlings is to Rust, a set of small, interactive exercises to help you learn the language.

The default type of variable in Cairo 1.0 is a field element called felt252, which is an integer in the range 0x<P0 \leq x < P, where PP is a very large prime P=2251+172192+1P = 2^{251} + 17 * 2^{192}+1. (For a more detailed explanation, see the felt-type section in The Cairo Programming Language book.) All other types in Cairo are built on top of felt252 such as the integer types of u8 to u256. It is recommended to use these higher-level types when possible as they provide additional safety features such as overflow protection.

When writing Starknet contracts, there are a few special attributes that are used to allow the compiler to generate the correct code. The #[contract] attribute is used to define a Starknet contract, similar to the contract keyword in Solidity.

A contract may be required to interact with another contract or to have some knowledge about the current execution state (for example, the caller address). This is where system calls come in, which allow the contract to interact with and use services from the Starknet OS. Most of the time the system calls are abstracted away or hidden behind helper methods, but you can see a list of the available system calls here.

The #[event] attribute is used to define an event that can be emitted by the contract. Similar to Solidity, events are used to notify the outside world of state changes in the contract and are emitted using the emit_event_syscall system call or by calling the helper function generated by the compiler, which is annotated with the #[event] attribute.

The #[external] attribute is used to define a function that can be called by the outside world, similar to the external keyword in Solidity. The #[view] attribute is designed to indicate that a function does not modify the contract state, although this is not enforced by the compiler, so state changes are possible if the function is called on chain.

The struct Storage is a special struct that the compiler uses to generate helper methods for interacting with the contract's storage using the low-level system calls storage_read_syscall and storage_write_syscall. In a Starknet contract, the storage is a map of 22512^{251} slots that can each be read or modified. Each slot is a felt that is initially set to 0. The fields in the Storage struct are turned into modules with read and write methods that automatically calculate the correct location in the storage map (see here for how the address is calculated) and can be used to read and write to the storage.

Before you can deploy a contract on Starknet, the contract class must first be declared on the network. Each declared class on the network is represented by a clash hash (see here for how the hash is calculated), which uniquely identifies it and can be used to deploy new contract instances.

Beyond Ethereum: Starknet Accounts

Unlike Ethereum, Starknet does not have externally owned accounts (EOAs). Instead, accounts are special contracts that can define their own logic and rules. Here is the interface for a generic account contract:

#[account_contract]
mod Account {
use starknet::ContractAddress;

#[constructor]
fn constructor(public_key_: felt252);

fn isValidSignature() -> felt252;

#[external]
fn __validate_deploy__(
class_hash: felt252, contract_address_salt: felt252, public_key_: felt252
) -> felt252;

#[external]
fn __validate_declare__(class_hash: felt252) -> felt252;

#[external]
fn __validate__(
contract_address: ContractAddress, entry_point_selector: felt252, calldata: Array<felt252>
) -> felt252;

#[external]
#[raw_output]
fn __execute__(mut calls: Array<Call>) -> Span<felt252>;
}

For a contract to be a valid account, it must at least implement the __validate__ and __execute__ functions and optionally implement the others. The __validate__ function should ensure that the transaction was initiated by the account owner, and the __execute__ function will perform the remaining actions. (See the "Validate and execute" section of the Starknet documentation for more details.)

The implementation could be as simple as checking an ECDSA signature, or it could be anything from a multi-sig to allowing multicalls. For a more in-depth look at accounts, see the Starknet documentation, the chapter "Account Abstraction" in The Starknet Book, and OpenZepplin's account implementation.

Potential Pitfalls in Cairo

One of the main benefits of Starknet being a zkRollup is that contracts written in Cairo allow for execution traces to be proved and verified on Ethereum L1. It has been designed to provide flexibility, but this also could lead to insecure code. In this section, we will look at some potential pitfalls.

Overflows

When using integer types such as u128 and u256, there is now some nice built-in overflow protection that will cause a panic --- for example,

    let a: u128 = 0xffffffffffffffffffffffffffffffff;
let b: u128 = 1;
let c: u128 = a + b;

// Run panicked with [39878429859757942499084499860145094553463 ('u128_add Overflow'), ].

This is not the case when using felts directly as overflows are still possible:

    let a: felt252 = 0x800000000000011000000000000000000000000000000000000000000000000;
let b: felt252 = 1;
let c: felt252 = a + b;
c.print();

// [DEBUG] (raw: 0)

Reentrancy

If you mark a trait with the #[abi] attribute, then the compiler will automatically generate two dispatchers based on the trait name; for example, for the trait ICallback, the generated names will be ICallbackDispatcher and ICallbackLibraryDispatcher. A dispatcher is a simple struct that wraps the call_contract syscall, allowing you to call other contracts. The library dispatcher is different in that the current contract's context and storage will be used when executing the external code, similar to delegatecall in Solidity. (See The Cairo Programming Language's section on dispatchers for more details.)

Since the contract dispatcher passes control to the external contract, it is possible for the external contract to call back into the current contract, which could lead to reentrancy bugs. For example, consider the following contract:

#[abi]
trait ICallback {
#[external]
fn callback();
}

#[contract]
mod reentrancy {
use option::OptionTrait;

use starknet::get_caller_address;
use starknet::ContractAddress;
use super::ICallbackDispatcher;
use super::ICallbackDispatcherTrait;


struct Storage {
balances: LegacyMap::<ContractAddress, u256>,
claimed: LegacyMap::<ContractAddress, bool>,
}

#[external]
fn claim(callback: ContractAddress) {
let caller = get_caller_address();
if !claimed::read(caller) {
ICallbackDispatcher { contract_address: callback }.callback();
balances::write(caller, balances::read(caller) + 100);
claimed::write(caller, true);
}
}

#[external]
fn transfer(to: ContractAddress, amount: u256) {
let caller = get_caller_address();
balances::write(caller, balances::read(caller) - amount);
balances::write(to, balances::read(to) + amount);
}

#[view]
fn get_balance(addr: ContractAddress) -> u256 {
balances::read(addr)
}
}

The claim function allows a user to claim 100 tokens from the contract if they have not already claimed them, but since the callback happens before the state is updated, it's possible for a contract to call the claim function repeatedly and claim as many tokens as they want:

use starknet::ContractAddress;

#[abi]
trait IClaim {
#[external]
fn claim(callback: ContractAddress);
}

#[contract]
mod hello {
use starknet::get_caller_address;
use starknet::get_contract_address;

use super::IClaimDispatcher;
use super::IClaimDispatcherTrait;

struct Storage {
count: u256,
}

#[external]
fn callback() {
if (count::read() < 10) {
count::write(count::read() + 1);
IClaimDispatcher { contract_address: get_caller_address() }.claim(get_contract_address());
} else {
count::write(0);
}
}
}

When using the library dispatcher, you must provide a class hash instead of a contract address, so you cannot accidentally use the wrong dispatcher. The executed code will use the same context and storage as the current contract, so the class hash must be trusted.

Storage Clashes

When using the storage struct, the underlying address of the storage slot is calculated using sn_keccak(variable_name) (sn_keccak is the first 250 bits of the Keccak256 hash). If you are using other modules or external libraries that have a similar storage struct, then it's possible for the storage slots to be identical and overwrite each other. For example, consider the following contract:

// foo.cairo
#[contract]
mod foo {
struct Storage {
num: u256,
}

fn get_num() -> u256 {
num::read()
}

fn set_num(n: u256) {
num::write(n);
}

}


// bar.cairo
use super::foo::foo;

#[contract]
mod bar {
struct Storage {
num: u128,
}

#[external]
fn set_num(n: u128) {
num::write(n)
}

#[view]
fn get_num() -> u128 {
num::read()
}

#[view]
fn foo_get_num() -> u256 {
super::foo::get_num()
}

#[external]
fn foo_set_num(n: u256) {
super::foo::set_num(n);
}
}

Both of the setters are writing to the same storage slot, except one is expecting a u256 and the other a u128, so when calling set_num, the bottom 128 bits of num will be set and the top 128 bits will not be changed. For example, if we call foo_set_num with 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff, then set_num with 0x1234 then foo_get_num, we get the following output:

starknet call --address 0x005e942196b3e1adfac0e1d2664d69671188237db343067ab61048e63957487c --function foo_get_num
4660 0xffffffffffffffffffffffffffffffff

The Next Chapter: Cairo 2.0

Even though 1.0 has only just been released, the language keeps evolving, and 2.0 will be released soon. The majority of code will be compatible, and there will be a six-month period where both syntaxes will be valid. The main incoming changes are designed to allow the compiler to enforce that view functions do not modify the state of the contract, making it much more explicit and easier to know if a function will modify the state. The new interface syntax will be something similar to the following:

#[starknet::interface]
trait ICounterContract<TContractState> {
fn increase_counter(ref self: TContractState, amount: u128);
fn decrease_counter(ref self: TContractState, amount: u128);
fn get_counter(self: @TContractState) -> u128;
}

The trait for a Starknet interface is now required to explicitly state whether it requires a ref to the contract state, which allows it to be modified and implicitly returned at the end of a function, whereas the "@" symbol indicates that the contract state is an immutable snapshot and cannot be modified. For more information, see the full details of the upcoming changes at the official community post.

Cairo: The Verdict

The changes from Cairo 0 to 1.0 are a great step towards making the language easy to use and adding some nice security features. The upcoming changes in Cairo 2.0 will make it easier to reason about where the state is modified and help catch mistakes earlier. However, it is still possible to write insecure code, and so it is important to understand the underlying system, the potential pitfalls, and how to avoid them.

Digging Deeper: Cairo Resources

About Us

Zellic is a smart contract auditing firm founded by hackers, for hackers. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants. Whether you’re developing or deploying smart contracts, Zellic’s experienced team can prevent you from being hacked.

Contact us for an audit that’s better than the rest. Real audits, not rubber stamps.

BaarkingDog

A cryptographic deep dive into mathematical attacks on ZK hash functions
Article heading

Introduction

In our last post, we discussed the design rationale and performance of various ZK-friendly hash functions. These include MiMC, Poseidon, Vision, and Rescue. Although the structures may differ slightly, the underlying concept remains the same: representing the circuit as simple expressions in a large field, it allows efficient proof in terms of both prover execution time and proof size. However, the question remains: Are they truly secure enough to be entrusted with billions of dollars? Let's dive into the attacks that are particularly effective against ZK-friendly hash functions.

ZK-Friendly Hash Functions

From a high-level perspective, ZK-friendly hash functions are constructed using algebraically simple keyed permutations, called arithmetization-oriented ciphers (AOCs). AOCs are based on either SPN or Feistel networks. Then the permutations are transformed into hash functions using the sponge framework.

(In addition, new permutations have been proposed recently, such as this and this. Both were accepted to CRYPTO 2023. We recommend checking out these papers because they offer significant performance benefits.)

As keyed permutations are structured as multiple rounds of linear and nonlinear operations, the higher the round count, the more secure the hash function but the worse the performance. The designers must consider every possible attack and determine the lower bounds of the round count required to mitigate them.

Maturation Takes Time, Including AOCs

Unfortunately, although the designers carefully consider every possible attack and even add security margins, this is not strong evidence that the structure is safe. As we will see in the examples presented in this post, even if the designer has done their best, there may be new attacks that the designer has not considered. Even peer-reviewed papers published in top conferences can be vulnerable to attacks. For instance, a cipher targeting homomorphic encryption named Chaghri was published in ACM CCS 2022, but an attack was discovered only three months after its publication.

New primitives often become adopted by industry after enough time has passed since their publication. This is independent of the claims made in the original paper and is instead based on the fact that enough people have analyzed them to determine that they are mature enough for practical use.

And from a designer's perspective, it is important to consider a variety of attacks, including those we will discuss as well as those evolving, to create a robust security design.

In this post, we will discuss algebraic attacks that are particularly efficient against ZK-friendly hash functions and should be taken into consideration, including attacks on the actual proposed primitive. Although you do not extensive ZK knowledge to understand this article, you may check out our latest ZK related blog post if you are a beginner to ZK.

Algebraic Attacks

Before starting to introduce algebraic attacks, we should clarify targets of attacks. The basic secure properties of hash functions are preimage resistance, second-preimage resistance, and collision resistance. Therefore, the attacker's primary goal is to recover preimages or find collision pairs. However, from the point of view of designers, it is hard to convince someone that a their hash function satisfies security properties.

Meanwhile, ZK hash functions are usually based on the block ciphers. Once the key is fixed, the block cipher becomes a permutation, and these permutations are transformed into hash functions using the sponge framework. It is mathmetically proven that the hash function derived from sponge construction is secure if the permutation is secure. Therefore, instead of demonstrating security for the hash function directly, designers show that a block cipher used inside the sponge construction is secure. Conversely, an attacker would like to show that the block cipher is insecure. For example, if attacker is given plaintext-ciphertext pairs (Pi,Ci)(P_i, C_i) and enable to recover key KK faster than testing all the possible key candidates, it is called key-recovery attack. If the block cipher is insecure, it weakens the designer's claim that the hash function created from the block cipher is secure.

ZK-friendly hash functions are represented by simple expressions in a large field. In traditional cipher design, the strongest attacks are considered to be statistical attacks called differential and linear cryptanalysis. However, when the field is large, these attacks are not suitable even for a small number of rounds. On the other hand, targeting low arithmetic complexity in attacks is effective.

Interpolation Attack

An interpolation attack is an attack that constructs a polynomial corresponding to the encryption function without knowing the key. Let Ek:FqFqE_k : \mathbb F_q \rightarrow \mathbb F_q be an encryption function. For a fixed key kk, the polynomial Ek(x)E_k(x) can be constructed using Lagrange interpolation --- for example, when we consider two rounds of MiMC, Ek(x)=((x+k)3+k+c1)3+kE_k(x) = ((x+k)^3 + k + c_1)^3 + k. This is a polynomial with degree 99 in terms of xx.

Once the attacker constructs the polynomial Ek(x)E_k(x), then can obatin ciphertext of any plaintext, which has the same effect as knowing the key. Moreover, the attacker may extract key kk from Ek(x)E_k(x) because the coefficients of x8x^8 is 9k9k.

When a polynomial has a degree of dd, the complexity of constructing a Lagrange interpolation polynomial is O(dlogd)O(d\log d). Therefore, the degree of the polynomial used to represent the encryption function should be high enough. For instance, in the MiMC, the round number rr is determined as r=log3qr = \lceil \log_3 q \rceil to ensure that the degree dd attains the maximum degree of q1q-1.

An interpolation attack targets a univariate equation in terms of plaintext. If the plaintext is Fqm\mathbb F_q^m instead of Fq\mathbb F_q, then this attack cannot be applied directly.

GCD Attack

The GCD attack is a key recovery attack that involves computing the GCD. In the GCD attack, the variable is the key KK. We denote the encryption function as E(K,x)E(K, x). For the plaintext-ciphertext pair (x,y)(x, y), we have E(K,x)y=0E(K, x) - y = 0 for K=kK = k. Now, let's consider two polynomials derived from the two pairs (x1,y1)(x_1, y_1) and (x2,y2)(x_2, y_2). Both E(K,x1)y1E(K, x_1) - y_1 and E(K,x2)y2E(K, x_2) - y_2 have a common factor of (Kk)(K-k), because E(k,x1)y1=E(k,x2)y2=0E(k, x_1) - y_1 = E(k, x_2) - y_2 = 0. (KK denotes the variable of the key, and kk is the actual value of the key.) This common factor can be determined by computing the GCD.

The complexity of finding the GCD of two polynomials with a degree of dd is O(dlog2d)O(d\log^2d). Therefore, keeping the maximum degree also thwarts this attack. Fortunately, degree grows exponentially as the number of rounds increases, so designers can build defenses against interpolation attacks and GCD attacks without much difficulty.

The GCD attack targets a univariate equation in terms of a key. If the key is in Fqm\mathbb F_q^m instead of Fq\mathbb F_q, then this attack cannot be applied directly.

Invariant Subfields Attack

If the round constants or linear layers are poorly chosen, the ciphertext range may be limited to the subfields. For instance, consider the case where the field is F_2n\mathbb F\_{2^n} and F_2m\mathbb F\_{2^m} is a subfield of F_2n\mathbb{F}\_{2^n}. This implies that mm divides nn. If all the round constants cic_i and the key kk in MiMC are in the subfield F_2m\mathbb F\_{2^m}, an adversary can ensure that Ek(x)F_2mE_k(x) \in \mathbb F\_{2^m} by selecting a plaintext xF_2mx \in \mathbb F\_{2^m}, and this reduces the possible key space 2n2^n to 2m2^m for an attacker --- for example, in MiMC, if the field is F_2128\mathbb F\_{2^{128}} and all round constants cic_i and the key kk are in the F_264\mathbb F\_{2^{64}}. Then, if the attack setting is chosen as a plaintext attack, an attacker may notice that the ciphertext is always in F_264\mathbb F\_{2^{64}} if the plaintext is in F_264\mathbb F\_{2^{64}}. Therefore, an attacker is trying to test only 2642^{64} key candidates instead of 21282^{128}.

This attack can be easily prevented by selecting a random round constant, such as a nothing-up-my-sleeve number. In other words, if an implementation utilizes custom round constants and the rationale behind their selection is undisclosed, there may be vulnerabilities to an invariant subfield attack. Please note that the invariant subfield attack is only effective on binary fields, as prime fields do not have subfields.

Linearization

Linearization is a technique that represents multivariate equations as linear equations by transforming every monomial as new variables. Consider below the multivariate equations in F_11\mathbb F\_{11}:

x1+x2x3=6x1x2+x1x3+x1=3x2x3+x2=2x1x2+x1+x3=8x1x2+x1x3+x3=7x2x3+x1+x2=2.\begin{aligned} x_1 + x_2x_3 = 6\\ x_1x_2 + x_1x_3 + x_1 = 3\\ x_2x_3 + x_2 = 2\\ x_1x_2 + x_1 + x_3 = 8\\ x_1x_2 + x_1x_3 + x_3 = 7\\ x_2x_3 + x_1 + x_2 = 2 \end{aligned}.

Then, by replacing every monomial to new variables like so,

x1=y1x2=y2x3=y3x1x2=y4x1x3=y5x2x3=y6,\begin{aligned} x_1 = y_1\\ x_2 = y_2\\ x_3 = y_3\\ x_1x_2 = y_4\\ x_1x_3 = y_5\\ x_2x_3 = y_6 \end{aligned},

the equations become a system of linear equations, and they can be easily solvable by Gaussian elimination.

Let nn be a number of unique monomials and mm be a number of equations. To solve given equations using linearization, nmn \leq m must be hold, and the time complexity is O(nω)O(n^\omega), where ω\omega is the minimum exponent for the complexity of matrix multiplication. At least every element in the matrix should be checked before multiplication, lower bounds of ω=2\omega = 2. And the current state-of-the-art algorithm is ω=2.371866\omega = 2.371866. To show a resistance on the linearization, the designer should claim the number of monomials nn is enough that attack complexity is higher than expected security parameters.

For example, for Fq\mathbb F_q with q2128q \approx 2^{128}, let key be k=(k0,k1,k2)Fq3k = (k_0, k_1, k_2) \in \mathbb F_q^3. The designer wants to show this primitive has 128-bits security. If E(k,x)E(k, x) has a max degree 24812^{48}-1, then the number of monomials that can be constructed from the equation is approximately 248×3=21442^{48 \times 3} = 2^{144}, since the degrees of k1,k2,k3k_1, k_2, k_3 vary 00 to 2482^{48}. Then the linearization time complexity is O((2144)ω)O(2288)O((2^{144})^\omega) \geq O(2^{288}), even if we assume that ω=2\omega = 2. Therefore, this primitive is secure against linearization.

Meanwhile, the important question is of whether all the monomials can actually appear. If the expression is sparse and most of the monomials don't actually occur, then the attack may be possible. Therefore, the designer should claim that most of monomials have appeared theoretically, or experimentally shown using toy parameters.

Gröbner Basis Attack

The Gröbner basis attack is a method for solving a multivariate system by computing the Gröbner basis. Algebraic structures of AOCs allow us to build multivariate equations for key and intermediate values. For example, let's consider three rounds of MiMC. Let I1I_1 and I2I_2 be the outputs of the first and second S-boxes, respectively.

Then, given plaintext-ciphertext pair (x,y)(x, y), we have a multivariate system:

(x+k)3I1=0(I1+k+c1)3I2=0(I2+k+c2)3ky=0\begin{aligned} (x+k)^3 - I_1 = 0\\ (I_1 + k + c_1)^3 - I_2 = 0\\ (I_2 + k + c_2)^3 - k - y = 0\\ \end{aligned}

The benefits of this multivariate system compared to the univariate system y=(((x+k)3+k+c1)3+k+c2)3+ky = (((x+k)^3 + k + c_1)^3 + k + c_2)^3 + k is that the degree of the equations are much lower. If we can solve this multivariate system for k,I1,I2k, I_1, I_2, then the key is recovered. However, while univariate equations are easily solvable using well-studied algorithms like the Kaltofen-Shoup algorithm if the degree is not that high, there is no efficient algorithm for a multivariate system.

Instead of solving a multivariate system directly, the Gröbner basis of multivariate equations always contains a univariate equation. Once we find the Gröbner basis of the given equations, we obtain a univariate equation that we can solve.

So, what is the Gröbner basis? Understanding the Gröbner basis requires a very high level of mathematical proficiency; therefore, we would like to provide only a brief sketch and omit the details. For a deeper understanding of the technical knowledge, we recommend referring to this paper.

Let's consider the following multivariate system:

f1:x2y3y1=0f2:xy+xy=0\begin{aligned} f_1 &: x^2 - y^3 - y - 1 = 0\\ f_2 &: -xy + x - y = 0 \end{aligned}

The red (resp. blue) line denotes the zeroes of $f_1$ (resp. $f_2$). Image created from WolframAlpha.

We can find another multivariate equation that has the same solutions as the above multivariate system by calculating f=m1f1+m2f2f' = m_1 \cdot f_1 + m_2 \cdot f_2, where m1m_1 and m2m_2 are monomials. For example, we choose m1=ym_1 = y and m2=xm_2 = x. Then f3=m1f1+m2f2=y(x2y3y1)+x(xy+xy)=y4y2y+x2xyf_3 = m_1 \cdot f_1 + m_2 \cdot f_2 = y(x^2 - y^3 - y - 1) + x(-xy + x - y) = -y^4 - y^2 - y + x^2 - xy. It is clear that the solution of the system f1,f2{f_1, f_2} is also the solution of f3f_3.

The Gröbner basis of the given multivariate system is earned by repeating the above calculations appropriately. The Gröbner basis has important characteristics:

  1. The solutions of the system derived from the Gröbner basis are the same as the original multivariate system, and
  2. The Gröbner basis has a univariate equation.

For example, the Gröbner basis of f1,f2{f_1, f_2} is g1=x+y4y3+y2y1,g2=y52y4+2y32y2y+1{g_1 = x + y^4 - y^3 + y^2 - y - 1, g_2 = y^5 - 2y^4 + 2y^3 - 2y^2 - y + 1 }. Since g2g_2 is the univariate equation for yy, yy is recovered from g2g_2. After that, xx is recovered by substituting yy into the obtained solution. The (x,y)(x, y) recovered from the Gröbner basis are also solutions of the original system.

The red (resp. blue) line denotes the zeroes of g1g_1 (resp. g2g_2). Images created from WolframAlpha.

To summarize, the Gröbner basis attack follows the following steps.

  1. Calculate a Gröbner basis.

  2. Identify a univariate polynomial in the given basis and solve for its roots.

  3. Substitute this solution into the Gröbner basis and repeat Step 2.

Let's consider the time complexity of a Gröbner basis attack. For LL equations with degrees d1,,dLd_1, \dots, d_L over nn variables, we define P(x)=i(1xdi)(1x)n.P(x) = \frac{\prod_i(1-x^{d_i})}{(1-x)^n}.

The degree of regularity DrD_r is defined as the degree of the term with the first nonpositive coefficient in P(x)P(x). The time complexity of calculating a Gröbner basis is estimated as (n+DrDr)ω,\binom{n + D_r}{D_r}^\omega, where nn is the number of variables and DrD_r is the degree of regularity.

This value may not be intuitive, but it can be understood that as the number of variables nn decrease, the number of equations LL increase, and the degree of equations did_i decrease, the system becomes easier to attack.

Attack Examples

Generally, designers consider all of the above attacks in their primitives and choose their round numbers. However, there might be variants for each of these structures, and attackers try to find them to break the primitives.

We will introduce several attacks for the ZK-friendly hash functions. These attacks result in specification changes or design withdrawals.

Interpolation Attacks on MiMC

The first attack is an interpolation attack on MiMC, which was proposed by Chaoyun Li and Bart Preneel (in this paper). We have discussed that for rr rounds of MiMC, the degree dd of the interpolation polynomial is d=3rd = 3^r. An attacker would need at least 3r+13^r+1 pairs to construct this polynomial, with a time complexity of dlogdd\log d. Furthermore, the memory complexity (3r+1)F(3^r + 1) \cdot \|\mathbb F\| represents the total memory required to store each coefficient. MiMC designers claim that if memory complexity is limited, then the number of rounds can be reduced. For example, the author suggests that 38 rounds are sufficient for MiMC-129/129 when the available bytes are limited to 2642^{64} bytes.

However, Li and Preneel found novel attacks that construct partial polynomials. This attack uses only a negligible amount of memory. Although an attacker may have access to 3r+13^r+1 data points, it is impossible to recover the exact polynomial due to memory limitations. However, it is possible to recover only the second-highest term of the interpolated polynomial. (Please refer to section 3.3 in the paper for more information.) Once the second-highest term is recovered from the interpolation, the value of key kk can be determined as the second highest-term, 3rk3^r \cdot k.

This attack indicates that for MiMC-129/129, the full 82 rounds are necessary, even with restrictions on the memory available to the attacker. However, these results do not affect the security claims of the full round MiMC.

There is an another attack against MiMC that aims the full round MiMC using higher-order differential cryptanalysis (read more here). However, this attack is faster than a brute-force attack but requires 2n12^{n-1} chosen data for binary field F_2n\mathbb F\_{2^n}. Therefore, it is not considered as a realistic threat.

Gröbner Basis Attack on Jarvis and Friday

Jarvis and Friday are a ZK-friendly block cipher and a hash function, respectively (read more here). Since Friday is the Merkle-Damgård construction of Jarvis, it is enough to discuss the structures of Jarvis only. Similar to the S-box used in AES, Jarvis employs an inverse S-box as a nonlinear layer. The inverse S-box in the field F_2n\mathbb F\_{2^n} is defined as inv(x)=x2n2inv(x) = x^{2^n-2}. If x=0x = 0, then inv(x)=0inv(x) = 0, and x×inv(x)=1x \times inv(x) = 1 otherwise.

The structure of Jarvis is relatively simple. The round function is composed of the inverse S-box, degree 44 polynomials BB and CC. B(x)B(x) and C(x)C(x) are formed as follows: B(x)=b4x4+b2x2+b1x1+b0x0B(x) = b_4 \cdot x^4 + b_2 \cdot x^2 + b_1 \cdot x^1 + b_0 \cdot x^0 and C(x)=c4x4+c2x2+c1x1+c0x0C(x) = c_4 \cdot x^4 + c_2 \cdot x^2 + c_1 \cdot x^1 + c_0 \cdot x^0. These equations are the same as those used in Vision. Finally, one round of Jarvis can be represented as Jarvis_i(s)=C(B1(inv(s)))+Kr\textsf{Jarvis}\_i(s) = C(B^{-1}(inv(s))) + K_r, where ii represents the round number index.

One round of Jarvis. Image from Jarvis paper

The inverse S-box is also utilized in their key schedule, as shown below.

One round of the key schedule of Jarvis --- $C_{i+1}$ is a round constant. Image from Jarvis paper.

Even with just one or two inverse S-boxes, it is possible to achieve maximum degrees. Moreover, the cost for SNARK/STARK is relatively low because only verifying x(1xy)=0x(1-xy) = 0 and y(1xy)=0y(1-xy) = 0 is sufficient to check for y=inv(x)y = inv(x). The authors considered several attacks and suggested using 10, 11, 12, and 14 rounds for n=128,160,192,256n = 128, 160, 192, 256, respectively.

However, they did not consider Gröbner basis attacks, and according to this paper, Jarvis is vulnerable to such attacks.

If only the key is treated as a variable, then the equation is univariate. However, the degree is high, making it difficult to recover the key. However, if we add every intermediate value as variables, the system becomes multivariate. With these intermediate values, the degree becomes dramatically lower and allows to mount Gröbner basis attacks.

For each round, xi=B1(inv(s))x_i = B^{-1}(inv(s)) is considered as a variable.

Intermediate state $x_i$ in one round of the encryption path. Image from the attack paper.

The following equations are given:

  • (C(xi)+ki)B(x_i+1)=1(C(x_i) + k_i) \cdot B(x\_{i+1}) = 1: r1r-1 equations of degree 88.
  • B(x1)(p+k0)=1B(x_1) \cdot (p + k_0) = 1: One equation of degree 55.
  • C(xr)=c+krC(x_r) = c + k_r: One equation of degree 44.
  • (k_i+1+ci)ki=1(k\_{i+1}+c_i) \cdot k_i = 1: rr equations of degree 22.

There are 2r+12\cdot r+1 variables x1,,xrx_1, \dots, x_r and k0,,krk_0, \dots, k_r with 2r+12\cdot r+1 equations. Unfortunately this breaks only a reduced round of Jarvis, but with the optimizations suggested in the paper, the number of variables are decreased and succeed in attacking the full round of Jarvis (see section 4.2 of the paper).

From this attack, we realized that with intermediate variables, the expression could be represented in a different form to make the attack possible. In particular, ZK-friendly hash functions often allow expressions to be represented in low degrees if they are well-transformed for efficiency, but care must be taken not to make them vulnerable to a Gröbner basis attack.

Invariant Subspace Attacks on Starkad

Starkad is a ZK-friendly hash function that has the same structure as Poseidon, with the only difference being that Starkad uses binary fields while Poseidon uses prime fields. The authors of Poseidon initially proposed both Starkad and Poseidon, but they have since withdrawn Starkad.

Let's review the structures of Starkad. The S-box layer in the middle is made of partial S-box layers. This would reduce the R1CS or AET cost.

Construction of Starkad, which is same as Poseidon. $R_F = 2 cdot R_f$ is the number of the full S-box rounds, and $R_P$ is the number of the partial S-box rounds. Image from the Poseidon paper.

The purpose of the linear layer is to spread local changes onto the entire state. In Starkad, the linear layer is a Cauchy matrix, which is defined by M[i,j]=1xi+yjM[i,j] = \frac{1}{x_i + y_j} for the entries of xi_1im{x_i}\_{1 \leq i \leq m} and yi_1im{y_i}\_{1 \leq i \leq m}, which are pairwise distinct, and xi+yj0x_i + y_j \neq 0, where i1,,mi \in {1, \dots, m} and j1,,mj \in {1, \dots, m}. To generate the sequences xi_1im{x_i}\_{1 \leq i \leq m} and yi_1im{y_i}\_{1 \leq i \leq m}, a constant rr and a starting point x0x_0 are first chosen. The variables xix_i and yiy_i are defined as follows: xi=x0+i1x_i = x_0 + i - 1 and yi=xi+ry_i = x_i + r. The matrix is generated and fixed once, and the designers choose x0=0x_0 = 0 and r=mr = m, where mm represents the number of words in each layer.

However, this MDS matrix allows for large invariant subspaces that pass through the entire middle partial S-box rounds without activating any S-boxes. This renders the middle RPR_P rounds linear, as noted in the attack paper. More precisely, for a certain invariant subspace VV, if ΔV\Delta \in V and xy=Δx \oplus y = \Delta, then F(x)F(y)=L(Δ)F(x) \oplus F(y) = L(\Delta), where FF represents the middle layer and LL represents a linear transformation. This affects security in terms of both statistical and algebraic cryptanalysis.

Although this attack can be mitigated by selecting a value for mm that is not divisible by 4 or by choosing another secure MDS matrix, designers decided to dismiss Starkad and let only Poseidon remain.

Conclusion

In this post, we've discussed several algebraic attacks and their usages in the primitives. When designing new AOCs, be sure to consider whether they are secure against attacks like the ones mentioned in this post. This is an active area of research, so it is recommended to periodically check places like the Cryptology ePrint Archive for new attacks.

About us

Zellic is a smart contract auditing firm founded by hackers, for hackers. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants. Whether you’re developing or deploying smart contracts, Zellic’s experienced team can prevent you from being hacked.

Contact us for an audit that’s better than the rest. Real audits, not rubber stamps.

If you think you’d be a good fit to work at Zellic, we’re hiring!

Cayden Liao

An intro to Zero-Knowledge and applications of zkSNARKs including zkEVMs, zkBridges, and zk Programming Languages
Article heading

Introduction

Zero-knowledge proofs (ZKPs) represent a family of cryptographic protocols that enable one party (denoted the prover) to produce proofs of knowledge for a set of values denoted witness values. The proofs allow the prover to prove knowledge of the witness values to another party (denoted the verifier) without disclosing any additional information about the values themselves.

The aforementioned mechanism extends quite naturally to proving computational integrity --- in other words, a particular computation has been performed correctly and honestly. Every operation in a computation may be represented as a constraint on the inputs and outputs of the operation. If a prover presents the output for a computation, along with a proof that they know has valid state values for every constraint in the computation, verifying the proof is equivalent to verifying that the output was computed correctly for some inputs. Therefore, the verifier is assured of the validity of the output without having to perform the computation.

Most notably, ZK proofs are used in cryptocurrencies to improve trust, privacy, and stability on the blockchain. By increasing the level of security without compromising the integrity of transactions, ZK proofs play a crucial role in these decentralized systems. In this blog, we will take a surface level walk-through of the landscape and real-world applications of zkSNARKs.

What Are zkSNARKs?

Specifically, zkSNARKs is an acronym for zero-knowledge succinct noninteractive argument of knowledge.

  1. ZK: ZK stands for zero-knowledge. As explained above, it allows a verifier to verify that a proof made by a prover is valid. Also, the verifier does not learn anything about the specific information in the proof. This characteristic ensures a high level of privacy as it allows sensitive information to remain hidden while still providing sufficient proof of its legitimacy.
  2. S: S stands for succinct. This means that the length of the proof has to be relatively small. How small? Usually, these proofs are only a few hundred bytes in length. The small size of the proofs allow for blazing fast verification in only a few milliseconds.
  3. N: N stands for noninteractive. This property is by far the most important property of zkSNARKs. A noninteractive proof means that the proof can be generated without any interaction from the verifier. Once the proof is created, it can be sent to a verifier, who can then confirm or refute the proof. The lack of interaction means that the proof can be publicly posted for anyone to verify without the need to communicate to the original prover.
  4. AR: AR stands for argument (of) and goes along with the next letter.
  5. K: K stands for knowledge. Combining with AR, the letters "ARK" stand for argument of knowledge. An argument of knowledge is a type of proof where a prover convinces a verifier that they have knowledge of a certain piece of information. This is done under a computational assumption; it's computationally infeasible for a dishonest prover to convince a verifier of a false statement. This is different from the zero-knowledge property since it focuses more on the soundness of the proof.

zkSNARKs are important mostly due to their noninteractive property. On the blockchain, it is infeasible to communicate with every user and have each verify a proof. It is much easier for the prover to post a proof to the public and have anyone verify it without having to communicate back to the prover.

Applications Into Blockchains & Real World

zkEVMs

Due to the increasing popularity of Ethereum, transaction times have been longer, fees have increased, and scalability has become a challenge. A solution to these challenges is by using zkEVMs along with zkRollups to help solve these problems. A zkEVM is a combination of zero-knowledge proofs with the Ethereum Virtual Machine. It essentially uses ZKPs to prove that a transaction happened in the zkEVM.

More specifically, the zkEVM lives on a Layer 2, a separate blockchain that utilizes and extends the main Ethereum chain. Now, smart contracts can be uploaded and the EVM runs the smart contract byte code. To prove that a state change occurred when interacting with a smart contract, the ZK part of the zkEVM creates a noninteractive proof that anyone can verify on that transaction.

As the blockchain increases in the number of blocks, it becomes computationally infeasible to go through every block. zkRollups come to the rescue. Put simply, zkRollups take a large chunk of user transactions and squeeze them into one mega block. It also attaches a proof onto this new block, which allows anyone to verify that the new block was made up of a large chunk of transactions. zkRollups offer a solution to the ever-increasing number of blocks on the blockchain.

By "rolling up" a large number of transactions into a single block, it greatly reduces the amount of data needed to be stored on the blockchain, increasing scalability and efficiency. Instead of having each transaction directly uploaded to the blockchain, transactions are bundled up first and then uploaded.

This is Polygon’s solution:

Polygon is one of the first creators of zkEVMs. Polygon is a Layer 2 scalability solution that uses a zkEVM for users to interact with to increase the speed of transactions and lower transaction fees. Right now, the main Ethereum chain can handle around 30 transactions per second (TPS). By uploading transactions to Polygon, they can be processed at speeds of up to 2,000 TPS. To achieve this speed, Polygon uses a side chain where lots of transactions are bundled up through a zkRollup before being sent to the main Ethereum blockchain to reduce the total communication to the main chain.

Along with faster transaction speeds, the cost per transaction is also reduced. Polygon achieves this in the same way by using zkRollups. Since the interaction with the main Ethereum blockchain is minimal, the gas fees for each interaction is shared across all the transactions that were rolled up. This splits a normal gas fee of about $20 to a few pennies for every user.

zkBridges

As more and more cryptocurrencies are created and the blockchain ecosystem diversifies, the demand for communicating and exchanging different coins increases. Given that each cryptocurrency is on its own blockchain, crypto bridges are used to transfer coins from one blockchain to another. Right now, two popular ways that people trade cryptocurrencies are through a centralized trusted party or by verifying consensus.

A centralized party for exchanging different cryptocurrencies is the same as currency exchange in the real world. A bank or a trusted party allows one to deposit some coins in exchange for other coins. This trusted party acts as a market maker. However, there are a few problems with this. First, in order for this exchange to occur, the market maker must have enough of both coins (i.e., liquidity) to perform a trade. Using Ethereum and Polygon MATIC as an example, if one wants to exchange their Ethereum for MATIC, the centralized trusted party must have enough MATIC coins to meet the value of the Ethereum coins being traded. If no one is trading from MATIC back to Ethereum, then the process stops: there isn’t enough MATIC to cover for the Ethereum coins. Second, if the central party is untrustworthy, they could steal all the deposited coins and disappear into thin air (i.e., a rug pull).

The zkBridge between Ethereum and Polygon MATIC.

Another solution to the crypto exchange problem is through zkBridges. It brings ZK proofs with crypto bridges to provide exchanges between different cryptocurrencies that use the EVM without the need of an intermediate trusted party. Let's use a transaction between Ethereum and Polygon as an example.

When a transaction is made and a block is created on the main Ethereum blockchain, it needs to be validated. To accomplish this, validators create a proof of consensus by using their own keys to sign the block. After a certain number of validators have signed the block and the signatures are verified, there becomes a consensus that the block is indeed valid.

After achieving a proof of consensus, a Merkle proof is created for the specific block and transaction and sent to a smart contract on the Polygon chain. Since smart contracts cannot communicate with each other cross-chain, a middle party is still used. However, these middle parties are more trustworthy since they can't mess with the proofs without invalidating them.

Succinct Labs takes advantage of this and uses a modified proof of consensus and Merkle proofs to achieve a zkBridge on cryptocurrencies that use the EVM. For Succinct Labs, a proof of consensus is formed by using 512 validators that are rotated every 27 hours to sign all new block headers on the main Ethereum chain with Groth16 (used by Tornado Cash). These proofs can be validated very efficiently on chain due to the constant-time proof verification of Groth16. When the validity of the proof is determined, a Merkle proof is used to relay the transaction across the cross-chain completing the zkBridge and allowing coin exchange.

zkProgramming Languages

In the development of ZK proofs or ZK circuits, Rust is typically chosen as the language to write them in. However, this requires both an understanding of the Rust language and an understanding of ZK circuits and libraries. Right now, most ZK libraries are quite bare. Developers are required to read large amounts of documentation just to find out how to use a function safely.

Due to this, the innovation of ZK technologies has been quite slow. The goal of a ZK programming language is to simplify the development of ZK circuits. By moving all the complex math behind the scenes, writing ZK circuits can be treated as writing conditionals in a normal programming language. It would allow regular software engineers without a deep understanding of how ZK circuits work to create them efficiently.

The need for ZK programming languages arises from the fact that coding up ZK circuits involves lots of math and cryptographic operations. Understanding these concepts require specialized knowledge in areas like cryptography and number theory. This creates a barrier for software engineers who may not have a strong background in these domains. Instead of using a ZK library to create constraints for a conditional statement, an if statement written in Noir is automatically converted to a circuit conditional. For example, to set a value a that depends on a boolean b without Noir,

let a = b*f(x) + (1-b)*g(x);

With a ZK programming language, it's as simple as

if b {
let a = f(x);
} else {
let a = g(x);
}

without the boolean algebra behind it.

Noir is a ZK programming language created by Aztec to bridge the gap between software engineers and ZK circuits. By using a Rust-like language and syntax, developers don’t have to relearn a new language and can create ZK circuits easily. Additionally, Noir already implemented a lot of useful ZK gadgets in their standard library including Merkle proofs, ECDSA, field methods, and more. For example, using a Merkle proof is as easy as this:

use dep::std;

fn main(message : [Field; 62], index : Field, hashpath : [Field; 40], root : Field) {
let leaf = std::hash::hash_to_field(message);
let merkle_root = std::merkle::compute_merkle_root(leaf, index, hashpath);
assert(merkle_root == root);
}

However, Noir doesn't compile code directly to a ZK circuit. During compilation, the written code compiles into an intermediate phase called the abstract circuit intermediate representation (ACIR). It compiles all code into R1CS, which allows users to choose the best proving system needed for their use. The ACIR code is then finally compiled with the chosen proving system on the abstract circuit virtual machine (ACVM), which acts as a ACIR compiler.

Conclusion

Overall zk-SNARKs provide a powerful tool for enhancing privacy and trust in blockchain systems. With a diverse range of applications (zkEVMs, zkBridges, zkProgramming languages), they can fit into a wide variety of needs on the blockchain. By allowing parties to prove integrity of computational results, ZK opens avenues for not only privacy applications but delegating computations since off-loaded workloads can be computed verifiably.

Special thanks to fellow Zellic engineer Mohit Sharma for their help and input in this article.

About Us

Zellic is a smart contract auditing firm founded by hackers, for hackers. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants. Whether you’re developing or deploying smart contracts, Zellic’s experienced team can prevent you from being hacked.

Contact us for an audit that’s better than the rest. Real audits, not rubber stamps.

BaarkingDog

An exploration of MiMC, Poseidon, and Vision/Rescue, along with a deep dive into SNARK and STARK complexity
Article heading

Introduction

Hash Function

A cryptographic hash function is one of the basic primitives for cryptography. (In this article, the word hash function will mean cryptographic hash function unless otherwise noted.) A cryptographic hash function satisfies the below three secure properties:

  1. Preimage resistance: Given hash value hh, it is hard to find any input message mm that f(m)=hf(m) = h.
  2. Second-preimage resistance: Given input message mm, it is hard to find message m(m)m'(\neq m) that f(m)=f(m)f(m')=f(m).
  3. Collision resistance: It is hard to find any two different input messages m1,m2(m1)m_1,m_2(\neq m_1) that f(m1)=f(m2)f(m_1)=f(m_2).

Cryptographic schemes such as digital signatures, message authentication codes, commitment schemes, and authentications are built on top of hash function. For example, the preimage resistance property of a hash function enables the proof-of-work system in cryptocurrency. SHA-2 and SHA-3 are common examples of hash functions, and they are adopted for general purposes.

Usage of Hashes in the ZK protocol

A hash function is also useful in the ZK protocol. One famous example is the membership check in a Merkle tree. For a Merkle root rr, the prover will claim knowledge of the witness w1,,wnw_1,\dots,w_n such that

H(H(H(w1,w2),w3)),wn)=rH(\dots H(H(w_1,w_2),w_3))\dots,w_n)=r

to prove their knowledge of element w1w_1, which is a member of the Merkle tree. We already discussed this usage in our previous post, “How Does Tornado Cash Work?”.

Traditional uses of hash functions can be utilized in the ZK protocol as well, such as integrity checks using vanilla hash computation, retrieving pseudorandom strings from the fixed length seed, and signing transactions.

Why Are ZK-Friendly Hash Functions Required?

The standardized hash functions such as SHA-2 and SHA-3 are extensively studied, and their security is widely believed. Moreover, traditional software and hardware implementations of them are efficient compared to their competitors.

So when we need to evaluate a hash in ZK protocols, it seems nice to use SHA-2 or SHA-3. However, many ZK protocols are using relatively unfamiliar hash functions such as MiMC, Poseidon, and Rescue instead of SHA-2 and SHA-3.

The main reason for this is that efficiency in ZK protocols is determined in quite different ways from traditional metrics such as running time, power consumption, and gate count. The efficiency of the circuits in ZK protocols depends on their algebraic structure.

Generally, if the circuit is represented as simple expressions in a large field, it allows efficient proof in terms of prover execution time and proof size. Unfortunately, traditional hash functions are inappropriate for this.

For example, SHA-2 is about 50--100 times more inefficient than ZK-friendly hash functions when the hash is computed in zk-STARK. This huge performance gap demonstrates the need for a ZK-friendly hash function.

In this post, we will introduce several ZK-friendly hash functions that are introduced in famous conferences and journals. It might be fun to focus on the design rationale for each hash function.

ZK Protocols

We first discuss the ZK protocols. Rather than trying to fully understand the protocols, we'll focus on their high-level characteristics and what the metric is to determine efficiency.

zk-SNARK

zk-SNARK is an acronym that stands for zero-knowledge succinct non-interactive argument of knowledge.

  1. Zero-knowledge: Verifier learns nothing other than the statement is true.
  2. Succinct: Proof size and verifier time is sublinear.
  3. Non-interactive: Needs no interaction between prover and verifier.
  4. Argument of knowledge: Not only for existence of input xx but also prover's knowledge of xx.

While there are several ways to build a protocol that satisfies the above characteristics, such as linear PCP + pairing-based cryptography, constant-round polynomial IOP, and polynomial commitment scheme + IOP-based, Groth16 based on the linear PCP and pairing-based cryptography is the most widely used one.

Groth16 and many other zero-knowledge proof systems such as Aurora, Ligero, and Bulletproofs are evaluating the circuit in Rank-1 Constraint System (R1CS) form.

R1CS Examples

The R1CS is a system of equations, where each equation is defined by a triplet of vectors (a_i,b_i,c_i)(\vec{a}\_i,\vec{b}\_i,\vec{c}\_i) such that (a_is)×(b_is)=(c_is)(\vec{a}\_i \cdot \vec{s}) \times (\vec{b}\_i \cdot \vec{s}) = (\vec{c}\_i \cdot \vec{s}).

For example, y=x3y = x^3 can be interpreted as two equations defined by a triplet of vectors (with intermediate value z=x2z = x^2):

x×x=z:((100)(xyz))×((100)(xyz))=((001)(xyz)),x \times x = z : \left( \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} \cdot \begin{pmatrix} x \\ y \\ z \end{pmatrix} \right) \times \left( \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} \cdot \begin{pmatrix} x \\ y \\ z \end{pmatrix} \right) = \left( \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix} \cdot \begin{pmatrix} x \\ y \\ z \end{pmatrix} \right),
z×x=y:((001)(xyz))×((100)(xyz))=((010)(xyz)).z \times x = y : \left( \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix} \cdot \begin{pmatrix} x \\ y \\ z \end{pmatrix} \right) \times \left( \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} \cdot \begin{pmatrix} x \\ y \\ z \end{pmatrix} \right) = \left( \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} \cdot \begin{pmatrix} x \\ y \\ z \end{pmatrix} \right).

In the SNARK setting, the cost is determined by the number of constraints in the R1CS. For example, xαx^\alpha for the constant α\alpha requires log2(α)\lceil \log_2(\alpha) \rceil constraints. This implies that x5x^5 and x7x^7 have the same cost in the SNARK setting.

zk-STARK

zk-STARK is an acronym that stands for zero-knowledge scalable transparent argument of knowledge.

  1. Zero-knowledge: Verifier learns nothing other than that the statement is true.
  2. Scalable: Prover time is quasilinear; proof size and verifier time is sublinear.
  3. Transparent: No trusted setup is needed.
  4. Argument of knowledge: Not only for existence of input xx but also for prover's knowledge of xx.

Any protocol that satisfies the above conditions is called a zk-STARK. However, zk-STARK often refers to the protocol in the paper where the concept was first proposed. We also refer to the protocol in that paper as zk-STARK in this article.

To evaluate the circuits in a STARK setting, the circuit should be transformed into algebraic intermediate representation (AIR).

AIR Examples

We give an example of computing a Fibonacci sequence. The Fibonacci sequence is defined as a0=1,a1=1,ai=a_i1+a_i2 for i2.\begin{aligned} a_0 &= 1, \\ a_1 &= 1, \\ a_i &= a\_{i-1} + a\_{i-2} \text{ for } i \geq 2. \end{aligned}

The prover wants to prove that a_7=21a\_{7} = 21. Then an algebraic execution trace (AET), an execution trace of a computation of the Fibonacci sequence, is

aa
1
1
2
3
5
8
13
21

And the AIR constraints described in the polynomial forms are

  1. a01=0,a_0 -1 = 0,
  2. a11=0,a_1 -1 = 0,
  3. a_ia_i1+a_i2=0 for 2i7,a\_{i} - a\_{i-1} + a\_{i-2} = 0 \text{ for } 2 \leq i \leq 7,
  4. a7=21.a_7 = 21.

Let tt be a number of rows and ww be a number of columns in AET. Then the size of the AET is twt \cdot w. Moreover, let dd be the maximal degree of an AIR constraint. In our case, t=8,w=1,d=1t = 8, w = 1, d = 1. The efficiency of a zk-STARK is determined by t,w,dt, w, d. For each circuit, the efficiency is compared by measuring twdt \cdot w \cdot d.

ZK-Friendly Hash Functions

Roughly speaking, the simpler the algebraic representation of a given circuit, the more efficient it is for both R1CS and AIR representations. Therefore, cryptographers began to design structures that were both algebraically simple and still secure. We often called these ciphers as arithmetization-oriented ciphers (AOCs).

For AOCs, most traditional symmetric cryptanalysis such as differential and linear cryptanalysis are generally less relevant. The most powerful attack type for AOCs is algebraic attack, such as Gröbner basis, linearization, GCD, and interpolation attack. Once the designer decides their cipher's design, the round number is chosen to be safe from those attacks.

On the other hand, this field is relatively new and interesting attacks (see examples one, two, three, and four) have been claimed, often resulting in ciphers being broken or parameters being modified, but we'll talk about those in the next post.

Let's take a look at the design rationale behind the different AOCs.

MiMC

MiMC is the first ZK-friendly hash function, introduced in ASIACRYPT 2016. Although more efficient ZK-friendly hash functions are suggested after MiMC, MiMC is still widely used in many applications since it is considered mature compared to the other ZK-friendly hash functions.

The design of MiMC is extremely simple: a function F(x):=x3F(x) := x^3 is iterated with subkey additions. This concept was already proposed by Nyberg and Knudsen in the 1990s, called KN-Cipher.

MiMC-n/nn/n

We first take a look at a block cipher. The block cipher is constructed by iterating rr rounds, where each round function is described as Fi(x)=(x+k+ci)3F_i(x) = (x + k + c_i)^3. In the round function, cic_i is a round constant and kk is a key, and the field is Fq\mathbb F_q where qq is a large prime or q=2nq = 2^n. The encryption process is defined as $$ E_k(x) = (F_{r-1} \circ F_{r-2} \circ \dots \circ F_0)(x) + k. $$

and c0c_0 is fixed to 00 because c0c_0 does not affect security. (If you're interested, I recommend checking it out for yourself!)

For a field Fq\mathbb F_q, F(x)=x3F(x) = x^3 has an inverse if and only if gcd(3,q1)=1\gcd(3, q-1) = 1. Therefore, the author suggested choosing odd nn for q=2nq = 2^n and q=3k+2q = 3k+2 for prime qq.

The round constants cic_i are not specified in the paper, but cic_i is recommended to be chosen randomly at first and hardcoded into the implementation. To show nothing up my sleeve on the round constants, it is often generated from the SHA-2 (or any other secure hash functions) digest of certain messages like MiMC0, MiMC1, \dots.

The round number is determined by r=log3qr = \lceil log_3 q \rceil. For example, the round number r=log3(2127)=81r = \lceil log_3 (2^{127}) \rceil = 81 for a field F_2127\mathbb F\_{2^{127}}. We discuss the reason of this later.

$r$ rounds of MiMC-$n/n$, images from MiMC paper. rr rounds of MiMC-n/nn/n. Images from MiMC paper.

MiMC-2n/n2n/n

It is also possible to construct a block cipher with the same nonlinear permutation in a Feistel network. The round function of MiMC-2n/n2n/n is described in the below figure and can be defined as $$ x_L || x_R \leftarrow x_R + (x_L + k + c_i)^3 || x_L. $$

MiMC-2n/n2n/n round function. Images from here.

For each round, only half of the data is changed; therefore, the number of rounds for the Feistel version is 2log3q2 \cdot \lceil log_3 q \rceil, which is doubled from the number of rounds of MiMC-n/nn/n.

The Hash Function

When the key is fixed at 00, then the block cipher becomes a permutation. Given a permutation, there is a well-known construction deriving hash functions from a permutation called the sponge framework, which is proven secure and used in many other hash functions, including SHA-3.

The sponge construction. PiP_i is input; ZiZ_i is hashed output. The unused capacity cc should be twice the desired collision or preimage attacks. Images from Wikipedia.

The sponge construction consists of two phases called absorb and squeeze. In the absorbing phase, the input data is absorbed into the sponge. Afterwards, the result is squeezed out.

The hash function from MiMC is also based on the sponge framework. In the absorbing phase, message blocks are added to a subset of the state, then transformed as a whole using a permutation function ff (in our case, an MiMC permutation). Note that all the other ZK-friendly hash functions introduced in this post are also based on the sponge framework.

Security Analysis

While interpolation attack, GCD attack, invariant subfields attack, differential cryptanalysis, and linear cryptanalysis are considered in the paper, we will only introduce interpolation attack in here, which provides the lower bound of the round numbers. An interpolation attack constructs a polynomial corresponding to the encryption function without knowledge of the secret key. If an adversary can construct such a polynomial, then for any given plaintext, the corresponding ciphertext can be produced without knowing the secret key.

Let EkE_k be an encryption function with degree dd in terms of input xx. Then with the d+1d+1 plaintext-ciphertext pairs, Ek(x)E_k(x) can be constructed using Lagrange's theorem, and the complexity of constructing a Lagrangian interpolation polynomial is O(dlogd)O(d \log d). In our case, degree d=3rd = 3^r. By choosing r=log3qr = \lceil log_3 q \rceil, the degree dd reaches q1q - 1 and the attack becomes infeasible.

SNARK Complexity

Since the single constraint can square the value, there are two R1CS constraints in each round of MiMC permutation:

x2 = x*x
x3 = x2*x

To generate a 256-bits output hash for the 256-bits input xx, the sponge construction is used. In details, calculate MiMC-2n/n2n/n of (x,0)(x, 0), then left 256-bits are hash value. The round number is chosen as 2log3(2256)=3242 \cdot \lceil log_3 (2^{256}) \rceil = 324. Then the number of constraints is 324×2=628324 \times 2 = 628. Note that field should be chosen as gcd(3,q1)=1\gcd(3, q-1) = 1, which means that F_2256\mathbb F\_{2^{256}} should not be chosen.

STARK Complexity

In AIR representations, for MiMC-2n/n2n/n, each row has two variables, xLx_L and xRx_R. We denote xL,xRx_L, x_R in ii-th row as x(i)_L,x(i)_Rx^{(i)}\_L, x^{(i)}\_R. Then from the round function xLxRxR+(xL+k+ci)3xLx_L \|\| x_R \leftarrow x_R + (x_L + k + c_i)^3 \|\| x_L, the constraint polynomial is defined as

  1. x(i+1)_L=x(i)_R+(x(i)_L+k+ci)3,x^{(i+1)}\_L = x^{(i)}\_R + (x^{(i)}\_L + k + c_i)^3,
  2. x(i+1)_R=x(i)_L.x^{(i+1)}\_R = x^{(i)}\_L.

Then the number of rows t=324t = 324, the number of columns w=2w = 2, and the maximal degree d=3d = 3. The efficiency metric is twd=1944.t \cdot w \cdot d = 1944.

Implementations

Although the authors suggested x3x^3 as S-box, this S-box is often replaced to other S-boxes such as x5,x7x^5, x^7 if gcd(3,q1)=1\gcd(3, q-1) = 1. For example, the BN254 field is FqF_q for 254-bits prime qq, which is gcd(3,q1)1\gcd(3, q-1) \neq 1. As a result, when using MiMC on the BN254, it is preferred to use x5x^5 or x7x^7 instead of x3x^3 (example 1, example 2). In this case, the round number is determined as log5q\log_5 q or log7q\log_7 q, which mitigates all the attacks presented so far.

For the round constants, the authors do not specify how to generate them in the paper. Therefore, each implementation will determine the round constants by computing the hash value of strings like MiMC0 and MiMC1 using SHA-2 or SHA-3. If someone creates their own MiMC implementation without disclosing how the round constants were generated, there might be a potential vulnerability, such as being susceptible to an invariant subfields attack.

Poseidon

Poseidon is a yet another ZK-friendly hash function suggested in USENIX 2021. Poseidon is based on the HADES design strategy, which is a generalization of a substitution-permutation network (SPN), suggested in EUROCRYPT 2020.

HADES Design Strategy

SPN is a famous block cipher algorithm. For example, AES uses the SPN structure. By applying enough alternating rounds of substitution boxes and permutation boxes, the attacker is unable to recover a key from plaintext-ciphertext pairs.

HADES is composed of three steps: Add Round Key, SubWords, and MixLayer.

This seems to be the same approach with SPN, but the main difference between the existing SPN and HADES is that some S-box layers in HADES are partial S-box layers, which means that a single S-box is applied for the rightmost element and identity is applied for the others. The author argued that this would reduce the R1CS or AET cost.

Poseidon Hash Function

The Poseidon hash function follows the same construction as HADES, with replacing Add Round Key to Add Round Constants. Poseidon only considers the prime field Fp\mathbb F_p. Let mm be the number of words in each layer (to avoid confusion with the number of rows tt in STARK, we use mm instead of tt unlike the notation in the paper). The graphical overview is shown below.

Construction of Poseidon. RF=2RfR_F = 2 \cdot R_f is the number of the full S-box rounds, and RPR_P is the number of the partial S-box rounds. Images from Poseidon paper.

The S-box is defined by S-box(x)=xα\text{S-box}(x) = x^\alpha, where α3\alpha \geq 3 is the smallest positive integer that satisfies gcd(α,p1)=1\gcd(\alpha, p-1) = 1.

The purpose of the linear layer is to spread local changes onto the entire state, so usually the linear layer is chosen as an MDS matrix, and its effects are negligible on the cost. In Poseidon, the linear layer is a Cauchy matrix, which is defined by M[i,j]=1xi+yjM[i,j] = \frac{1}{x_i + y_j} for the entries of xi_1im\\x_i\\\_{1 \leq i \leq m} and yi_1im\\y_i\\\_{1 \leq i \leq m} are pairwise distinct and xi+yj0x_i + y_j \neq 0, where i1,,mi \in {1, \dots, m} and j1,,mj \in {1, \dots, m}. This was an original suggestion in Poseidon; however, some Cauchy matrixes are insecure if there is an invariant subspace trail. We omit the details of the attack. The interested reader may refer to the paper. Therefore, the linear layer should be carefully chosen to avoid this type of attack.

The structure is highly parameterizable, and the suggested parameters for a 256-bits output hash are m=3,RF=8,RP=57m = 3, R_F = 8, R_P = 57 for an α=5\alpha = 5 over Fp\mathbb F_p with \approx 256-bit pp. RF=8R_F = 8 is chosen to prevent statistical attacks, and RP=57R_P = 57 is chosen to prevent algebraic attacks.

SNARK Complexity

For α=5\alpha = 5, S-box xαx^\alpha is represented by three constraints:

x2 = x*x
x4 = x2*x2
x5 = x4*x

And the total S-box number is mRF+RPm \cdot R_F + R_P. Therefore, the total number of constraints for α=5\alpha = 5 is 3(mRF+RP)3 \cdot (m \cdot R_F + R_P). For m=3,RF=8,RP=57m = 3, R_F = 8, R_P = 57, it requires 276276 R1CS constraints.

STARK Complexity

If we maintain mm variables for each round, then the number of rows is t=RF+RPt = R_F + R_P, the number of columns is w=kw = k, and the maximal degree is d=αd = \alpha. Then the efficency metric is twd=(RF+RP)mαt \cdot w \cdot d = (R_F + R_P) \cdot m \cdot \alpha. However, this AET representation does not take advantage of partial S-box rounds. Instead of this, by treating every S-box output as a variable, the number of rows is t=1t = 1, the number of columns is w=mRF+RPw = m \cdot R_F + R_P, and the maximal degree dd is still α\alpha. For m=3,RF=8,RP=57m = 3, R_F = 8, R_P = 57, the efficiency metric is twd=1815=425t \cdot w \cdot d = 1 \cdot 81 \cdot 5 = 425.

Vision / Rescue

Vision / Rescue are ZK-friendly hash functions following the Marvellous design strategy. They are suggested in FSE 2020.

Marvellous Design Strategy

Marvellous design is a substitution-permutation network parameterized by the tuple (q,m,π,M,v,s)(q, m, \pi, M, v, s):

  • qq : The field is Fq\mathbb F_q, with qq either a power of 22 or a prime number,
  • mm : The state is mm elements in Fq\mathbb F_q,
  • π=(π0,π1)\pi = (\pi_0, \pi_1) : The S-boxes,
  • MM : An MDS matrix,
  • vv : The first step constant, and
  • ss : The desired security level.

In each round of a Marvellous design, a pair of S-boxes (π0,π1)(\pi_0, \pi_1) is used alternately. The difference between π0\pi_0 and π1\pi_1 is in their degree. For one, π0\pi_0 should be chosen with a high degree when evaluated forward and a lower degree when evaluated backward. The other S-box, namely π1\pi_1, is chosen with the opposite goal, meaning that it has a low degree in the forward direction and a high degree in the backward direction. This choice has three benefits:

  1. No matter which direction an adversary is trying to attack, the degree is guaranteed to be high.
  2. It results in the same cost for the encryption and decryption functions.
  3. Owing to nonprocedural computation, the low-degree representation of each S-box can be evaluated efficiently.

For Vision / Rescue, π0\pi_0 is obtained directly from π1\pi_1: π0=π11\pi_0 = \pi_1^{-1}.

The linear layer is chosen by the Vandermonde matrix. As described in Poseidon, there is no security claim related to the Vandermonde matrix, and any other MDS matrix should be fine.

The one interesting thing that differentiates the Marvellous from the others is that it uses a heavy key schedule. In MiMC and HADES, there is no special key schedule and the master key is added for each round. However, in Marvellous, the key schedule reuses the round function. The author claimed that this is a kind of security margin as the domain of AOCs is relatively new. Note that a heavy schedule does not affect the efficiency of hash functions from fixed-key permutations because every subkey is derived in the offline phase.

Vision

Vision is meant to operate on binary fields with its native field F_2n\mathbb F\_{2^n}. To construct the S-boxes, we select degree 44, which we denote by BB. More precisely, there are no x3x^3 terms in B(x)B(x): B(x)=b4x4+b2x2+b1x1+b0B(x) = b_4 \cdot x^4 + b_2 \cdot x^2 + b_1 \cdot x^1 + b_0. Then BB is linearized on F2\mathbb F_2 and it gives benefits for other computation settings such as MPC. However, it is irrelevant with the SNARK/STARK setting, so we don't discuss it any further.

After BB is chosen, π0\pi_0 and π1\pi_1 is π0(x)=B1(x1)\pi_0(x) = B^{-1}(x^{-1}) and π1(x)=B(x1)\pi_1(x) = B(x^{-1}).

Finally, the number of rounds is doubled for the longest reaching attack, which means a 100% safety margin is added, and the round number varies from 10 to 16.

A single round (two steps) of Vision. Images from Marvellous paper.

Rescue

The second family of algorithms in the Marvellous universe is Rescue. To build the S-box, the smallest prime α\alpha such that gcd(α,q1)=1\gcd(\alpha, q-1) = 1 should be found. Then the S-boxes are set to be π0(x)=x1/α\pi_0(x) = x^{1/\alpha} and π1(x)=xα\pi_1(x) = x^\alpha. The number of rounds is doubled for the longest reaching attack, as in Vision and the round number it varies from 1010 to 2222.

A single round (two steps) of Rescue. Images from Marvellous paper.

SNARK Complexity

For Vision, we should consider both inverse mappings x1x^{-1} and B(x)B(x) --- B1(x)B^{-1}(x) is analogous with B(x)B(x). First, for x1x^{-1}, let y=x1y = x^{-1}. Then if x=0x = 0, then y=0y = 0, and if x0x \neq 0, then xy=1xy = 1. Putting this together, we get the necessary and sufficient condition that x(1+xy)=0x(1+xy) = 0 and y(1+xy)=0y(1+xy) = 0. It is represented by three constraints:

z = x*y
eq1 = x*(1+z)
eq2 = y*(1+z)

Second, for BB, it is represented by two constraints because its degree is 44:

x2 = x*x
x4 = x2*x2

Note that only x4x^4 and x2x^2 are evaluated, and B(x)=b4x4+b2x2+b1x1+b0B(x) = b_4 \cdot x^4 + b_2 \cdot x^2 + b_1 \cdot x^1 + b_0 is not yet derived. However, this does not matter because the structure of R1CS allows us to multiply and add variables by a constant later.

A single round consists of two steps, and 3m+2m=5m3m + 2m = 5m constraints are added for each step. Therefore, the number of total constraints is 10mr10 \cdot m \cdot r for the round number rr. From the recommended parameters q2256,m=3,r=12q \approx 2^{256}, m = 3, r = 12, the number of constraints is 10×3×12=36010 \times 3 \times 12 = 360.

For Rescue, only B(x)=xαB(x) = x^\alpha and B(x)=x1/αB(x) = x^{1/\alpha} are nonlinear operations, and for α=3\alpha = 3, each B(x)B(x) and B1(x)B^{-1}(x) requires two constraints, as we saw in MiMC. Therefore, 4m4m constraints are added for each round, and the total number of constraints is 4mr4 \cdot m \cdot r for the round number rr. From the recommended parameters q2256,m=3,r=22q \approx 2^{256}, m = 3, r = 22, the number of the constraints is 4×3×22=2644 \times 3 \times 22 = 264.

STARK Complexity

For Vision, the author presents an AIR with w=2m,t=2,d=2w = 2m, t = 2, d = 2 for each step. The first mm elements correspond to the original state (say S[i]S[i]) and the last mm elements are stored auxiliary values (say R[i]R[i]). For the first cycle, R[i]=1R[i] = 1 if S[i]=0S[i] = 0, and R[i]=0R[i] = 0 otherwise. We denote the value corresponding to S[i]S[i] for the next row as S[i]S^\prime[i]. Our goal is to make S[i]=S[i]1S^\prime[i] = S[i]^{-1}. Then by setting the below constraints, the values of R[i]R[i] and S[i]S^\prime[i] are verified to be correct.

  1. S[i]S[i]R[i]=0,S[i] \cdot S^\prime[i] - R[i] = 0,
  2. S[i](1R[i])=0,S[i] \cdot (1 - R[i]) = 0,
  3. S[i](1R[i])=0.S^\prime[i] \cdot (1 - R[i]) = 0.

For the second cycle, R[i]=S[i]2R[i] = S[i]^2. Then S[i]=B(S[i])S^\prime[i] = B(S[i]) and R[i]R[i] are verified by the below constraints:

  1. R[i]S[i]2=0,R[i] - S[i]^2 = 0,
  2. S[i]b4R[i]2b2R[i]b1S[i]b0=0.S^\prime[i] - b_4 \cdot R[i]^2 - b_2 \cdot R[i] - b_1 \cdot S[i] - b_0 = 0.

When the round number is rr, w=2m,t=4r,d=2w = 2m, t = 4r, d = 2. From the recommended parameters q2256,m=3,r=12q \approx 2^{256}, m = 3, r = 12, the efficiency metric is twd=4862=576t \cdot w \cdot d = 48 \cdot 6 \cdot 2 = 576.

For Rescue, no auxiliary values are required. Moreover, it is possible to combine B(x)B(x) and B1(x)B^{-1}(x) in one equation. Therefore, for rr rounds, w=m,t=r,d=3w = m, t = r, d = 3. From the recommended parameters q2256,m=3,r=22q \approx 2^{256}, m = 3, r = 22, the efficiency metric is twd=2333=207t \cdot w \cdot d = 23 \cdot 3 \cdot 3 = 207.

Conclusion

In this article, we've discussed the design rationale and performance of various AOCs. There are more AOCs that we haven't mentioned, such as GMiMC, Jarvis, and Friday, but their design rationale follows similar principles to the ones we've covered.

The overall SNARK and STARK performance is shown below.

AOCR1CS(SNARK)AET(STARK)
MiMC6281944
Poseidon276425
Vision360576
Rescue264207

As you can see on the table, Rescue has competitive results in both SNARK and STARK. In fact, STARKWARE also recommended Rescue from the STARK-friendly hash in their own survey.

In recent years, the hash functions 2-3x more efficient than Rescue in SNARK settings are proposed: Anemoi and Griffin. They are not yet fully peer-reviewed and need to be validated by academics and industry, but their overwhelming performance means that they can be considered for use once they are matured.

Meanwhile, the algebraic simplicity of AOCs often allows attackers to attack ciphers. Therefore, it is recommended to use ciphers that have been published and proven to be reliable over time rather than using them based on their efficiency alone. In the next article, we'll cover algebraic attacks on AOCs.

BaarkingDog

A breakdown of the mathematical principles behind Tornado Cash
Article heading

Introduction

Tornado Cash is an open-source decentralized coin mixer on the Ethereum network, developed by Roman Semenov, Alexey Pertsev, and Roman Storm. It was launched in December 2019, and as of March 2023, 3.66M ETH are deposited with 112K ETH in the pool (Tornado Cash Analysis). The feature provided by Tornado Cash is a kind of double-edged sword: it provides strong anonymity so Ethereum co-founder Vitalik Buterin could send anonymous funds to aid Ukraine, but it allows coins earned from criminal activity to be laundered. For example, both North Korean government hacking group Lazarus and the Euler finance hacker mixed their coins using Tornado Cash.

In this post, we will talk about the mathematical principles behind Tornado Cash. Let’s deep dive into Tornado Cash!

danger

As of August 2022, the U.S. Department of the Treasury’s Office of Foreign Assets has announced that any transactions within the United States or U.S. persons that involve Tornado Cash are prohibited. This post is for educational purposes only. We are not responsible for any misuse of the information.

Background

Hash Function

A hash function is a mathematical function that takes in input messages of arbitrary size and returns a fixed-size string of characters called a hash value. For example, f(x)=x%97f(x) = x \% 97 (remainder of dividing xx by 9797) is a hash function for integer input. The purpose of a hash function is to generate a unique digital fingerprint of the input data, which can be used for various purposes, such as data storage, data retrieval, data integrity checking, digital signatures, and password verification.

We may define a special type of hash function named cryptographic hash function. A hash function ff is called a cryptographic hash function if it satisfies the below three secure properties:

  1. Pre-image resistance: Given hash value hh, it is hard to find any input message mm that f(m)=hf(m) = h.
  2. Second-preimage resistance: Given input message mm, it is hard to find message m(m)m'( \neq m) that f(m)=f(m)f(m') = f(m).
  3. Collision resistance: It is hard to find any two different input messages m1,m2(m1)m_1, m_2(\neq m_1) that f(m1)=f(m2)f(m_1) = f(m_2).

There are many different hash functions available, each hashes with its own characteristics, strengths, and weaknesses. Common examples of hash functions include MD5, SHA-1, SHA-2, and SHA-3. (Note: MD5 and SHA-1 are now deprecated and generally should not be used for cryptographic purposes.)

Merkle Tree

A Merkle tree is a tree data structure used in cryptography and computer science to efficiently verify the integrity and authenticity of large sets of data. The tree structure is based on cryptographic hash functions. To construct a Merkle tree, the elements to be stored are first hashed and the resulting hash values are arranged as the leaf nodes of the tree. Then, pairs of adjacent leaf nodes are hashed together to create a new set of hash values, which are used as the parents of the previous leaf nodes. This process is repeated recursively until a single hash value, known as the root hash, is generated. Here is an example of storing elements “Alice”, “Bob”, “Charles”, “Daniel”, “Emma”, “Fiona”, “Grace”, and “Henry” in a Merkle tree.

Example of storing elements “Alice”, “Bob”, “Charles”, “Daniel”, “Emma”, “Fiona”, “Grace”, and “Henry” in a Merkle tree; $f$ is a cryptographic hash function.

Example of storing elements “Alice”, “Bob”, “Charles”, “Daniel”, “Emma”, “Fiona”, “Grace”, and “Henry” in a Merkle tree; ff is a cryptographic hash function.

After storing elements in this Merkle tree, the root h_15h\_{15} is opened. To prove “Alice” is in the Merkle tree, it is enough to provide h2,h_10,h_14h_2, h\_{10}, h\_{14}. Then verifier can recompute h_15h\_{15} by h_15=f(f(f(f("Alice"),h2),h_10),h_14)h\_{15} = f(f(f(f(\text{"Alice"}),h_2),h\_{10}),h\_{14}). This additional value required to recompute Merkle root is called Merkle proof (also known as Merkle path).

Example of proving the existence of element “Alice” in a Merkle tree. Green-colored boxes denote the Merkle proof.

Example of proving the existence of element “Alice” in a Merkle tree. Green-colored boxes denote the Merkle proof.

Due to the preimage resistance of ff, Merkle proof does not reveal the other stored elements and the total number of elements in Merkle proof is log scale of the total number of elements stored in Merkle tree.

Commitment Scheme

A commitment scheme is a cryptographic protocol that allows a party to commit to a message or a value, without revealing the value itself, and later reveal it in a verifiable manner. This is useful in scenarios where a party needs to prove that they made a commitment to a particular message at an earlier time without revealing the message until a later time. Check out this scenario:

Alice has the foreknowledge of tomorrow's Nasdaq closing price pp. Alice wants to convince her foreknowledge without revealing the value itself. Then Alice can commit price pp by choosing sufficiently long random string secret\text{secret} and going public with com=f(secretp)\text{com} = f(\text{secret} \|\|p) where ff is a cryptographic hash function and \|\| is a concatenation operation. Due to preimage resistance property, it is impossible to recover pp from com\text{com}. After tomorrow, pp is known to everyone. Then Alice opens secret\text{secret}. Now Alice’s foreknowledge is verified by checking com=?f(secretp)\text{com} \stackrel{?}{=} f(\text{secret}\|\|p).

info

Question: Is secret\text{secret} essential? What if defining com=f(p)\text{com} = f(p) instead of com=f(secretp)\text{com} = f(\text{secret} \|\|p)?

Answer: pp can be revealed by checking f(1),f(2),f(1), f(2), \dots until hash value is the same as com\text{com}. The secret acts as a salt that prevents brute-forcing the hash to reveal the secret.

(Non-interactive) Zero-Knowledge Proof

A zero-knowledge proof is a cryptographic protocol that allows a prover to demonstrate to the verifier that a given statement is true without revealing any additional information beyond the truth of the statement itself. In other words, it allows a prover to convince a verifier that they know a secret value without revealing that value or any other information about it. Zero-knowledge proof must satisfy the below three properties:

  1. Completeness: If the statement is true, then an honest prover should be able to convince the verifier of its truth.
  2. Soundness: If the statement is false, then no cheating prover should be able to convince the verifier of its truth.
  3. Zero-knowledgeness: The proof should not reveal any additional information beyond the truth of the statement.

Additional Note Completeness and soundness are clear but zero-knowledgeness can be confusing for a beginner. Can we formally define “any additional information is not revealed to verifier”? The answer is yes. To prove zero-knowledgeness of a given protocol, we show every message communicated between prover and verifier can simulated by someone who does not know the statement. For me, zero-knowledge proof for a graph coloring problem is really helpful to understand the concept of zero-knowledge proof. It is recommended to check out this article.

Zero-knowledge proofs are classified into two types, interactive or non-interactive, depending on the communications between the prover and verifier. If the prover and verifier interact by exchanging messages, similar to an oral test, the protocol is called interactive. On the other hand, if the prover creates a proof that cann be verified without requiring any further interaction between the prover and verifier, the protocol is called non-interactive.

There are several examples of zero-knowledge proof protocols such as zk-SNARK, zk-STARK, Bulletproofs, and Ligero. While we will only cover details of the zk-SNARK, which is used in Tornado Cash later, the most important thing to note is that for any function ff and given output yy, it is possible to prove the knowledge of an input xx satisfies f(x)=yf(x)=y without revealing xx. Moreover, any interactive zero-knowledge proof can be transformed into non-interactive zero-knowledge proof. Finally, for any function ff and given output yy, a prover can prove their knowledge of an input xx satisfies f(x)=yf(x)=y without revealing xx by sending only proof π\pi to a prover with no interaction.

Protocol Overview

Tornado Cash aims to provide users with complete anonymity while using Ethereum by allowing them to make private transactions. So the main features of Tornado Cash are simply deposits and withdrawals. This can be done in a single transaction with a predetermined amount of Ether. The amount is not specified in the white paper, but it is either 0.1 or 1 or 10 or 100 ETH in the current implementations. The reason for fixing the amount is to prevent tracking the transaction based on its value. For example, there are 10 deposits from addrin_1,addrin_2,,addrin_10\text{addr\\in}\_1, \text{addr\\in}\_2, \dots, \text{addr\\in}\_{10} and 10 withdraws from addrout_1,addrout_2,,addrout_10\text{addr\\out}\_1, \text{addr\\out}\_2, \dots, \text{addr\\out}\_{10}. If the amount is different for each deposit, linking deposit address and withdrawal address is trivial.

Setup

First, a Pedersen hash function and MiMC hash function are used in Tornado Cash. We denote each function as H1H_1 and H2H_2, respectively. The contract has a Merkle tree of height 2020, where each non-leaf node is the hash of its two children with H2H_2. Since the height is 2020, there are 2202^{20} leaves. The leaves are initialized as 00 and will be replaced by some value from left to right when a deposit is made. Since each leaf corresponds to a single deposit, the contract supports for only 2202^{20} deposits. After 2202^{20} deposits, no more deposits are allowed.

Additional Note When the contract maintains a Merkle tree, it does not have to store all the leaves in a tree. It is enough to store only 2020 elements for each newly-used leaf, which correspond to the nodes in the route from the recently inserted leaf to the root. For further details, please check the code. The filledSubtrees is the array representing a Merkle tree as above.

Although not necessary, the contract stores the most recent n=100n = 100 (as mentioned in the white paper) or 3030 (in the implementation) root values for user convenience, not just the most recent one. Finally, the contract also has a list of nullifier hashes to prevent double spending, which we will introduce later.

Deposit

The process for a deposit is as follows:

  1. A user generates two random numbers kk(nullifier), rr(secret) and computes commitment C=H1(kr)C = H_1(k\|\|r). This two numbers must be memorized. This is a commitment scheme that we introduced earlier in this blog post.
  2. Send ETH transaction to contract with data CC. If the tree is not full, the contract accepts the transaction and replaces the leftmost zero leaf to CC. This changes the 2020 elements in the tree, including the root.

Note that k,rk, r are hidden but commitment CC is opened to everyone, and the order is also preserved. For example, from the beginning, if addresses 0x111,0x222,\text{0x111}, \text{0x222}, and 0x333\text{0x333} make deposits, then it is publicly known that the first leaf belongs to 0x111\text{0x111}, the second leaf belongs to 0x222\text{0x222}, and the third leaf belongs to 0x333\text{0x333}. However, this means almost nothing because, during withdrawal, the leaf index is not revealed.

Withdrawal

The process for withdrawing the ll-th leaf C=H1(kr)C = H_1(k\|\|r) is as follows:

  1. Compute the Merkle proof OO, which proves an existence of CC in Merkle tree with root RR. Root RR must be one of the recent nn root values.
  2. Compute the nullifier hash h=H1(k)h = H_1(k).
  3. Compute the proof π\pi, which proves the knowledge of l,k,rl, k, r such that h=H1(k)h = H_1(k) and OO is a valid Merkle proof for ll-th leaf C=H1(kr)C=H_1(k\|\|r).
  4. Send ETH transaction to contract with R,h,πR, h, \pi. The contract accepts the request if proof π\pi is valid and hh is not in the list of nullifier hashes.
  5. The contract adds hh to the list of nullifier hashes.

To understand this withdrawal procedure, we recall non-interactive zero-knowledge proof (NIZK). In NIZK, a prover can prove their knowledge of an input xx satisfies f(x)=yf(x)=y<