Skip to main content
BaarkingDog

How Does Tornado Cash Work?

A breakdown of the mathematical principles behind Tornado Cash
Article heading

Introduction

Tornado Cash is an open-source decentralized coin mixer on the Ethereum network, developed by Roman Semenov, Alexey Pertsev, and Roman Storm. It was launched in December 2019, and as of March 2024, 4.00M ETH are deposited with 125K ETH in the pool (Tornado Cash Analysis). The feature provided by Tornado Cash is a kind of double-edged sword: it provides strong anonymity so Ethereum co-founder Vitalik Buterin could send anonymous funds to aid Ukraine, but it allows coins earned from criminal activity to be laundered. For example, both North Korean government hacking group Lazarus and the Euler finance hacker mixed their coins using Tornado Cash.

In this post, we will talk about the mathematical principles behind Tornado Cash. Let’s deep dive into Tornado Cash!

warning

As of August 2022, the U.S. Department of the Treasury’s Office of Foreign Assets has announced that any transactions within the United States or U.S. persons that involve Tornado Cash are prohibited. This post is for educational purposes only. We are not responsible for any misuse of the information.

Background

Hash Function

A hash function is a mathematical function that takes in input messages of arbitrary size and returns a fixed-size string of characters called a hash value. For example, f(x)=x%97f(x) = x \% 97 (remainder of dividing xx by 9797) is a hash function for integer input. The purpose of a hash function is to generate a unique digital fingerprint of the input data, which can be used for various purposes, such as data storage, data retrieval, data integrity checking, digital signatures, and password verification.

We may define a special type of hash function named cryptographic hash function. A hash function ff is called a cryptographic hash function if it satisfies the below three secure properties:

  1. Pre-image resistance: Given hash value hh, it is hard to find any input message mm that f(m)=hf(m) = h.
  2. Second-preimage resistance: Given input message mm, it is hard to find message m(m)m'( \neq m) that f(m)=f(m)f(m') = f(m).
  3. Collision resistance: It is hard to find any two different input messages m1,m2(m1)m_1, m_2(\neq m_1) that f(m1)=f(m2)f(m_1) = f(m_2).

There are many different hash functions available, each hashes with its own characteristics, strengths, and weaknesses. Common examples of hash functions include MD5, SHA-1, SHA-2, and SHA-3. (Note: MD5 and SHA-1 are now deprecated and generally should not be used for cryptographic purposes.)

Merkle Tree

A Merkle tree is a tree data structure used in cryptography and computer science to efficiently verify the integrity and authenticity of large sets of data. The tree structure is based on cryptographic hash functions. To construct a Merkle tree, the elements to be stored are first hashed and the resulting hash values are arranged as the leaf nodes of the tree. Then, pairs of adjacent leaf nodes are hashed together to create a new set of hash values, which are used as the parents of the previous leaf nodes. This process is repeated recursively until a single hash value, known as the root hash, is generated. Here is an example of storing elements “Alice”, “Bob”, “Charles”, “Daniel”, “Emma”, “Fiona”, “Grace”, and “Henry” in a Merkle tree.

Example of storing elements “Alice”, “Bob”, “Charles”, “Daniel”, “Emma”, “Fiona”, “Grace”, and “Henry” in a Merkle tree; f is a cryptographic hash function.

Example of storing elements “Alice”, “Bob”, “Charles”, “Daniel”, “Emma”, “Fiona”, “Grace”, and “Henry” in a Merkle tree; ff is a cryptographic hash function.

After storing elements in this Merkle tree, the root h_15h\_{15} is opened. To prove “Alice” is in the Merkle tree, it is enough to provide h2,h_10,h_14h_2, h\_{10}, h\_{14}. Then verifier can recompute h_15h\_{15} by h_15=f(f(f(f("Alice"),h2),h_10),h_14)h\_{15} = f(f(f(f(\text{"Alice"}),h_2),h\_{10}),h\_{14}). This additional value required to recompute Merkle root is called Merkle proof (also known as Merkle path).

Example of proving the existence of element “Alice” in a Merkle tree. Green-colored boxes denote the Merkle proof.

Example of proving the existence of element “Alice” in a Merkle tree. Green-colored boxes denote the Merkle proof.

Due to the preimage resistance of ff, Merkle proof does not reveal the other stored elements and the total number of elements in Merkle proof is log scale of the total number of elements stored in Merkle tree.

Commitment Scheme

A commitment scheme is a cryptographic protocol that allows a party to commit to a message or a value, without revealing the value itself, and later reveal it in a verifiable manner. This is useful in scenarios where a party needs to prove that they made a commitment to a particular message at an earlier time without revealing the message until a later time. Check out this scenario:

Alice has the foreknowledge of tomorrow’s Nasdaq closing price pp. Alice wants to convince her foreknowledge without revealing the value itself. Then Alice can commit price pp by choosing sufficiently long random string secret\text{secret} and going public with com=f(secretp)\text{com} = f(\text{secret} \|\|p) where ff is a cryptographic hash function and \|\| is a concatenation operation. Due to preimage resistance property, it is impossible to recover pp from com\text{com}. After tomorrow, pp is known to everyone. Then Alice opens secret\text{secret}. Now Alice’s foreknowledge is verified by checking com=?f(secretp)\text{com} \stackrel{?}{=} f(\text{secret}\|\|p).

info

Question: Is secret\text{secret} essential? What if defining com=f(p)\text{com} = f(p) instead of com=f(secretp)\text{com} = f(\text{secret} \|\|p)?

Answer: pp can be revealed by checking f(1),f(2),f(1), f(2), \dots until hash value is the same as com\text{com}. The secret acts as a salt that prevents brute-forcing the hash to reveal the secret.

(Non-interactive) Zero-Knowledge Proof

A zero-knowledge proof is a cryptographic protocol that allows a prover to demonstrate to the verifier that a given statement is true without revealing any additional information beyond the truth of the statement itself. In other words, it allows a prover to convince a verifier that they know a secret value without revealing that value or any other information about it. Zero-knowledge proof must satisfy the below three properties:

  1. Completeness: If the statement is true, then an honest prover should be able to convince the verifier of its truth.
  2. Soundness: If the statement is false, then no cheating prover should be able to convince the verifier of its truth.
  3. Zero-knowledgeness: The proof should not reveal any additional information beyond the truth of the statement.
Additional Note

Completeness and soundness are clear but zero-knowledgeness can be confusing for a beginner. Can we formally define “any additional information is not revealed to verifier”? The answer is yes. To prove zero-knowledgeness of a given protocol, we show every message communicated between prover and verifier can simulated by someone who does not know the statement. For me, zero-knowledge proof for a graph coloring problem is really helpful to understand the concept of zero-knowledge proof. It is recommended to check out this article.

Zero-knowledge proofs are classified into two types, interactive or non-interactive, depending on the communications between the prover and verifier. If the prover and verifier interact by exchanging messages, similar to an oral test, the protocol is called interactive. On the other hand, if the prover creates a proof that cann be verified without requiring any further interaction between the prover and verifier, the protocol is called non-interactive.

There are several examples of zero-knowledge proof protocols such as zk-SNARK, zk-STARK, Bulletproofs, and Ligero. While we will only cover details of the zk-SNARK, which is used in Tornado Cash later, the most important thing to note is that for any function ff and given output yy, it is possible to prove the knowledge of an input xx satisfies f(x)=yf(x)=y without revealing xx. Moreover, any interactive zero-knowledge proof can be transformed into non-interactive zero-knowledge proof. Finally, for any function ff and given output yy, a prover can prove their knowledge of an input xx satisfies f(x)=yf(x)=y without revealing xx by sending only proof π\pi to a prover with no interaction.

Protocol Overview

Tornado Cash aims to provide users with complete anonymity while using Ethereum by allowing them to make private transactions. So the main features of Tornado Cash are simply deposits and withdrawals. This can be done in a single transaction with a predetermined amount of Ether. The amount is not specified in the white paper, but it is either 0.1 or 1 or 10 or 100 ETH in the current implementations. The reason for fixing the amount is to prevent tracking the transaction based on its value. For example, there are 10 deposits from addrin_1,addrin_2,,addrin_10\text{addr\\in}\_1, \text{addr\\in}\_2, \dots, \text{addr\\in}\_{10} and 10 withdraws from addrout_1,addrout_2,,addrout_10\text{addr\\out}\_1, \text{addr\\out}\_2, \dots, \text{addr\\out}\_{10}. If the amount is different for each deposit, linking deposit address and withdrawal address is trivial.

Setup

First, a Pedersen hash function and MiMC hash function are used in Tornado Cash. We denote each function as H1H_1 and H2H_2, respectively. The contract has a Merkle tree of height 2020, where each non-leaf node is the hash of its two children with H2H_2. Since the height is 2020, there are 2202^{20} leaves. The leaves are initialized as 00 and will be replaced by some value from left to right when a deposit is made. Since each leaf corresponds to a single deposit, the contract supports for only 2202^{20} deposits. After 2202^{20} deposits, no more deposits are allowed.

Additional Note

When the contract maintains a Merkle tree, it does not have to store all the leaves in a tree. It is enough to store only 2020 elements for each newly-used leaf, which correspond to the nodes in the route from the recently inserted leaf to the root. For further details, please check the code. The filledSubtrees is the array representing a Merkle tree as above.

Although not necessary, the contract stores the most recent n=100n = 100 (as mentioned in the white paper) or 3030 (in the implementation) root values for user convenience, not just the most recent one. Finally, the contract also has a list of nullifier hashes to prevent double spending, which we will introduce later.

Deposit

The process for a deposit is as follows:

  1. A user generates two random numbers kk(nullifier), rr(secret) and computes commitment C=H1(kr)C = H_1(k\|\|r). This two numbers must be memorized. This is a commitment scheme that we introduced earlier in this blog post.
  2. Send ETH transaction to contract with data CC. If the tree is not full, the contract accepts the transaction and replaces the leftmost zero leaf to CC. This changes the 2020 elements in the tree, including the root.

Note that k,rk, r are hidden but commitment CC is opened to everyone, and the order is also preserved. For example, from the beginning, if addresses 0x111,0x222,\text{0x111}, \text{0x222}, and 0x333\text{0x333} make deposits, then it is publicly known that the first leaf belongs to 0x111\text{0x111}, the second leaf belongs to 0x222\text{0x222}, and the third leaf belongs to 0x333\text{0x333}. However, this means almost nothing because, during withdrawal, the leaf index is not revealed.

Withdrawal

The process for withdrawing the ll-th leaf C=H1(kr)C = H_1(k\|\|r) is as follows:

  1. Compute the Merkle proof OO, which proves an existence of CC in Merkle tree with root RR. Root RR must be one of the recent nn root values.
  2. Compute the nullifier hash h=H1(k)h = H_1(k).
  3. Compute the proof π\pi, which proves the knowledge of l,k,rl, k, r such that h=H1(k)h = H_1(k) and OO is a valid Merkle proof for ll-th leaf C=H1(kr)C=H_1(k\|\|r).
  4. Send ETH transaction to contract with R,h,πR, h, \pi. The contract accepts the request if proof π\pi is valid and hh is not in the list of nullifier hashes.
  5. The contract adds hh to the list of nullifier hashes.

To understand this withdrawal procedure, we recall non-interactive zero-knowledge proof (NIZK). In NIZK, a prover can prove their knowledge of an input xx satisfies f(x)=yf(x)=y without revealing xx by sending only proof π\pi to prover with no interaction. Once we define ff as

f(l,k,r,O)={1,if h=H1(k) and O is valid Merkle proof for l-th leaf0,elsef(l, k, r, O) = \begin{cases} 1, \text{if } h = H_1(k) \text{ and } O \text{ is valid Merkle proof for } l\text{-th leaf} \\ 0, \text{else} \end{cases}

and the target output y=1y = 1, then proof π\pi represents that prover is a legitimate withdrawer without revealing l,k,r,Ol, k,r,O, so the anonymity is guaranteed. We discuss proof more in the Zero-Knowledge Proof in Tornado Cash section.

The withdrawal procedure seems sound. But what about the fee? If a user wants to withdraw a coin to a fresh wallet, the wallet cannot pay a fee because it has no balance. It is also impossible to pay from the original wallet because it allows linking the fresh wallet and original wallet, which violates anonymity. To resolve this issue, users can optionally use a relay. The user chooses coin recipient address AA and fee ff and includes them into the proof π\pi. Then a user sends the proof to relay and relay transmits it into the contract. A relayer takes a fee as compensation, but they cannot change any withdrawal data, including recipient address.

Zero-Knowledge Proofs in Tornado Cash

Now we have a bird’s-eye view of Tornado Cash. We will take a more in-depth look at the zero-knowledge proofs used in Tornado Cash in this section.

zk-SNARKs

Recommendation

The mathematics behind zk-SNARKs, such as group theory, elliptic curve, pairing, and sigma protocol is quite challenging. While we will provide a brief introduction to zk-SNARK in this section, it may not be sufficient for those encountering the concept for the first time. Fortunately, there are numerous articles and slides available online that introduce zk-SNARKs and their basic concepts. We recommend reading these resources first to gain a better understanding if you are a newcomer.

zk-SNARK is an acronym that stands for Zero-Knowledge Succinct Non-interactive Argument of Knowledge.

  1. Zero-Knowledge: Verifier learns nothing other than the statement is true.
  2. Succinct: Proof size and verifier time is sublinear.
  3. Non-interactive: Need no interaction.
  4. Argument of Knowledge: Not only for existence of input xx but also prover’s knowledge of xx.

Not only zero-knowledge but succinct is also mysterious. For example, if a prover wants to prove their knowledge of input xx such that SHA256(x)=0256\text{SHA256}(x) = 0^{256} to a verifier, aside from zero-knowledgeness, the easiest way is just to send xx to verifier. Of course, the proof size is x\|x\| and verifier time is exactly the same as computing SHA256(x)\text{SHA256}(x), so it is not succinct. Now we face the question, Is it even possible to prove something more cheaply than directly calculating it? Before we start introducing details of zk-SNARKs, we will show a simple example that allows a verifier to accept a proof with less computation than by directly calculating it. Here is the example:

There are two nn by nn matrices A,B,A, B, and CC. The prover wants to prove that C=ABC = AB. If the verifier directly calculates ABAB and compares it with CC, the time complexity is O(n3)O(n^3) — or O(n2.373)O(n^{2.373}) by using a state-of-the-art algorithm. However, if the verifier selects random vector vv size of nn and checks whether A(Bv)=?CvA\cdot(Bv) \stackrel{?}{=} Cv, its time complexity is reduced to O(n2)O(n^2). It is possible such that ABCAB \neq C but A(Bv)=?CvA\cdot(Bv) \stackrel{?}{=} Cv so the verifier might accept wrong CC, but the probability of this occurring is dramatically reduced when the verifier repeats this verification for different values of vv.

Groth16

Let’s move on to the zero-knowledge proof system actually used in Tornado Cash. Tornado Cash uses Groth16, which is an improved version of Pinocchio. In Groth16, a prover transforms their statement into a QAP (quadratic arithmetic program) form. We will show a transformation step by step using a simple example:

def f(x, y):
return x**2 + 3 * y**2 + 10

assert f(x,y) == 17 # f(2, 1) = 17

Flattening

Prover wants to prove their knowledge of x,yx, y satisfy f(x,y)=17f(x,y) = 17. To do this, the prover first converts the given statement into a sequence of special forms where x = y and x = y (op) z , where y, z can be variables or numbers, and op can be + or *. This is similar to the concept of three address code from compilers. The transformation of the equation f(x,y)f(x, y) is as follows:

x2 = x * x
y2 = y * y
t1 = 3 * y2
t2 = x2 + t1
out = t2 + 10

R1CS

After flattening, these conditions are transformed into a R1CS (Rank-1 Constraint System). R1CS is a system of equations, where each equation is defined by a triplet of vectors (a_i,b_i,c_i)(\vec{a}\_i,\vec{b}\_i,\vec{c}\_i) such that (a_is)×(b_is)=(c_is)(\vec{a}\_i \cdot \vec{s}) \times (\vec{b}\_i \cdot \vec{s}) = (\vec{c}\_i \cdot \vec{s}) . In our case, we will build a system such that the solution vector s=(1,x,y,x2,y2,t1,t2,out)\vec{s} = (1, x, y, x2, y2, t1, t2, out). For example, flattened equation x2 = x * x is transformed into

a_1=(0,1,0,0,0,0,0,0)b_1=(0,1,0,0,0,0,0,0)c_1=(0,0,0,1,0,0,0,0),\vec{a}\_1 = (0,1,0,0,0,0,0,0) \\ \vec{b}\_1 = (0,1,0,0,0,0,0,0) \\ \vec{c}\_1 = (0,0,0,1,0,0,0,0),

As you can see, each of the elements in vectors (a_i,b_i,c_i)(\vec{a}\_i,\vec{b}\_i,\vec{c}\_i) represents a coefficient for a variable (or constant) used in the system of equations. And the solution vector s\vec{s} represents all of the variables used in the system.

Continuing with our example, y2 = y * y is transformed into

a_2=(0,0,1,0,0,0,0,0)b_2=(0,0,1,0,0,0,0,0)c_2=(0,0,0,0,1,0,0,0),\vec{a}\_2 = (0,0,1,0,0,0,0,0) \\ \vec{b}\_2 = (0,0,1,0,0,0,0,0) \\ \vec{c}\_2 = (0,0,0,0,1,0,0,0),

t1 = 3 * y2 is transformed into

a_3=(0,0,0,0,1,0,0,0)b_3=(3,0,0,0,0,0,0,0)c_3=(0,0,0,0,0,1,0,0),\vec{a}\_3 = (0,0,0,0,1,0,0,0) \\ \vec{b}\_3 = (3,0,0,0,0,0,0,0) \\ \vec{c}\_3 = (0,0,0,0,0,1,0,0),

and t2 = x2 + t1, out = t2 + 10 is transformed into

a_4=(10,0,0,1,0,1,0,0)b_4=(1,0,0,0,0,0,0,0)c_4=(0,0,0,0,0,0,0,1).\vec{a}\_4 = (10,0,0,1,0,1,0,0) \\ \vec{b}\_4 = (1,0,0,0,0,0,0,0) \\ \vec{c}\_4 = (0,0,0,0,0,0,0,1).

After representing all the flattened equations in this format, the only prover who knows the correct x,yx, y can find solution vector s\vec{s} of given constraint system. So the prover’s ultimate goal now is to prove their knowledge of s\vec{s}.

QAP

The purpose of QAP is to prove same system but decrease communication cost using polynomials instead of vector dot products. From the vectors in R1CS, we can derive polynomials. For example, polynomial a1a_1 (note: this is NOT a1\vec{a_1}, which is a vector in the R1CS) is defined by the points a1(1)=0,a1(2)=0,a1(3)=0,a1(4)=10,a_1(1) = 0,a_1(2) = 0,a_1(3) = 0,a_1(4) = 10, where 0,0,0,100, 0, 0, 10 corresponds to the terms for the elements corresponding to 11 in the solution vector in the coefficient vectors a1,a2,a3,a4,\vec{a_1}, \vec{a_2}, \vec{a_3}, \vec{a_4}, respectively. This polynomial is degree 33 and can be generated by Lagrange interpolation. Polynomials ax,a_y,a_x2,a_y2,a_t1,a_t2,a_outa_x, a\_{y}, a\_{x2}, a\_{y2},a\_{t1},a\_{t2},a\_{out} are also generated in the same way. And the same goes for b1,...,b_out,c1,...,c_outb_1, ...,b\_{out}, c_1,...,c\_{out}.

Let A=(a1,ax,a_y,a_x2,a_y2,a_t1,a_t2,a_out)\vec{A} = (a_1, a_x, a\_{y}, a\_{x2}, a\_{y2},a\_{t1},a\_{t2},a\_{out}) (yes, vector of polynomials!) and the same goes for B,C\vec{B}, \vec{C}. It can observed that (A(x)s)(B(x)s)=C(x)s(\vec{A}(x) \cdot \vec{s}) \cdot (\vec{B}(x) \cdot \vec{s}) = \vec{C}(x) \cdot \vec{s} for x=1,,4x = 1,\dots,4. Let A(x)=(A(x)s)A(x) = (\vec{A}(x) \cdot \vec{s}) and the same for B(x),C(x)B(x), C(x). Then, there exists a polynomial H(x)H(x) such that A(x)B(x)C(x)=H(x)(x1)(x2)(x3)(x4)A(x) \cdot B(x) - C(x) = H(x) \cdot (x-1)(x-2)(x-3)(x-4) if s\vec{s} is properly chosen. In this example, f(2,1)=17f(2, 1) = 17, so s=(1,2,1,4,1,3,7,17)\vec{s} = (1,2,1,4,1,3,7,17). This s\vec{s} is called satisfying assignment.

From this long journey, we transformed the statement into multiplications of polynomials. If a prover can convince that

  1. A(x),B(x),A(x), B(x), and C(x)C(x) are derived from same s\vec{s}, and
  2. exists H(x)H(x) such that A(x)B(x)C(x)=H(x)Z(x)A(x)\cdot B(x) - C(x) = H(x) \cdot Z(x) where Z(x)=(x1)(x2)(x3)(x4)Z(x) = (x-1)(x-2)(x-3)(x-4),

then the verifier accepts the prover’s knowledge. To guarantee succinctness, rather than directly handling polynomials, A(t)B(t)C(t)=H(t)Z(t)A(t)\cdot B(t) - C(t) = H(t) \cdot Z(t) for some random point tt is checked.

Homomorphic Hiding & Pairing

The equation A(t)B(t)C(t)=H(t)Z(t)A(t)\cdot B(t) - C(t) = H(t) \cdot Z(t) can hold for some tt when A(x)B(x)C(x)H(x)Z(x)A(x)\cdot B(x) - C(x) \neq H(x) \cdot Z(x). This probability is negligible, but if tt is known, then an attacker can easily construct A(x),B(x),C(x),H(x)A(x), B(x), C(x), H(x) satisfying A(t)B(t)C(t)=H(t)Z(t)A(t)\cdot B(t) - C(t) = H(t) \cdot Z(t). Therefore tt should be hidden but allowing the prover to calculate A(t),B(t),C(t),H(t)A(t), B(t), C(t), H(t). It seems contradictory, but homomorphic hiding allows this. From the hardness of elliptic curve discrete logarithm problem, we define an encryption function E(x)=xgE(x) = xg for a generator gg. Then EE satisfies the following:

  1. It is hard to find xx from E(x)E(x).
  2. xyx \neq y then E(x)E(y)E(x) \neq E(y).
  3. E(x+y)=E(x)+E(y)E(x + y) = E(x)+E(y).
  4. aE(x)=E(ax)a \cdot E(x) = E(ax).

Rather than directly giving tt to the prover, the prover is given E(t0),E(t1),E(t^0), E(t^1), \dots. Then, since A(t),B(t),C(t),A(t), B(t), C(t), and H(t)H(t) are linear combinations of t0,t1,t^0, t^1, \dots, the prover can calculate E(A(t)),E(B(t)),E(C(t)),E(H(t))E(A(t)), E(B(t)), E(C(t)), E(H(t)) . Moreover, the fact that E(A(t)),E(B(t)),E(C(t)),E(H(t))E(A(t)), E(B(t)), E(C(t)), E(H(t)) is really derived form E(t0),E(t1),E(t^0), E(t^1), \dots can be proven by the knowledge of coefficient assumption test. We omit information of the coefficient assumption test in this article.

Now the prover generates E(A(t)),E(B(t)),E(C(t)),E(H(t))E(A(t)), E(B(t)), E(C(t)), E(H(t)) , which will be given to the verifier, and the verifier can generate E(Z(t))E(Z(t)) since Z(x)Z(x) is known. However, the verifier cannot check A(t)B(t)C(t)=H(t)Z(t)A(t)\cdot B(t) - C(t) = H(t) \cdot Z(t) because EE is not homomorphic in multiplication. We need another mapping called pairings. A pairing is a map e:G1×G2GTe : G_1 \times G_2 \rightarrow G_T that satisfies e(aP,bQ)=abe(P,Q)e(aP,bQ)=ab \cdot e(P,Q) where G1,G2G_1, G_2 are elliptic curves with the same equation but different generators g1,g2g_1, g_2 having order qq and GTG_T is a cyclic group of order qq. We slightly extend EE to E1E_1 and E2E_2, where E1(x)=xg1,E2(x)=xg2E_1(x) = xg_1, E_2(x) = xg_2.

From now on, we can multiply hiding values. Finally, A(t)B(t)C(t)=H(t)Z(t)A(t)\cdot B(t) - C(t) = H(t) \cdot Z(t) is checked from e(E1(A(t)),E2(B(t))/e(E1(C(t)),E2(1))=?e(E1(H(t))E2(Z(t))).e(E_1(A(t)), E_2(B(t)) / e(E_1(C(t)), E_2(1)) \stackrel{?}{=} e(E_1(H(t)) \cdot E_2(Z(t))). Since only E1(A(t)),E2(B(t)),E1(C(t))E_1(A(t)), E_2(B(t)), E_1(C(t)) are required to verify, the proof size is fixed to 22 elements of G1G_1 and 11 element of G2G_2.

Trusted Setup

In this procedure, the prover needs E1(t0),E1(t1),,E2(t0),E2(t1),E_1(t^0),E_1(t^1),\dots,E_2(t^0),E_2(t^1),\dots generated from a random tt. One possible scenario is that the verifier chooses random tt, generates E1(t0),E1(t1),,E2(t0),E2(t1),E_1(t^0),E_1(t^1),\dots,E_2(t^0),E_2(t^1),\dots and sends these values to the prover. However, this method is very costly for prover and only enables an interactive model. In the non-interactive model, the verifier cannot send something to the prover. Therefore, instead of verifier, a third party generates common reference string (CRS) ****E1(t0),E1(t1),,E2(t0),E2(t1),E_1(t^0),E_1(t^1),\dots,E_2(t^0),E_2(t^1),\dots and opens it to everyone. This is called a trusted setup. The requirement of trusted setup is a significant weakness of zk-SNARK.

Additional Note

Originally, CRS contained some circuit-specific values other than E1(ti),E2(ti)E_1(t^i), E_2(t^i), and the exact procedure is little more complex because it requires checking whether A(x),B(x),A(x), B(x), and C(x)C(x) are derived from same s\vec{s} using knowledge of coefficient assumption. You may refer to the Groth16 paper to learn more about this.

Powers-of-Tau

In Tornado Cash, the users act as provers while the contract serves as the verifier. In the beta version of Tornado Cash, the Tornado Cash team generated CRS and made it public (see here, proving key and verification key are derived from CRS). If all users trust the Tornado Cash team, they can generate proof using CRS provided by the team. However, if the Tornado Cash team become malicious, they can generate forged proof and potentially steal all the coins. To resolve this issue, the Tornado Cash team replaced CRS generated from the crypto community, called power-of-tau. User 11 to kk first hold their own random t1,t2,,tkt_1, t_2, \dots, t_k. User 11 generates E1(t10),E1(t11),,E2(t10),E2(t11),E_1(t_1^0),E_1(t_1^1),\dots,E_2(t_1^0),E_2(t_1^1),\dots. Note that E1(t1i)=t1ig1,E2(t1i)=t1ig2.E_1(t_1^i) = t_1^i \cdot g_1, E_2(t_1^i) = t_1^i \cdot g_2. Then user 22 generates E1((t1t2)0),E1((t1t2)1),,E2((t1t2)0),E2((t1t2)1),E_1((t_1t_2)^0),E_1((t_1t_2)^1),\dots,E_2((t_1t_2)^0),E_2((t_1t_2)^1),\dots from E1((t1t2)i)=t2iE1(t1i),E2((t1t2)i)=t2iE2(t1i)E_1((t_1t_2)^i) = t_2^i \cdot E_1(t_1^i), E_2((t_1t_2)^i) = t_2^i \cdot E_2(t_1^i), without knowing t1t_1. After user kk generates the list, the CRS from secret t=t1t2tkt = t_1t_2 \dots t_k is generated. In the case of Tornado Cash, 1,114 users participated and CRS is successfully replaced.

Implementations in Tornado Cash

Now we get back to statements in Tornado Cash. Our goal is to prove and verify the knowledge of l,k,r,Ol,k,r,O such that for a f(l,k,r,O)=1f(l,k,r,O)=1 for

f(l,k,r,O)={1,if h=H1(k) and O is valid Merkle proof for l-th leaf0,elsef(l, k, r, O) = \begin{cases} 1, \text{if } h = H_1(k) \text{ and } O \text{ is valid Merkle proof for } l\text{-th leaf} \\ 0, \text{else} \end{cases}

This ff needs to be transformed into a QAP. Tornado Cash uses circuit compilers Circom and snarkjs. Circuit compiler Circom has their own language called circom language. After coding ff in a circom language, the code is transformed into the R1CS using circom compiler. After that, snarkjs takes care of almost everything, such as generating CRS (via the trusted setup ceremony in case of Tornado Cash), proof, and a smart contract for the verification.

Security Concerns

In this section, we will discuss several security concerns for protocol and user levels. It is important to note that addressing these issues does not necessarily mean that Tornado Cash is vulnerable.

Hash Functions

Using different hash functions H1H_1 and H2H_2 seems unnecessary, and the reasons for this choice were not given by the authors. ABDK Consulting, who audited Tornado Cash, also pointed this out (see here, section 5.1). Moreover, the Pedersen hash function has a homomorphic property, so it is not a cryptographic hash function. Instead, defining both H1H_1 and H2H_2 as MiMC (or any other SNARK-friendly hash function) and applying domain separation might be better.

Dependencies

The contract source code is not very long and well-analyzed. However, as is often the case, small flaws can cause an apocalypse in this field. In fact, in October 2019, the Tornado Cash team discovered a vulnerability where an attacker could make arbitrary deposits due to issues with the external dependency of the zk-SNARK implementation of the MiMC (see here). Therefore, all the dependencies must be also carefully audited.

User Mistakes

The user must choose a high entropy kk and rr. There are also possibilities of losing anonymity from user mistakes. We recommend reading this article.

Conclusion

Tornado Cash is a popular, decentralized coin mixer on the Ethereum network that offers strong anonymity to its users through the use of cryptographic techniques. As a main feature of Tornado Cash is anonymity in making private transactions, we looked at the math behind their deposits and withdrawals. Taking a deeper dive into their zero-knowledge proof system, Groth16, we exemplified how a transformation of a prover’s statement into quadratic arithmetic program form may occur with reference to flattening, homomorphic hiding and pairing, and the R1CS – and followed with a discussion of procedures such as trusted setup and powers-of-tau. There were also several security concerns to note (including using different hash functions, dependencies needing auditing, and user mistakes). Since the anonymity and privacy in Tornado Cash is one of its strongest features, it is interesting to take a look behind the curtain and see the math behind it.

Disclaimer

It is worth recognizing that while the anonymity provided by Tornado Cash can be beneficial, it can also be used for illegal or prohibited activities. Additionally, it is important to note once again that using Tornado Cash for any transactions within the United States or by U.S. persons is prohibited. We recommend you do your own research and consult with qualified legal counsel before interacting with Tornado Cash.