Mohit

January 25, 2024

What Is a ZK Audit?

A look into the ZK audit process and techniques the leading ZK auditors employ to examine ZK circuits

Is zero knowledge (ZK) boomer crypto? Yes and no.

The marketing around ZK hypes it up as advanced cryptography, which, in concept, is quite true. ZK proof systems are, of course, very elegant cryptographic constructions.

On the other hand, they’re only half the picture when it comes to a working ZK product. The interface provided by ZK frameworks does a good job of abstracting away the cryptographic details to help developers focus on the business logic.

There are two main components to a typical ZK application.

The circuit: This is the circuit representation for the program we desire to create proofs for. This front-end component is what encodes the business logic of the application.
The proving system: This is the ZK proving system backend that the application uses. The choice of proving system and framework is dependent on the requirements of the project and does affect the front end used.

Circuit implementation comes with its own set of vulnerability classes, disjoint from the low-level cryptography bugs that may be found in the proving system. Naturally, most ZK teams don’t build their own framework, focusing instead on the circuits and business logic. Since the vast majority of ZK audits are focused on circuits, and it’s also the part of the stack most prone to bugs in the wild, it is (for the most part) what we focus on in this post.

We will describe what the audit process for a ZK application looks like, bring up important questions to answer during an audit, and end with some general tips that we have found useful as we continue to refine our process for auditing ZK applications.

This article assumes a moderate understanding of ZK on the reader’s part. While we try to provide technical context wherever necessary, we will not be introducing concepts like public/private inputs, nullifiers, and so on from scratch. This is by no means intended to be an exhaustive checklist for audits (if one is even possible) but rather be general guidelines, which may prove useful to ZK security researchers as well as provide developers some insight into our audit methodology.

Audit Process

In a way, auditing a ZK circuit is like thinking about programming backwards. While it may look similar to computation, it is in fact a completely separate structure, since a ZK circuit is essentially a set of assertions (constraints) imposed on a program’s execution trace.

Consider the following pseudocode:

a = 2
b = 4
c = a+b

Let’s focus on the variable c. Interpreting this snippet as pseudocode, since c is defined as a+b, its value is by definition always guaranteed to be equal to a + b. If we were to write a circuit for this program, however, this relationship would be declared explicitly. The problem is, even if we skip this assertion, the circuit would continue to work fine. But what happens when someone inputs into the circuit a value for c that does not equal a+b? Since there was no constraint in place to enforce this relationship, this trace would produce a valid proof as well. Depending on how c is used later on, the other elements in the trace could be completely bogus and not dependent on a and b at all. The verifier, however, would look at the proof and think the inputs must have been a valid program trace. In other words, in a ZK circuit, all necessary relationships between variables must be explicitly declared.

Consider the following pseudocode example,

a = 0
b = sha256(a)

compared to the following circuit expressed as a set of assertions:

a == 0
b == e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Technically, the value e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 is equal to the SHA-256 hash of an empty string. The circuit should be correct. However, the circuit never expressed the actual computation of the hash, and the value of the output is therefore arbitrary. A circuit that ensures the validity of such computations for arbitrary values of a and b would not only need an assertion on the input and output of the hash function but an assertion for every step of the hash computation itself.

These are the kinds of considerations ZK engineers must keep in mind when translating computations to circuits. An audit involves diligently going through every constraint defined in the circuit to make sure the translation is sound. For this, it’s important for the auditor to have a strong understanding of the computation expressed by the circuit and the logical relationships between witness values.

On a high level, there are two phases to auditing a ZK circuit:

Protocol design: Investigate design issues with the circuit and how it fits in the larger protocol.
Circuit implementation: Investigate implementation issues that make the circuit deviate from spec or permit undefined behavior.

Protocol Design

A comprehensive security review of a ZK application must begin with an inspection of the core protocol design.

Is the Design Specification Cryptographically Sound?

ZK applications often employ sophisticated custom protocols, which can lead to novel cryptography bugs. Ideally, a cryptographic protocol must always be presented with a well-defined set of assumptions and accompanying security proofs. However, this is often skipped for early-stage projects or ones that extend on existing protocols. Therefore, an auditor must thoroughly examine the usage of all cryptographic primitives, any fixed parameters used in the protocol, and the soundness of the protocol as a whole.

Even secure primitives may introduce vulnerabilities if used incorrectly in the larger protocol↗ or configured in an insecure manner↗.

Does the Proof Expose Too Much Data?

Through certain design flaws, the public outputs may be used to infer information about values that are intended to be private. This violates the privacy guarantees, aka the zero-knowledge property of the circuit.

Example: Information exposure was found in the Aztec 2.0 prelaunch patch↗. Their method for removing spending keys involving an account nullifier broke the sender privacy of transactions from the account. This was the case since the static account nullifier could be used to correlate transactions by the account. This issue was remediated by changing the key-removal procedure to use an account nonce instead of the nullifier.

How Does the Proof Fit Into the Larger Protocol?

How a ZK proof interacts with out-of-circuit components is an important factor in ensuring the end-to-end security of the protocol. A circuit being implemented securely and the proof verifier being correct are by no means enough to guarantee security.

How does a proof typically fit into the larger picture? A circuit takes in some public and private inputs. A valid proof asserts some relationship between these values. This ties the validity of the private inputs to the public inputs being publicly checkable by the verifier. To now consider a statement valid, the verifier must perform two actions:

Verify the proof, ensuring the correct relationship between the public and private inputs.
Check the public inputs against publicly verifiable values. This may involve performing computations on values known to the verifier.

Assuming the proof is verified correctly, we can still run into issues in the second step. A few major issues one may run into in this step are

Admin control over the parameters used to check public inputs can lead to centralization risk.
Insufficient validation of all public outputs. For example, in a circuit that proves membership of a certain transaction in a block, if the block hash itself is never checked to correspond to a valid block, the proof becomes meaningless. In a way, the public outputs must act as a commitment to all private inputs and constrain their validity.
If verification involves out-of-circuit computation to verify public inputs, there could be discrepancies between in-circuit and out-of-circuit computation. Sometimes the subcircuit for certain computations can have slight deviations from the spec that’s followed out of circuit. This can lead to interesting attack vectors.

Example: One public example of this can be found in Polygon’s zkEVM audit↗. The cause of this vulnerability was a discrepancy in RLP decoding algorithm between the EVM and zkEVM ROM. RLP decodes short and long strings as follows:

If a string is 0–5 bytes long, the RLP encoding consists of a single byte with a value 0x80 (dec. 128) plus the length of the string followed by the string. The range of the first byte is thus [0x80, 0xb7] (dec. [128, 183]).
If a string is more than 55 bytes long, the RLP encoding consists of a single byte with value 0xb7 (dec. 183) plus the length in bytes of the length of the string in binary form, followed by the length of the string, followed by the string. For example, a 1024-byte–long string would be encoded as \xb9\x04\x00 (dec. 185, 4, 0) followed by the string. Here, 0xb9 (183 + 2 = 185) is the first byte, followed by the two bytes 0x0400 (dec. 1024) that denote the length of the actual string. The range of the first byte is thus [0xb8, 0xbf] (dec. [184, 191]).

However, the circuit missed a constraint to check that the prefix byte range actually corresponds to the string length (i.e., one could have a string with length < 55 with a prefix in the range [0xb8, 0xbf] or vice versa). This invalid RLP encoding would still lead to a valid proof, and the attack would claim the transferred assets. However, other contracts in the protocol would fail to do so, causing the network to halt.

Circuit Review

Completeness issues: There exist valid inputs for which the circuit statement cannot be proved. This can be equated to incomplete functionality. These issues usually arise in the form of overconstraints.
Soundness issues: This happens when a logical relationship between two or more values is not expressed as a constraint. Underconstrained witness values mean we can produce valid proofs for an invalid set of witness values.

Completeness Issues

Overconstraints can make the circuit unprovable, leading to functionality loss for certain users or queries.

Example: One such issue that we identified during our audit of Scroll’s zkEVM↗ was an overconstraint in the Poseidon hash circuit.

The Poseidon table supported two modes of hashing, represented by the variable mpt_only. If mpt_only = true, it signifies two field elements are to be hashed, while mpt_only = false signifies variable-length input. The table has some custom rows at the beginning. Custom rows are constrained to be filled with zeros. There are supposed to be two custom rows if mpt_only = false and one otherwise.

config.s_custom.enable(region, 1)?;
if self.mpt_only {
return Ok(1);
}

This code snippet essentially means if the self.mpt parameter is set to true during circuit synthesis, the return value would be used to set the second row as custom.

The overconstraint here was that even for the case where mpt_only = true, the second row was also marked as custom. This row, meant to contain the inputs to the hash, is now overconstrained to be zero.

As a result, any hashing attempt with nonzero inputs would fail.

Soundness Issues

These are by far the most prevalent bugs in ZK circuits. They are tricky to spot and can be very easy to miss, and though an exhaustive security checklist might not be possible, a structured approach to them goes a long way. We now describe some prominent classes of underconstraints that we have encountered repeatedly during our audits along with some general techniques that have helped us examine circuits thoroughly and efficiently.

Are All Variables Range Checked?

Any element that is not free to be an arbitrary field element must be range checked implicitly or explicitly at some point in the circuit.

Example: We identified several missing range checks during our Scroll zkEVM audit, some of which had very significant consequences. For example, in the RLP circuit, the input bytes were not constrained to be in the 0–255 range.

Are Multiphase Operations Constrained Properly?

ZK circuits are inherently deterministic in nature, but certain programs often require access to in-circuit randomness. The way this is achieved is by dividing the computation into phases. The first phase proceeds entirely deterministically. For each phase thereafter, we derive a random element by applying a hash function to all the witness values calculated up to that point in the circuit (i.e., Fiat–Shamir heuristic).

Everything that is solely witnessed at first must be constrained in a later phase. Random challenges derived in multiphase circuits are particularly useful for string operations and compressing multicolumn tables to a single column. However, this means that the constraints for values must be enforced across multiple phases.

Example: During one of our audits, we encountered a possible bug where a set of values was witnessed in the first phase. The concatenation of these values had to be equal to another witness element. This constraint had to be enforced in the second phase using random linear combination (RLC) but was never done, leading to a soundness issue.

Are Variable-Length Values Handled Properly?

Handling variable-length values inside a circuit is a sticky affair. In our experience, spending time on subcircuits that handle variable-length inputs is important.

Example: One example is an issue in the RLP tag computation we identified in Scroll. The value of an RLP tag was calculated using the RLC of its constituent bytes. The formula to calculate this is bytes_rlc(i+1) =) bytes_rlc(i) *r + byte_value(i+1) where r is the challenge value used to calculate the RLC. However, since they did not keep track of the tag length, padding an RLP value with null bytes would result in the same tag. This was a critical issue as it could be used to spoof equality checks between RLP values.

Are There Inverse Constraints for Out-of-Circuit Computations?

Oftentimes, operations that are hard to represent in circuit are computed out of circuit, and instead the inverse is asserted in circuit. This is a useful strategy for optimizing circuit size but presents more surface area for oversights. One possibility here is of values being taken out and back inside the circuit, without an equality constraint.

Example: An example can be found in the audit of Penumbra↗.

impl IncomingViewingKeyVar {
pub fn derive(nk: &NullifierKeyVar, ak: &AuthorizationKeyVar) -> Result<Self, SynthesisError> {
// TRUNCATED...
let inner_ivk_mod_q: Fq = ivk_mod_q.value().unwrap_or_default();
let ivk_mod_r = Fr::from_le_bytes_mod_order(&inner_ivk_mod_q.to_bytes());
let ivk = NonNativeFieldVar::<Fr, Fq>::new_variable(
cs,
|| Ok(ivk_mod_r),
AllocationMode::Witness,
)?; 

Here, ivk_mod_q is extracted from the circuit and inserted back in as ivk without any equality checks. This removes any constraints placed upon it previously.

The sheer surface area for underconstraints, along with how subtle they can often be, can make looking for them a very daunting task. While there really is no alternative to iterating and reiterating over every single constraint in scope, here are a few general tips that help make the process slightly more structured.

Keep track of the immediate logical relationships between values. This, of course, goes without saying, as discussed in the opening paragraph. It’s imperative to keep track of logical relationships in order to spot any that aren’t constrained.
Make a list of assumptions for every function and see if they’re followed by every call. Every function makes some assumptions about its inputs. This can be the structure that the inputs are expected to follow without being constrained inside the function itself. These assumptions must be enforced externally in every call to that function.
Double-check untested code. While this tip is by no means specific to ZK audits, the sections of a codebase that lack comprehensive testing are often more prone to bugs and must be looked at thoroughly.

Conclusion

The ZK space is evolving fast, and things are bound to break every now and then. ZK circuits are often very critical components in an ecosystem. As they are still a relatively niche field of development, they can be as hard to audit as they are to write, which makes it very easy for inexperienced auditors to miss bugs. The high impact combined with the high level of technical sophistication required to audit circuits properly makes comprehensive audits paramount when it comes to ZK. As we continue to lead the industry in our ZK hacking efforts, we’re excited to share our insights with the larger security community.

About Us

Zellic specializes in securing emerging technologies. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants.

Developers, founders, and investors trust our security assessments to ship quickly, confidently, and without critical vulnerabilities. With our background in real-world offensive security research, we find what others miss.

‍Contact us↗ for an audit that’s better than the rest. Real audits, not rubber stamps.

What Is a ZK Audit?

Audit Process

Protocol Design

Is the Design Specification Cryptographically Sound?

Does the Proof Expose Too Much Data?

How Does the Proof Fit Into the Larger Protocol?

Circuit Review

Completeness Issues

Soundness Issues

Are All Variables Range Checked?

Are Multiphase Operations Constrained Properly?

Are Variable-Length Values Handled Properly?

Are There Inverse Constraints for Out-of-Circuit Computations?

Conclusion

About Us

About us

What we do

Follow us

Audit Process​

Protocol Design​

Is the Design Specification Cryptographically Sound?​

Does the Proof Expose Too Much Data?​

How Does the Proof Fit Into the Larger Protocol?​

Circuit Review​

Completeness Issues​

Soundness Issues​

Are All Variables Range Checked?​

Are Multiphase Operations Constrained Properly?​

Are Variable-Length Values Handled Properly?​

Are There Inverse Constraints for Out-of-Circuit Computations?​

Conclusion​

About Us​

Audit Process

Protocol Design

Is the Design Specification Cryptographically Sound?

Does the Proof Expose Too Much Data?

How Does the Proof Fit Into the Larger Protocol?

Circuit Review

Completeness Issues

Soundness Issues

Are All Variables Range Checked?

Are Multiphase Operations Constrained Properly?

Are Variable-Length Values Handled Properly?

Are There Inverse Constraints for Out-of-Circuit Computations?

Conclusion

About Us