By Wyatt Benno — Aug 20, 2024

Minimal Space, Maximum Pace: How memory efficient zero-knowledge proofs work

In the rapidly evolving world of blockchain and cryptography, we often hear about blazing-fast zero-knowledge virtual machines (zkVMs) designed to scale Ethereum or tackle large-scale computational problems. However, for true privacy, ZKPs need to run locally. This concept, known as local verifiable compute, involves implementing extremely space-efficient ZKPs for use in local and often resource constrained devices, from embedded systems to IoT devices to users' web browsers.

The fundamental principle is simple yet crucial: the moment you share your data with a third party, you compromise its privacy. By keeping computation local, you maintain control over sensitive information while still benefiting from the verifiability that ZKPs provide.

In this post, we'll dive deep into the world of memory-efficient zero-knowledge proofs. We'll explore:

The importance of local verifiable compute in preserving privacy
Various ZKP schemes pushing the boundaries of space efficiency
State-of-the-art techniques in minimizing memory usage for ZKPs
Non-recursive and recursive approaches to achieving memory efficiency
The trade-offs between space, time, and proof size in different schemes
Practical implementations and their real-world performance
The potential future developments in the field of memory-efficient ZKPs

By the end of this post, you should have a good understanding of why memory efficiency ZKP is crucial for the widespread adoption of zero-knowledge proofs, how current technologies are addressing this challenge, and what we can expect in the near future for memory efficient ZK.

*Shameless plug: we open-sourced the NovaNet zkEngine at zkHack Montreal. It uses the folding scheme (SuperNova) and we are adding all of the tricks for memory efficient zk. Try it out.

Client-Side ZK: Groth16, Circom,Tornado Cash

Before diving into the latest advancements in memory-efficient zero-knowledge proofs, it's important to understand the history and evolution of client-side ZK implementations. This history provides crucial context for why memory efficiency has become such a critical focus in the field.

Groth16: A Breakthrough in ZK-SNARKs

In 2016, Jens Groth introduced what became known as the Groth16 protocol, a significant advancement in zero-knowledge Succinct Non-interactive Arguments of Knowledge (zk-SNARKs). Groth16 offered several compelling features:

Succinct proofs: The proofs are extremely small, consisting of only 3 group elements.
Efficient verification: Verification requires only a small, constant number of pairing operations.
Fast proving time: The prover's work is dominated by a number of multi-exponentiations proportional to the size of the circuit.

These characteristics make Groth16 highly attractive for blockchain applications, where minimizing on-chain costs is crucial. However, Groth16 requires a trusted setup ceremony for each circuit, which is seen as a severe limitation for many applications.

Circom: Bringing ZK to the Browser

As the potential for client-side ZK became apparent, tools were needed to make implementation more accessible. Enter Circom, a domain-specific language and compiler for building arithmetic circuits for use in zero-knowledge proofs. Circom allows developers to describe circuits in a high-level language, which could then be compiled into constraints suitable for ZK proving systems like Groth16. Circom's toolkit includes tools for generating and verifying proofs in JavaScript and WASM. This made it ideal for browser-based ZK tasks.

With Circom developers can generate ZK proofs entirely within a web browser.

Many projects used a Circom-Groth16 stack for privacy. Most recently, the privacy DeFi project Penumbra or the ZK for mobile framework Mopro. But before these there were a few infamous, or depending on your view, famous examples.

One of the most prominent early applications of client-side ZK was Tornado Cash, a decentralized, non-custodial privacy solution for Ethereum. Launched in 2019, Tornado Cash used client-side ZK to allow users to break the on-chain link between source and destination addresses, i.e. , more private transactions.

Here is an example flow: Users could deposit a fixed amount of ETH into a Tornado Cash smart contract. On the client side (e.g., in the browser), a zk-SNARK proof would be generated to prove knowledge of the deposit without revealing which deposit. Withdrawal: The user could then submit this proof to withdraw the funds to a new address, with no on-chain link to the original deposit.

Tornado demonstrated the power of client-side ZK for enhancing privacy in blockchain transactions. However, it also highlights some of the challenges:

Generating proofs in the browser is slow, the application needs a UI and use-case that match.
Memory usage: The proving process still requires significant memory, which could be problematic on mobile devices or in resource-constrained environments (e.g., edge devices, DePIN).
Trusted setups restrict the adaptability of systems that may need multiple circuit types.

These challenges have driven much of the recent research into more memory-efficient ZK proving systems. The goal is to make privacy preserving ZK more accessible and practical across a wider range of devices and use cases, paving the way for broader adoption of privacy-preserving technologies.

Holographic SNARKs and Preprocessing Techniques

Holographic or preprocessing SNARKs, exemplified by systems like Spartan, offer an interesting approach to reducing prover overhead. In these systems, the prover's work can be divided with the smaller privacy preserving portion ran locally and the larger part ran in an untrusted prover. In the case of Spartan this is done in three parts:

Reducing the satisfiability of the circuit to claims about polynomials.
Proving the witness polynomial evaluation.
Proving sparse polynomial evaluations. (~90% according to Srinath Setty)

The key insight here is that the third step, which can account for over 90% of the prover's work, can be offloaded to an untrusted party without compromising privacy. This offloading can provide a significant speedup for the prover. Furthermore, this step can be batched for multiple client-side proofs, further amortizing the work of the untrusted entity.

However, there are some important considerations to keep in mind. The witness point evaluation is still proportional to the size of the computation. Even with the substantial cost savings from offloading, systems like Spartan would still spend considerable time on the device for the witness evaluation. Additionally, the sum-check protocol used in some of these systems is not inherently space-efficient. To achieve space efficiency, it would need to take logarithmic passes over the entire computation.

It's worth noting that preprocessing techniques can be applied to essentially any preprocessing (or holographic) SNARK. These techniques offer ways to reduce the computational burden on the prover by allowing some work to be done in advance or offloaded to more powerful machines.

Non-recursive space saving techniques

Recent advancements in zero-knowledge proof systems have led to innovative space-saving techniques, particularly in sumcheck protocols. These developments are crucial for deploying ZKP systems on resource-constrained devices or in web browsers, opening up new possibilities for privacy-preserving applications.

One notable contribution in this area is a new family of prover algorithms for the multilinear sumcheck protocol that offers a flexible time-space tradeoff. This approach allows for adjusting the balance between computational time and memory usage, making it adaptable to various hardware constraints. The algorithm family is parameterized by an integer k, where increasing k reduces memory consumption while increasing running time. At the extremes, it recovers the asymptotics of previous state-of-the-art algorithms: linear time and space when k=1, and logarithmic space but superlinear time when k equals the number of variables.

For instance, with k=2, the algorithm achieves a prover running time of O(N) (where N is the number of addends in the sum) while using only O(√N) space. This represents a significant improvement over previous approaches that required either linear space or superlinear time. Unfortunately, it is still not clear that this work can be used as the sumcheck in zkSNARKs, as it only deals with simple polynomials.

Ligetron is another novel zero-knowledge proof system that builds upon the Ligero protocol to achieve impressive space efficiency. What sets Ligetron apart is its innovative approach to handling the underlying computation being proven. Instead of working with a flattened circuit representation, which can be memory-intensive, Ligetron uses WebAssembly (WASM) as an intermediate representation for the computation. This allows it to leverage the semantic structure of the original high-level code, leading to significant memory savings.

A key feature of Ligetron is its garbage collection mechanism. As it processes the WASM code, it keeps track of which values are "active" - those currently needed for the computation - and discards or "garbage collects" values that are no longer necessary. This approach allows Ligetron to maintain a much smaller memory footprint compared to systems that must store the entire circuit in memory.

The space efficiency of Ligetron is particularly evident in structured computations. For example, in their benchmarks on a combination circuit involving SHA256 hashing and edit distance calculations, Ligetron's memory usage grew very slowly with increasing circuit size, allowing it to handle circuits with billions of gates using only a few hundred megabytes of memory.

Moreover, Ligetron achieves this space efficiency without sacrificing too much in terms of speed. Its prover runs at about 500(ns) per gate for single-instance proofs, and an impressive 65(ns) per gate in batched settings.

These advancements in space-efficient protocols and prover algorithms represent significant steps towards making zero-knowledge proofs more practical and widely applicable, especially in scenarios where memory constraints are the critical factor.

Recursive space saving techniques

Nova and other such folding schemes use a different technique for space efficiency. They allow the developer to select the size of the circuit computation that is ran over and over in a recursive fashion. The 'recursive overhead' of folding schemes are the smallest in the literature with Nova’s verifier circuit being around ≈10,000 R1CS constraints. This allows for the creation of similar setups as Ligetron, where each step of a WASM program can be seen as a single step in a recursive system, with only that one step kept 'in the head' of the prover at any time. This is the core approach that we take in the zkEngine.

One added benefit of IVC type recursion is that the prover can choose the step size. This means that if you have a large powerful machine you can choose larger step sizes and prove faster, or if you have a smaller machine you can choose the smallest step size to match memory. In our tests we were able to get proving steps down to ~50MB per step, making it practical for many DePIN and client-side devices.

Recursion also exists in STARKish type systems, with Plonky2 being one great example. It uses a mixture of PLONK and FRI to reach a balance between speed and proof size. Generally, FRI based systems can run very fast but have larger proof sizes if they do so. Plonky2 looks to remove this tradeoff. I expect that the recursive overhead is a bit larger than systems like Nova, making the minimal memory overhead achievable greater than it would otherwise be in recursive systems; this is not inherently bad as Plonky2 is designed for maximum speed rather than maximum space efficiency.

Recursion is a core tool used in ZKP systems to control the memory that the prover needs on any step. But what happens when we combine all of the above space efficiency techniques?

Recursive meets non-recursive techniques, meets game theory.

The most memory efficient and performative systems arise from the combination of the best techniques that already exist, with the addition of a few novel techniques; this is the cutting edge of memory-efficient ZK!

For example, we use cooperative game theoretical optimizations such as Shapely Value tracking, which are used in a more comprehensive system for proving called NovaNet. This system uses NIVC, you can read about this in a previous post, which allows us to a la carte' pick and pay for only the circuits used in a specific computation. In this system it is possible to have a Ligetron prover run on-device and have an aggregation layer be done by more powerful provers at a later stage, with the Ligetron circuit itself run as a verifier circuit. The same is true for a JOLT (not yet privacy enabled) prover, a zkML specific prover, or anything else that can be made into a recursive verifier circuit.

In the case of preprocessing SNARK, a la carte' selection is also very attractive as many small provers can be doing 10% of the work on their devices with 90% of the work taken by potentially much larger and specialized machines in the greater NovaNet network. The Orchestrator nodes in our case are determining the most efficient coalitions to achieve the task at hand, with NULL players receiving nothing, and identical contributions receive the same value. A researcher or developers can add their optimized proving scheme (specialized or more general) and be incentivized for doing so with NovaNet.

"Prover cooperation where needed prover competition where needed."

A combination of techniques can also result in Ligetron with pre-processing, for doubly memory efficient ZK. Or JOLT with recursion. We are very excited about the combinations that make for better on-device ZK — as this is the only way we achieve privacy and locally verifiable compute at scale. Better performance on constrained devices is the key step for global adoption of ZK and arguably the global adoption of web3.

NovaNet is an incentive and orchestration layer that uses a wide breadth of state of the art techniques, hardware providers, and game theoretical optimizations, aimed at bringing local verifiable compute with privacy to your application. Some examples are DePIN, or Gaming, but can also be about much more general use cases in finance or healthcare.

Jump into the technical channel in our TG and let's chat!