By Wyatt Benno — Aug 25, 2025

Sumcheck good. Lookups good. JOLT good. Particularly for zero-knowledge machine learning.

verifiable AI

TL;DR: We made zkML 3-7x faster than everyone else. Here's how.

Standing on the shoulders of giants

The a16z crypto research team recently dropped some serious heat with their 6x speedup announcement, showing how JOLT's lookup-based approach with sumcheck protocol can dramatically outperform other SNARK constructions running on just CPU. Their "Twist and Shout" optimizations proved that when you stop trying to arithmetize everything and start embracing lookups and sparsity, beautiful things happen:

Less code to audit. Faster prover. Almost “prover killer” proof.

🏅

zkML-JOLT (JOLT 'Atlas') builds upon the foundational JOLT research and implementation from the a16z Crypto team. We're grateful for their groundbreaking work and excited to contribute to the broader JOLT ecosystem.

So what if you modified JOLT for the world of neural network inference — which happens to have a lot of non-linearities, sparsity, and MatMul operations?

While the a16z team was busy revolutionizing general-purpose zkVMs, we were focused on another problem: making zero-knowledge machine learning more practical in NovaNet. Turns out, ML workloads have some very specific patterns that play exceptionally well with JOLT's lookup-heavy architecture. Where other approaches get bogged down in expensive field arithmetic for ML operations, JOLT's sumcheck + lookup combo cuts right through it.

Traditional circuit-based approaches are prohibitively expensive when representing non-linear functions like ReLU and SoftMax. Lookups, on the other hand, eliminate the need for circuit representation entirely. Just One Lookup Table (JOLT) was designed from first principles to use only lookup arguments. This foundational design choice means that any other proving scheme attempting to retrofit lookups into their existing systems will always be at a fundamental disadvantage.

In zkML JOLT (Atlas), we eliminate the complexity that plagues other approaches: ‘no quotient polynomials, no byte decomposition, no grand products, no permutation checks’, and most importantly — no complicated circuits.

For matrix-vector multiplication (a dominant cost in machine learning), we leverage JOLT's highly efficient batched sumcheck protocol for exceptional performance. While GKR-based approaches can be retrofitted with sparsity-aware frameworks like SpaGKR, JOLT 'Atlas' benefits from sparsity natively through its lookup-centric architecture and sparse polynomial commitments — eliminating the need for additional optimization layers.

Another interesting advantage — JOLT never materializes its full lookup tables, which are structured rather than explicitly stored. While other zkML projects become locked into specific quantization schemes due to materialized table constraints, JOLT's approach theoretically enables flexible quantization support and could extend to floating-point operations without the rigid preprocessing limitations faced by competitors. This is a side note as most models we are looking into near term are quantized.

Another significant advantage — with Twist and Shout, JOLT no longer requires decomposing operations into smaller subtables. This opens up precompile possibilities that were previously impossible. Operations can now be added as primitive instructions (or virtual sequences) as long as their evaluation tables are MLE-structured, rather than requiring the more restrictive 'decomposable' property. This dramatically expands the range of operations that can be efficiently integrated directly into JOLT Atlas’s instruction set.

As a small example, of an end to end test we took a multi-classification model and ran it across zkML projects.


# zkML lib benchmarks (multi-class model: preprocessing + proving + verification)


zkml-jolt    ~0.7s       # fastest

mina-zkml    ~2.0s       # relatively fast

ezkl         4-5s        # significantly slower

deep-prove   N/A         # doesnt support gather op

zk-torch     N/A         # doesnt support reduceSum op

* Good theory turns into really good practical results 😮. The orders of magnitude improvements prove true over all of the models that we tested.

Completeness and Scaling

Currently, zkML-JOLT runs on CPU architecture, but GPU acceleration represents a clear path to our next 10x performance improvement. When evaluating competitor claims of "1000x speedups," it's important to distinguish between theoretical peak performance with specialized hardware versus practical, deployment-ready implementations. True performance comparisons should be made on equivalent hardware configurations.

Moreover, our research has revealed significant gaps in the zkML ecosystem around both completeness and correctness. Many frameworks claiming ONNX support lack essential components like memory consistency checks — a critical requirement for proving that memory has not been tampered with during execution. Without these safeguards, such implementations cannot legitimately claim complete ONNX compatibility.

JOLT 'Atlas' prioritizes both performance and verifiable correctness, ensuring that our speed advantages don't come at the cost of security or completeness. Even so, it is still a work in progress; use all zkML with caution!

Use Cases

We're deploying this work as the first specialized prover in our NIVC-based prover network (DeSCI for ZK), NovaNet. NovaNet’s sister project Kinic, already leverages this technology to create portable and verifiable AI memory as “Plaid for AI” — learn more at https://www.kinic.io/.

The broader applications for verifiable AI are transformative. As researcher Daniel Kang outlined at zkSummit 13, use cases span privacy-preserving healthcare inference, financial services (including loan underwriting and algorithmic trading), and verification of locally-running AI agents. Each represents a market where trust and auditability are paramount, but current AI systems operate as black boxes.

zkML-JOLT makes these applications practically viable by delivering the performance needed for real-world deployment while maintaining mathematical guarantees of correctness. Willing to wait a few moments for zkML verification before your personal AI agent transfers $20k in USDC? We think most people will see the benefit of verifiable ML in agents that are taking on larger responsibilities.

Folding JOLT

Most zkVM and zkML projects are ‘succinctly verifiable’ but not ‘zero-knowledge’ — they don't preserve privacy of inputs / outputs, the witness, or intermediate computations. For real-word, inter-personal or inter-company AI agents, cryptographic proofs of correct execution must be coupled with privacy guarantees.

Our architecture enables true zero-knowledge through 'folding schemes'. We can fold the JOLT verifier using techniques from the HyperNova paper to achieve ‘zero-knowledge’ privacy. This approach also allows dynamic step sizing, enabling efficient proving across diverse hardware configurations; big specialized server hardware and even personal computers.

Unlike other zkML frameworks that demand substantial memory and specialized hardware, JOLT's efficiency gains with folding lets us tune the memory-speed tradeoff for specific deployment scenarios. Multi-folding with continuations can also be user for vast parallelization of zkML workloads. This flexibility unlocks use cases previously impractical in the zkML space — from consumer devices to resource-constrained environments.

What’s Next?

The a16z Crypto team continues advancing JOLT's core infrastructure with upcoming releases for ‘streaming JOLT’, optimized Fiat-Shamir transformations, enhancing the sumcheck protocol, and GPU acceleration — delivering more order-of-magnitude performance improvements. JOLT 'Atlas' will both benefit from and contribute to this research effort.

On the ICME team side, our product development drives specific research priorities in lattice-based polynomial commitment schemes (PCS) and folding schemes. These will enable pay-per-bit cost structures, smaller field arithmetic, plausible post-quantum security, homomorphism used in folding, and on-the-fly public parameter generation — critical capabilities for advanced distributed zkML deployments at scale.

This dual-track approach ensures JOLT 'Atlas' remains at the cutting edge while addressing real-world deployment challenges that emerge from building out user-facing applications of verifiable AI.

Follow me as we build: https://x.com/wyatt_benno
Check out the code: https://github.com/ICME-Lab/jolt-atlas

Useful refs:
Mina-zkml
*https://github.com/chris-chris/mina-zkml/tree/main/src

HyperNova
https://eprint.iacr.org/2023/573.pdf

Neo
https://eprint.iacr.org/2025/294

Twist and Shout
https://eprint.iacr.org/2025/105