By Wyatt Benno — Nov 5, 2025

The Definitive Guide to ZKML (2025).

AI systems are making decisions that move money, affect health outcomes, and control autonomous agents. But how do you verify that an AI actually computed what it claimed to compute? That it used the model it promised to use? That it didn't leak your data in the process?

There are several approaches to verifiable AI — trusted hardware, consensus mechanisms, re-execution in secure enclaves. Each has merits. But the most interesting approach, the one that solves verification through pure mathematics and minimal trust assumptions, is zero-knowledge machine learning (ZKML).

Look, we're going to skip the hardware and re-execution approaches — TEEs, consensus mechanisms, all that. Not because they're bad, but because the interesting problem 'for me' is pure mathematical verification: zero-knowledge proofs applied to machine learning — ZKML.

This field barely existed in 'industry' three years ago. Then Modulus Labs, EZKL, Dr. Daniel Kang, Dr. Cathie So, and a handful of others showed up and said "let's make AI verifiable." The immediate objection was obvious: zkVMs have 100,000x to 1,000,000x overhead. Running inference in a zero-knowledge proof is like swimming through concrete.

So why do it?

Turns out there are three good reasons ZKML is worth the pain.

Succinct Verification: Big Compute, Small Receipts.

Here's the asymmetry that makes ZKML work: computation can be expensive, verification can be cheap.

AWS runs your model on a GPU cluster for an hour. Then it hands your phone a cryptographic receipt that takes 50 milliseconds to verify. Your phone knows —mathematically knows — the computation was done correctly. No trust required.

This unlocks something new: trustless agent workflows. An agent on your phone talks to an agent in a company data center, which talks to an agent on Ethereum, which talks to an agent on Solana. Each one passes a cryptographic baton like a trustless agent relay race for agentic commerce. The whole 'chain' of inference is verifiable end-to-end.

Without this? One compromised agent poisons the entire workflow. In autonomous systems moving money or making medical decisions, that's not a bug— it's a catastrophe waiting to happen.

Privacy: The Actually Useful Part of "Zero Knowledge"

The ZK in ZKP means the proof leaks nothing.

A hospital runs diagnostics on patient data. Generates a proof. Now they can show regulators "we used the FDA-approved model and got this result" without exposing a single patient record. The data stays private. The proof is public.

Or: a bank proves they ran their fraud detection model correctly without revealing the model itself (competitive advantage) or the transaction data (regulatory requirement). The auditor verifies. Everyone's happy.

We're also watching AI move on-device — Gemma, Apple's Foundation Models, the whole local inference wave. These models need to talk to the outside world eventually. zkML is how a model running on your laptop proves to a remote system that it actually computed something, without uploading your data or the model weights.

There are many use-case of zkML that require privacy. Not all repos provide it — keep this in mind devs!

Programmable Money: Why Agents Need Proofs

In 2025 Cryptographic proofs can control actual money. This matters more than people realize.

Standards like x402 and ERC-8004 are emerging for agent-to-agent payments. We're heading toward autonomous economies where:

An agent buys data from a provider
Runs inference across multiple models
Delivers results to a client
Settles payment — all without a human in the loop

Every step needs proof. Did you use the data you paid for? Did you run the model you claimed? Is this result actually from that computation? zkML answers these questions cryptographically.

When agents are moving real money — not testnet tokens, actual value—math-based security isn't optional. You need proofs or you need trust. And if you're building trustless systems, the choice is obvious.

In 2025 — ZKML is still expensive. The overhead is real. But the overhead is getting better (1,000,000x → 100,000x → 10,000x) and the value proposition is getting clearer.

zkPyTorch dropped in March 2025 and suddenly you can prove VGG-16 inference in 2.2 seconds. Lagrange's DeepProve tackled large LLM inference in August. In Fall our JOLT Atlas repo we experienced similar speedups on a wide range of models — all before applying any GPU.

In 2025 — we're way past the toy phase. There are models that can be proven in seconds with ZKP right now. With better dev tooling — we can expect to see this breakout infra appearing live in many more projects in 2026.

You pay the computational cost once — for verifiability, privacy, and the ability to coordinate agents across trust boundaries without intermediaries. In a world where AI agents are about to start moving billions of dollars around, that's not a luxury. It's essential infrastructure.

The Field: Who's Building What

zkML went from "maybe possible" in 2022 to "actually shipping" by 2025. Here's how we got here and who's doing what.

The Early Days (2022-2023): Proof of Concept

Modulus Labs kicked things off. Daniel Shorr and team out of Stanford published "The Cost of Intelligence" — the first real benchmarking of ZK proof systems for AI. Their thesis: if zk-rollups can make Ethereum compute cheaper, maybe they can bring AI on-chain too.

Spoiler: it was expensive as hell. $20 per transaction just to verify the smallest piece of a smart contract. But it worked. They built RockyBot (on-chain AI fighting game) and Leela vs the World to prove the concept. More importantly, they showed you could prove GPT-2 and Twitter's recommendation algorithm in zero knowledge.

The underlying tech they used is a protocol called GKR. Vitalik did a tutorial on it recently so I will not re-iterate the details here; go check out that post if you are interested in GKR. The concept is that GKR allows you to skip cryptographic commitments at the central layers and that ML operations 'feel' natural expressible in this setting. It turns out that matmul and other essential operations are actually more efficient with specialized protocols, sumcheck protocol and lookup arguments. The core reason was explained very well, years ago, by Thaler in his book Proofs, Arguments, and Zero-knowledge:

Preview: Other Protocols for MATMULT. An alternate interactive MATMULT protocol can be obtained by applying the GKR protocol (covered later in Section 4.6) to a circuit C that computes the product C of two input matrices A,B. The verifier in this protocol runs in O(n^2) time, and the prover runs in time O(S), where S is the number of gates in C.

The advantage of the MATMULT protocol described in this section is two-fold. First, it does not care how the prover finds the right answer. In contrast, the GKR protocol demands that the prover compute the answer matrix C in a prescribed manner, namely by evaluating the circuit C gate-by-gate. Second, the prover in the protocol of this section simply finds the right answer and then does O(n^2) extra work to prove correctness. This O(n^2) term is a low-order additive overhead, assuming that there is no linear-time algorithm for matrix multiplication. In contrast, the GKR protocol introduces at least a constant factor overhead for the prover. In practice, this is the difference between a prover that runs many times slower than an (unverifiable) MATMULT algorithm, and a prover that runs a fraction of a percent slower`

Thaler also was an early proponent of sumcheck protocol as a core building block in all of ZK! @SuccinctJT #tendsToBeRight.

Halo2 applied to ZKML

Around the same time, Jason Morton started EZKL. His approach was different — take any model in ONNX format (the open standard for neural networks), convert it to Halo2 circuits, generate proofs. The killer feature: you didn't need to be a cryptographer. Export your PyTorch model, point EZKL at it, get a proof.

The Explosion (2024-2025): Choose Your Fighter

*If your project should be listed or facts change in 2025 let me know!
*Claims below are what I found projects said about themselves in their own blog postings. Sometimes they are exaggerated claims! 😬😬

EZKL (2023~)

ONNX → Halo2 circuits
Benchmarks showed it was 65x faster than RISC Zero, 3x faster than Orion
Used 98% less memory than RISC Zero
Tradeoff: only supports a subset of ONNX operators (they're adding more)
Main challenge: quantization. You lose some accuracy going from floating point to fixed point arithmetic
Likely privacy preserving ✅

Lagrange DeepProve (launched 2024, GPT-2 proof early 2025)

This one came in fast. Claims 54-158x faster than EZKL
First to prove a complete GPT-2 inference — not just pieces, the whole thing
Verification: 671x faster for MLPs, 521x faster for CNNs (half a second verification time)
Uses sumcheck protocol + lookup arguments (logup GKR)
Working on LLAMA support — GPT-2 and LLAMA architecturally similar, so they're close
Has a decentralized prover network (live on EigenLayer)
Not likely privacy preserving ❌

zkPyTorch (Polyhedra Network, March 2025)

This was the breakthrough for modern transformers
First to prove Llama-3 — 150 seconds per token
VGG-16 in 2.2 seconds
Three-layer optimization: preprocessing, ZK-friendly quantization, circuit optimization
Uses DAGs and parallel execution across cores
Integrates with Expander proving engine
Not likely privacy preserving ❌

ZKTorch (Daniel Kang, July 2025)

"Universal" compiler — handles anything
GPT-J (6 billion parameters): 20 minutes on 64 threads
GPT-2: 10 minutes (down from 1+ hour)
ResNet-50 proof: 85KB (Mystique was generating 1.27GB proofs)
Uses proof accumulation — folds multiple proofs into one compact proof
This is the current speed king for general-purpose zkML
Academic target not industry

Jolt Atlas (NovaNet/ICME Labs, August 2025)

Built on a16z's JOLT zkVM modified for ONNX
The zkVM approach but actually very fast
Key insight: ML workloads love lookup tables, and JOLT is lookup-native
No quotient polynomials, no byte decomposition, no grand products — just lookups and sumcheck
Flexible quantization support — doesn't materialize full lookup tables, so you're not locked into specific quantization schemes
Can theoretically extend to floating-point (most others are stuck with fixed-point)
Perfect for agent use cases where you need both verification and privacy
Can supports true zero-knowledge via folding schemes (HyperNova / BlindFold). ✅

The Technical Reality

Quantization Hell: ML models use floating point arithmetic. ZK proofs use finite field arithmetic (basically integers). You have to convert, and you lose precision. Most ZKML quantize the model so they lose a bit on accuracy. But on the other hand a lot of ML used on small-devices and in production are quantized models.

Every framework handles this differently. Some use larger bit widths (more accurate, slower). Some use lookup tables. Some do clever tricks with fixed-point representations. We like our approach at Jolt Atlas because we don't need to materialize lookup tables for a lot of ML operators.

Nobody's figured out the perfect solution yet. Iterations on iterations allowing for more use-case at each step. This is one reason to be optimistic about ZKML near term.

Operator Coverage: ONNX has 120+ operators. Most zkML frameworks support maybe 50-200 of them. This means certain model architectures just don't work yet. Teams are racing to add more, but it's a grind.

Your production model uses operators your zkML framework doesn't support. This happens more than you'd think.

ONNX spec has over 120 operators. Most zkML frameworks support 50~ or less. The gaps:

Custom layers you wrote for your specific use case: nope
Exotic normalizations (GroupNorm, LayerNorm variations): maybe
Dynamic control flow (if statements, loops): often not
Attention mechanisms: just being added to major frameworks in 2024-2025
Recent innovations (flash attention, rotary embeddings): probably not

You discover this when you try to export your model. ONNX conversion succeeds. Framework ingestion fails. "Unsupported operator: [whatever]."

Now you're rewriting your model to use only supported operators. This isn't a minor inconvenience — it's a architectural constraint you should have known about before you started training. This is one reason we like the zkVM approach.. as each operator can be generalized for plug-and-play more easily. Precompile centric approaches are more hand-crafty 🫳🧶.

Activation Functions: Choose Wisely: In vanilla ML, activation functions are free. ReLU, sigmoid, tanh, GELU — pick whatever works.

In zkML, activation functions are expensive operations that blow up your circuit.

Why are activation functions expensive? ZK circuits are built on polynomial arithmetic — addition and multiplication in finite fields. These operations are cheap because they map directly to circuit constraints. But activation functions are non-linear in ways that don't decompose nicely into field arithmetic.

ReLU needs to compute "if x > 0 then x else 0"—that comparison requires multiple constraints to represent. Sigmoid needs to compute 1/(1 + e^(-x)) exponentiation in finite fields is painful, requiring many multiplications and often lookup tables. Softmax combines exponentiation, summation, and division across an entire vector, turning what's a simple operation in native execution into a circuit with hundreds or thousands of constraints per neuron.

Cheap:

Linear (no activation): free
Scaled addition: basically free

Moderate:

ReLU: requires comparison, manageable
Step functions: similar cost to ReLU

Expensive:

Sigmoid: exponentiation is painful in circuits
Tanh: even worse
Softmax: exponentiation + division + normalization, true pain
GELU/SwiGLU: forget it (for now... we have some WIP here)

Modern transformers love GELU and variants. zkML transformers are stuck with approximations or simpler alternatives.

This is why frameworks are building lookup tables for non-linearities. Precompute common values, reference them instead of computing. Faster, but you're trading memory for speed and locking in quantization choices.

Use Cases: So What's Actually Worth Proving?

You've just read about 10,000x overhead, quantization hell, and exponential cost curves. Reasonable question: why would anyone subject themselves to this?

The answer isn't "everything should be zkML." The answer is: certain problems need verifiability so badly that the overhead is worth it.

A Basic Filter

Before we dive into use cases, here's the test: Does trust failure cost more than the proof?

If you're running a recommendation algorithm to show cat videos, trust failure costs nothing. Just show the cat videos. Nobody cares if your model is actually the one you claimed.

If you're running a trading bot managing $10M in assets, trust failure is catastrophic. The agent goes rogue, the position gets liquidated, you're explaining to investors why you trusted an opaque API.

zkML makes sense when:

High stakes: Money, health, legal decisions, security
Trust gaps: Multiple parties who don't trust each other
Privacy constraints: Sensitive data that can't be shared
Auditability requirements: Regulators or stakeholders need proof
Adversarial environments: Someone has an incentive to cheat

If your use case doesn't hit at least two of these, you probably don't need zkML yet.

DeFi: Where the Money Lives

DeFi is zkML's natural habitat. You've got: High-value transactions that require trustless execution and succinct verification on blockchain while maintaining transparency for users. Adversarial participants will try to exploit every edge!

Price Oracles

The first real zkML product was Upshot + Modulus's zkPredictor. Problem: NFT valuations are computed by proprietary ML models. How do you trust the price feed?

Traditional oracle: "Trust us, this is what our model said." zkML oracle: "Here's a cryptographic proof that this price came from this exact model, running on this specific data (potentially private)."

The proof means you can build financial products (lending, derivatives) on top of these prices without trusting Upshot. They can't manipulate prices without breaking the proof. The data stays private, but the computation is verifiable.

This pattern generalizes: any time a DeFi protocol needs ML-derived data (volatility estimates, risk scores, yield predictions), zkML can prove the computation without revealing the model.

Trading Bots & Agents

Picture this: you've deployed a yield optimization agent across multiple DeFi protocols. It's managing liquidity positions on Uniswap, farming on Curve, rebalancing on Aave.

How do you know it's executing your strategy correctly? How do you prove to LPs that their capital is being managed according to the algorithm you advertised?

With zkML, the agent generates a proof for each action. "I moved 50 ETH from pool A to pool B because my model predicted higher yields, and here's proof I used the strategy you approved."

Giza is building exactly this on Starknet. Their LuminAIR framework (using StarkWare's STWO prover) lets you build verifiable agents for DeFi. An agent that rebalances Uniswap V3 positions can prove each rebalance decision came from the promised model. The model weights stay private. The trading strategy stays private. The proof is public.

This unlocks agent-to-agent interactions. Your agent can trustlessly compose with another agent because both are producing verifiable computation. No trusted intermediary. Just math.

Risk Models & Credit Scoring

Banks use ML for credit decisions. DeFi protocols use ML for collateralization ratios. Problem: how do you prove your risk model was applied consistently?

Traditional system: "Trust the bank." zkML system: "Every loan decision comes with a proof that this specific model, with these frozen parameters, evaluated this applicant's data."

This matters for:

Regulatory compliance: Prove you're not discriminating
Fairness audits: Prove the same model ran for everyone
Dispute resolution: If someone challenges a decision, you have cryptographic proof of what happened

The model can stay proprietary. The data can stay private. The proof shows the process was fair.

Trustless Agents

Remember the intro? Agents passing cryptographic batons in a relay race?

Here's the scenario: An agent ecosystem where:

Agent A (on your phone) analyzes your calendar and decides you need to book a flight
Agent B (travel booking service) finds flights and prices
Agent C (payment processor) executes the transaction
Agent D (expense tracking) records it for your company's accounting

Each step needs to verify the previous step. Agent B won't execute if Agent A's analysis is fraudulent. Agent C won't pay if Agent B's quote is manipulated. Agent D won't record if Agent C's transaction is suspicious.

Without zkML: every agent runs in a trusted enclave or everyone trusts everyone. Neither scales.

With zkML: each agent produces a proof. Agent B verifies Agent A's proof. Agent C verifies Agent B's proof. The whole pipeline is trustless. One agent can be running on AWS, another on your phone, another on Ethereum. Doesn't matter — math connects them.

The x402 & ERC-8004 Future

These emerging standards define how AI agents pay each other directly. No human in the loop. But payments require trust.

If Agent A claims "I ran this analysis, pay me," Agent B needs proof. If Agent B is managing funds and Agent A is lying, that's theft. zkML provides the proof layer.

We're heading toward autonomous agent economies. Agents hire other agents for subtasks. Agents prove their work cryptographically. Payments flow automatically based on verified completion. No centralized party controls the workflow.

NovaNet's Jolt Atlas is designed for this. Privacy + verification. The agent proves it computed correctly without revealing inputs, outputs, or intermediate states. Perfect for commercial agents where everything is sensitive.

Healthcare: Privacy Meets Auditability

Healthcare is drowning in ML but terrified of privacy violations. HIPAA, GDPR, regional regulations—every jurisdiction has rules about patient data.

Diagnostic Models

A hospital runs an ML diagnostic model. FDA approved, thoroughly validated. Patient comes in, model analyzes imaging data, recommends treatment.

Regulator asks: "Did you actually use the FDA-approved model? Or did you use a modified version? Can you prove it?"

Traditional answer: "Trust our logs." zkML answer: "Here's a cryptographic proof that this exact model (with commitment to weights) ran on this patient's data, producing this result."

The patient data never leaves the hospital. The model weights stay confidential (IP). But the proof goes to the regulator, the insurance company, whoever needs to verify.

Collaborative Research Without Data Sharing

Multiple hospitals want to train a model on their combined patient data. Can't share the data (privacy laws). Can't trust each other (competitive).

zkML enables: each hospital proves their local training was done correctly on valid data. The proofs combine. Everyone gets a better model. Nobody sees anyone else's data.

Gaming: Provable Fairness

Gaming doesn't need zkML for cat pictures. But competitive gaming with money on the line? Different story.

AI Opponents

You're playing poker against an AI. How do you know the AI isn't cheating by seeing your cards? How do you know it's actually the "hard difficulty" you paid for and not just "medium" renamed?

zkML: The game server proves each AI decision came from the committed model. Can't cheat. Can't substitute a weaker model. Proof is generated per-hand, verified by the client.

Modulus built RockyBot (AI fighting game) and Leela vs the World (on-chain chess) as proofs of concept. The AI's behavior is provable. Players can verify they're facing the real deal.

Fair Matchmaking

Ranked matchmaking uses ML to pair players. If the algorithm is opaque, conspiracy theories flourish: "They're matching me with bad teammates!" "They're rigging games!"

zkML: prove the matchmaking algorithm ran correctly. Prove every player was scored by the same model. Suddenly the tinfoil-hat theories have less ground to stand on.

Model Marketplaces: MLaaS Verification

You pay for access to GPT-4-level API. How do you know you're actually getting GPT-4 and not GPT-3.5 with a renamed endpoint?

Right now: trust the provider.

With zkML: every API response comes with a proof. "This output came from model X with Y parameters." If the provider tries to substitute a cheaper model, the proof fails.

This enables competitive model marketplaces as providers can't cheat on model tiers! Users can verify SLA compliance and Pricing tied to verified compute (you pay for what you actually got).

AI Memory

One core use-case at ICME Labs involves embedding models. These can run in browsers and are practical targets for ZKML right now. Imagine source material being ingested in English while the consumers buy and query in Japanese — they can't audit so they need cryptographic trust.

Or renting a memory — with a trust me bro 'my AI memory contains this..~' Classification models can be used right now to tackle this trust issue and create a new AI Memory Economy ™️.

What's Missing (In 2025)

Let's be honest about what doesn't work yet:

Large Language Models GPT-5 in zkML? Not happening. Maybe GPT-2 as a demo (zkPyTorch proved Llama-3 but at 150 seconds per token). Real cutting edge LLM inference can be proven.. but slowly and while eating a lot of memory.

Real-Time Systems If you need sub-100ms inference with proofs, you're limited to smaller models or more direct classification models. The self-driving car proving every decision? Not with current zkML.

Training We can prove inference. We cannot prove training at scale. If you need to verify a model was trained on specific data with specific methods, ZKML isn't there yet.

Complex Architectures Attention mechanisms just became feasible. Mixture of experts? Graph neural networks? Diffusion models? Still research areas.

Predictions for ZKML in 2026

Here are some basic educated guesses what we will see going into 2026 for the next 10x unlocks.

Hardware Wave

Silicon unlocks are fair game,.

GPU Acceleration (Already Landing): Every major zkML framework has or is adding GPU support. EZKL, Lagrange, zkPyTorch, Jolt — all running on CUDA. But 2025's GPU support was "it runs on GPUs." 2026's will be "it's optimized for GPUs."

The difference matters. Current implementations port CPU algorithms to GPUs. Next-gen implementations redesign algorithms around GPU primitives. Massive parallelism. Streaming data through GPU memory. Kernel fusion for proof generation.

Expected impact: 5-10x speedup on existing workloads. Models that took 30 seconds might take 3-5 seconds. That's the difference between "viable for batch processing" and "viable for interactive applications."

Multi-Machine Proving (Coordination Layer)

Most current zkML: one beefy machine generates your proof.

2026 zkML: proof generation gets parallelized across a cluster. Split the circuit, distribute to multiple provers (multi-folding), aggregate the results.

Lagrange is already working on this. Polyhedra mentioned it in their zkPyTorch roadmap. The tech exists (recursive proofs, proof aggregation, continuations). NovaNet our infra layer focuses on how cooperative provers (via folding scheme) handle this task. The engineering is hard (work distribution, fault tolerance, cost optimization).

When this ships: proving GPT-2 might go from 10 minutes to 1 minute by throwing 10 machines at it. Proving Llama-3 will go from "curiosity" to "actually usable."

Proof Systems: Better Math

Hardware helps, but better algorithms help more.

Field Arithmetic

Most zkML today uses BN254 or similar large fields. Some teams are exploring Mersenne-31 fields — and other smaller fields, that can have way faster operations. There are estimates of 10x performance improvement just from the field switch. While EC based systems continue to benefit from sparsity (Twist and Shout as one example). Lattice based ZKP scheme allow us to tap into these smaller fields, while benefitting from sparsity and homomorphism. Lattice also enable pay-per-bit and are plausibly post quantum secure; last cool fact — public parameter generation can be done on the fly.

Why it matters: field operations are the innermost loop of proof generation. 10x faster fields means 10x faster proofs across the board. Models that prove in 10 seconds might prove in 1 second.

Jolt Atlas already benefits from this — lookup-centric architectures work well with sparsity — some ML operations have a lot of sparsity.

Proof Accumulation / Folding Schemes

ZKTorch used this: instead of generating independent proofs for every layer, fold multiple proofs into one accumulator. The final proof is tiny regardless of model depth.

This is Nova/SuperNova/HyperNova/NeutronNova territory ⭐💥. Recursive SNARKs that let you prove "I proved A, then I proved B, then I proved C" without the proof size exploding.

2026 prediction: this becomes standard. Every zkML framework adds folding. ResNet-50 goes from 1.27GB proof (old Mystique) to <100KB (new folding-based systems). GPT-style models become feasible because proof size doesn't scale with sequence length.

Folding also helps with prover memory. You can run ZKML on many devices and pick the step size to match the machines spec.

Lastly, folding can be used to give back ZK to protocols that are not privacy preserving. There is a cool trick in the HyperNova paper that shows how to do this.

Streaming Proofs

Current limitation: to prove an LLM generating 100 tokens, you prove token 1, then token 2, then token 3... each proof is independent while memory explodes. You can control memory growth with folding or via streaming.

This is research-stage now but will ship in 2026. When it does: LLM inference in zkML goes from "prove on big machine" to "prove anywhere".

Operator Coverage Explosion

Remember: ONNX has 120+ operators. Most frameworks support 50~.

The gap is closing fast. Not because frameworks are implementing operators one by one — because they're building operator compilers and general purpose zkVM primitives to handle many operators at scale.

Transformer Primitives

Attention mechanisms were barely feasible in 2024. By late 2025, multiple frameworks support them. In 2026, they'll be optimized.

Specialized circuits for:

Scaled dot-product attention
Multi-head attention
Positional encodings
Layer normalization (the transformer killer in early zkML)

Combined with streaming proofs, this means: transformer-based models become first-class citizens in zkML. Not just "we can prove GPT-2 slowly," but "we can prove modern transformer architectures at reasonable cost."

What this unlocks: vision transformers, audio models, multi-modal models. All the architectures that power modern ML, now provable.

Cost Curve Shifts Lead To Use Case Evolution

The technical improvements aren't interesting by themselves. What matters is what they enable.

DeFi Agents: From Batch to Real-Time

2025: An agent rebalances your portfolio every hour. Each rebalance comes with a proof generated in the background. By the time the next trade executes, the previous proof is ready.

2026: The agent rebalances in real-time based on market conditions. Proofs generate in 1-5 seconds. The agent operates in a continuous loop: observe market → compute decision → generate proof → execute trade. The proof is available before the next block confirms.

This changes the game. You can build reactive agents, not just scheduled ones. Flash crash protection. MEV defense. Automated arbitrage with cryptographic guarantees.

Healthcare: From Audit Logs to Real-Time Verification

2025: Hospital runs diagnostics. Model generates result. Hospital later produces a proof for the regulator. Proof generation takes minutes, happens offline.

2026: Proof generation is fast enough that it happens during the clinical workflow. Doctor orders the test. Model runs. Proof generates in parallel. By the time the doctor reviews the result, the proof is attached.

This enables: real-time audit compliance. Insurance pre-authorization with instant verification. Cross-institutional workflows where each step is proven before the next begins.

Trustless Agents: From Demos to Production

2025: Agent workflows are possible but clunky. Each agent interaction requires proof generation that takes seconds to minutes. Complex workflows feel slow.

2026: With sub-second proving for simple models and parallelized proving for complex ones, agent interactions feel natural. Agent A calls Agent B, waits 0.5 seconds for proof verification, proceeds. The latency is annoying but much better than humans doing it manually 🤪.

This is when trustless agent networks actually scale. Not research projects—production systems where hundreds of agents coordinate, each proving their work cryptographically.

The x402/ERC-8004 vision becomes real: agents hiring agents, paying in crypto, all mediated by proofs.

Gaming: From Turn-Based to Real-Time

2025: zkML in games is limited to turn-based scenarios. Poker bots, chess engines, strategy games where you can tolerate 1-5 second proof generation per move.

2026: Fast enough for real-time game AI in certain genres. Fighting games where the AI opponent proves each decision. RTS games where strategic decisions (not unit-level pathfinding, but high-level tactics) are proven.

Still not fast enough for FPS games or reflex-based mechanics. But the viable design space expands considerably.

Model Marketplaces: From Niche to Normal

2025: Proving API responses is cool but niche. Only high-value applications justify the overhead.

2026: Costs drop enough that proving becomes standard for any API charging more than $0.01 per call. Model providers differentiate on verifiability. "Unverified inference" becomes the budget tier.

This enables: SLA enforcement through cryptography. Proof-of-work for AI services. Reputation systems based on verified compute history.

Verifiable AI memory: Creating Shared Value

2025: We are already using ZKML to prove things about embeddings and classifications of vector DB in 2025. This use-case hyper scales in 2026.

2026: Trustless Shared AI memory go online. Your AI assistant doesn't have one memory — it coordinates across multiple verified memories. Personal memory, company memory, specialized expertise.

At The End Of The Day

Plan for incremental progress, and occasional revolutionary leaps — subscribe to hear about the leaps!

The ZKML party is already happening — we proved it's possible to verify ML with zkp. Now we're in the boring part where engineer and researchers make it faster, cheaper, and more reliable.

At an event I heard a crypto VC say "ZK this year is boring"!

Boring is good. Boring means it's becoming real.

I build software and write about where AI meets cryptography.

My Twitter