Agentic Commerce - the hack waiting to happen.

There's a fundamental problem with AI security today. The tools we use were never designed for the world we're rapidly entering, a world where AI agents don't just assist humans, but transact autonomously with each other at machine speed, preferably with no human in the loop. A world of 'Agentic Commerce' where there's no one to catch what the filters miss or investigate long traces when the dashboard flags an issue. Many guardrails are reactive: "something went wrong, so let's take a look." But what if that something was triggered by a prompt injection attack that drained your bank account?

AI security today relies on a three-pillar approach: guardrails, observability, and policy enforcement. Companies deploy content filters to block harmful outputs, implement real-time monitoring to track agent behavior, and use LLM-based judges to detect jailbreaks and prompt injections. Advanced systems like AWS AgentCore provide continuous evaluations with pre-built evaluators covering correctness, helpfulness, and safety, running against live interactions with alerts when metrics drop. Amazon's Bedrock Guardrails claims to block up to 88% of harmful content, while AWS's Automated Reasoning checks deliver up to 99% verification accuracy in detecting AI hallucinations. These are impressive achievements for human-supervised AI systems where anomalies can be flagged and investigated.

Take Automated Reasoning as an example of how far we've come, and where the limits are. The system uses mathematical logic and formal verification techniques to validate accuracy, providing definitive rules and parameters against which AI responses are checked. Unlike probabilistic reasoning methods that deal with uncertainty, Automated Reasoning translates natural language policies into formal logic consisting of rules, variables, and types. For a mortgage approval system, it can ensure an AI agent never approves a loan for someone with a credit score below 680 or less than 20% down payment, catching hallucinations before they become costly mistakes.

But here's what it can't do: prove that the AI agent actually used the policy you think it used, verify which model executed the decision, or prevent the guardrail itself from being compromised. This is what security researchers call the "same model, different hat" problem. When LLMs are used to both generate responses and evaluate their safety, both components inherit the same weaknesses, allowing coordinated bypasses through prompt injection attacks. If an attacker can trick the main AI agent through a carefully crafted prompt, they can often use the same technique to trick the guardrail that's supposed to be watching it; because both are fundamentally language models vulnerable to the same manipulation tactics. Automated Reasoning validates outputs against rules, but it operates after the LLM has generated content, and if that generation process was compromised, the check may be validating a sophisticated attack rather than catching it.

Human's need to make good rules!

Consider a real attack scenario: Your company sets a policy that no single transaction should exceed 100 USD. An indirect prompt injection hidden inside a normal-looking document that the AI agent fetches instructs the agent to break a 1,000 USD transfer into one hundred separate 10 USD transactions. Each individual transaction is checked by Automated Reasoning, which correctly validates that 10 < 100 and approves it. The formal logic worked perfectly, but the agent was manipulated to circumvent the policy's intent through behavior the rules didn't anticipate. By the time your monitoring flags the unusual pattern of 100 micro-transactions, the money is gone.

New World, New Zero-Days

The problem isn't just that guardrails can be bypassed, it's what happens when they're bypassed at machine speed in an autonomous commerce system.

In traditional cybersecurity, a zero-day vulnerability is bad, but there are natural friction points that limit the damage. A human needs to notice something's wrong. Transactions need manual approval. Bank fraud detection systems flag unusual patterns. There's time to react, investigate, and shut things down. Hours, maybe days, but time.

In agentic commerce, that time doesn't exist.

This is the amplification problem: A vulnerability that might steal 1,000 in a human-speed system steals 10 million in a machine-speed system. Not because the vulnerability is worse, but because the exploit executes thousands of times faster than humans can respond. Continuous evaluations can trigger alerts when metrics drop, but alerts are reactive, by the time they fire, thousands more compromised transactions have already executed.

The numbers are brutal: If your agent can execute 1,000 transactions per second and your incident response team needs 60 seconds to investigate an alert, that's 60,000 potentially fraudulent transactions before you even understand what's happening. And that assumes your team is monitoring 24/7, immediately sees the alert, and instantly understands it's not a false positive.

Machine speed enables machine-scale theft. And current guardrails, designed for human-supervised systems where a delayed response is acceptable, simply cannot keep up.

If we always need a human in the loop to catch what automated systems miss, we completely defeat the benefits and purpose of agent-to-agent systems.

Fortunately... math works.

But human's still need to make good rules!
Human's still need to make good rules!
Human's still need to make all good rules?!

But after the rules and models are set math and cryptography should carry us the rest of the way.

After all, there's a reason the global financial system processes trillions of dollars daily without requiring humans to verify each transaction: cryptography provides mathematical certainty that trust and observation cannot.

When Visa processes a contactless payment, it doesn't trust that the card is legitimate, it cryptographically verifies it. When your bank confirms a wire transfer, it doesn't hope the amount wasn't tampered with, it proves it mathematically. When blockchain networks settle transactions, they don't rely on reputation (in most cases anyway 😂), they use cryptographic proofs that make fraud computationally infeasible.

Cryptography turns trust problems into math problems. And math doesn't care about prompt injection, social engineering, or sophisticated attacks. Either the proof is valid or it isn't. There's no middle ground, no probabilistic confidence score, no "88% blocked."

Agents can also make rules..

The same AI agents that need guardrails can help humans design better guardrails.

Automated Reasoning systems (which also often use LLM for the translation step to formal logic) can automatically generate test scenarios from policy definitions, making coverage more comprehensive. Humans set the intent ("prevent large unauthorized transfers"), but agents systematically explore edge cases: currency conversion, transaction fees, split payments, aggregate daily limits. Humans are good at high-level policy; agents excel at exhaustive edge case enumeration.

Even better: agents continuously audit rules against real patterns. A monitoring agent reviews cryptographic proofs from legitimate transactions and flags: "15% of valid business transactions are 100-500 USD invoice payments. Current policy blocks legitimate commerce." The agent proposes a refined rule. Human approves. Policy improves.

The feedback loop:

  • Humans design intent, agents generate comprehensive test cases
  • Monitoring agents analyze proof patterns, find gaps, propose improvements
  • Humans review and approve major updates
  • System becomes more robust over time

Division of labor: humans provide judgment and intent, agents provide exhaustive analysis and continuous refinement. This is why "comprehensive policies" isn't wishful thinking, it's achievable when agents help humans systematically harden rules using cryptographically verified real-world data.

Automated Reasoning provides deterministic rule checking, but zkML (introduced below) provides both deterministic proof generation and succinct verification. This matters for machine-speed systems where you need to verify thousands of proofs per second. Also if you don't trust the provider.

Succinct verification of complex workflows is exactly what's missing in AI agent security, and it's exactly what Zero-Knowledge Machine Learning (zkML) provides.

zkML: Mathematical Receipts for AI Execution

zkML is emerging technology built on years of research into zero-knowledge succinct non-interactive arguments of knowledge (zkSNARKs). In zkML, the party that computes the ML inference also generates a cryptographic proof that proves the computation.

The result is a mathematical receipt that proves:

  • Which specific model (with exact weights and version) was executed
  • On which specific inputs (these can be private)
  • Producing which specific outputs
  • And this proof can be verified by anyone, instantly, without trusting the party that generated it
  • The receipts are succinctly verifiable, meaning even if the underlying computation took hours, it can be checked in under a second

From Reactive Guardrails to Proactive Proof

Remember our $100 transaction limit being bypassed through 100 micro-transactions?

Here's what changes with zkML:

Automated Reasoning translates policy rules into formal logic and validates AI outputs with up to 99% accuracy in detecting hallucinations. That mathematical validation is exactly what you want—if humans design comprehensive rules and train robust guardrail models, the system should rarely have incidents. The formal logic enforcement is deterministic: no interpretation, no drift, just mathematical rule checking at machine speed.

The problem was never the rule checking itself. The problem is proving which guardrail models actually ran, on what data, with which rules. Potentially to third parties that don't trust you, your model, or your guardrail process.


You know the rules you set work. You know your model will never send all of the client's money. But does your client know that? What if your client is also an agent?

zkML solves the trust problem. Your agent generates a cryptographic proof showing that its approved guardrail model validated the transaction. The proof captures not just "a guardrail ran" but exactly what it evaluated: the decision context, the specific rules checked, which model version executed.

When 100 micro-transactions execute in rapid succession, the monitoring agent doesn't need human judgment, it verifies cryptographic proofs mathematically. The proofs reveal: same guardrail models, same formal logic, identical approval reasoning, 10-second timespan.

In a trustless system with well-designed rules, this is game over for the attacker. The math alone proves the anomaly, the proofs show a pattern legitimate transactions cannot produce. The monitoring agent automatically denies all pending requests, halts the transaction chain, and quarantines the compromised agent. No interpretation needed. No investigation backlog. No human intervention. Circuit breaker triggers autonomously at machine speed—matching the attack speed with verification speed.

This is the division of labor that makes agentic commerce viable:

  • Humans: Design rules, train models, set security requirements (done once, done well)
  • Automated Reasoning: Enforces rules through formal logic (99% accuracy, deterministic)
  • AI systems: battle test the rules.
  • zkML: Proves the right models and rules actually executed (cryptographic certainty, succinct verification)
  • Monitoring agents: Verify proofs and trigger responses autonomously (machine speed, no bottleneck)

If humans set up comprehensive policies and well-trained guardrails, the system self-enforces with mathematical guarantees. No trust required. No humans in the loop for every transaction. Just math verifying math, autonomously catching patterns that would drain bank accounts before a human could read the first alert.

This isn't "better monitoring" or "smarter guardrails". It's fundamentally different architecture. Just like Visa doesn't trust merchants and merchants don't trust customers, agents don't need to trust other agents when every claim comes with a cryptographic receipt that can't be forged.

What Happens to Observability?

With zkML and Automated Reasoning guardrails, observability fundamentally transforms from reactive monitoring to cryptographic verification.

Traditional observability watches behavior patterns to infer what might be happening:

  • Monitor metrics, logs, traces
  • Look for anomalies in behavior
  • Investigate when things look suspicious
  • Reactive: Something looks wrong → investigate → respond

Automated Reasoning validation adds latency that scales with complexity, and traditional CloudTrail logging only records API calls, not whether the claimed model actually ran or what computation occurred. By the time dashboards flag unusual patterns, damage is done.

zkML-powered observability mathematically verifies what actually happened:

  • Every action carries cryptographic proof of execution
  • Anomalies are mathematically provable, not statistically inferred
  • Proactive: Proof shows violation → automatic enforcement

Observability doesn't disappear. It evolves. Instead of teams staring at dashboards hoping to catch anomalies before catastrophic loss, monitoring agents verify cryptographic receipts in real-time. The system doesn't ask "does this pattern look suspicious?" It asks "does this proof mathematically violate any policy?"

One is a question requiring human judgment. The other is a mathematical fact triggering autonomous enforcement.

This is the shift from observation to verification. And in a world where agents transact at machine speed, only verification scales.


AI security today relies on a three-pillar approach: guardrails, observability, and policy enforcement. But these pillars were designed for human-supervised systems where alerts can be investigated and delayed responses are acceptable.

Agentic commerce operates at machine speed with no human in the loop. Prompt injection appears in over 73% of production AI deployments, and even sophisticated systems can be bypassed with simple techniques achieving 100% evasion success. At machine speed, a single vulnerability doesn't steal thousands, it steals millions before humans can respond.

zkML transforms all three pillars:

Guardrails: From probabilistic filtering (88% blocked) to cryptographic proof of execution. Not "did a guardrail run?" but "here's mathematical proof that SafetyModel-v2.3 validated this transaction."

Observability: From reactive monitoring (investigate suspicious patterns) to proactive verification (mathematically prove policy violations in real-time). Monitoring agents don't watch dashboards, they verify cryptographic receipts and trigger autonomous enforcement.

Policy Enforcement: From trusting logs and hoping rules were checked to succinct verification of formal logic execution. Every decision carries a cryptographic receipt proving which rules ran, on what data, with which models.

Humans design comprehensive rules with AI assistance. Automated Reasoning enforces them deterministically. zkML proves the right models executed the right checks. Monitoring agents verify thousands of proofs per second and respond autonomously.

This is how trillion-dollar agent economies become viable: not by hoping security works, but by mathematically proving it did. Cryptography is already securing global commerce. It will also secure Agentic Commerce.

The only question is whether we build in mathematical security before a disaster happens.. or after.


zkML also can provide privacy for input data. I will get into how and why this is important in another post.

At ICME Labs we are building Jolt Atlas so that you don't need to trust agents.. you can verify them. Star and follow 😄

Wyatt Benno

I build zkML software and write about where AI meets cryptography.

My Twitter

Subscribe to ICME

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe