The Oracle Problem
Why semantic truth verification is computationally intractable — and the two-layer architecture that routes around it.
Abstract
The ability of two autonomous agents to agree on whether a piece of work was correctly performed is the foundational verification problem of agentic commerce. This paper argues that the general form of this problem — semantic truth verification at machine speed and over open domains — is computationally intractable, and that any architecture assuming it can be "solved" by a smarter oracle is building on a foundation that will not hold.
We then describe a two-layer architecture that routes around the intractability rather than attempting to solve it: a fast, deterministic validator layer handles everything that admits a machine-checkable predicate (schema, cryptographic proof, policy conformance), and a quality market layer with skin-in-the-game reputation and economic dispute resolution handles the residual — the semantic judgments that no oracle can cheaply make. We argue this decomposition is both necessary and sufficient for agentic commerce to function at scale.
Video Summary & Audio Companion
The Two-Layer Verification Stack
Key Findings
- General-purpose semantic truth verification at machine speed is computationally intractable — no smarter oracle solves it.
- The path forward is architectural decomposition, not a better oracle: separate what is machine-checkable from what is not.
- A deterministic validator layer handles schema, cryptographic proof, and policy conformance — cheaply, at scale, with no model in the loop.
- A quality market layer — with stake, CRI-style reputation, and economic dispute resolution — handles the residual semantic judgments.
Methodology
The paper formalises the agent-commerce quality-verification problem as an oracle problem: when one agent pays another agent for a piece of work — a translation, a summary, a search result — some external oracle must decide whether the delivered work matches the promised quality. The paper proves that no general-purpose oracle exists that can decide semantic equivalence in finite time across arbitrary content domains.
The proof reduces semantic equivalence to the halting problem via Rice’s theorem and to the entscheidungsproblem via Gödel-Tarski. An LLM evaluator is, in this framing, a probabilistic oracle: it returns an answer in finite time, but the answer is not reliably correct, and the rate of error is not bounded above by any deterministic function of input length.
Having established the impossibility result, the paper proposes a two-layer architecture that works around it: a deterministic-validator layer (regex matchers, structural validators, hash-equivalence checks, schema-conformance, type-checking) and a Quality Markets layer (staked human or agent reviewers compete in a market for the right to grade outputs that the deterministic layer cannot decide). The two layers are complementary, not redundant.
The paper specifies the routing rule between the two layers: an output is routed to the deterministic layer if and only if every claim made by the seller can be expressed in a formal grammar known to that layer. Otherwise it is routed to Quality Markets. The grammar membership decision itself is decidable, which is why the routing rule is implementable.
Why this matters in practice
For protocol designers, the result is that any agent-commerce protocol that claims to offer “automatic quality verification” without specifying which formal grammar the verifier accepts is selling magic. The protocol must either name the grammar (the deterministic case) or expose a human-grading mechanism (the Quality Markets case). There is no third option.
For marketplace operators, the architectural implication is that low-cost transactions cannot be uniformly “LLM-verified”. Either the validation reduces to a structural check (cheap, deterministic, narrow) or the validation requires a market (expensive, probabilistic, broad). Pricing must reflect this split.
For regulators, the result is that any framework that requires automated quality verification as a precondition for autonomous transactions is mandating something computationally impossible. The framework must instead specify which grammars are acceptable and what auditability the human-graded fallback must provide.
Frequently asked questions
Is this paper saying LLM evaluators are useless?
No. It is saying LLM evaluators are probabilistic and cannot be the sole verification layer in any protocol that needs deterministic settlement. They are useful as routers (decide whether a structural validator applies) and as advisory signals (flag a transaction for human review). They are not useful as binding verdicts.
What is a “Quality Market”?
A market where graders stake collateral to compete for the right to grade an output. The graders are paid out of the transaction price; mis-grading is punished by stake loss. The market is the human-graded fallback for the cases the deterministic layer cannot decide.
Why doesn’t Rice’s theorem make the problem unsolvable in practice?
Rice’s theorem rules out a general-purpose decider. It does not rule out domain-specific deciders. The paper’s contribution is to characterise exactly which decisions are decidable (those expressible in a formal grammar) and which are not (everything else).
Has this been deployed?
A reference implementation exists in the AgenticEconomy.dev /quality-markets-deployment directory. It demonstrates the two-layer routing on a synthetic translation marketplace.
Does this generalise beyond agent commerce?
Yes. The same architecture applies to any system that needs to verify quality on heterogenous outputs without human gating — content moderation, code review automation, scientific replication automation. The agent-commerce framing is the most economically pressing instance.
Cite this paper
@article{dechamps2026oracle,
author = {Dechamps Otamendi, Ren\'e},
title = {The Oracle Problem in Agent Commerce: Why Semantic Truth Verification Is Computationally Intractable},
year = {2026},
month = {March},
note = {Preprint},
doi = {10.5281/zenodo.20039454},
url = {https://doi.org/10.5281/zenodo.20039454},
orcid = {0009-0007-1033-6519}
}


