Statutory finding detection that runs before any model is called. Schema-stable output on every run, across every model provider — by architectural design, not by prompt engineering.
Request early accessFine-tuning, RAG, and better prompting reduce hallucination frequency. They do not eliminate it structurally. More critically for litigation: a finding present in run 1 may be absent in run 3. That output is non-reproducible and non-auditable — disqualifying for any proceeding that requires it.
The problem is not just accuracy. It is schema stability.
Every finding a practitioner sees originates in Stage 1. The LLM contributes prose and may suggest secondary analysis — but it cannot generate, modify, or suppress what the pack found. That separation is enforced architecturally, not by instruction.
The domain pack executes against each clause. All findings, severities, section references, and routing weights are computed here from term lists derived directly from statutory text. No model inference occurs. The pack cannot hallucinate — it only detects terms explicitly derived from the statute.
Findings with context keys are resolved against supplied context hints. Fully deterministic. No model involved.
If a model adapter is configured, the engine calls the LLM with source text and the pack's already-computed findings as context. The model may suggest additional ambiguities and generates readable prose. Provider-agnostic: Groq, Claude, local models.
Model-suggested findings are appended to pack findings. Pack findings are never overwritten or suppressed.
A deterministic routing function computes VERIFY / ASK / FREEZE based on finding severity weights. Output is machine-readable, hash-chained, and audit-trail compliant. No LLM involved.
Pack findings achieved 100% label stability across all models and run conditions. The Flynn negative control — a clean §1001 guilty plea — produced zero HIGH findings on every run.
Identical findings across Groq Llama 3.3 70B and Claude Sonnet 4.6. Swapping inference providers does not affect pack output — an architectural property, not an empirical one.
The Flynn negative control produced zero HIGH findings across all runs and models. The rich prompt baseline fired three stable false positives on the same document.
| Case | Statute | Type | Pack Stability | Result |
|---|---|---|---|---|
| People v. Saenger (NY 2023) | NYS §240.75 | Positive control | 100% | Predicate offense defect detected |
| U.S. v. Comey (E.D. Va. 2025) | 18 USC §1001/§1505 | Positive control | 100% | Ambiguity and specification defects detected |
| U.S. v. Flynn (D.D.C. 2017) | 18 USC §1001 | Negative control | 100% | Zero HIGH findings — clean guilty plea |
| U.S. v. Montague (2d Cir. 2023) | 18 USC §1961/§1962 | Positive control | 100% | Predicate act specification defect detected |
| U.S. v. Combs (S.D.N.Y. 2024) | 18 USC §1962(d) | Well-pled indictment | 100% | No structural defects found (pending) |
Each domain pack encodes the element structure of a specific statute as a deterministic rule engine. Term lists are derived directly from statutory text. Findings cannot drift due to model behavior.
We are working with select litigation support teams and e-discovery firms ahead of general availability.
Get in touch