Public Evidence
Evidence & Benchmarks
Each evidence item includes what it demonstrates — and, critically, its limitations. No claims are made beyond what is publicly verifiable.
CEG Benchmark Mini-Run
Public Summary2026-05-31
Governed mode produced replayable decision evidence for selected benchmark tasks.
Demonstrates
- auditability
- duplicate handling
- stale-context handling
- governance blocking
- replayable traces
Limitations
- mini-run only
- not a general safety guarantee
- public summary redacted
Execution Commit Gate Validation
Synthetic2026-06-01
Side effects are blocked when ExecutionCommit is missing from the decision trace.
Demonstrates
- execution boundary enforcement
- missing ExecutionCommit detection
- trace integrity
Limitations
- synthetic test scenario
- production conditions may differ
- does not cover all side effect types
Kernel Structural Invariant Checks
Public Summary2026-06-02
Critical kernel tests pass for DTO contracts, policy authority path, and structural invariants.
Demonstrates
- DTO contract stability
- single policy authority path
- structural invariants separated from policy
Limitations
- kernel v0.1 freeze candidate only
- not all invariants formalized yet
- public summary only
Duplicate Proposal Handling
Public Summary2026-06-03
Duplicate intent proposals are detected and prevented from creating duplicate side effects.
Demonstrates
- idempotency guard
- intent deduplication
- audit trail consistency
Limitations
- content-based dedup only
- not timing-attack resistant
- public summary