Skip to content

Post-Merge Verification (Anti-Reward-Hacking)

Autonomous workers have an incentive to declare victory. If you let them self-report success, they’ll cheerfully mark tasks “done” while types are widened to any, tests are marked .skip, and silent failures are caught by catch(e){}. The orchestrator defends against this with adversarial verification — a separate agent that re-reads the merged commit from scratch and challenges the worker’s claims.

Workers merge their own PRs via gh pr merge --squash --delete-branch. After the merge is observed, the orchestrator runs orchestrate-verify.sh on the merged commit as a surface-gaps step. Verification does not gate the merge — it runs post-merge and files a remediation ticket when it finds gaps.

Worker: "I wrote tests, types check, security review passed" → merges own PR
v
Orchestrator (post-merge): runs orchestrate-verify.sh on merged commit
├─ verified → advance wave, mark ticket done
└─ failed → file remediation ticket; wave advancement blocked until ticket filed
(or unblocked if allowSelfReportedCompletion is true)

The verifier runs the orchestrate-verify.sh script (in plugins/dev/scripts/) plus a dedicated LLM pass. The combined checks:

CheckWhat it looks for
Test existence.test.ts / _test.go / test_*.py files for new functions/endpoints
Test executionRuns the project’s test command; demands green
Type safetyRuns the typecheck command; fails on any new errors
Reward-hacking patternsas any, @ts-ignore, // eslint-disable, any return types, .skip tests, empty catches
Silent failure hunterCatch blocks that swallow without logging/rethrowing
PR description vs realityDoes the PR description match the diff? Any undocumented changes?
Linear ticket AC vs diffDo the acceptance criteria from the ticket appear to be addressed?

The LLM pass is done by a different model than the worker used (if the worker was Opus, verification uses Sonnet; and vice versa) so the two aren’t making identical mistakes.

Two possible outcomes per verification run:

The orchestrator records verification-passed in the event log. The wave advances normally and the ticket moves to Done.

Verification found gaps in the merged commit (missing tests, reward-hacking patterns, or a discrepancy between the PR description and the diff). The orchestrator files a new remediation ticket with specific findings:

Verification found issues in CTL-48 (merged):
1. Added function `validateEmail` at src/auth.ts:42 has no unit tests
2. Used `as any` at src/api.ts:88 — should be a proper type guard
3. Test file src/auth.test.ts:112 is `.skip`'d without explanation

The remediation ticket enters the backlog and can be scheduled in a future wave. Whether the current wave waits for the remediation ticket to be filed before advancing is controlled by allowSelfReportedCompletion:

  • allowSelfReportedCompletion: false (default) — wave blocks until the remediation ticket is filed
  • allowSelfReportedCompletion: true — verification failures are advisory; wave advances immediately

Note: blocking wave advancement waits for the ticket to be filed, not for the issues to be resolved. The remediation work itself happens separately.

Every worker already runs its own quality gates (typecheck, lint, test, build, security review, code review) during Phase 4. Verification is different because:

Worker-side (Phase 4)Verification (orchestrator)
Same agent that wrote the codeDifferent agent with fresh context
Trusts its own claimsAdversarial — assumes nothing
Runs inside worker context budgetRuns inside orchestrator context — can read full diff + ticket + history
Can be fooled by its own rationalizationsHas no priors about what should be there

Worker Phase 4 is necessary but not sufficient. The orchestrator’s verification is the thing that catches “I wrote a test that calls expect(true).toBe(true) and shipped it.”

Each verification run emits events:

verification-started detail: { ticket, round, verifier-model }
verification-passed detail: { ticket, round }
verification-failed detail: { ticket, round, findings: [...] }

These appear in the dashboard and the /events SSE stream, so you can watch verification run live on the same screen as phase progress.

Controlled via catalyst.orchestration in .catalyst/config.json:

FieldTypeDefaultDescription
verifyBeforeMergebooleantrueRun adversarial verification on merged commits (post-merge)
allowSelfReportedCompletionbooleanfalseWhen true, verification failures are advisory — wave advances without waiting for a remediation ticket to be filed

To disable verification (not recommended):

{
"catalyst": {
"orchestration": {
"verifyBeforeMerge": false
}
}
}

To allow waves to advance even when verification finds gaps:

{
"catalyst": {
"orchestration": {
"allowSelfReportedCompletion": true
}
}
}

Running /catalyst-dev:oneshot standalone (no orchestrator)? You don’t get verification — it’s orchestrator-only. The standalone path runs Phase 4 gates and that’s it. If you want adversarial verification without full orchestration, the workaround is to open the PR, then manually run the code-reviewer agent and silent-failure-hunter agent against it. Or just wrap the oneshot in a single-worker orchestrator — verification will run.