Building a Multi-LLM Plan Critique System for Claude Code
This is a deep dive into the plan critique system I mentioned in my AI coding workflow post. It's a Claude Code hook that blocks implementation until multiple LLMs have reviewed the plan.

The Goal
When Claude creates an implementation plan, I want:
- Test-first enforcement — Block if no test files listed
- Gemini 3 Flash critique — Architectural review, coverage gaps
- Codex second opinion — What did Gemini miss?
- All feedback visible to Claude before implementation starts
How Claude Code Hooks Work
Hooks are commands that run at specific points in Claude's workflow. The one I care about is PostToolUse — runs after a tool completes.
In ~/.claude/settings.json:
{
"hooks": {
"PostToolUse": [
{
"matcher": "ExitPlanMode",
"hooks": [
{
"type": "command",
"command": "~/.claude/hooks/critique-plan.sh",
"timeout": 600
}
]
}
]
}
}
This triggers my script whenever Claude exits plan mode (i.e., the plan is ready for approval).
The Architecture
The Hard Parts
Problem 1: Blocking Doesn't Work with JSON
My first attempt returned:
{
"decision": "block",
"reason": "TEST-FIRST VIOLATION: Plan rejected..."
}
Claude Code said "hook succeeded" and kept going. The decision: block was ignored.
Fix: Use exit code 2 + stderr instead:
# This gets ignored
cat << EOF
{"decision": "block", "reason": "..."}
EOF
exit 0
# This actually blocks
cat >&2 << EOF
TEST-FIRST VIOLATION - Plan Rejected
...
EOF
exit 2
Exit code 2 signals "hook blocked this action" and stderr content becomes the message Claude sees.
Problem 2: Background Processes Are Useless
I tried running the LLM critiques in the background:
( ... gemini ... codex ... ) &
Hook returned immediately. Critique ran in the background. Claude never saw it because the hook was already done.
Fix: Make it synchronous. Yes, it takes 20-30 seconds. Worth it.
# Blocking - Claude waits and sees the result
GEMINI_RESULT=$(opencode run -m "openrouter/google/gemini-3-flash-preview" "$PROMPT")
CODEX_RESULT=$(codex exec --full-auto "$PROMPT")
Problem 3: Plan Freshness
The hook reads the most recent plan file from ~/.claude/plans/. But if the user takes too long reviewing, the plan gets "stale" and the hook silently exits:
Plan age: 288s (max 120s)
EXIT: Plan too old
Claude sees "hook succeeded" and proceeds.
Partial fix: Increased timeout to 600s. Better fix would be reading from tool_response.plan in the hook input instead of relying on file timestamps.
The Code
Here's the main hook script. It's messy but it works.
#!/bin/bash
set -euo pipefail
DEBUG_LOG="$HOME/.claude/logs/critique-hook.log"
mkdir -p "$(dirname "$DEBUG_LOG")"
log_debug() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> "$DEBUG_LOG"
}
# Find most recent plan file
PLAN_FILE=$(ls -t ~/.claude/plans/*.md 2>/dev/null | head -1)
if [[ ! -f "$PLAN_FILE" ]]; then
exit 0 # No plan, nothing to do
fi
# Freshness check
PLAN_AGE=$(($(date +%s) - $(stat -f %m "$PLAN_FILE")))
if [[ $PLAN_AGE -gt 600 ]]; then
exit 0 # Too old, skip
fi
PLAN_CONTENT=$(cat "$PLAN_FILE")
# Test-first check (separate script)
TEST_CHECK=$(echo "$PLAN_CONTENT" | ~/.claude/hooks/test-first-check.sh)
TEST_PASSED=$(echo "$TEST_CHECK" | jq -r '.passed')
if [[ "$TEST_PASSED" == "false" ]]; then
ISSUES=$(echo "$TEST_CHECK" | jq -r '.issues | join("\n- ")')
cat >&2 << EOF
TEST-FIRST VIOLATION - Plan Rejected
Issues:
- $ISSUES
Fix: Add test files BEFORE implementation files in your plan.
EOF
exit 2
fi
# Gemini critique
GEMINI_PROMPT="You are a senior architect reviewing an implementation plan...
$PLAN_CONTENT"
GEMINI_RESULT=$(opencode run -m "openrouter/google/gemini-3-flash-preview" "$GEMINI_PROMPT" 2>/dev/null || echo "[unavailable]")
# Codex reviews Gemini's critique
CODEX_PROMPT="Review this plan AND Gemini's critique. What did Gemini miss?
---
PLAN:
$PLAN_CONTENT
---
GEMINI'S CRITIQUE:
$GEMINI_RESULT"
CODEX_RESULT=$(codex exec --full-auto "$CODEX_PROMPT" 2>/dev/null || echo "[unavailable]")
# Return to Claude
FULL_CRITIQUE="PLAN CRITIQUE FROM GEMINI + CODEX
## Gemini 3 Flash
$GEMINI_RESULT
## Codex
$CODEX_RESULT"
jq -n --arg ctx "$FULL_CRITIQUE" '{
"hookSpecificOutput": {
"hookEventName": "PostToolUse",
"additionalContext": $ctx
}
}'
What the Critiques Look Like
Here's an actual critique from a recent plan:
Gemini:
Test Coverage (PRIMARY CRITIQUE - INSUFFICIENT) While the plan includes
collection-viewer.test.ts, it has significant gaps:
- Missing Add Page Tests: You modify
src/app/collection/[id]/add/page.tsxbut provide no corresponding test file.- Null Safety:
currentAndeeTagcan now benull. Tests must verify sub-components don't crash.
Codex:
- Missed test scenario:
?viewer=present butuseCurrentAndeereports authenticated; ensure viewer override wins- Test quality issue:
page.test.tsxusesvi.mockinside tests; this won't rewire imports after module is loaded
They catch different things. That's the point.
What's Missing
Things I'd improve:
- Read plan from hook input instead of filesystem (avoids staleness issues)
- Add confidence weights — if both models agree on an issue, flag it higher
- Auto-apply suggestions — generate a diff for test files mentioned in critiques
- More models — GPT-4o, Claude itself reviewing its own plan
Related
- My overall AI coding workflow
- Test-first enforcement — the validator script
- The test-first-check.sh script details (coming soon)
The hook code lives in my dotfiles. Still iterating on it.