AI peer review for Claude Code is now a free open-source skill. It lets Claude, Codex, and Gemini review each other's code through CLI handoffs — with a structured verdict, file:line findings, and an honesty contract baked into the response format.
The skill is called `ai-collab-bridge`. MIT licensed. Installs in 30 seconds. I tested it on macOS arm64; the bash scripts are portable to Linux.
TL;DR
Install the AI peer review skill in 30 seconds
git clone https://github.com/owgit/ai-collab-bridge ~/.claude/skills/ai-collab-bridge
chmod +x ~/.claude/skills/ai-collab-bridge/scripts/*.sh
~/.claude/skills/ai-collab-bridge/scripts/doctor.shThe `doctor.sh` script verifies your skill files, required tooling, and each AI CLI (Claude, Codex, Gemini) in one pass. Exit code equals the number of failed checks. If anything is broken, the output tells you the exact fix command. In my testing on a fresh macOS install, the whole verification ran in under two seconds.
Then start a new Claude Code session — the AI peer review skill is auto-discovered — and trigger it with something like *"use the AI Collab Bridge to review my latest commit."*
How AI peer review actually works

The protocol is small enough to fit in one paragraph. The implementer — whichever AI just finished work — stages a packet. The packet is a single markdown file with a summary, focus questions, the diff, and the file list. The implementer then invokes the reviewer's CLI with that packet wrapped in a review-request template. The reviewer reads the packet and responds.
Stage a packet (implementer side)
SUMMARY="Brief description of the change" \
QUESTIONS="Anything you want focused on" \
~/.claude/skills/ai-collab-bridge/scripts/stage-packet.sh main > /tmp/packet.mdThat produces a markdown file with everything the reviewer needs: your summary, your focus questions, the `git diff --stat`, and the full diff against `main`.
Dispatch the AI peer review
~/.claude/skills/ai-collab-bridge/scripts/request-review.sh codex /tmp/packet.md
# or claude / gemini — same shape, different targetThe script wraps your packet in the review-request template, sends it to the target AI's CLI, and prints the structured response.
The response format
Every AI peer review response follows the same five-section template:
The last item is the one that surprised me when I first designed it. In my experience, the most valuable section of any code review is the one where the reviewer is forced to disclose what they did not look at. Without it, you cannot tell whether an `APPROVE` means "I checked everything and it is fine" or "I skimmed the diff and nothing jumped out." With it, the implementer knows exactly where the reviewer's scope ends.
That is the honesty contract underneath every AI peer review. The protocol cannot enforce it. It can only make dishonesty harder to hide.
The benchmark: 100% vs 31% pass rate
I ran the AI peer review skill against vanilla Claude as baseline. Three test cases — reviewer role, implementer handoff, onboarding walkthrough — same prompts, same model, with-skill versus without-skill.
| Metric | With skill | Without skill | Delta |
|---|---|---|---|
| --- | --- | --- | --- |
| Pass rate | 100% | 31% | +69 percentage points |
| Time | 67 s | 68 s | identical |
| Tokens | 60 k | 53 k | +14% |
In my measurements, the AI peer review skill costs about 14% more tokens (the model reads the role playbooks) and runs at the same speed. Cheap insurance for structured, honest output.
The meta moment: the bridge reviewing itself

After I shipped v0.1, I used the bridge to review the bridge's own next commit. Codex caught two bugs I had missed in my doctor script and probe logic. I fixed both, shipped v0.2, and ran the bridge again. Codex caught two more bugs. I fixed those, shipped v0.3.
The four bugs the bridge found in itself
Not stylistic preferences, not fabricated concerns to look useful. Concrete things I tested after each fix:
None of those were obvious to me when I wrote them. All of them were obvious to Codex when it read the diff cold, without my implementation bias.
That is the entire pitch for AI peer review. Different minds, different blind spots, structured handoff. The work gets better and you do not have to think about it.
What I learned about the Codex CLI
Most of the work in shipping this AI peer review skill was not the protocol. It was figuring out how to invoke each AI's CLI without it doing something unexpected.
Codex MCP bootstrap hang
When you run `codex exec`, it loads `~/.codex/config.toml` and tries to bootstrap any MCP servers you have configured. If those servers require their own auth (Cloudflare Workers MCP, GitHub Copilot MCP, and so on) and you have not set up tokens for non-interactive use, codex stalls — silently. In my first review dispatch, the process hung for ten minutes before I traced it back to the MCP bootstrap.
The fix is `--ignore-user-config`. The codex CLI also has a dedicated `codex exec review` subcommand designed for exactly this use case, plus an `-o` flag that writes only the agent's final message to a file. See the OpenAI Codex CLI repo and the Codex documentation for the full reference.
The npm vendor binary ENOENT
`@openai/codex` installed via npm sometimes ships without its platform-specific vendor binary. `which codex` succeeds, but invoking it fails with a cryptic Node `ENOENT` stack trace pointing at a missing arm64 binary. I hit this on my own machine and the fix is `npm uninstall -g @openai/codex && npm install -g @openai/codex`. The bridge's `doctor.sh` now detects this exact pattern and surfaces the fix directly.
Supported AI CLIs
| AI | Default invocation | Override env var |
|---|---|---|
| --- | --- | --- |
| Claude | `claude -p` | `AI_COLLAB_CLAUDE_CMD` |
| Codex | `codex exec review --ignore-user-config --ephemeral` | `AI_COLLAB_CODEX_CMD` |
| Gemini | `gemini -p` | `AI_COLLAB_GEMINI_CMD` |
Adding another AI is one `case` line in `scripts/request-review.sh` plus an optional role playbook under `references/`. PRs welcome.
Why AI peer review matters for multi-AI workflows
If you have been following the move from "AI as autocomplete" to "AI as collaborator," you have probably already noticed that the next bottleneck is not the individual models. The next bottleneck is how the models work with each other and with your tools. That is also what I argued in my agent-ready website checklist→ — the systems around the model matter more than the model itself.
In my own day-to-day, this looks like:
Without AI peer review, those workflows run in parallel but in isolation. Each AI sees only what is in front of it. With the AI peer review skill in place, the diffs flow between them in a structured way, and the work compounds.
The same pattern shows up across angles I have written about before — from the comparison side in Claude Code vs Cursor for 2026→, from the tooling side in how to build an MCP server with TypeScript→, and from the structure side in the cross-repo AI context layer→. The agents in your loop need clean interfaces with each other, not just clean interfaces with humans.
Try the AI peer review skill yourself
The AI peer review skill is open-source and lives at `github.com/owgit/ai-collab-bridge`.
git clone https://github.com/owgit/ai-collab-bridge ~/.claude/skills/ai-collab-bridge
chmod +x ~/.claude/skills/ai-collab-bridge/scripts/*.sh
~/.claude/skills/ai-collab-bridge/scripts/doctor.shThere are `AGENTS.md` and `CLAUDE.md` files at the repo root, so Codex and Claude Code both have explicit entry points if you point them at the repo cold. The protocol works for any AI with a CLI — same packet format, same response template, same three verdicts. In my workflow, this pairs naturally with the security review you should be running→ on your AI-generated code.
If you want to contribute, the next obvious steps are documented in the repo: enforcing the response format via `--output-schema`, adding support for more CLIs, and building reviewer-as-Claude scenarios beyond the reviewer-as-Codex paths I have already tested.
The point
The boundaries between AI models are conventions, not facts. We get to decide whether they cooperate or compete. If you have two AIs working on the same codebase and they are not reviewing each other, you are leaving real bugs in the code.
The AI peer review skill is one structured way to fix that. It is free, MIT-licensed, and installs in 30 seconds. Go give your AIs a colleague.



