Claude Fable 5 Is the Best Model on the Market Right Now
Tech
Claude Fable 5
Anthropic
AI models
Agentic coding

Claude Fable 5 Is the Best Model on the Market Right Now

Anthropic dropped Claude Fable 5 today and I let it loose on a legacy CRM. It planned, shipped and tested a full rebuild in one day. Nothing else comes close.

Uygar DuzgunUUygar Duzgun
Jun 9, 2026
7 min read
Recommended reading

Anthropic shipped Claude Fable 5 today — June 9, 2026 — and I'm just going to say it: this is the most impressive piece of software I have ever pointed at my own code. I gave it the worst thing I own — a crusty, self-hosted legacy CRM — and watched it plan, rebuild, debug and regression-test a whole module in a single working day. I've spent a year building AI agents that actually work, and I know the difference between a demo and a workhorse. This is the first model where I stopped checking its work line by line and started reviewing it like a senior colleague.

Yes, that's hype. Today it's earned. Here's the model, the numbers, and the receipt from my own repo.

What Claude Fable 5 actually is

Fable 5 is what Anthropic calls a *Mythos-class* model — a capability tier above the Opus line. The launch twist: it's one base model shipped as two products. Claude Fable 5 is generally available with safety classifiers attached. Claude Mythos 5 is the same model with some safeguards lifted — "the strongest cybersecurity capabilities of any model in the world," in Anthropic's words — restricted to vetted cyberdefense partners under Project Glasswing.

The classifiers are the price of getting Mythos-class capability the same day it exists. Queries touching cybersecurity, biology/chemistry or model distillation fall back to Claude Opus 4.8, and you're told when it happens — under 5% of sessions on average. In a full day of CRM work I never hit it once. If you build web software, the guardrails live somewhere you rarely visit.

Claude Fable 5 benchmarks: lapping the field

The numbers against the current frontier (Anthropic, The Decoder):

SWE-bench Pro (agentic coding): 80.3% — versus 69.2% for Claude Opus 4.8, 58.6% for GPT-5.5, and 54.2% for Gemini 3.1 Pro. That's not a lead, that's a lap.
FrontierCode Diamond (production-grade coding): 29.3% versus 13.4% for Opus 4.8 — and 5.7% for GPT-5.5. More than 5x OpenAI's flagship.
GDPval-AA (knowledge work): 1932 versus 1890 for Opus 4.8
Hebbia's analytics benchmark: first model ever to break 90% — a 10-point jump over Opus

And the party tricks are absurd: it beat Pokémon FireRed with vision alone where earlier models needed a scaffold of helper tools, rebuilt a web app's source code from screenshots, and composed an EDM remix in code — having never heard music.

One asterisk worth knowing, credit to Digital Applied for reading the fine print: on the starred cyber/bio benchmarks, Anthropic's table shows the restricted *Mythos 5* score — deployable Fable falls back to Opus 4.8 on exactly those topics. For everything else — coding, analysis, vision, long-horizon agent work — Fable and Mythos are effectively the same model.

What Stripe, Cursor and GitHub are saying

The early-access reactions read like nothing I've seen at a model launch. Stripe says Fable 5 compressed months of engineering into days — a codebase-wide migration in a 50-million-line Ruby monolith done in a day instead of two team-months. Cursor's CEO: "the state of the art model on CursorBench… it's opened up a class of long-horizon problems that were out of reach." GitHub calls the autonomy and reliability on long-horizon tasks beyond anything they've benchmarked. Lovable's CTO says it best for builders: apps that took a hundred prompts a year ago, it now one-shots.

The recurring theme — and this matches my day exactly — is not raw IQ. It's *endurance*: the longer and more complex the task, the larger Fable 5's lead.

My launch-day test: a legacy CRM lead modal

Forget lab benchmarks. Mine is a self-hosted Perfex CRM — CodeIgniter 3, Bootstrap 3, a decade of jQuery patterns, custom `my_` view overrides on top of vendor files, and a lead modal that had grown ugly and half-finished over years of patches. This is the kind of codebase that humbles AI models. Not this one.

What it shipped in one session

Mapped the entire feature before touching it — every PHP view, override, CSS block and JS handler wired to the modal, including module hooks I'd forgotten existed
Rebuilt the notes tab into a date-grouped timeline with a smart composer: draft autosave per lead, Cmd+Enter to save, inline editing — and partial AJAX updates so adding a note no longer re-renders the whole modal
Solved the integration problem properly: the vendor's core JS re-renders everything on submit via a body-delegated handler. Fable intercepted it with a closer delegate and `stopPropagation`, kept the vendor's last-contact tracking intact, and left every other consumer of those globals untouched

The bugs it found that I never asked about

Duplicate form IDs rendering twice in the DOM. A `type=" button"` typo that made a Close button silently submit the form. A website field that could never be filled when empty. An unguarded variable that 500'd the "new lead" flow. Four real bugs, caught in passing, while doing something else.

The tests it wrote to protect its own work

Recommended reading

An 11-step Playwright E2E suite that creates a test lead, walks every tab, proves the partial-update behavior via DOM markers, and deletes its own test data — even on failure. The final commit: 12 files, +3,608/−848 lines, green build. My usual AI peer review workflow found nothing to flag.

What floored me wasn't the speed. It's that the model *respected the system* — it read the codebase's conventions (view overrides, hook patterns, CSRF via global ajax setup, jQuery delegation order) and worked inside them instead of rewriting the world. That's the difference between a demo and a colleague.

Pricing and the two-week window

Fable 5 costs $10 per million input tokens and $50 per million output — double Opus 4.8, with a 90% prompt-caching discount on input. The rollout is staged: included on Pro, Max, Team and seat-based Enterprise plans at no extra cost from today through June 22, then usage credits from June 23 until capacity allows it back as standard. Translation: you have two weeks to try the best model on the market for free with your existing plan. Use them.

Verdict: the new bar

I've run GPT-5.5 and Gemini 3.1 Pro on this same codebase in recent months. Capable models — that need a supervisor. Fable 5 worked like a colleague: planning, verifying its own output, catching bugs adjacent to the task, writing the tests that protect its own work, and shipping behind a green build. The benchmarks say it laps the field; my repo says the benchmarks aren't lying. Claude Fable 5 is the best model on the market right now, and it isn't close.

Full disclosure, in the spirit of this blog: Fable 5 helped draft this post about itself, working from today's sources and its own commit log. I reviewed every claim. That sentence would have been a gimmick a year ago. Today it's just the workflow.

Sources

FAQ

What is the difference between Claude Fable 5 and Claude Mythos 5?+
They are the same underlying Mythos-class model. Fable 5 is generally available with safety classifiers that route cybersecurity, biology/chemistry and distillation queries to Claude Opus 4.8. Mythos 5 has those safeguards lifted in some areas and is restricted to vetted Project Glasswing partners.
How much does Claude Fable 5 cost?+
$10 per million input tokens and $50 per million output tokens — double Claude Opus 4.8 — with a 90% prompt-caching discount on input. It is included in paid Claude plans from June 9 to June 22, 2026, then requires usage credits from June 23.
Is Claude Fable 5 better than GPT-5.5 for coding?+
On published benchmarks, yes: Fable 5 scores 80.3% on SWE-bench Pro versus 58.6% for GPT-5.5, and 29.3% on FrontierCode Diamond versus 5.7%. In my own launch-day test on a legacy CRM codebase it planned, executed and regression-tested a full module rebuild in one session.
What are Claude Fable 5's safeguards?+
Separate classifier models detect cybersecurity, biology/chemistry and distillation requests and hand the response to Claude Opus 4.8 instead, triggering in under 5% of sessions on average. Over 1,000 hours of external red-teaming found no universal jailbreak so far.

Recommended for you

PrestaShop Migration from Hestia to DirectAdmin

PrestaShop Migration from Hestia to DirectAdmin

A practical anonymized case study on moving a PrestaShop backend from Hestia to DirectAdmin with Redis, cron migration, cache checks, and AI review.

8 min read
Building AI Agents That Actually Work

Building AI Agents That Actually Work

How I built OpenClaw — an AI orchestration framework that manages autonomous agents with human oversight and real business results.

8 min read
OpenAI GPT-5.6: Facts, Rumors, and the Claude Fable 5 Gap

OpenAI GPT-5.6: Facts, Rumors, and the Claude Fable 5 Gap

GPT-5.6 is not official yet. Here is the sourced view of GPT-5.5, Claude Fable 5, and the gap OpenAI may need to close.

9 min read