Anthropic shipped Claude Fable 5 today — June 9, 2026 — and I'm just going to say it: this is the most impressive piece of software I have ever pointed at my own code. I gave it the worst thing I own — a crusty, self-hosted legacy CRM — and watched it plan, rebuild, debug and regression-test a whole module in a single working day. I've spent a year building AI agents that actually work→, and I know the difference between a demo and a workhorse. This is the first model where I stopped checking its work line by line and started reviewing it like a senior colleague.
Yes, that's hype. Today it's earned. Here's the model, the numbers, and the receipt from my own repo.
What Claude Fable 5 actually is
Fable 5 is what Anthropic calls a *Mythos-class* model — a capability tier above the Opus line. The launch twist: it's one base model shipped as two products. Claude Fable 5 is generally available with safety classifiers attached. Claude Mythos 5 is the same model with some safeguards lifted — "the strongest cybersecurity capabilities of any model in the world," in Anthropic's words — restricted to vetted cyberdefense partners under Project Glasswing.
The classifiers are the price of getting Mythos-class capability the same day it exists. Queries touching cybersecurity, biology/chemistry or model distillation fall back to Claude Opus 4.8, and you're told when it happens — under 5% of sessions on average. In a full day of CRM work I never hit it once. If you build web software, the guardrails live somewhere you rarely visit.
Claude Fable 5 benchmarks: lapping the field
The numbers against the current frontier (Anthropic, The Decoder):
And the party tricks are absurd: it beat Pokémon FireRed with vision alone where earlier models needed a scaffold of helper tools, rebuilt a web app's source code from screenshots, and composed an EDM remix in code — having never heard music.
One asterisk worth knowing, credit to Digital Applied for reading the fine print: on the starred cyber/bio benchmarks, Anthropic's table shows the restricted *Mythos 5* score — deployable Fable falls back to Opus 4.8 on exactly those topics. For everything else — coding, analysis, vision, long-horizon agent work — Fable and Mythos are effectively the same model.
What Stripe, Cursor and GitHub are saying
The early-access reactions read like nothing I've seen at a model launch. Stripe says Fable 5 compressed months of engineering into days — a codebase-wide migration in a 50-million-line Ruby monolith done in a day instead of two team-months. Cursor's CEO: "the state of the art model on CursorBench… it's opened up a class of long-horizon problems that were out of reach." GitHub calls the autonomy and reliability on long-horizon tasks beyond anything they've benchmarked. Lovable's CTO says it best for builders: apps that took a hundred prompts a year ago, it now one-shots.
The recurring theme — and this matches my day exactly — is not raw IQ. It's *endurance*: the longer and more complex the task, the larger Fable 5's lead.
My launch-day test: a legacy CRM lead modal
Forget lab benchmarks. Mine is a self-hosted Perfex CRM — CodeIgniter 3, Bootstrap 3, a decade of jQuery patterns, custom `my_` view overrides on top of vendor files, and a lead modal that had grown ugly and half-finished over years of patches. This is the kind of codebase that humbles AI models. Not this one.
What it shipped in one session
The bugs it found that I never asked about
Duplicate form IDs rendering twice in the DOM. A `type=" button"` typo that made a Close button silently submit the form. A website field that could never be filled when empty. An unguarded variable that 500'd the "new lead" flow. Four real bugs, caught in passing, while doing something else.
The tests it wrote to protect its own work
An 11-step Playwright E2E suite that creates a test lead, walks every tab, proves the partial-update behavior via DOM markers, and deletes its own test data — even on failure. The final commit: 12 files, +3,608/−848 lines, green build. My usual AI peer review workflow→ found nothing to flag.
What floored me wasn't the speed. It's that the model *respected the system* — it read the codebase's conventions (view overrides, hook patterns, CSRF via global ajax setup, jQuery delegation order) and worked inside them instead of rewriting the world. That's the difference between a demo and a colleague.
Pricing and the two-week window
Fable 5 costs $10 per million input tokens and $50 per million output — double Opus 4.8, with a 90% prompt-caching discount on input. The rollout is staged: included on Pro, Max, Team and seat-based Enterprise plans at no extra cost from today through June 22, then usage credits from June 23 until capacity allows it back as standard. Translation: you have two weeks to try the best model on the market for free with your existing plan. Use them.
Verdict: the new bar
I've run GPT-5.5 and Gemini 3.1 Pro on this same codebase in recent months. Capable models — that need a supervisor. Fable 5 worked like a colleague: planning, verifying its own output, catching bugs adjacent to the task, writing the tests that protect its own work, and shipping behind a green build. The benchmarks say it laps the field; my repo says the benchmarks aren't lying. Claude Fable 5 is the best model on the market right now, and it isn't close.
Full disclosure, in the spirit of this blog: Fable 5 helped draft this post about itself, working from today's sources and its own commit log. I reviewed every claim. That sentence would have been a gimmick a year ago. Today it's just the workflow.


