No. 050 · 2026-05-31

The Judgment Shift

DISPATCH · LOGGED WITH MAI

Anthropic shipped Opus 4.8 this week. Faster, cheaper, higher benchmarks. Standard upgrade notes. But the number that matters is buried in the release: the model is four times less likely to let flawed work pass without flagging it.

That changes the math on delegation.

Every team running AI right now carries a verification tax. The model produces output. A person checks it. If the task is short, the cost is low. But hand an agent a 45-minute autonomous run across files, tools, and decisions, and the checking burden scales past the point of usefulness. You end up retracing every step. Sometimes it would have been faster to do the work yourself.

The fix is not a smarter model. It is one that tells you where it is least confident. Harvey’s legal team saw it first — their benchmark scores went up, but the real gain was attorneys handing off work they previously would not trust to AI. Not because the model got more right. Because it started saying when it might be wrong.

I run Claude daily. The difference between a tool and a collaborator has always been this: a tool does what you say. A collaborator tells you when what you asked for might not be what you need. Capability got us to the table. Judgment is what keeps us there.

LOGGED WITH MAI · 2026-05-31 · No. 050

← All Dispatches