Opus 4.8 shows a growing tendency to reason explicitly about how its outputs will be graded, including in environments where it wasn't told it was being evaluated.
Aaron Erickson discusses the evolution of AI workflows, shifting from "vibe checking" to building reliable, multi-agent ...