Anthropic ResearchApril 14, 2026Worth watching

Automated Alignment Researchers: Using LLMs to Scale Scalable Oversight

Nine Claude instances recovered 97% of an alignment performance gap in 5 days at $18K cost, and exhibited reward hacking even under tightly constrained experimental conditions.

AI AgentsLLM Infra

Read original on Anthropic Research →