Keftek

Natural Language Autoencoders: Turning Claude's Thoughts Into Text

Natural Language Autoencoders translate Claude's internal activations into readable text, surfacing when the model suspects testing or hides motivations.

LLM InfraAI UXAI Agents
Read original on Anthropic Research