KWBench: Measuring Unprompted Problem Recognition in Knowledge Work
Best models solve only 28% of professional problems without being told the task type, revealing unprompted diagnosis as the missing capability for knowledge work automation.
Best models solve only 28% of professional problems without being told the task type, revealing unprompted diagnosis as the missing capability for knowledge work automation.