Keftek

GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows

Advanced agent harnesses beat the base model by 172%, proving execution design matters more than model size for real-world workflow completion.

AI Agents
Read original on arXiv