Radar

Weekly signals across AI research, releases, and infrastructure. Filtered for what matters when you're building with it.

Pulse

Weekly indicators across AI research and open-weight releases

arXiv Papers →

cs.AI + cs.CL + cs.LG

892-5% vs 4w avg

Open-Weight Releases →

Notable models per week

4-6% vs 4w avg

Shifts strategyWorth watching↑↓Theme momentum vs 4-week avg

On Friday, Codex stopped needing a terminal. It can now see, click, and type its way through Windows apps the way a person does, the coding agent finally let out of the command line. Mistral shipped its own answer the same week, a self-hostable agent split into a work mode and a code mode. Anthropic put Opus 4.8 out at the price of its predecessor and raised $65 billion at a $965 billion valuation, on run-rate revenue that crossed $47 billion this month. Capability and capital point the same way, toward agents that operate real software instead of describing it.

The quieter results drew the other line. LongDS-Bench ran five frontier models through long data-analysis tasks and watched accuracy fall nearly 47 points from the first turns to the last, most failures tracing to a lost grip on the problem rather than too few steps. More steps did not help. Anthropic's survey of 1,260 social scientists found 81% had tried a chatbot and only 20% reached for a coding agent. Driving a desktop is an engineering problem now. Holding the thread forty turns later is still a research one, and that gap is what this week's valuations are quietly pricing past.

DateSourceTitle

May 29OpenAICodex Computer Use on WindowsAI AgentsAutomation May 28AnthropicIntroducing Claude Opus 4.8LLM InfraAI Agents May 28AnthropicAnthropic raises $65B in Series H funding at $965B post-money valuationIndustry May 28Mistral AIVibe gets to work.AI AgentsAutomation May 28Mistral AIIntroducing Search ToolkitLLM InfraAutomation May 27Anthropic ResearchCoding agents in the social sciencesApplied AIAI Agents May 27Hugging Face PapersLongDS-Bench: On the Failure of Long-Horizon Agentic Data AnalysisAI AgentsApplied AI May 26Hugging Face PapersTask-Focused Memorization for Multimodal AgentsAI AgentsAI UX

When KPMG flipped the switch this week, every one of its 276,000 employees got an AI teammate the same day. PwC had done the same with 30,000 staff five days earlier. Two of the Big Four wired Claude into their delivery floors inside a week. The professional services adoption curve just bent.

Underneath the deal, the agent runtime war broke open. Anthropic bought Stainless, the SDK and MCP generator that wires Claude into every other system, and previewed self-hosted sandboxes for buyers who cannot send data to the cloud. OpenAI ran the same play from the other side, putting Codex inside Dell's on-prem AI Factory, with 4 million weekly Codex developers and a use case that already stretches past coding into report drafting and lead qualification. Google countered at I/O with Managed Agents available behind a single Gemini API call, plus WebMCP, an open standard letting any website expose tools to a browser agent. Anthropic also started metering agent compute across its tiers, the first vendor to put long-running agents on their own billing line. Where the agent runs is now contested ground, not whether the model can.

DateSourceTitle

May 22Anthropic ResearchProject Glasswing: An Initial UpdateIndustryApplied AI May 22InfoWorldAnthropic Puts Claude Agents on a Meter Across SubscriptionsIndustryAI AgentsLLM Infra May 22OpenAIOpenAI Named a Leader in Enterprise AI Coding Agents by GartnerIndustryAI Agents May 22AI NewsOpenAI Opens Singapore AI Lab as IMDA Updates Agentic AI FrameworkIndustryApplied AI May 22AI Agent NewsSalesforce Agentforce Coworker Enters BetaAI UXAI AgentsApplied AI May 21InfoQCode with Claude (London): Managed Agents UpdatesAI AgentsLLM Infra May 20AI Agent NewsCamunda ProcessOS Enters Closed BetaAutomationAI Agents May 20AI NewsAlibaba Designs Zhenwu M890 Chip Around AI AgentsLLM InfraIndustryAI Agents May 19AnthropicKPMG Integrates Claude Across Its Core Business and WorkforceApplied AIIndustryAI Agents May 19Google Developers BlogGoogle I/O: Gemini 3.5 + Antigravity Agent PlatformAI AgentsLLM InfraIndustry May 19Google Developers BlogWebMCP: An Open Standard for Browser AgentsAI AgentsAI UXLLM Infra May 19OpenAIAdvancing Content Provenance for a Safer AI EcosystemAI UXIndustry May 18AnthropicAnthropic Acquires StainlessAI AgentsLLM InfraIndustry May 18OpenAIOpenAI and Dell Bring Codex to Hybrid and On-Premises EnterprisesApplied AIAI AgentsAutomation

Tomoro was a 150-person AI consultancy on Monday morning. By Monday afternoon it was OpenAI's services arm, anchored to a $4B joint venture with TPG and eighteen co-investors. Exactly a week earlier Anthropic had spun up the same kind of vehicle with Blackstone. Two frontier labs, two implementation businesses, seven days apart. The labs have stopped waiting for the consultancies to close the gap between selling a model and actually adopting one.

Anthropic spent the rest of the week building out the services layer underneath. A Claude for Small Business package landed with fifteen prebuilt workflows wired into the accounting and productivity tools SMBs already buy. PwC committed 30,000 US staff to a joint Center of Excellence and pointed at production deployments delivering up to 70% faster delivery in underwriting, mainframe modernization, and HR. A $200M four-year Gates Foundation deal followed, aimed at frontline health workers and smallholder farmers in low- and middle-income countries. The seat-license era of AI sales is closing. What replaces it looks less like Salesforce and more like Accenture, with the model vendor sitting on both sides of the table.

DateSourceTitle

May 14Anthropic Research2028: Two scenarios for global AI leadershipIndustryLLM Infra May 14AnthropicPwC Expanded Partnership With AnthropicApplied AIIndustryAI Agents May 14AnthropicAnthropic and the Gates Foundation: $200M Four-Year PartnershipApplied AIIndustry May 13AnthropicClaude for Small BusinessApplied AIAI AgentsAutomation May 13AI NewsGoogle DeepMind's Magic Pointer Adds Context-Aware Reasoning to the CursorAI UXApplied AI May 13OpenAIOur Response to the TanStack npm Supply Chain AttackLLM InfraIndustry May 11AI NewsBain Sees US$100 Billion SaaS Market in Agentic AI AutomationIndustryAI AgentsApplied AI May 11OpenAIOpenAI Launches the OpenAI Deployment CompanyApplied AIIndustryAI Agents

On Wednesday the Pentagon canceled Anthropic's classified-work contracts and stamped a "supply chain risk" label on the company, days after Dario Amodei refused to allow the model to be used for domestic surveillance or autonomous weapons. Other vendors moved into the slot the same week. A safety stance now costs federal revenue, and Anthropic is the live test of whether what it gains elsewhere is worth more.

The elsewhere arrived fast. Three days earlier the company had spun up a private-equity-backed enterprise services firm with Blackstone and Goldman Sachs to put Applied AI engineers inside regional healthcare systems and mid-market manufacturers, buyers who had never had access to frontier deployment. Ten finance-vertical agent templates and a 300-megawatt SpaceX compute deal followed before the cancellation landed. Three alignment papers also shipped: agentic-misalignment rates fell from 96% to near zero by training on reasons rather than demonstrations, an open-source auditing tool was handed to an independent nonprofit, and a new technique caught the model suspecting it was being tested without admitting it. The deals are no longer being closed on capability. They are being closed on a chain of evidence that the model will not do the wrong thing when nobody is watching.

DateSourceTitle

May 8AI NewsRingCentral Adds Shopify, Calendly, and WhatsApp to AI ReceptionistApplied AIAutomationAI Agents May 8Anthropic ResearchTeaching Claude WhyAI AgentsApplied AIAI UX May 7AI NewsAI Helping Ease the UK's NHS BurdenApplied AIIndustry May 7Anthropic ResearchNatural Language Autoencoders: Turning Claude's Thoughts Into TextLLM InfraAI UXAI Agents May 7Anthropic ResearchDonating Our Open-Source Alignment ToolIndustryAI AgentsLLM Infra May 6AI NewsUS Government Increases AI Suppliers and Rethinks Anthropic's RoleIndustryAI Agents May 6AnthropicHigher Usage Limits for Claude and a Compute Deal With SpaceXIndustryLLM Infra May 6AI NewsGoogle Tests Remy AI Agent for Gemini as Focus Turns to User ControlAI AgentsAI UXApplied AI May 6AI NewsHP and the Art of AI and Data for the EnterpriseLLM InfraIndustryApplied AI May 5AnthropicAgents for Financial ServicesAI AgentsApplied AIIndustry May 4AnthropicBuilding a New Enterprise AI Services Company With Blackstone, Hellman & Friedman, and Goldman SachsApplied AIIndustryAI Agents May 4AI NewsPhysical AI Raises Governance Questions for Autonomous SystemsIndustryAI Agents

Microsoft spent four years convincing Wall Street it was the OpenAI company. This week OpenAI walked into Amazon's Bedrock console with GPT-5.5, Codex, and Managed Agents, behind a $50B commitment from Amazon. Anthropic was already there. Two frontier stacks, one buy button, no exclusivity left to defend. Mistral picked a different fight with self-hostable async coding agents that fan out, inspect their own diffs, and open pull requests when finished. Anthropic took a third route, dropping Claude into the creative stack through MCP, the Adobe and Ableton windows designers and musicians already had open. Three bets on where the work actually happens, none of them in the lab anymore.

Underneath the moves, the mood shifted. Google warned that hidden instructions on public web pages are now poisoning enterprise agents, and the firewalls and EDR stacks defenders already paid for cannot see the attack, because the agent is using legitimate credentials. Regulators flagged missing override paths. SAP turned governance into a sales pitch, IBM shipped Bob to keep AI-assisted SDLC budgets from running away, and Copilot moved from seats to per-token billing, which puts AI usage into the same forecasting bucket as cloud spend. The capability story flatlined for a week. The operational one is the one to watch.

DateSourceTitle

May 1AI NewsPer-Token AI Charges Come to GitHub CopilotIndustryLLM Infra May 1AI NewsSAP: How Enterprise AI Governance Secures Profit MarginsApplied AIIndustryAI Agents Apr 30Anthropic ResearchHow People Ask Claude for Personal GuidanceAI UXApplied AI Apr 30AI NewsAI Agent Governance Takes Focus as Regulators Flag Control GapsIndustryAI AgentsAI UX Apr 29MistralRemote Agents in Vibe, Powered by Mistral Medium 3.5AI AgentsLLM InfraMulti-Model Apr 29AI NewsIDC: How EMEA CIOs Can Jumpstart AI RolloutsIndustryApplied AI Apr 28OpenAIOpenAI Models, Codex, and Managed Agents Come to AWSIndustryLLM InfraAI AgentsMulti-Model Apr 28AnthropicClaude for Creative WorkApplied AIAI UXAI Agents Apr 28AI NewsIBM Launches AI Platform Bob to Regulate SDLC CostsApplied AIAutomationIndustry Apr 27AI NewsGoogle Warns Malicious Web Pages Are Poisoning AI AgentsAI AgentsIndustryAI UX Apr 27AnthropicAnthropic Names Theo Hourmouzis GM of Australia and New Zealand, Opens Sydney OfficeIndustry

The week's biggest move was a supply chain story, not a model story. Anthropic locked in up to 5 gigawatts of AWS Trainium capacity, with $25B more from Amazon and a $100B 10-year commitment going the other way, an explicit answer to the reliability strain showing up in Claude usage. NVIDIA and Google countered on Tuesday with A5X bare-metal instances claiming 10x lower inference cost per token, easing the unit economics for anyone running production agents at scale. Yann LeCun pulled the other direction, raising $1B for AMI Labs with twelve people and a thesis that modular world models will outperform monolithic LLMs on a fraction of the GPU budget. The compute layer is consolidating at the top while a counter-thesis tries to route around it from below.

The rest of the week filled in the agent stack. Band raised to build interaction infrastructure between agents, the routing, audit, and authority layer that becomes load-bearing once production agents start coordinating. Snowflake shipped Intelligence and Cortex Code for agentic workflows, Siemens shipped the Eigen Engineering Agent that runs PLC programming two to five times faster than humans, and NEC committed Claude and Claude Code to 30,000 employees as Anthropic's first Japan global partner. Anthropic's Economic Index of 81,000 Claude users found productivity gains cluster at the top and bottom of the pay scale, while early-career workers in AI-exposed roles report the most displacement anxiety, a pattern that will shape who buys AI-native services next. Law firms moved into a third stage of adoption with billing models pivoting from hourly to value-based, and Mythos found 271 vulnerabilities in Firefox 150, evidence that defender economics have flipped where the tooling is in place.

DateSourceTitle

Apr 24AnthropicAnthropic and NEC Partner to Build AI-Native Engineering at Scale in JapanIndustryApplied AI Apr 24AI NewsWhy AI Agents Need Interaction InfrastructureAI AgentsLLM InfraAutomation Apr 23AI NewsAMI Labs: Yann LeCun's $1B Bet on Modular World Models Over LLMsIndustryLLM Infra Apr 23AI NewsNVIDIA and Google A5X Bare-Metal Cuts Inference Cost 10xLLM InfraIndustry Apr 22Anthropic ResearchWhat 81,000 People Told Us About the Economics of AIApplied AIIndustry Apr 22AI NewsReversing Enterprise Security Costs With AI Vulnerability DiscoveryIndustryAI Agents Apr 22AI NewsAI in Law Firms Entering Its Closing SummariesApplied AIIndustry Apr 22Hugging FaceAgentic World Modeling: Foundations, Capabilities, Laws, and BeyondAI AgentsLLM Infra Apr 21AI NewsSnowflake Launches Intelligence and Cortex Code for Agentic WorkflowsApplied AIAI AgentsAutomation Apr 21AI NewsSiemens Eigen Engineering Agent Automates PLC ProgrammingAutomationAI AgentsApplied AI Apr 20AnthropicAnthropic and Amazon Expand to Up to 5 Gigawatts of ComputeIndustryLLM Infra Apr 20AI NewsHow to Prepare for and Remediate an AI System IncidentIndustryAI UXAI Agents Apr 20AI NewsAnthropic Walks Into the White House, and Mythos Is Why Washington Let It InIndustryAI Agents

The most telling moment this week came from a benchmark, not a launch: GTA-2 showed that wrapping the same base model in a better agent harness produced a 172% jump in task completion. Anthropic had shipped Opus 4.7 at the same price as 4.6 a day earlier, a quiet admission that the model tier is no longer where the leverage is. KWBench sharpened the point from the other side, finding that the best models solve only 28% of knowledge-work problems when nobody tells them what kind of problem it is. Raw model capability matters less than the structure around it, the harnesses, memory, and unprompted diagnosis that decide whether an agent actually works.

The enterprise stack is forming to match. Commvault launched what is effectively a Ctrl-Z for cloud agents, monitoring every API call across AWS, Azure, and GCP so unwanted actions can be reversed without touching legitimate work. OpenAI's Agents SDK added native sandbox execution the same week, isolating credentials from model-generated code. SAP pushed agentic AI deep into SuccessFactors, from recruiting to payroll. Anthropic let nine Claude instances run alignment experiments on themselves for $18K and five days, showing how agent oversight could eventually be done by other agents. The next twelve months belong less to whoever builds the smartest model and more to whoever builds the scaffolding around it, the kind that makes agents more capable and the kind that keeps them honest.

DateSourceTitle

Apr 17arXivGTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended WorkflowsAI Agents Apr 17AnthropicIntroducing Claude Design by Anthropic LabsAI UXApplied AI Apr 16AnthropicIntroducing Claude Opus 4.7IndustryAI AgentsLLM Infra Apr 16AI NewsOpenAI Agents SDK Adds Sandbox Execution and GovernanceAI AgentsAutomation Apr 16arXivExperience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM AgentsAI AgentsLLM Infra Apr 15arXivKWBench: Measuring Unprompted Problem Recognition in Knowledge WorkApplied AIAI Agents Apr 15AI NewsCommvault Launches AI Protect: Rollback Infrastructure for Cloud AI AgentsAI AgentsAutomation Apr 14Anthropic ResearchAutomated Alignment Researchers: Using LLMs to Scale Scalable OversightAI AgentsLLM Infra Apr 14AI NewsSAP Brings Agentic AI to Human Capital Management in SuccessFactorsApplied AIAutomation

Before this week, agents were a demo category. After Monday they were a national-security event. Anthropic's Mythos preview, run through Project Glasswing with forty security partners, autonomously found thousands of zero-days, including a 17-year-old remote code execution bug buried in FreeBSD. Days later, Treasury Secretary Bessent and Fed Chair Powell pulled major bank CEOs into an urgent session to warn them what the model had just proven. The capability is no longer hypothetical, and neither is the asymmetry: a model can now uncover flaws that took humans seventeen years to notice, and it does not need a patch cycle to move on to the next target.

The rest of the week was the industry building rails before the next Glasswing-class event. Anthropic pushed Claude Managed Agents into public beta, a composable stack with managed harnesses, sandboxing, and streaming meant to take teams from prototype to production in days instead of months, alongside a five-principle framework for deploying agents with plan-mode review and granular tool permissions. Apple and other major tech companies moved in the opposite direction, shipping agents with deliberately narrow capabilities on the bet that user trust grows faster than raw autonomy. Meta released Muse Spark closed-source from its Superintelligence Labs, a pointed break from its open-weight history, with a Contemplating mode that runs parallel agent squads. The frontier has become too consequential to open-source casually, and the next twelve months favor builders who can ship guardrails as fast as they ship capability.

DateSourceTitle

Apr 10OpenAIThe Next Phase of Enterprise AIApplied AIAI AgentsIndustry Apr 10AI NewsWhy Companies Like Apple Are Building AI Agents With LimitsAI AgentsAI UX Apr 9Anthropic ResearchTrustworthy Agents in PracticeAI AgentsAI UX Apr 9AI NewsAgentic AI's Governance Challenges Under the EU AI Act in 2026IndustryAI Agents Apr 10BloombergUS Officials Summon Bank CEOs Over Anthropic Mythos CapabilitiesIndustryAI Agents Apr 8AnthropicAnthropic Launches Claude Managed Agents in Public BetaAI AgentsApplied AIAutomation Apr 8Meta AIMeta Launches Muse Spark With Multi-Agent Orchestration, Drops Open SourceMulti-ModelIndustryAI UX Apr 7AnthropicAnthropic Releases Claude Mythos Preview via Project GlasswingAI AgentsIndustryLLM Infra Apr 6AnthropicAnthropic Expands to 1M Google TPUs, Passes $30B Revenue Run RateIndustry

Open weights leapt forward from two directions this week. Google opened with Gemma 4 under Apache 2.0, a trimodal family spanning text, vision, and audio, with 140-plus languages and agentic-workflow tuning. Meta followed with Llama 4 Scout and Maverick, natively multimodal MoE models where Scout carries a 10M-token context window and Maverick outruns GPT-4o on multimodal benchmarks. Anyone building SME automation now has a credible foundation layer they can host themselves, no vendor contract required. The foundation tier, long dominated by closed frontier labs, is quietly becoming infrastructure rather than a product.

While the foundation opened up, the commercial side tightened. Anthropic blocked third-party frameworks like OpenClaw from routing through Pro and Max subscriptions, forcing teams that had been piping subscription quota into custom agents onto separate extra-usage billing. OpenAI shuttered the Sora app after $1M/day operating costs failed to justify the usage, a reminder that consumer AI products still struggle to reach positive unit economics. The same week, OpenAI closed a $122B round at an $852B valuation and moved Codex to pay-as-you-go for teams. The pressure this year is on the middle, where hobbyist arbitrage and consumer apps get squeezed, while the foundation opens up and the enterprise tier locks down.

DateSourceTitle

Apr 5Meta AIMeta Releases Llama 4 Scout and Maverick: Open MoE With 10M ContextLLM InfraIndustryMulti-Model Apr 5OpenAIOpenAI Begins Sora Shutdown: App Closes April 26, API in SeptemberIndustry Apr 4Anthropic / TNWAnthropic Blocks Third-Party Frameworks From Claude SubscriptionsIndustryAI Agents Apr 2Google DeepMindGoogle Releases Gemma 4: Open Trimodal Models With 256K ContextLLM InfraMulti-ModelAI Agents Apr 2OpenAIOpenAI Codex Adds Pay-as-You-Go Pricing for TeamsAI AgentsApplied AI Apr 1Anthropic APIAnthropic Raises Message Batches Output Cap to 300K TokensLLM InfraApplied AI Mar 31OpenAIOpenAI Closes $122B Round at $852B ValuationIndustry

A leaked model and a commerce protocol signaled where agents are heading. Anthropic accidentally revealed Claude Mythos, a tier above Opus with dramatically higher coding and cybersecurity scores, along with a possible IPO as early as October. OpenAI launched Agentic Commerce Protocol with Walmart and nine retailers, turning ChatGPT into a full shopping agent with account linking and payments. Anthropic's Economic Index showed experienced users tackle fundamentally harder work, not just the same tasks faster. Mistral shipped Voxtral TTS, a 4B open-weight model matching ElevenLabs on naturalness.

DateSourceTitle

Mar 27Fortune / AnthropicAnthropic Confirms 'Claude Mythos' After Accidental Data LeakIndustryLLM Infra Mar 26Mistral AIMistral Releases Voxtral TTS: Open-Weight 4B Streaming Speech ModelIndustryAI UX Mar 24Anthropic ResearchAnthropic Economic Index: Learning CurvesApplied AIAI UX Mar 24OpenAIOpenAI Expands Agentic Commerce: Walmart In-ChatGPT Shopping Goes LiveAI AgentsAutomationAI UX Mar 24Bloomberg / OpenAIOpenAI Discontinues Sora App and APIIndustry Mar 26arXivFinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use Under MCPAI AgentsApplied AI Mar 24AI NewsAutomating Complex Finance Workflows With Multimodal AIAutomationApplied AI

Small models got serious and agent security got its first real scare. OpenAI shipped GPT-5.4 nano and mini for cheap subagent routing. NVIDIA open-sourced a 30B MoE running on just 3B active parameters. Researchers demonstrated ClawWorm, the first self-propagating worm that spreads across production agent frameworks, a wake-up call for anyone deploying agents without sandboxing.

DateSourceTitle

Mar 20NVIDIA ResearchNemotron-Cascade 2: Open 30B MoE with 3B Active ParametersLLM InfraMulti-Model Mar 17OpenAIIntroducing GPT-5.4 mini and nanoIndustryLLM InfraMulti-Model Mar 16Mistral AINVIDIA Launches Nemotron Coalition with Mistral, Perplexity, and 6 More LabsIndustryLLM Infra Mar 16arXivClawWorm: Self-Propagating Attacks Across LLM Agent EcosystemsAI Agents Mar 16arXivSAGE: Multi-Agent Self-Evolution for LLM ReasoningAI AgentsLLM Infra Mar 16arXivSpend Less, Reason Better: Budget-Aware Value Tree Search for LLM AgentsAI AgentsLLM Infra

Agents started moving money and managing calendars. Mastercard completed the first live authenticated agentic payment in Singapore, fully end-to-end with no human in the loop. ChatGPT gained write access to Google and Microsoft apps for drafting emails and scheduling. Open-source GUI agents hit production quality on both desktop and mobile, making browser automation accessible to anyone building custom workflows.

DateSourceTitle

Mar 13HuggingFace PapersDIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool UseAI AgentsLLM Infra Mar 13Anthropic1M Token Context Window Now Generally Available for Claude Opus 4.6 and Sonnet 4.6LLM InfraAI AgentsIndustry Mar 12AnthropicAnthropic Invests $100 Million into the Claude Partner NetworkIndustryApplied AI Mar 12AI NewsHow Multi-Agent AI Economics Influence Business AutomationMulti-ModelAutomationApplied AI Mar 11OpenAIGoogle and Microsoft Apps in ChatGPT Now Support Write ActionsAutomationAI AgentsAI UX Mar 11Mistral AIRails Testing on Autopilot: Building an Agent That Writes What Developers Won'tAI AgentsAutomation Mar 11AI NewsManulife Moves AI Agents into Core Financial WorkflowsApplied AIAI AgentsAutomation Mar 10AI NewsMastercard Completes First Live Authenticated Agentic Transaction with DBS and UOBAI AgentsAutomationIndustry Mar 9GitHubPageAgent: In-Page GUI Agent That Controls Web Interfaces with Natural LanguageAI AgentsAutomationAI UX Mar 9GitHubagency-agents: Complete AI Agency with Specialized AgentsAI AgentsAutomationMulti-Model Mar 9GitHubHermes Agent: The Agent That Grows With YouAI AgentsApplied AI

OpenAI unified reasoning and coding into one model and pushed enterprise automation forward. GPT-5.4 Thinking merged frontier reasoning with tool-heavy workflows at 1M-token context. Codex Security found 14 zero-days in major open-source projects as an agentic scanner. Reusable Skills landed in ChatGPT Business, letting teams package workflows that auto-trigger in conversations.

DateSourceTitle

Mar 8arXivMASFactory: Graph-Centric Framework for Multi-Agent Orchestration with Vibe GraphingAI AgentsMulti-ModelAutomation Mar 7OpenAIReasoning Models Struggle to Control Their Chains of ThoughtLLM InfraAI UX Mar 6OpenAICodex Security: Application Security Agent in Research PreviewAI AgentsApplied AI Mar 5OpenAISkills for ChatGPT Business & Enterprise: Reusable Workflow AutomationAutomationAI AgentsAI UX Mar 5OpenAIGPT-5.4 Thinking & Pro: Unified Reasoning + Coding + Agentic ModelIndustryAI AgentsLLM Infra Mar 5Anthropic ResearchLabor Market Impacts of AI: A New Measure and Early EvidenceApplied AIIndustry Mar 5AI NewsDyna.Ai Raises Series A for Agentic AI in Financial ServicesIndustryAI AgentsAutomation Mar 6AI NewsScaling Intelligent Automation Without Breaking Live WorkflowsAutomationApplied AI

Agents plugged into enterprise tools while safety caught up. Anthropic launched enterprise agent plugins connecting Claude directly into Excel, PowerPoint, and domain tools. Google shipped Agent Step in Opal for no-code agentic workflows. OpenAI introduced Lockdown Mode to block prompt injection in agent sessions, setting a new baseline for production safety. A new framework stabilized multi-step agentic RL training, solving the collapse problem that made agent training unreliable.

DateSourceTitle

Feb 27OpenAIOpenAI Raises $110B at $730B ValuationIndustry Feb 24TechCrunchAnthropic Launches Enterprise Agent Plugins for Finance, Engineering, and DesignAI AgentsApplied AIAutomation Feb 24OpenAIOpenAI Introduces Lockdown Mode and Elevated Risk Labels in ChatGPTAI UXAI Agents Feb 25arXivARLArena: A Unified Framework for Stable Agentic Reinforcement LearningAI Agents Feb 24TechCrunchGoogle Labs Launches Agent Step in Opal to Build Agentic AI WorkflowsAutomationAI Agents Feb 23HuggingFace PapersRAG-Anything: Unified Multimodal Knowledge RetrievalApplied AILLM Infra

The cost barrier for production agents dropped sharply. Claude Sonnet 4.6 delivered Opus-class coding and agent performance at Sonnet pricing. Anthropic revealed their engineers now spend 70%+ of work reviewing AI output rather than writing new code. A multimodal RAG framework unified docs, images, and tables into a single retrieval layer for knowledge systems.

DateSourceTitle

Feb 18Anthropic ResearchHow AI Is Transforming Work at AnthropicApplied AIAI Agents Feb 17AnthropicIntroducing Claude Sonnet 4.6IndustryAI AgentsLLM Infra

Real-time coding got dramatically faster and GUI agents crossed the production threshold. GPT-5.3-Codex-Spark hit 1000+ tokens/sec on Cerebras, cutting IDE agent latency by 80%. GUI-Owl 1.5 and Mobile-Agent-v3 reached state-of-the-art on both desktop and mobile benchmarks, making open-source GUI automation production-ready.

DateSourceTitle

Feb 14GitHubGUI-Owl 1.5 & Mobile-Agent-v3: Open-Source GUI Agent Foundation ModelsAI AgentsAutomation Feb 12OpenAIGPT-5.3-Codex-Spark: Real-Time Coding at 1000+ Tokens/SecondLLM InfraAI Agents

New coding benchmarks fell and enterprise adoption shifted from pilots to production. GPT-5.3-Codex set new SWE-Bench records as OpenAI's most capable coding model. Mistral shipped Voxtral Transcribe 2 with sub-200ms streaming and speaker diarization. AI Expo 2026 confirmed the pattern: enterprises are moving from experimental pilots to production agent deployments, with governance and data quality as the new bottlenecks.

DateSourceTitle

Feb 4AI NewsAI Expo 2026: Governance and Data Readiness Enable the Agentic EnterpriseIndustryAI AgentsAutomation Feb 5OpenAIIntroducing GPT-5.3-CodexIndustryAI Agents Feb 4Mistral AIVoxtral Transcribe 2: Transcription at the Speed of SoundLLM InfraMulti-Model

A breakthrough in context handling could remove the biggest bottleneck for long-document agents. MIT published a recursive language model framework that lets any LLM process inputs 100x beyond its context window through recursive self-calls, eliminating context rot at 10M+ tokens.

DateSourceTitle

Jan 28arXivRecursive Language Models: Processing 10M+ Tokens Without Context RotLLM Infra

The gap between AI integration and true delegation became the defining metric. Anthropic's Agentic Coding Trends report found developers integrate AI into 60% of work but fully delegate under 20%. That gap is where the opportunity sits for anyone building agent-first tools and services.

DateSourceTitle

Jan 21Anthropic2026 Agentic Coding Trends ReportAI AgentsIndustry