Welcome to the Orchestration Era // Herbert Cuba Garcia

Download this as an e-paper

On May 28, 2026, Anthropic shipped Dynamic Workflows alongside Claude Opus 4.8. The feature lets Claude write an orchestration plan, break it into subtasks, spin up hundreds of parallel subagents, and — before reporting anything back — have separate agents verify the results. It handles codebase-scale migrations across hundreds of thousands of lines of code, from kickoff to merge, with the existing test suite as its quality bar.

That release crystallised something. The question in the industry stopped being can agents do real work? — that was answered in H1. The new question: how do we compose agents into repeatable, verifiable workflows that raise quality while reducing human involvement?

I think that's the transition we're in right now. Not more agents. Better orchestration. The human is becoming a workflow-author — someone who designs the choreography, defines what "good enough" looks like, and lets the system handle execution. Closer to directing than coding.

The shift didn't happen overnight. It took roughly three years. And each phase along the way introduced its own capabilities, its own vocabulary, and its own expertise. Looking back from here, the arc is surprisingly clean.

Timeline of applied AI from 2023 to 2027 — capability rising, human involvement shrinking, bottleneck migrating from capability to orchestration

Three years of AI in three axes

I'm going to tell this as a retrospective — from the first Claude release to today — with each era described along three axes:

What the AI can do — capability
What the human's job becomes — role
Where the bottleneck sits — constraint

A human-role spine runs through it: prompter → context-provider → harness-builder → reviewer → workflow-author. At each step, the human moved further from execution and closer to judgment.

2023 — The prompt as interface

Capability: You can instruct a model in natural language and get useful output. GPT-4 lands in March. Claude follows. People build in weekends what would have taken teams months.

Role: Prompter. You sit at a chat window and type instructions. "Prompt engineering" enters the vocabulary — part art, part cargo cult.

Bottleneck: Raw capability. Models hallucinate, lose the thread, can't use tools. RAG appears as a footnote — a way to shove relevant documents into the context window — but it's duct tape over a deeper problem. The models simply aren't reliable enough for anything beyond drafts and prototypes.

The vocabulary this era created: prompt engineering, zero-shot, few-shot, chain-of-thought, hallucination, temperature, system prompt. An entirely new literacy, built from scratch in months.

This is the era of wonder. And wonder is useful, but it's not production.

2024 — Tools and context, not just chat

Capability: Function calling and tool use become standard. Context windows expand to 128K tokens and beyond. Late in the year, reasoning models appear — OpenAI's o1 shows you can trade inference compute for better answers. The first agent frameworks over-promise spectacularly. AutoGPT generates enormous excitement, then reveals the gap between "autonomous" and "useful." The loop runs, but it runs off the rails.

Role: Context-provider. The job shifts from crafting perfect prompts to assembling the right information. What documents does the model need? What tools should it call? You're curating inputs more than writing instructions.

Bottleneck: Context. The 128K window is impressive until you fill it with noise. The real skill becomes knowing what to include and what to leave out.

The vocabulary this era created: function calling, tool use, reasoning models, context window, agent loop, retrieval-augmented generation. And a lesson the industry keeps relearning — autonomy without structure produces chaos, not value.

2025 — Harness > model

Capability: The year the scaffolding becomes as important as the model. Open-source LLMs explode — Qwen, DeepSeek, and others close the capability gap with proprietary models at stunning speed. But the bigger shift is structural.

In November 2024, Anthropic open-sources the Model Context Protocol — MCP — a standard for connecting AI models to external tools and data sources. Think of it as USB-C for AI. By March 2025, OpenAI adopts it. Google follows in April. By December, Anthropic donates MCP to the Linux Foundation under the newly formed Agentic AI Foundation, co-founded with Block and OpenAI. A protocol war that never happened. Instead: convergence.

Meanwhile, the discourse shifts. In June 2025, Shopify CEO Tobi Lutke frames the new discipline as "context engineering" — the art of providing all the context for the task to be plausibly solvable by the LLM. Andrej Karpathy amplifies the idea. Simon Willison endorses the term publicly. "Prompt engineering" starts to sound quaint — a label for the chat era applied to what is clearly a systems problem.

Role: Harness-builder. You're designing the system around the model. Memory management, tool selection, retrieval pipelines, guardrails. The model is one component. The harness is the product.

Bottleneck: Scaffolding. The model is good enough. The context protocol exists. But building a reliable harness — one that handles edge cases, manages state, recovers from errors — is genuinely hard engineering. Not prompt engineering. Software engineering.

The vocabulary this era created: context engineering, MCP, harness, scaffolding, agentic framework, tool orchestration. The expertise shifted from linguistics to systems design.

2026 H1 — Agentic production

Capability: Agents become reliable enough to deploy on real, unsupervised work. Not demos. Not prototypes. Production tasks with actual stakes. Code review, migrations, data pipelines, content operations — agents handle them end-to-end with human oversight measured in minutes per day, not hours.

The framing I keep coming back to: infrastructure, not intelligence, is now the bottleneck for production agents. The models are there. What's missing is the plumbing — authentication, state management, error recovery, audit trails, deployment pipelines for agents the way we built deployment pipelines for software.

Role: Reviewer and steerer. Engineers run multiple parallel agents simultaneously — monitoring progress across workstreams, intervening when an agent gets stuck, steering priorities. The ratio of thinking to typing inverts.

The engineer at the terminal today isn't typing code anymore. They're commanding agents that do real, irreversible work — code merges, data migrations, production deployments. The interface looks deceptively simple. A chat window. A few instructions. But every move lands in production. The gap between what it looks like and what it does has never been wider.

Bottleneck: Reliability and infrastructure. Individual agents work. But they fail in ways that are hard to predict. They need the kind of operational maturity that took traditional software two decades to develop. Agents need it now.

The vocabulary this era created: agentic production, fleet management, agent reliability, human-in-the-loop (but barely). The expertise is now DevOps for agents — monitoring, fallbacks, deployment pipelines.

Back to H2 2026 — The orchestration era

No orchestra ever succeeded because it had the loudest instruments. It succeeded because the composition was right — the sequencing, the coordination, the interplay between sections. The capability was a given. The orchestration was the variable.

That's where we are now.

Dynamic Workflows is the canonical example of where we are now. The pattern: an orchestrator plans the work, distributes it across specialised subagents running in parallel, and then — the critical piece — separate agents challenge and verify the results before anything gets reported back. Claude doesn't just do the work. It checks the work. Multiple times, from different angles.

Verification is the hidden protagonist of this entire three-year arc. Each era moved the bottleneck: capability → context → harness → reliability → and now, verification. Adversarial review and consensus between agents is what makes the jump from "production" to "workflow" possible. Without it, you're scaling chaos. With it, you're scaling quality.

Evals become the moat. The teams that can define what "correct" looks like, measure it rigorously, and build it into their workflows will pull away from those that can't. The model is commodity. The eval is competitive advantage.

Standards matter here in a way they didn't before. MCP gave us agent-to-tool connectivity. What's emerging now is agent-to-agent connectivity — protocols for handoff, verification, and state management across multi-agent workflows. This is why workflows compose now in a way they simply couldn't in 2025.

And governance stops being a compliance checkbox. When agents modify a hundred files across a codebase, who is accountable? In regulated industries, ERP systems, anything where "the AI did it" is not an acceptable answer — audit trails and traceability become engineering requirements, not legal afterthoughts.

The vocabulary this era is creating: dynamic workflows, orchestration plan, adversarial verification, agent-to-agent protocol, workflow-author. The expertise is choreography — designing multi-agent systems that are not just functional but auditable and repeatable.

Where this leads — 2027

I'm framing this as open questions, not predictions.

The economics question: When workflows run hundreds of agents in parallel, tokens become the new unit economics. A single migration workflow might consume more compute in an afternoon than a developer uses in a month. Anthropic themselves note that Dynamic Workflows "can consume substantially more tokens than a typical session." Does agentic work create value faster than it consumes tokens? I think yes — but only if organisations treat it like infrastructure investment, not magic. Model routing — cheaper models for routine subtasks, expensive models for judgment calls — becomes a core competency. Local and on-prem execution re-enters the conversation, not just for cost, but for data sovereignty. The framing isn't "avoid tokens." It's FinOps for agents.

The autonomy question: Can workflows replace entire organisational functions end-to-end? Not individual tasks. Not chains of tasks. Entire functions that today require departments. Organisations aren't just technology — they're accountability structures, legal entities, cultural systems. The question isn't "can agents do the work?" It's "who signs off when they do?"

The human role at that point becomes outcome-owner. You define what success looks like. You're accountable for results. You may not touch the execution at all.

The through-line

Three years. Five eras. One consistent pattern: the human moved further from execution and closer to judgment at every step. Prompter → context-provider → harness-builder → reviewer → workflow-author → outcome-owner. Each era introduced new capabilities, new vocabulary, new expertise — and each one made the previous era's hard problems feel almost quaint.

Verification ties it all together. Every bottleneck in this timeline was, at its core, a verification problem in disguise. Can the model produce correct output? Can it use the right context? Can the harness catch errors? Can the workflow prove its results are sound?

The organisations that thrive in the orchestration era won't be the ones with the best models or the most agents. They'll be the ones that got good at defining, measuring, and verifying what "good" looks like.

That's always been the hard part. The tools just finally caught up.

Welcome to the Orchestration Era_

Three years of AI in three axes

2023 — The prompt as interface

2024 — Tools and context, not just chat

2025 — Harness > model

2026 H1 — Agentic production

Back to H2 2026 — The orchestration era

Where this leads — 2027

The through-line

//Read next

APEX — Agentic Production Execution

//Books

The 3 Crucibles

The Digital Singularity Shift