← cd /insights
// INSIGHT 064 2026-07-03 agentic-engineeringspecification 27 min read

Your Real Asset Moved Upstream_

For fifty years, code was the asset and everything else was documentation. Agentic development quietly inverts that — and most organizations haven't noticed what they're now responsible for maintaining.

Your Real Asset Moved Upstream
// fig. 064
TL;DR
  • When an agent can regenerate code from a clear spec, the spec becomes the durable asset and the code becomes disposable output.
  • That makes spec drift a first-class production bug, and the spec — not the code — the thing you version, review, test, and protect.
  • Write specs broad to narrow: business context → ICP/brand → product strategy → personas → feature spec with testable acceptance criteria.
  • Match rigor to the work (three tiers, from one-person build to a governed expert team), and treat the agents as first-class collaborators building each layer with you.

For fifty years, code was the thing you protected. It was the asset, the artifact, the thing under version control that represented years of accumulated decisions. Everything else — requirements docs, design notes, the spec — was scaffolding. You wrote it to get to the code, and once the code existed, the scaffolding was allowed to rot. Nobody got fired for a stale requirements document. People got fired for losing the codebase.

Agentic development quietly inverts that relationship. And I don't think most organizations have noticed what they're now on the hook for maintaining.

The Inversion

In short

Because the same spec can regenerate the code in any language, framework, or era, your real intellectual property quietly moves from the codebase to the spec that produces it.

Here's the shift in one sentence: when an agent can regenerate the implementation from a clear specification, the code stops being the durable asset and the spec becomes it.

The inversion

Code becomes a disposable output. The spec becomes the thing you protect, version, and maintain — because it's what regenerates everything downstream.

Think about what that actually means. If I have a precise, well-structured spec, I can produce the code from it — this version, a version in a different language, a version on a different framework, a rewritten version six months from now when the underlying libraries have moved. The code is now an output. Downstream. Disposable in the specific sense that it can be regenerated on demand from something more durable sitting upstream.

Geoffrey Huntley has a technique he calls "Ralph" that makes this almost uncomfortably literal. It's a bash loop — feed the same spec file to a coding agent, over and over, in a fresh session each time. The agent has no memory between runs. The only thing that persists is the file. The spec isn't the input to the work anymore. The spec is the work, and the code is just what falls out the bottom each cycle.

Once you see it, you can't unsee it. GitHub's spec-kit states the philosophy plainly: specifications become executable, generating the implementation rather than just guiding it. That's not a productivity tip. That's a change in what your organization's real intellectual property is.

Meanwhile, in science fiction

The Matrix — Neo vs. the Architect

For three films Neo fights inside the Matrix — bending the rules of the rendered world, beating agents, editing the running instance. And it never holds. We learn the system has already been rebuilt five times, because none of that touches the thing that actually generates the world: the specification the Architect holds. Neo can force a temporary change, but the spec sits one level up, with the power to regenerate the whole Matrix regardless of what happens on the inside.

That's the whole argument in one image. Fighting in the rendered world is editing the output — impressive, and reverted by the next regeneration. Durable change only comes from the level where the Architect and the Oracle sat all along: the specification. Patch the code and the system reverts; change the spec and everything downstream is rebuilt to match. And it isn't only about software — anything you can specify precisely enough to regenerate runs on the same logic.

What You're Actually Maintaining Now

In short

If the spec is the source of truth, it has to be maintained like code — reviewed, versioned, kept in sync — a discipline most teams don't yet have.

For most of software history, the honest answer to "where does the knowledge live" was: in the code, and in the heads of the people who wrote it. The spec was a snapshot taken at the beginning, accurate for about a week.

If the spec becomes the source of truth, that arrangement breaks in a useful way. The knowledge has to live in the spec, because the spec is what regenerates everything else. Which means the spec has to be maintained — kept current, kept correct, kept in sync with what the system actually does.

This is where I think the discomfort starts for a lot of teams. We know how to maintain code. We have fifty years of practice, tooling, review culture, and muscle memory around it. We have almost none of that for specs. We treat spec-writing as a phase, not a discipline. Something you do at the start and abandon.

The teams getting real leverage from agents are the ones flipping that. They review specs the way they used to review pull requests. They version specs deliberately. They treat a vague spec as a blocking defect, not a minor inconvenience to be cleaned up later in the prompt.

And though I'm using software language here — code, tests, review — the same claim holds outside engineering. Swap "code" for a campaign, a report, a data model, or an onboarding flow, and the sentence still reads true: the knowledge has to live in the spec that regenerates the work, and the spec has to be maintained. The vocabulary is software's; the principle isn't.

Spec Drift Is the New Production Bug

In short

When the spec regenerates the code, a spec that no longer matches reality stops being a stale doc and becomes a first-class production bug.

There's a failure mode that comes directly with this inversion, and it's worth naming because it's already showing up in practice.

Call it spec drift. The spec says the system does X. The code, through months of small changes and hotfixes and "we'll update the doc later," actually does Y. A concrete version of this: the spec required a confirmation email when a subscription was cancelled. The original code sent it. A refactor stripped it out quietly. No test existed to catch it because the tests were derived from the implementation, not the spec. Nobody noticed for two billing cycles — until a customer complained.

In the old world, that's a documentation problem — annoying, low stakes, the kind of thing you apologized for during onboarding. In a world where the spec regenerates the code, it becomes a first-class bug. If someone reruns generation from a spec that no longer matches reality, they don't get a stale document. They get a system that quietly reverts to an old set of assumptions.

Spec-kit ships a whole separate guide for what they call evolving specs, and the reason it exists is that this rot is real and already observed. The moment the spec is load-bearing, drift stops being cosmetic and starts being dangerous.

I find this clarifying rather than alarming. It tells you exactly where to put your attention. The question stops being "is our documentation up to date" — a question no one ever answered honestly — and becomes "is our source of truth correct," which is a question with teeth. You can't shrug at a wrong source of truth.

This Doesn't Mean More Bureaucracy

In short

This isn't a return to waterfall. The specs that win are shorter and living — the sharpest high-signal set, not the most exhaustive binder.

The obvious objection is that this sounds like a return to heavy upfront documentation, and we spent two decades escaping that for good reasons. I don't think that's what's happening.

Waterfall specs were long because they were trying to eliminate ambiguity for human implementers who would take months and might interpret things loosely. They were also write-once — a big document produced at the start and rarely touched again. What's emerging now is the opposite in two ways. The specs are shorter, because the highest-signal spec beats the most exhaustive one. And they're living, because they regenerate real output continuously rather than sitting in a folder.

There's good evidence that stuffing more into a spec actively backfires. Anthropic has been direct about this in their own guidance: an overloaded instruction file doesn't degrade gracefully, it causes the important instructions to get lost. The model starts ignoring the very rules you were trying to enforce. So the pressure isn't toward bigger specs. It's toward sharper ones — the smallest set of high-signal decisions that make the right output likely. That's a craft, and it's closer to writing a good API contract than writing a requirements binder.

A Worked Example: Speccing a New Ecommerce Brand

In short

Write a stack of specs broad to narrow — business context → ICP/brand → product strategy → personas → feature spec with testable acceptance criteria — so each layer decides for the one below.

Abstract arguments about specs are easy to nod along to and hard to act on. So let me build one. Say I'm launching a new ecommerce brand — call it a direct-to-consumer coffee company — and I want an agent to build the storefront. What does the spec actually look like?

The mistake most people make is starting at the wrong layer. They open a chat and type "build me an ecommerce site for a coffee brand," and the agent produces something generic and confidently wrong, because it filled every gap I left with an average of the internet. The fix isn't a longer prompt. It's a stack of specs, broad to narrow, where each layer constrains the one below it. The agent never has to guess, because the answer is always one layer up.

Here's the stack I'd actually write, top to bottom.

fig. — the spec stack, broad to narrow

1
Business contextthe why — goals, priorities
↓ constrains
2
ICP & brandwho it's for, how it feels
↓ constrains
3
Product strategywhat's in scope — and out
↓ constrains
4
Personas & jobssuccess in human terms
↓ constrains
5
Feature spec + acceptance criteriarunnable checks

Layer 1 — Business context (the why). This is the shortest and most important document, and almost nobody writes it down. It states what the business is trying to achieve, in terms that let every downstream decision resolve itself:

spec · business context

Goal: reach €40k monthly revenue within 12 months, selling premium single-origin coffee subscriptions. Margin matters more than volume — we'd rather have 800 loyal subscribers at €25/month than 5,000 one-time buyers. Success in year one is subscription retention above 70% at 6 months, not traffic.

Notice what that already decides. "Margin over volume" and "retention over traffic" will later tell the agent that the homepage should optimize for subscription sign-up, not for a sprawling catalog — without me ever writing "make the sign-up button prominent." The business context makes the UI decision for me, three layers down.

Layer 2 — Ideal customer profile and brand. Who is this for, and how does it feel:

spec · ICP & brand

ICP: urban professionals, 28–45, who already spend €4+ on café coffee daily and care about origin and craft. They're price-insensitive but quality-sensitive and taste-literate. Brand voice: confident, minimal, a little serious about coffee — think a specialty roaster, not a supermarket. No discount-driven urgency, no "SALE" banners; that cheapens the perception we're selling.

That "no SALE banners" line is a constraint. When the agent later builds a promotion component, it knows the shape of what's allowed. I didn't have to review the output and catch it — the spec caught it.

Layer 3 — Product strategy. What we're building and, crucially, what we're not:

spec · product strategy

Launch scope: subscription flow (weekly / biweekly / monthly), a 6-SKU catalog, and a "find your roast" quiz that routes newcomers to a starting product. Explicitly out of scope for launch: gift cards, a loyalty-points system, and multi-currency. We'll revisit those after we hit 300 subscribers.

The out-of-scope list is doing as much work as the in-scope one. It's the single most common thing missing from real-world specs, and its absence is why agents cheerfully build features nobody asked for.

Layer 4 — Personas and their jobs. Not demographics — the specific job each visitor is trying to get done:

spec · personas & jobs

The Newcomer: intrigued but doesn't know what to buy. Job: get a confident recommendation fast. The quiz is for them. The Convert: had one bag, considering a subscription. Job: understand flexibility (can I pause? skip? change roast?) without emailing support. The Regular: subscriber managing their plan. Job: change cadence or roast in under a minute, on mobile.

Each persona's "job" becomes a testable claim about the finished site. That's the bridge to the next layer.

Layer 5 — Feature spec + acceptance criteria (the narrow end). Now, and only now, do we get specific about one feature — say, the subscription management page for The Regular:

spec · feature

Subscription management page. A logged-in subscriber can, from one screen: change delivery cadence, swap roast, skip the next delivery, or pause. All changes take effect on the next billing cycle and show a confirmation of the new next-charge date. Must be usable one-handed on mobile. Out of scope: cancellation flow (separate spec).

And then the part that makes it a spec instead of a wish — the acceptance criteria, written as concrete cases:

acceptance criteria
  • Given a subscriber on a monthly plan, when they switch to biweekly, then the next-charge date updates and a confirmation states the new date and amount.
  • Given a subscriber, when they skip the next delivery, then no charge occurs on the upcoming cycle and the following cycle is unchanged.
  • Given a paused subscriber, when they view the page, then a single clear "resume" action is visible.

That Given/When/Then block is not decoration. It maps directly onto a test:

test

switching monthly→biweekly updates next_charge_date and returns a confirmation with the new date + amount

Which is exactly the mechanism the whole idea rests on: the agent generates the page from the feature spec, and the acceptance criteria generate the tests that prove the page does what the spec claims. When someone later changes the billing logic and the confirmation date silently breaks, the test fails — because the spec said, in a machine-checkable way, that it shouldn't.

None of this notation is new, and it's worth saying so: BDD and Gherkin have used Given/When/Then to turn specs into executable tests for well over a decade. What's new isn't the format — it's that an agent can now act on the spec directly, generating the implementation from it, instead of a human reading it and hand-translating it into code.

Read the stack back and the shape is clear: each layer answered a question so the layer below didn't have to guess. Business context decided the priorities; ICP and brand decided the constraints; product strategy drew the boundary; personas defined success in human terms; the feature spec turned one of those into runnable checks. That's what "broad to narrow" means in practice — not a longer prompt, a layered one.

The Stack Holds When the Stakes Are Higher

In short

The same stack works inside a regulated enterprise — the compliance constraints are exactly what belongs in the spec, making them visible, versioned, and auditable.

The coffee-brand example is greenfield. Does the same structure hold inside an existing enterprise system — with compliance requirements, legacy constraints, and a risk committee that signs off before anything ships? It does, and the mechanics barely change. The constraints are precisely the things that belong in the spec, which makes them visible, versioned, and auditable instead of buried in code comments and tribal knowledge.

A mid-size Nordic financial services company — composite, but drawn from real practice — used the stack to rebuild a claims-processing workflow. Layer 3's out-of-scope list explicitly excluded any automated denial decision, because compliance required human sign-off on every rejection. That one line prevented three months of building the wrong thing. Later, when regulators asked for evidence the denial logic hadn't changed between releases, the team pulled the spec diff — which acceptance criterion changed, by whom — alongside the test run proving everything else still held. An auditor can work with that. An auditor cannot work with a diff of 40,000 lines of claims-processing logic.

One boundary worth naming, because for a large enterprise it's most of the estate: this applies cleanly to systems you control and can regenerate. For vendor or COTS systems — your ERP, your HR platform — you can't regenerate the internals, so the spec covers the layer you do own: the integrations, the configuration, the data contracts. Speccing a Workday or SAP integration this way still pays off — the vendor owns the box, you own a versioned, testable spec of how you connect to it — it just covers your half of the boundary, not theirs.

The Agents Are in the Room Now

In short

Treat agents as first-class collaborators, not a vending machine — they help write each layer using the context of the ones above, so the context compounds.

One shift makes all of this work better, and it's easy to miss: the agents aren't outside the process waiting to be handed a finished spec. They're in the same environment as you — the same repo, the same files, the same context — working alongside you while you write. If you want real acceleration, treat them as first-class participants, not as a vending machine you drop a prompt into.

That changes how the stack actually gets built. You don't write all five layers alone and then hand them over. The agent helps write each layer using the context of the ones above it: draft the business context together, and it can propose an ICP that's consistent with it; settle the ICP, and it can draft a product strategy that respects both. It interviews you where you're vague, flags contradictions between layers, and drafts the routine parts so you spend your judgment on the decisions that actually need it. You still own the intent and the trade-offs. But the spec becomes something you and the agent produce together, each layer carrying the context of the last — which is exactly why the broad-to-narrow structure pays off. The context compounds, for you and for the agent working beside you.

Who Owns the Spec

In short

A load-bearing spec needs an owner, a review path, and version history — a four-row RACI, not a committee — and it becomes a real gate because runnable acceptance criteria give it teeth.

The moment the spec is load-bearing, "we should keep it updated" stops being enough. Something that regenerates production code needs an owner, a review path, and a version history — the same non-negotiables we long ago accepted for code.

In practice I think this looks less exotic than it sounds, because we already have the machinery. The spec lives in the repo, next to the code it generates, under the same version control. A change to a spec is a pull request. The ownership map is simpler than it sounds: the top layer — business context — belongs to whoever owns the product outcome, whether that's the CPO, the product director, or the business lead. Feature specs belong to the engineering lead or product manager for that domain. A spec PR gets a technical reviewer who checks whether the constraints are still true, and a product reviewer who checks whether the acceptance criteria still reflect what customers need. Changes touching a regulated boundary require a third sign-off: compliance or risk. That's a four-row RACI, not a committee.

The review itself is not a rubber stamp on prose. It checks the things that actually matter: is this constraint still true, does this acceptance criterion still hold, does this decision contradict one made elsewhere.

None of that ownership logic is unique to code, either. A content library, a marketing system, or an internal process has the same shape — someone owns the top-level intent, someone owns each downstream spec, and changes flow through a review to the accountable owner. The tools differ (a doc store and an approval flow instead of a repo and a PR), but the RACI is identical. Wherever a spec regenerates real output, it needs an owner.

The mechanism that makes this real rather than aspirational is testing the spec, not just reading it. A spec that ends in concrete, runnable acceptance criteria can be checked against the system automatically — the spec claims the system does X under condition Y, and a test asserts exactly that. When the code drifts from the spec, the test fails. That's the difference between a spec you hope is current and a spec you know is current. Spec review becomes a real gate, not a courtesy, because the spec has teeth that can bite when it's wrong.

None of this requires reinventing your stack — the version control, the review process, the test runner already exist. But be honest about where the gap actually is: turning prose acceptance criteria into runnable tests against arbitrary system behavior is not fully off-the-shelf yet. For a subscription date calculation it's a normal unit test; for "the page feels trustworthy" it isn't a test at all — and that's not a maturity gap that better tooling will close, it's a structural ceiling. Some criteria are judgment calls that will always need a human, no matter how good the automation gets. The discipline is real and available today; full spec-to-test automation is partial by nature, not just early. The move is to write criteria concrete enough to test where you can, and to be explicit about which claims are human-judged rather than machine-checked — not to pretend the whole spec compiles into a test suite.

This Is a New Discipline — Match the Rigor to the Work

In short

Rigor is a dial, not a ladder. Match the tier — one-person build, a couple of experts, or a governed team — to the scale, speed, and stakes of the work.

I want to be honest that none of this comes for free. Spec-driven work is a new discipline, and like any discipline it takes reps and a shift in mindset before it feels natural. It's a different muscle from writing code, and most people — strong engineers included — are beginners at it right now.

The rigor is a dial, not a ladder. How formal you get should track the scale of the effort, how much raw speed you're after, and the safety and compliance stakes. Three tiers, roughly — and these are modes you choose to fit the work, not steps you climb toward some "best" one:

fig. — tiers of rigor, matched to scale & stakes

TIER 1
One-person build
Writes the top of the stack, interviewed by agents. Agents generate downward from the context above; human steers. No versioning, no review.
Speed: highest · cost: comprehension debt, black box
TIER 2
A couple of experts
Same build, but specs are versioned and both can edit. A short ADR captures every critical change. Enough process to catch drift, not enough to kill momentum.
Speed: strong · the pragmatic middle
TIER 3
Larger expert team
Explicit ownership per layer, a RACI, PR-to-owner on every change, ADRs, real coordination. Productive friction between experts.
Speed: slower · quality highest · scales

Tier 1 — the one-person build. One person writes the top of the stack and gets interviewed by their agents to sharpen it. The further down the layers you go, the more the agents generate from the context above — the human mostly steers and checks the output is right. No versioning, no review; there's only one engineer. This is fast, the way one-person armies always are. The cost is comprehension debt: the project runs like a black box inside the organization, understood by exactly one person.

Tier 2 — a couple of experts. Same way of building, but the specs are versioned now, and both people can create updates and tweak things. When something critical changes, they write a short ADR — an architecture decision record — so the reasoning survives for whoever reads it later. This is the pragmatic middle: enough process to keep tabs on spec drift and quality, not so much that it kills momentum.

Tier 3 — a larger expert team. Ownership gets explicit. One person is accountable for the business context — the top layer — and others own the layers beneath. A RACI tracks who owns what. Everything is versioned, and every change goes through a pull request to the accountable owner. ADRs matter even more here, and coordination becomes real work. Speed drops, but quality is highest, because you get productive friction between human experts. And it scales — there's a process that experienced and less-experienced spec writers can both follow.

None of these is the "right" one, and Tier 3 is not the goal. A weekend prototype run at Tier 3 is absurd; a core banking system run at Tier 1 is negligent. The skill is matching the tier to what you're actually building — and being willing to move up as the stakes rise, or down when they don't. As a rough budgeting anchor: Tier 1 is one person, Tier 2 is two or three, and Tier 3 is five or more plus a compliance or risk reviewer once a regulated boundary is in play.

The Governance Question a Risk Committee Will Ask

In short

The blast radius of regenerating from a wrong spec is governable with controls you already have: route it through change approval, inspect the spec diff, and let the test gate prove control effectiveness.

For a regulated enterprise there's one more question, and a risk committee asks it first: what's the blast radius when someone regenerates code from a spec that's wrong? The failure mode is real — a stale spec can quietly revert a hard-won security decision — but it's governable with controls we already have. Skip the gate and the failure is concrete: someone regenerates a service from a spec that never captured a control you added by hand — a rate limit, an audit-log write, a consent check — and it silently vanishes from the rebuild. Nobody sees a red flag, because the code compiled and the tests that existed passed; you find out at the next audit, or the next incident. Regeneration that touches regulated systems goes through the normal change-approval gate, and what the gate inspects is the spec diff: which acceptance criteria changed, who changed them, sign-off on anything touching a regulated boundary, plus the test run proving the unchanged criteria still hold. That maps cleanly onto controls auditors already recognize — spec-diff-plus-sign-off is change management, the SOX, ISO 27001, and DORA kind, and the test gate is your evidence of control effectiveness. The risk doesn't disappear; it moves somewhere you can see it and put a control on it, instead of living unexamined in the gap between what the docs say and what the code does.

Humans Move Up the Value Chain

In short

When execution moves to the agent, people move up the value chain — into defining, constraining, and deciding. Leadership's job is to make that elevation legible, not let it feel like a demotion to documentation.

The instinct is to frame this as a loss — the agent writes the code, so what's left for the people? I think it's the opposite. When execution moves to the agent, the humans move up the value chain, into the work that was always the highest-value part and never got the recognition: deciding what to build, defining it precisely, holding the constraints, making the trade-offs. The spec is that work made visible.

There's a real risk in the transition, and it's cultural, not technical. If you tell a team of senior engineers that spec-writing is now the high-leverage work, some of them will hear "you want me to write documentation" — because for their whole careers, writing prose about code was the low-status task that rolled downhill. That perception is the thing to get ahead of. The people doing spec work are not being demoted to scribes. They're being moved to the front of the value chain, where their judgment sets the direction for everything the agents then execute.

The practical path into this for most teams starts with pairing, not training. An engineer who's strong at code review can spec-review alongside someone who's written a few already — it takes three to five cycles before the pattern becomes legible. The skill being developed is precision of language and the ability to write acceptance criteria specific enough to generate a test. That's a concrete, learnable muscle. It shows up in weeks, not months, and the feedback loop is immediate: the quality of the code the agent generates from a sharp spec versus a vague one is obvious on the first run.

But pairing only works once there's someone credible to pair with, and because this is a genuinely new discipline, that first person is rarely already on the team. Be honest about the bootstrap: bring in someone who has actually done this — an external practitioner, or an internal hire who's agentic-first, ideally with a deep enough technical background to write acceptance criteria that hold up. Have them run the first few specs and workshops, and let pairing propagate the muscle from there. Trying to invent the discipline from scratch, in-house, while also shipping is the slowest possible path.

There's a reason the emerging name for this is spec engineering, not spec writing. "Writing" implies documentation — describing something that already exists. Engineering is the discipline of building a system precise enough to behave predictably, and that's exactly what a good spec is: you're engineering the same rigor, decomposition, and edge-case thinking you always did, just one level of abstraction up. You're no longer engineering the implementation; you're engineering the definition that generates it. The work got more leveraged, not less technical — a sharp spec is a harder artifact to produce than the code that falls out of it, because it has to be right about intent, not just syntax.

Leadership's job is to make that legible — to make the experts doing the spec work feel and understand the elevation, not just be told about it. It shows up in how you review: a sharp, minimal spec earns the same recognition a clean piece of code always did. It shows up in who you ask to write the hard ones — the load-bearing specs go to your strongest people, and everyone sees that. And it shows up in growth and titles, because the ability to define a problem precisely enough that an agent executes it cleanly is now one of the most valuable things a person on your team can do, and the org chart should say so. The teams that name this shift as a promotion get engaged experts who lean in. The teams that let it feel like a demotion to documentation get quiet resistance and correction loops.

What It Actually Changes

In short

Teams that make the shift get fewer rollbacks and fewer requirements fights — but the real prize is a defensible moat: the accumulated, well-maintained clarity about what the system should do and why.

Picture the shape of the change on a team that makes this shift. Before: the spec is a Jira ticket description nobody rereads, and success is measured in code shipped. After: spec review becomes part of the definition of done, each feature spec is reviewed before generation, versioned alongside the code, and covered by tests derived from its acceptance criteria. Two things follow almost mechanically. Rollbacks fall, because a regression that violates an acceptance criterion now fails a test before it ships instead of surprising you in production. And the requirements arguments dry up, because the spec is the argument — resolved once, written down, and versioned, instead of relitigated in standup every sprint. The change that tends to surprise people isn't speed. It's clarity: the team stops discovering what it meant to build after it's already built.

The change that surprises people is what it does to the defensible asset. If competitors can generate implementations as easily as you can, the code was never the moat. The moat is the accumulated, well-maintained clarity about what the system should do and why — the decisions, the constraints, the trade-offs already made and written down where they regenerate everything downstream. That's the thing that's hard to copy and expensive to rebuild. That's the thing worth protecting.

We Protected the Wrong Artifact

In short

None of this is really about code. The same spec-to-execution loop runs marketing, content, and product too — humans holding strategy, agents holding execution — and it's the near-future shape of all digital production. Building an organization around it is the work behind APEX.

We spent fifty years getting good at protecting the wrong artifact. Not wrong at the time — code genuinely was the scarce, valuable, hard-to-reproduce thing. But the scarcity moved. When execution gets cheap, the value climbs upstream to the definition, and the definition is the spec.

And here's the part that makes this bigger than a software story: none of it is really about code. Everything I've described is agnostic to the kind of digital work. The same spec-to-execution loop runs a marketing campaign, a content library, a data pipeline, a product surface. The layers change name, not shape — business context still constrains strategy, strategy still constrains the artifact, and an agent still generates the artifact from a spec sharp enough to check. Code was just the first place it became undeniable.

The near-future of digital production

Humans hold the strategy — intent, constraints, taste, trade-offs. Agents hold the execution — fast, tireless, regenerable. The spec is the contract between the two.

That's not a productivity hack bolted onto how we work today. It's a different division of labor — and the organizations that internalize it early will feel the ones that didn't straining to keep up.

Which is why the last mile isn't individual, it's organizational. Transforming a company to run this way means deliberately leaning into the highest tier of rigor and getting the one-person armies out of the critical path. The solo build is intoxicating precisely because it's so fast — but an organization made of black boxes each understood by exactly one person isn't a transformation, it's a pile of key-person risks waiting to walk out the door. Scaling this is a governance move, not a tooling one: shared ownership, versioned specs, review gates, the boring rigor that lets many people build on the same source of truth.

That organizational shift is the harder, more valuable half of the problem, and it's the one I've spent real time on. I've been building an operating model — APEX — for exactly this: how a team designs, specs, and verifies while agents execute and iterate, with the ownership and rigor that make it hold at scale. If this piece resonated, that's where the argument goes next.

The organizations that come out of this era ahead won't be the ones that generated code the fastest. They'll be the ones that understood, earlier than their competitors, that the artifact worth maintaining had changed underneath them — and started treating their specs like the primary artifacts they'd become.

//Read next

© 2026 Herbert Cuba Garcia // built by markdown & AI