//the operating model
APEX_
Agentic Production Execution
A production-grade operating model for teams where humans design and verify while agents execute and iterate. Organizational scaffolding that makes agentic execution reliable, measurable, and improvable at scale.
//The APEX Cycle
The heartbeat of the framework. Three phases that repeat continuously — and a system that gets better every cycle.
01
HUMAN-FIRST
Strategic
~2–3x velocity
Humans design the system: specs, quality criteria, agent configuration, permissions. This is not overhead — it is the work. Agents can only execute within the boundaries humans draw.
→
02
AGENT-FIRST
Execution
~10–20x velocity
Agents execute against the specs. Agent-to-agent review loops resolve mechanical problems before a human ever sees the work. The single human role: verification.
→
03
LEARN
Reflection
~1–2x velocity
Evaluate → Reflect → Calibrate. Agents pre-compile the metrics; humans decide what to change. The output feeds back into Strategic — and the next cycle runs on a better system.
↺ calibration feeds back into Strategic — the cycle repeats on an evolved system
//1. Strategic — Three areas, nine domains
Every capability has a home. Every domain has a clear owner, clear boundaries, and clear artifacts. None falls through the cracks.
Platform
the foundation everything runs on
D1
Infrastructure
The harness decision, model strategy, compute, tool integrations. The most consequential choice in the framework.
D2
Operational Tooling
Dashboards, metrics pipelines, context generators. The bridge between measuring and acting.
D3
Security & Compliance
Least privilege, permission maps, audit trails. Invisible when working, catastrophic when neglected.
Spec
what humans specify & measure
D4
Business Context
The "why" and "for whom". Brand, personas, vision, constraints — everything agents need to produce domain-appropriate output.
D5
Spec Engineering
The translation layer between strategic thinking and executable work. The PRD is the single source of truth.
D6
QA Strategic
Humans define what "good" means at the system level. Measurement plans, evaluation criteria, definition of done.
Config
what humans configure for agents
D7
Agent Design
Identity files, skills, memory, behavior. The richer the identity, the less the agent infers — and inference is where drift happens.
D8
Orchestration Design
Routing rules, delegation chains, handoff protocols. Owns the traffic between agents — not what they do.
D9
QA Operational
Quality gates inside the execution loop. Designed by humans, enforced by a skeptical review agent — the generator never grades its own homework.
//2. Execution — The inner loop
Domain-agnostic. Humans see pre-validated work, not first drafts.
agents iterate in seconds — by the time work reaches a human, the mechanical problems are resolved
// 3. Reflection — The metrics
Principles without measurement are just opinions. These are the initial recommended metrics to tell you whether the system is improving or degrading. Each implementation will need their own bespoke metrics.
M1
First-Pass Acceptance
Share of deliverables accepted at human verification without another round. The clearest signal of spec quality.
M2
Iteration Depth
Average agent-to-agent iterations per task. Watch the trend — decreasing means sharper specs.
M3
Human Touch Rate
Tasks needing human intervention outside designed verification points. Should decrease over time.
M4
Calibration Impact
The meta-metric: change in the other metrics cycle over cycle. Flat impact = ceremony without learning.
M5
Cycle Time
Spec-to-verified-delivery, end to end. Shrinking cycle time is the clearest signal the system is maturing.
//One agnostic framework, many different implementations
APEX is instantiated per use case. Same areas, domains, and phases — configured differently for fundamentally different work.
Product development
weekly cycles · autonomous harness
An Architect agent decomposes the PRD; Frontend and Integrator agents build in parallel; a QA agent codifies verified work into tests. Developers verify intent, not CSS.
Content production
daily cycles · autonomous harness
Brief → Research → Writer → skeptical Review agent → iterate → Editorial Lead verifies. The brand voice lives in the writer's identity file; briefs go in Monday morning, verified articles ship by afternoon.
Data & research
daily runs · DAG harness
A fixed, auditable pipeline: three analyst agents in parallel → Correlator → Report Writer → Compliance Checker → human sign-off. Determinism beats recoverability when compliance is on the line.
//The ten principles
01Harness first. Your runtime choice sets all constraints. Decide it before you configure anything else.
02Human in control of outcome. Not every step — the result. Design, verify, decide.
03Quality in = quality out. Output is a direct function of specs, context, and criteria.
04Agents review agents first. All work passes agent review before a human sees it.
05Domain-mapped ownership. Quality gates map to expertise, not generic reviews.
06Iterate often, iterate fast. Don't gate iterations behind human approval when agents can resolve them.
07Least privilege. Every agent gets only the access it needs. No more.
08Calibrate the system, not just the output. Repeated fixes mean the system needs to change.
09Data-driven reflections. Agents report metrics. Humans decide on data, not gut feelings.
10Think big, scale back. Design the whole system first. Remove what's premature. Keep the architecture.
Go deeper
The complete reference and the three full walkthroughs live in the insights.