FAQ
This page answers the practical questions that come up once the framework clicks, starting with: how do I apply this to my team? The book is written for you, a human reader, first. But it also assumes you will put your own agent to work on it, and it is built so an agent can pick it up and carry the framework straight into your situation.
How do I apply this framework?
Start with something built into the site that is easy to miss. Every page here has a plain-markdown twin (/loops.md, /grading.md, and so on for each page), the whole book is concatenated at /llms-full.txt, and there is a map at /llms.txt. Because agents love markdown, because everyone loves markdown, right? Hand the material to your agent and let it reason about your situation in these terms.
I think of the book as a context trajectory as much as a text: the same pages you read are also a body of context you can hand to an agent, so that its questions, its diagnoses, the artifacts it drafts, all come back shaped by the framework instead of the The statistical middle (slop)Output that converges on the bland average of everything the model has ever read: plausible, smooth, and indistinguishable from anyone else’s.The onboarding text that reads like every onboarding text ever written. of everything ever written about AI and teams. Reading it yourself and putting an agent on it are not in competition. The reading is where your own judgment forms; the agent is how you bring that judgment to bear on your specifics without working through all of it by hand.
What that looks like in practice, roughly in the order the book moves:
- Organizational readiness. Hand the agent Loops and its four strategic questions, then your real context: the artifacts (architecture docs, codebases, a handful of recent pull requests), the work in flight (open stories, your recent agent sessions), and the thing none of those show, how decisions actually get made. Then ask it where your current loop sizes are choices and where they are accidents.
- Loop sizing. Give it a specific task off your board and the seven sizing criteria, and make it argue for a size and a set of gates instead of reaching for “loose” because that feels like progress.
- HarnessThe scaffold that turns a model into an agent, assembled from many parts. Among those: the loop it works in, the tools it can reach, how its context is managed as a run grows long (compression, retrieval), the hooks that fire on what it does, subagents, and guardrails. Backpressure and other resistance attach here, and beyond it.An interactive agent tool like Claude Code, Codex, or Pi is a harness. You have been working inside one all along. inventory. Walk it through the harness clusters and the two BackpressureThe resistance an agent works against while it builds, well before any review at the end: a failing test or type error on the technical side, an acceptance scenario or rubric on the product side. Some of it reaches the agent automatically as a signal in the loop; some it imposes on itself by following a discipline set at the start, like writing the failing test first and working until it goes green. The more of it you can encode, the longer you can let the loop run.A red build is backpressure; the agent reads it and fixes the code. So are acceptance criteria: write them well and the agent works against your definition of good as it goes, instead of a person catching the miss at the end. layers against what you actually have wired up, so the gaps in product and domain backpressure stop being invisible. Why not let your agent propose extensions for the harness you’re using right now?
- Grading material. Point it at Grading and a slice of your domain, and have it draft the boring artifacts (RubricA written list of what “good” means for a kind of output, so the same standard can be applied to every result.Acceptance criteria, reused as a grading checklist instead of a one-off., scenarios, and especially CounterexampleA plausible-looking but wrong output you keep on file, so the system learns never to produce that kind of thing again.The bug you once shipped and then wrote a regression test for.) a loose loop needs before it can run.
One caveat, because the framework turns on it: an agent applying this to your org is itself a loop, and its read of your situation is a starting grip, not truth, in the way a rubric is not automatically truth. It will sound confident about a trust problem it cannot actually see, or wave through a loop size your blast radius does not support. The judgment about whether its diagnosis is any good is still yours, and supplying that judgment is the whole skill this framework is about. So treat the output as a strong first draft to argue with, rather than a verdict to adopt.
Do I need to be an engineer to apply it?
No, and treating the framework as an engineering-exclusive topic would make me half-sad. Half of it is judgment about results rather than code: whether the thing was worth building, which of several variants is the right one, whether the output is the real thing or the plausible-generic version of it. Roles is the long version. The short version is: product people, designers, and domain experts supply resistance no compiler, no test suite can produce, and they supply it across the whole loop, not just for the review at the end.
Isn’t this just spec-driven development?
No, though the two get confused right now, because the industry is mid-swing back toward writing everything down before the agent moves. A spec is useful when it structures the Search spaceThe set of possible solutions a task allows. An agent does not translate a spec line by line; it samples from this space, pulled toward one solution by your context, or proposing a few in a planning step. Generating many on purpose and selecting the best is a discipline you opt into.There are many valid ways to implement a story. The agent is choosing among them, not transcribing the one right answer.. Replace the space with it, though, and you’ve turned the agent into a typist and paid for a search you never ran. Grading sorts out where specs help and where they hurt, and why it’s important that you move from the spec you write up front to the grading material you keep iterating on. The spec alone is just context.
Is Scrum dead?
Yes, but it matters which part dies. Scrum is risk containment, and the two-week sprint bundles two different risks under one cadence. The first is building the thing wrong: the wrong implementation, the wrong technical path. That was expensive to undo when a person typed it slowly by hand, so capping the damage at two weeks was a sound economic bet. Agents collapse that bet, because redoing the implementation is cheap now.
The second risk is building the wrong thing entirely: the wrong feature, a wrong read of what the work was even for. The sprint review caught that one through stakeholder and PO feedback. It never had anything to do with typing speed. Agents arguably make it worse, because building the wrong thing fast is how you end up with sprawl and slop. So this half of the bet does not collapse. It gets sharper, and the check Scrum ran every two weeks now has to run closer to continuously.
What dies, then, is the calendar: the fixed ceremony that paced both risks at one rhythm. The judgment the review carried survives. It just moves into Grading and product backpressure instead of a recurring slot. Loops lets you set that cadence per task, where the sprint used to set one rhythm for everything at once. Loop sizing is a per-task dial, so it answers the cadence question and not the separate one of coordinating several people who each drive their own agents, which Scrum also handled.
In a sense, you can say that agentic engineering is the biggest lever to finally work truly agile. Iteration is all we need.
We are debating which model to standardize on. Where does that fit?
Lower on the list. The measurement on Why and Harness (clawbench) finds that the setup around the model, the HarnessThe scaffold that turns a model into an agent, assembled from many parts. Among those: the loop it works in, the tools it can reach, how its context is managed as a run grows long (compression, retrieval), the hooks that fire on what it does, subagents, and guardrails. Backpressure and other resistance attach here, and beyond it.An interactive agent tool like Claude Code, Codex, or Pi is a harness. You have been working inside one all along., moves the result about ten times more than the model choice does. Pick a capable frontier model, then put the argument where the leverage sits: the loop, the backpressure, and the grading around it.
Should our harnesses be standardized, or custom to each use case?
Both, but split along the right seam. The harness itself has to be specific. It encodes this codebase, this domain’s rules, these graders. A one-size harness handed to every team is the statistical middle in tooling form, the same slop you are working to keep out of the output. There is no generic harness worth having, for the same reason there is no generic product worth shipping.
What standardizes is the layer underneath, the disciplines rather than the device: how a team builds a harness and keeps it from rotting, the backpressure clusters it works through, the grading vocabulary, the readiness gate before a task goes loose. A hospital is the closest picture. Every patient gets a custom course of treatment, and nobody franchises that. What is standard is the process around it, the checklists, the sterile technique, the handover discipline. You standardize the process that produces safe variation, not the variation itself.
In between sits a parts catalog: shared skills, templates, reviewer subagents, grading sets other teams can lift and adapt. Treat those as building blocks to compose, never a finished machine to install. The finished machine is always local.
Why is the book so short?
Because tactics age in months and strategy doesn’t. Anything I could pin down about today’s exact tools, the flags, the model names, the way one harness wires up its hooks, would be wrong by the time you read it, and the book is meant to outlive that. So it stays at the level that holds (probably): how to size a loop, what context buys you, how to codify judgment. The book lives and grows, but it grows along that line.
There is a second reason, and it is the framework turned on itself. The book is context, and context you hand to an agent costs something. A bloated, tactical book would clog your own context window and your agent’s, and most of what clogged it would be the statistical middle anyway. I would rather it ground you: a small, dense body of context that shapes the questions your agent asks and the diagnoses it offers. It can synthesize your specifics on its own, once the framework has it in its grip. The book does not need to carry them.
Why is this book a website?
Because the good old web still serves more devices, and more agents, than any other format. A PDF or an EPUB freezes the moment it ships; this thing keeps living, and the web is the only place that update reaches you without a re-download. Every page here already comes with a plain-markdown twin, and the whole book concatenates at /llms-full.txt, so your agent reads it as easily as you do.
Everyone is using their own agent now to bend concepts onto their own situation, their role, what they already know. A book fights that. Who wants to copy-and-paste their way through context-managing a book by hand, one that keeps changing under them? Hand an agent a URL and that problem disappears: you point it at the page, and it does the bending for you.