Helping software build itself

The Elastic Loop

A framework for everyone delegating work to machines.

Three figures pulling a tangle of tight, elastic and loose loops

The Elastic Loop is a working model for delegating work, specificly building software products, to AI agents without losing track of it: when to keep the loop tight, when to let it stretch, and what has to be true before you let go. It is for engineers, but just as much for the product people, the designers, the domain experts, the people who shape how a team works, and the leaders deciding where the money goes. The breadth is the point. Once an agent does the building, two judgments decide the outcome: what is worth building, and whether what came back is any good. Both are spread across every one of those roles, not concentrated in whoever sits at the keyboard.

How far can you let go?

Chances are you have done this today: you asked an AI for something, read the answer, winced, rephrased, read again. That round trip is the loop, and it is the unit everything here is built from. You put in intent, the agent does some work, you check what came back, and what you learned shapes the next ask. The size of a loop is how much work happens between two of your looks: a sentence, a feature, a week of output.

The version you ran this morning is the tightest one. Every turn under your eyes, every result checked before it counts, and that mode works well. The question this framework keeps circling is how far the loop can stretch beyond it: first elastic, where you hand over a bounded chunk of work and check in at agreed points, then loose, where the agent works for hours or days on its own while you do something else. And what has to be true before you can let go that far.

The whole framework in one sentence

Intent opens the loop. Context grounds it. Backpressure keeps it useful. Verification closes it.

Opens. Somebody decides to delegate and states what good would actually look like.

Grounds. Context is what lets the agent stop guessing and start working from your specifics: your codebase, your customers, your constraints.

Keeps it useful. is the word that stuck in agentic engineering for resistance: anything an agent’s output has to survive before it counts (Geoffrey Huntley and Moss Banay did much of the early work on it). A failing test is backpressure, and so is a designer’s critique or an acceptance scenario from someone who knows the domain. Plenty of the resistance that matters never touches a compiler. Engineers will know the word from streams and TCP, where it means downstream throttling upstream; this is a borrowing of the physics, not the protocol.

Closes. The outcome gets checked, and what the loop learned survives it. How expensive that check is for the human decides how many loops you can run at once; more on that in Harness.

Where does your next task belong?

Two questions decide where a task sits: how much line does the agent get, and what holds the output other than you? Read the grid on those two axes. Each step up adds one more dam between the agent and what ships.

Tight

every turn supervised · minutes

Elastic

checkpoints at agreed points · hours

Loose

days of autonomy · outcomes only

Full backpressure

technical + product / domain

High-risk work, research, regulated change

A delegated feature with product backpressure

The dark factory, done right: agents delivering asynchronously against automated checks, humans reviewing outcomes

Technical only

tests, CI, linters

Pair coding with tests, linting etc.

Delegated implementation, system-checked but slop-exposed

Slop at scale

looks like productivity while the output drifts generic

No automated checks

you hold the line

You are the dam, reading every turn

Output piles up between check-ins, nobody on the volume

Sprawl at scale

output multiplying faster than anyone can contain it

Context floor

Loose only opens once the context can serve itself

What is context?

A longer leash does not call for more context. The amount a task needs is fixed. What changes is how much of it the agent has to reach on its own, without you handing it over turn by turn from your own head.

So what is this context, concretely? Everything the agent would need to stop guessing: the decision history that evaporated in a chat thread three months ago, the business rules that live only in your most experienced colleague’s head, the design system, API docs that describe what the system does today rather than two years ago, who the customer is and what they have already tried. Most of this exists in any team that has shipped something. It just rarely exists in a form an agent can read, because heads, hallway conversations, and meeting recordings nobody rewatches do not count.

Freshness is part of the bar, and on the right side of the grid it is the sharp part. In a tight loop a stale document costs you one correction, because you watch the wrong turn happen. In a loose loop a stale document is worse than none, because the agent follows it with full conviction and nobody contradicts it for hours.

Moving that knowledge into a state where it stays available and current without a human pumping it in has become a discipline of its own, usually filed under context engineering. It is slower work than adding another check, which is why the floor sits where it does.

Two ways this goes wrong

Sprawl is the explosion. Slop is the slow collision. Both are what happens when a search process runs without backpressure, and only one of them announces itself.

A , because that is what an agentic loop is, essentially: it explores a space of possible solutions and looks for one or many variants that fit your intent (see Why).

Sprawl

Output without containment: variant inflation, refactoring PRs nobody asked for, backlogs gathering mold. It happens far from any repo too. Picture the team that generates thirty landing-page variants in an afternoon while nobody is assigned to review even one. The good news is that sprawl announces itself. You can see it in CI logs and PR volume and pull people back. Think of a pressure reactor: the agents generate pressure, and the containment wall is what makes a productive reaction possible in the first place.

Slop

The stock-photo feeling: output that looks plausible, reads smoothly, and could have come from any team prompting any model on any given day. The everyday version is the onboarding text that sounds like every onboarding text ever written. Slop does not announce itself. The damage shows up quarters later, in the market, in usage signals, in the competitor who built the thing properly. It gets squeezed from both sides: context up front keeps the agent from starting in the of everything it has ever read, and product backpressure at the back pulls the output out of it.

One honest caveat, because sprawl has two natures and the grid only catches one. The kind that comes from missing checks, output that does not fit the system, broken refactors, architectural drift, is what the rows measure, so it shows up as a cell. The kind that comes from missing closure, output that passes every check but that nobody is assigned to read, those thirty landing pages, can happen anywhere, even under full backpressure, because depth of checking is not the same as having someone there to shut the loop. That second kind rides on closure, which I keep off the axes on purpose, since it applies to every cell equally.

Context

positions the start: before any work happens, grounding places the agent at your specifics, outside the middle

ungrounded start

context positions

backpressure pulls

grounded start

your specific solution

the statistical middle — plausible, generic, everyone’s output

Product & domain backpressure

holds it there and pulls the output the rest of the way toward your product

The squeeze: context positions the start outside the middle before any work happens; backpressure pulls the output the rest of the way in the same direction.

Three images I keep reaching for

The truffle

Delegation runs on taste: qualitative judgment built through sensory contact with the work, the kind you get from reading diffs, shipping mistakes, and watching real users get stuck. I reach for this one with engineers who review every turn because letting go feels like losing the craft. Tight is a legitimate mode, just not the only one. And I reach for it with everyone who judges output without building it: designers, domain experts, anyone whose “this is off” is worth more than a passing test suite. The talk this framework grew out of, How Does Truffle Taste?, was built around it.

The mirror

AI amplifies and reflects whatever organizational structure it lands in, including the parts you would rather it didn’t. This one is for leadership teams shopping for tools when the question really sits in the operating model, and for the people who own how the team works, scrum masters and coaches included, because the loop will mirror their rituals right back at them.

The pressure reactor

Backpressure is containment, and containment is what allows a productive reaction at all. This is the image for architecture and harness debates, where “guardrails” sounds like bureaucracy until you see the wall as the thing that lets you run the reactor hot.

Where to start

If you are new to all of this, start with Loops; if you never write code, Roles is your door in.

01 Loops Tight, elastic, loose: three zones and how to size the loop for the task in front of you. None of the zones outranks the others, each just comes with different preconditions. → 02 Why Why stretch a loop past tight at all, and is the risk worth it? The payoff comes down to how well you can judge what comes back. The economic and technical case, now with measurement behind it. → 03 Harness The backpressure layers in full: what holds agent output honest against the system, and what holds it honest against the product. → 04 Grading Outcome grading is the new specification. Tests, rubrics, scenarios, golden examples of known-good output, and why a rubric is not automatically truth. → 05 Roles Every role carries judgment about agent work that nobody else can supply. What engineers, product people, designers, domain experts, the people who run the process, and leaders each bring to the loop. →

Even machines.

The quiet part: agents have started applying this loop to themselves. Cursor’s First Proof and OpenAI’s harness engineering arrived at the same shape independently: decompose, parallelize, verify, iterate. Anthropic’s Fable model shows strong signs of being post-trained to put itself in reinforced loops, and the dynamic workflows feature builds harnessed loops on the fly for the task at hand. I am keeping this as a coda rather than a headline, because it is the most interesting thread here and the one I am least sure about.