The Elastic Loop
Helping software build itself

The Elastic Loop

A framework for everyone delegating work to machines.

Three figures pulling a tangle of tight, elastic and loose loops

The Elastic Loop is a working model for delegating work, specificly building software products, to AI agents without losing track of it: when to keep the loop tight, when to let it stretch, and what has to be true before you let go. It is for engineers, but just as much for the product people, the designers, the domain experts, the people who shape how a team works, and the leaders deciding where the money goes. The breadth is the point. Once an agent does the building, two judgments decide the outcome: what is worth building, and whether what came back is any good. Both are spread across every one of those roles, not concentrated in whoever sits at the keyboard.

How far can you let go?

Chances are you have done this today: you asked an AI for something, read the answer, winced, rephrased, read again. That round trip is the loop, and it is the unit everything here is built from. You put in intent, the agent does some work, you check what came back, and what you learned shapes the next ask. The size of a loop is how much work happens between two of your looks: a sentence, a feature, a week of output.

The version you ran this morning is the tightest one. Every turn under your eyes, every result checked before it counts, and that mode works well. The question this framework keeps circling is how far the loop can stretch beyond it: first elastic, where you hand over a bounded chunk of work and check in at agreed points, then loose, where the agent works for hours or days on its own while you do something else. And what has to be true before you can let go that far.

The whole framework in one sentence

Intent opens the loop. Context grounds it. Backpressure keeps it useful. Verification closes it.

Opens. Somebody decides to delegate and states what good would actually look like.

Grounds. Context is what lets the agent stop guessing and start working from your specifics: your codebase, your customers, your constraints.

Keeps it useful. BackpressureThe resistance an agent works against while it builds, well before any review at the end: a failing test or type error on the technical side, an acceptance scenario or rubric on the product side. Some of it reaches the agent automatically as a signal in the loop; some it imposes on itself by following a discipline set at the start, like writing the failing test first and working until it goes green. The more of it you can encode, the longer you can let the loop run.A red build is backpressure; the agent reads it and fixes the code. So are acceptance criteria: write them well and the agent works against your definition of good as it goes, instead of a person catching the miss at the end. is the word that stuck in agentic engineering for resistance: anything an agent’s output has to survive before it counts (Geoffrey Huntley and Moss Banay did much of the early work on it). A failing test is backpressure, and so is a designer’s critique or an acceptance scenario from someone who knows the domain. Plenty of the resistance that matters never touches a compiler. Engineers will know the word from streams and TCP, where it means downstream throttling upstream; this is a borrowing of the physics, not the protocol.

Closes. The outcome gets checked, and what the loop learned survives it. How expensive that check is for the human decides how many loops you can run at once; more on that in Harness.

Where does your next task belong?

Two questions decide where a task sits: how much line does the agent get, and what holds the output other than you? Read the grid on those two axes. Each step up adds one more dam between the agent and what ships.

Tight
every turn supervised · minutes
Elastic
checkpoints at agreed points · hours
Loose
days of autonomy · outcomes only
Full backpressure
technical + product / domain
High-risk work, research, regulated change
A delegated feature with product backpressure
The dark factory, done right: agents delivering asynchronously against automated checks, humans reviewing outcomes
Technical only
tests, CI, linters
Pair coding with tests, linting etc.
Delegated implementation, system-checked but slop-exposed
Slop at scale
looks like productivity while the output drifts generic
No automated checks
you hold the line
You are the dam, reading every turn
Output piles up between check-ins, nobody on the volume
Sprawl at scale
output multiplying faster than anyone can contain it
Context floor
Loose only opens once the context can serve itself

What is context?

A longer leash does not call for more context. The amount a task needs is fixed. What changes is how much of it the agent has to reach on its own, without you handing it over turn by turn from your own head.

So what is this context, concretely? Everything the agent would need to stop guessing: the decision history that evaporated in a chat thread three months ago, the business rules that live only in your most experienced colleague’s head, the design system, API docs that describe what the system does today rather than two years ago, who the customer is and what they have already tried. Most of this exists in any team that has shipped something. It just rarely exists in a form an agent can read, because heads, hallway conversations, and meeting recordings nobody rewatches do not count.

Freshness is part of the bar, and on the right side of the grid it is the sharp part. In a tight loop a stale document costs you one correction, because you watch the wrong turn happen. In a loose loop a stale document is worse than none, because the agent follows it with full conviction and nobody contradicts it for hours.

Moving that knowledge into a state where it stays available and current without a human pumping it in has become a discipline of its own, usually filed under context engineering. It is slower work than adding another check, which is why the floor sits where it does.

Two ways this goes wrong

Sprawl is the explosion. Slop is the slow collision. Both are what happens when a search process runs without backpressure, and only one of them announces itself.

A Search spaceThe set of possible solutions a task allows. An agent does not translate a spec line by line; it samples from this space, pulled toward one solution by your context, or proposing a few in a planning step. Generating many on purpose and selecting the best is a discipline you opt into.There are many valid ways to implement a story. The agent is choosing among them, not transcribing the one right answer., because that is what an agentic loop is, essentially: it explores a space of possible solutions and looks for one or many variants that fit your intent (see Why).

Sprawl

Output without containment: variant inflation, refactoring PRs nobody asked for, backlogs gathering mold. It happens far from any repo too. Picture the team that generates thirty landing-page variants in an afternoon while nobody is assigned to review even one. The good news is that sprawl announces itself. You can see it in CI logs and PR volume and pull people back. Think of a pressure reactor: the agents generate pressure, and the containment wall is what makes a productive reaction possible in the first place.

Slop

The stock-photo feeling: output that looks plausible, reads smoothly, and could have come from any team prompting any model on any given day. The everyday version is the onboarding text that sounds like every onboarding text ever written. Slop does not announce itself. The damage shows up quarters later, in the market, in usage signals, in the competitor who built the thing properly. It gets squeezed from both sides: context up front keeps the agent from starting in the The statistical middle (slop)Output that converges on the bland average of everything the model has ever read: plausible, smooth, and indistinguishable from anyone else’s.The onboarding text that reads like every onboarding text ever written. of everything it has ever read, and product backpressure at the back pulls the output out of it.

One honest caveat, because sprawl has two natures and the grid only catches one. The kind that comes from missing checks, output that does not fit the system, broken refactors, architectural drift, is what the rows measure, so it shows up as a cell. The kind that comes from missing closure, output that passes every check but that nobody is assigned to read, those thirty landing pages, can happen anywhere, even under full backpressure, because depth of checking is not the same as having someone there to shut the loop. That second kind rides on closure, which I keep off the axes on purpose, since it applies to every cell equally.

Context
positions the start: before any work happens, grounding places the agent at your specifics, outside the middle
ungrounded start
context positions
backpressure pulls
grounded start
your specific solution
the statistical middle — plausible, generic, everyone’s output
Product & domain backpressure
holds it there and pulls the output the rest of the way toward your product
The squeeze: context positions the start outside the middle before any work happens; backpressure pulls the output the rest of the way in the same direction.

Three images I keep reaching for

The truffle

Delegation runs on taste: qualitative judgment built through sensory contact with the work, the kind you get from reading diffs, shipping mistakes, and watching real users get stuck. I reach for this one with engineers who review every turn because letting go feels like losing the craft. Tight is a legitimate mode, just not the only one. And I reach for it with everyone who judges output without building it: designers, domain experts, anyone whose “this is off” is worth more than a passing test suite. The talk this framework grew out of, How Does Truffle Taste?, was built around it.

The mirror

AI amplifies and reflects whatever organizational structure it lands in, including the parts you would rather it didn’t. This one is for leadership teams shopping for tools when the question really sits in the operating model, and for the people who own how the team works, scrum masters and coaches included, because the loop will mirror their rituals right back at them.

The pressure reactor

Backpressure is containment, and containment is what allows a productive reaction at all. This is the image for architecture and harness debates, where “guardrails” sounds like bureaucracy until you see the wall as the thing that lets you run the reactor hot.

Where to start

If you are new to all of this, start with Loops; if you never write code, Roles is your door in.

Even machines.

The quiet part: agents have started applying this loop to themselves. Cursor’s First Proof and OpenAI’s harness engineering arrived at the same shape independently: decompose, parallelize, verify, iterate. Anthropic’s Fable model shows strong signs of being post-trained to put itself in reinforced loops, and the dynamic workflows feature builds harnessed loops on the fly for the task at hand. I am keeping this as a coda rather than a headline, because it is the most interesting thread here and the one I am least sure about.