Practical loop design guideUpdated Jul 30, 2026By Editorial Automation Desk

What Is Loop Engineering for AI Coding Agents?

Learn loop engineering for AI coding agents through Addy Osmani's workflow context, a budget calculator, stop rules, and a downloadable proof-of-done JSON contract.

Best for: Developers, staff engineers, and platform teams adopting agentic coding workflows in Cursor, Claude Code, Codex, or custom CI agents.
Use this guide to: Developers want to move from prompting agents task-by-task to designing durable loops that plan, change code, verify results, and stop safely.

Quick Answer

Loop engineering for AI coding agents means designing the repeatable control loop around the agent, not writing one better prompt. Addy Osmani's loop engineering approach is useful context, but the practical repo loop is Plan → Act → Observe → Verify → Stop, a stricter version of the act → observe → reason cycle: give the agent a bounded goal, inspect tests or diffs, retry only with a changed strategy, and stop on token or cost caps, repeated failure, risky permission changes, or a required human checkpoint.

Workflow

Plan, act, observe, verify, retry or stop

Use this as the first loop design before adding schedules, subagents, or cloud runners.

PlanDefine the repository task, allowed files, verification command, budget, and approval boundary.
ActLet the agent make the smallest useful change, create the artifact, or run the selected tool.
ObserveRead tests, logs, diffs, screenshots, telemetry, or tool output before choosing the next action.
VerifyRun the named check and compare the result with the done signal rather than relying on a summary.
Retry or stopRetry with a changed strategy, escalate to a human, or stop when success, risk, repetition, or budget says so.

Pattern catalog

Loop engineering pattern matrix

Verified from practitioner and vendor guidance on July 30, 2026.

Pattern	Best for	Termination	Tool examples	Risk to control
Plan → execute → verify	Contained repo tasks with a clear success command such as tests, type checks, or builds	Stop when verification passes or the iteration cap is reached	Claude Code agent sessions, Cursor Agent mode, Codex with AGENTS.md verification rules	Unbounded retries that churn files without reading failing output
Retry with capped iterations	Flaky commands, migration steps, or setup tasks where one more attempt often succeeds	Hard cap on attempts per item, then escalate to a human	Claude Code /loop on a schedule, Cursor Automations with explicit max-run notes	Burning tokens on the same bad assumption instead of changing strategy
Evaluator → optimizer	Work with clear quality criteria: reviews, docs, test plans, or refactor proposals	Stop when the evaluator accepts the output or improvement stalls	Claude Code subagents (maker + checker), Codex subagents in .codex/agents/	Two agents agreeing on a polished but wrong answer without ground-truth checks
Explore → narrow → implement	Unfamiliar codebases, incident triage, or tasks where the first file guess is often wrong	Stop when the target files and change scope are identified, then switch to bounded edits	Read-only subagents, Ask mode, then Agent mode on a scoped branch	Endless exploration with no handoff to a bounded implementation loop
Scheduled wake-up loop	Recurring hygiene: dependency alerts, nightly test triage, changelog scans, backlog grooming	Each run ends with a summary, ticket, or no-op; the schedule does not imply infinite in-run retries	Claude Code /loop and cron, Cursor Automations, GitHub Actions agent jobs	Treating a cron job as a loop without in-run observation and stop rules
Human-in-the-loop checkpoint	Production changes, permission widening, schema migrations, or destructive operations	Pause until a named human approves, rejects, or narrows scope	Hooks before deploy, MCP approval gates, PR-required cloud agent output	Automating past the checkpoint because the loop 'almost' finished

Browser-based planning tool

Agent Loop Budget Calculator

Estimate the hard upper bound before an agent loop starts. No values leave your browser.

Maximum planned exposure

Tokens: 125,000
Tool calls: 40
Estimated token cost: $1.25
Risk level: Low

Recommended stop rule: Stop on success, the iteration cap, a repeated failure, or any request for wider permissions.

This is a planning ceiling, not a bill estimate. Cached tokens, output pricing, tool fees, retries outside the loop, and vendor-specific billing can change actual cost.

Definition

Loop engineering is the outer control system

It governs an agent's repeated attempts; it is not merely asking the model to try again.

Operational definition

Loop engineering designs the outer plan → act → observe → verify cycle around an AI agent. A complete loop names its done signal, evidence gate, retry strategy, stop rules, token/time/cost budget, permission boundary, and escalation owner before execution starts.

Proof of done

Evidence gate for an AI coding agent loop

Use this KyenAI template to turn an agent's completion claim into an auditable state transition.

Bind evidence to source state

Record the commit SHA or immutable revision that every receipt was generated against.

Require fresh receipts

Reject test, build, review, or screenshot evidence produced before the current source revision.

Separate claim from verdict

Treat the agent summary as a claim; use commands, policies, or a named reviewer to decide the state transition.

Name every terminal state

End as verified, review-required, blocked, or stopped-by-budget instead of an ambiguous DONE message.

Proof-of-done contract

Copy the JSON template into your runner, replace the placeholders, and allow a terminal-state transition only after every required receipt is current and independently checked.

{
  "sourceRevision": "<git-sha>",
  "claimedState": "ready-for-review",
  "requiredGates": [
    { "id": "focused-tests", "command": "npm test", "fresh": true },
    { "id": "diff-review", "reviewer": "not-the-authoring-agent" }
  ],
  "allowedTerminalStates": [
    "verified", "review-required", "blocked", "stopped-by-budget"
  ]
}

Cautions

This is a KyenAI starter contract, not the Proof-or-Stop reference implementation.
A passed gate proves only the named check under its stated trust model; it does not prove semantic program correctness.

Download proof-of-done-contract.json

Practical example

Example: turn a flaky coding-agent task into a bounded loop

A team asks an AI coding agent to fix failing frontend tests after a dependency upgrade. The old prompt was: 'fix the tests.' The loop version names the files, the test command, the maximum attempts, and the stop rule.

Plan: inspect the failing test output and list likely files without editing.
Act: patch one cause at a time on a branch, keeping generated files out of scope.
Observe: run the specific failing test first, then the wider test file if the focused check passes.
Verify: run the project test command and capture the exact pass or fail state.
Retry/stop: retry once with a different hypothesis; after the second same failure, produce a handoff note instead of rewriting more code.

Implementation sketch

Pseudo-code for a bounded agent loop

Translate this control structure into LangGraph, a CI job, or your agent runner of choice.

Bounded plan-act-observe-verify loop

Shows the loop controls that matter: verification, attempt cap, repeated-failure detection, and human escalation.

goal = "fix failing frontend tests after dependency upgrade"
verify = "npm test --prefix frontend -- guide"
max_iterations = 3
same_failure_count = 0

for attempt in range(1, max_iterations + 1):
    plan = agent.plan(goal, current_test_output, allowed_files)
    patch = agent.act(plan)
    result = run(verify)

    if result.passed:
        stop("done", patch, result)

    if result.failure_signature == previous_failure_signature:
        same_failure_count += 1
    else:
        same_failure_count = 0

    if same_failure_count >= 2 or touches_production_boundary(patch):
        escalate("human review", result, patch)

    current_test_output = result.output

stop("iteration cap reached", current_test_output)

Cautions

Do not run unattended loops against production paths without an approval checkpoint.
Record real token, cost, and intervention numbers only when your runner actually measures them.

Stop rules

When should an AI agent stop the loop?

Termination is the safety feature. Add it before parallelism, schedules, or broader permissions.

Stop when the verification command passes and the required artifact exists.
Stop after two repeated failures with the same root cause unless a human approves a new strategy.
Stop before production changes, permission expansion, data deletion, or secret access.
Stop when the run reaches its token, time, cost, or parallel-agent budget.
Stop when the agent cannot explain the next action from observed evidence.
Stop when the loop would create a second page, branch, or tool path that competes with the original goal.

Anatomy

Five loop building blocks

Every durable coding-agent loop needs these controls before you scale parallelism or schedules.

Clear goal and done signal

State what finished means: passing command, merged PR, ticket filed, or report delivered.

Observable tools

Give the agent tests, linters, logs, diffs, or MCP access so each cycle produces evidence.

Context budget

Load repo instructions, skills, and scoped files deliberately; trim between iterations.

Termination and escalation

Cap iterations, name escalation paths, and stop when the same error repeats.

Cost and concurrency limits

Budget tokens, parallel agents, and runtime so loops cannot run unbounded overnight.

Decision and evidence path

Move from tool choice to verifiable policy

Compare Codex, Claude Code, Cursor, and Copilot
Choose the operating model and the two tools worth piloting.
Verify AGENTS.md, CLAUDE.md, Copilot, and Cursor rules
Check which instruction file each approved product surface actually reads.
Open the datasets, templates, and security checklists
Download versioned evidence and cite the underlying asset rather than this summary.

Evidence reviewed

Learn loop engineering for AI coding agents through Addy Osmani's workflow context, a budget calculator, stop rules, and a downloadable proof-of-done JSON contract. Evidence reviewed for this practical loop design guide: Loop Engineering; Building effective AI agents; What is loop engineering?. Decision scope: Developers want to move from prompting agents task-by-task to designing durable loops that plan, change code, verify results, and stop safely. Product behavior after Jul 30, 2026 and any benchmark result not shown on this page remain unverified.

Best next step

Write loop stop rules into AGENTS.md, CLAUDE.md, or Copilot instructions. Agent loops need repository instructions that name verification commands and human checkpoints.

Methodology and disclosure

KyenAI writes this guide as an independent editorial reference, not as an endorsement page for any third-party tool. The comparison, checklist, or workflow advice is based on visible source material from Addy Osmani, Anthropic, Kilo, arXiv, plus the operational constraints named on the page. Product behavior, pricing, availability, and enterprise controls can change after the listed update date, so vendor-specific claims should be rechecked against the linked sources before procurement, migration, or production rollout.

What Is Loop Engineering?

Loop engineering is the practice of designing the repeatable control system around an AI agent: goal, context, tool access, observation, verification, retry policy, and stop rules. In coding work, that system should read repository evidence such as test output, compiler errors, diffs, logs, screenshots, or review comments before deciding the next step.

The reason it matters for software teams is simple: a coding agent can sound confident while still looping on the wrong file, rerunning the same failing command, or expanding scope without proof. A loop-engineered workflow makes the agent show evidence, change strategy after failure, and stop before cost, risk, or permissions drift out of bounds.

What Is Addy Osmani's Loop Engineering Approach?

The useful reading of Osmani's idea is that engineers should stop babysitting every agent turn and start designing the operating loop around the agent. The prompt still matters, but it is only one part of a system that also includes tools, feedback, memory, cost limits, and termination.

In a coding workflow, that means the loop must know what done looks like before the first edit happens. A good loop names the repository task, the allowed files or tools, the command that proves progress, the maximum attempts, and the human checkpoint for risky actions.

How Does Loop Engineering Work for AI Coding Agents?

Start with a plan, then let the agent act on the smallest useful change. The observe step reads evidence from the repository rather than trusting a fluent status update: failing tests, compiler output, diffs, logs, screenshots, or MCP tool responses. Verification turns that observation into a verdict. Retry only when the next attempt changes the strategy; otherwise stop and hand the evidence to a human.

This is why loop engineering is especially useful for AI coding agents. Software work already has observable signals: tests pass or fail, builds complete or fail, files changed or did not, and pull requests can be reviewed. The loop should attach to those signals instead of treating agent confidence as evidence.

Loop Engineering vs Prompt Engineering

Prompt engineering optimizes a single instruction or conversation. Loop engineering designs the repeated system around that instruction: context loading, tool access, action, observation, verification, retries, and stop conditions. A strong prompt can still fail inside a weak loop if the agent cannot see test output or has no rule for when to stop.

Loop engineering is also different from a cron job. A cron job runs a fixed command on a schedule. A coding-agent loop observes current state and decides the next action before it continues. If nothing inside the run observes evidence and changes strategy, it is scheduling, not loop engineering.

Where loops live in today's tools

Claude Code supports recurring work through /loop scheduling, hooks that fire at lifecycle points, subagents for split explore-implement-verify roles, and headless or CI-style runs that persist after a laptop closes. Cursor supports long-running cloud agents, parallel agents on isolated branches, and Automations triggered by GitHub, Slack, Linear, or schedules. Codex and similar agents implement loops through tool calls, subagents, and repository instructions that name verification commands.

The surface differs by vendor, but the architecture repeats: goal, context, tools, observation, adjustment, termination. Pick the tool by where your team already works, then design the loop around observable repo evidence rather than model charisma.

A Practical Loop Engineering Example

Suppose a dependency upgrade breaks frontend tests. A weak instruction says: fix the tests. A loop-engineered instruction says: inspect the failing output first, edit only the affected test or component files, run the focused test command, then run the wider project check if the focused command passes. Retry once with a different hypothesis, but stop after the same failure repeats twice.

That example matters because the loop is not trying to be autonomous in every direction. It is narrow, observable, and cheap to review. The agent can keep working through routine failure, but it must stop before rewriting unrelated files, touching production configuration, or burning more attempts on the same root cause.

When Should an AI Agent Stop the Loop?

Stop rules are not administrative decoration; they are the safety feature. Stop when verification passes, when an iteration cap is reached, when the same failure repeats, when the next action requires wider permissions, when cost crosses the budget, or when the agent can no longer tie its next action to observed evidence.

For team use, make the stop rule visible in AGENTS.md, CLAUDE.md, a Copilot instructions file, or the workflow configuration that starts the loop. The safest loops are boring to audit: every retry has a reason, every escalation has an owner, and every successful run leaves a concise artifact.

What counts as proof that an AI coding loop is done?

Treat DONE, tested, reviewed, and ready-to-merge as claims until current evidence supports the transition. Bind every test result, build receipt, diff review, or screenshot to the exact source revision it checked; reject stale receipts after the code changes; and record whether the run ended verified, review-required, blocked, or stopped by budget.

A July 2026 Proof-or-Stop preprint formalizes this as evidence-gated lifecycle control: fresh, source-state-bound, mechanically verifiable evidence must satisfy a gate before state can advance. The paper is a preprint with an evaluation limited to one model family, 24 ablation tasks, and a self-hosted corpus. Its operational proof means evidence admissible under a stated trust model, not a guarantee of semantic program correctness. The downloadable KyenAI contract below is an independent starter template, not the paper's reference implementation.

AI coding agent workflow checklist

Before scaling Automations, /loop schedules, or cloud agents, confirm the workflow has a named owner, a bounded goal, a verification command, a retry cap, a human checkpoint for risky actions, and a log that avoids secrets.

The checklist should answer five questions: what done looks like, what evidence the loop reads, what tools it may call, when it must stop, and who approves escalation. If any answer is missing, the workflow is still a prompt, not loop engineering.

Pilot on one repository task first. Record review time, token use, and human interventions actually observed. Add parallel agents or schedules only after a single-threaded loop is trustworthy.

Common Patterns and When to Use Them

Plan-execute-verify fits bounded repo tasks with a clear pass command. Retry-with-cap helps flaky setup steps but needs a hard attempt limit per item. Evaluator-optimizer pairs work well for reviews and docs when criteria are explicit. Explore-narrow prevents premature edits in unfamiliar code. Scheduled wake-up loops handle recurring triage. Human-in-the-loop checkpoints belong before production, permission widening, or destructive operations.

Anthropic's agent guidance recommends adding complexity only when simpler flows fail. Start with one loop on one repository task, measure review effort and token use, then add subagents or schedules only when the simpler loop stalls.

Recommended play

Start with one real repository task and a single plan-execute-verify loop before adding schedules or parallel agents.
Write the done signal as a command or artifact, not a vibe: passing tests, green build, opened PR, or filed ticket.
Cap iterations per item and escalate when the same failure repeats twice with the same root cause.
Separate exploration from implementation so read-only passes cannot mutate production paths.
Budget tokens and concurrency before running unattended cloud or scheduled loops.

When to prompt vs when to loop

Use this table to decide whether a task needs a durable loop or a single supervised agent session.

Area	Prompt once when	Design a loop when	Stop rule to add
Task shape	The steps are predictable and fit one focused session	The agent must read errors, revise, and re-run verification	Name the verification command and maximum iterations
Duration	You can stay at the keyboard for the whole task	Work should continue while you review other items or close the laptop	Set a schedule or queue with a summary artifact per run
Risk	Changes are reversible and confined to a local branch	The loop touches shared files, CI, production config, or permissions	Require a human checkpoint before merge or deploy
Cost	Token use is small and visible in one sitting	Retries, parallel agents, or long horizons can compound quickly	Set per-run and per-day budgets with automatic stop
Team workflow	One engineer needs a quick answer or small patch	A team wants repeatable triage, review, or hygiene across repos	Publish run logs without secrets and name an owner for loop drift

Execution steps

Name the goal and done signal

Write what finished means in observable terms: command output, PR state, ticket link, or report section. Avoid fuzzy goals like 'make it better' that let the loop run without a verdict.

Choose the first pattern

Default to plan-execute-verify for code changes. Add evaluator-optimizer only when review criteria are explicit. Reserve scheduled wake-up loops for recurring triage after the single-task loop works once.

Wire observation before speed

Give the agent tests, linters, build commands, diff review, or MCP tools that return ground truth. A loop without observation is just expensive repetition.

Set termination and escalation

Cap attempts per file or task, stop when the same error repeats, and name who approves production or permission changes. Document what the loop should do when blocked.

Write the loop into repository instructions

Record the verification command, retry cap, forbidden paths, and human checkpoint in AGENTS.md, CLAUDE.md, Copilot instructions, or the workflow file that launches the loop.

Pilot, measure, then parallelize

Run the loop on one repo task, record review time, token use, and human interventions actually observed. Add parallel agents or cloud handoff only when single-threaded loops are trustworthy.

Common pitfalls

Fuzzy goals with no done signal

Translate goals into a verification command, required artifact, or explicit human acceptance step before the first unattended run.

Unbounded retries on the same mistake

Cap iterations per item and change strategy after repeated failures instead of paying for identical attempts.

Cron without an agent decision-maker

Ensure each run observes current state and chooses the next action; a fixed script on a timer is scheduling, not loop engineering.

Parallel agents on shared files

Isolate branches or assign disjoint ownership; merge results deliberately instead of letting agents overwrite each other.

Implementation checklist

Write the goal and done signal in observable terms.
Pick plan-execute-verify as the default loop pattern.
Attach tests, linters, or builds as loop observation.
Cap iterations and name escalation for repeated failures.
Add human checkpoints before production or destructive actions.
Budget tokens and parallel agents before unattended runs.
Log outcomes without secrets and assign a loop owner.

Questions this guide answers

What is loop engineering for AI coding agents?

Loop engineering means designing the operating loop around the agent instead of writing one better prompt. For coding agents, the practical loop is Plan → Act → Observe → Verify → Stop: give the agent a bounded goal, inspect tests or diffs, retry only with a changed strategy, and stop on token caps, repeated failure, or a required human checkpoint.

When should an AI coding agent stop the loop?

Stop when verification passes, when an iteration cap is reached, when the same failure repeats, when the next action needs wider permissions, when cost crosses the budget, or when the agent can no longer tie its next action to observed evidence. Document stop rules in AGENTS.md, CLAUDE.md, Copilot instructions, or the workflow file that launches the loop.

How is loop engineering different from prompt engineering?

Prompt engineering improves a single instruction. Loop engineering designs the repeated system around that instruction: context loading, tool access, action, observation, verification, retry limits, and stop conditions. A strong prompt can still fail inside a weak loop if the agent cannot read test output or does not know when to stop.

What belongs on an AI coding agent workflow checklist?

Name the goal and done signal, pick plan-execute-verify as the default pattern, attach tests or linters as observation, cap iterations, add human checkpoints before production or destructive actions, budget tokens before unattended runs, and assign a loop owner who reviews logs without secrets.

What counts as proof that an AI coding agent loop is done?

An agent's DONE message is a claim, not proof. Require fresh evidence tied to the exact source revision: passing commands, required artifacts, an independently checked diff, and any named human approval. Record an explicit terminal state such as verified, review-required, blocked, or stopped-by-budget. A passed gate proves the named check under its trust model, not semantic correctness of the whole program.