Skip to content

Chapter 2: The ReAct Loop — The Core Engine of an Agent

The core question: You issue a command — how does the Agent turn it into a final result after a dozen steps? The key to that journey is a loop that continuously drives itself forward.


Ordinary AI answers questions. An Agent solves tasks.

The gap between those two words is much larger than it sounds. Answering a question takes one step — "What's the temperature in Beijing today?" — the AI checks and tells you 23°C, done. But solving a task often requires a dozen steps, and the direction of each step depends on the result of the previous one.

"Help me organize all the TODO comments in the project, categorize them by module, and generate a list ready for review" — that's not a question, it's an engineering job.

What OpenClaw can do is exactly this kind of thing. And the fundamental reason it can do it is a core engine called the ReAct loop.

Understand this loop, and you understand the essential difference between an Agent and an ordinary LLM.


I. The Ceiling of "Answer Once"

Start with a familiar scenario. You open ChatGPT and ask it to analyze your project's code structure. It's smart — it tells you how to analyze it, even gives you a detailed analysis framework.

But if you actually want it to analyze it for you — it gets stuck. It can't see your code.

Even if you paste the code in and it tells you "there's a potential null pointer issue on line 47," and you want it to go ahead and fix it — it gets stuck again. It has no way to modify your files.

Even if you manually tell it how to fix it, come back tomorrow and it's a fresh start from zero.

This isn't ChatGPT being insufficiently smart. This is the ceiling of single-turn question-and-answer mode:

LimitationRoot Cause
Cannot proactively gather informationNo tools — can only rely on the user to feed it data manually
Cannot make sustained progressEach response is independent; it can't build on the result of the previous step
Cannot self-correctIf the output goes wrong, it goes wrong — no mechanism to verify or adjust
Doesn't remember progressMid-task, the next conversation starts from zero

The essence of single-turn dialogue is: you provide all the information, the AI produces an answer in one shot, and that's it.

But real tasks don't work that way. Real tasks are dynamic — you can't know all the information you'll need at the start, and the result of each step shapes the direction of the next. This requires a completely different way of working.


II. The Birth of the Loop: Observe → Think → Act

In October 2022, researchers at Princeton University published a paper proposing a deceptively simple idea: let language models alternate between reasoning and acting.

The paper was called ReAct — Re for Reasoning, Act for Acting. The core insight: reasoning and action should not be separated. Think before acting, observe after acting, think again after observing — repeat until the task is done.

OpenClaw's core engine is built on this idea:

User Input

  ┌─────────────────────────────────────────────┐
  │  ① Observe                                  │
  │     Receive the current state:              │
  │     · What the user sent                    │
  │     · What the tool returned last time      │
  │     · What the conversation history holds   │
  └───────────────────┬─────────────────────────┘

  ┌─────────────────────────────────────────────┐
  │  ② Think                                    │
  │     Hand context to the model for reasoning:│
  │     · Where has the task progressed?        │
  │     · What should happen next?              │
  │     · Call a tool, or reply to the user?    │
  └──────────────┬──────────────┬───────────────┘
                 │              │
           Call a tool     Task complete
                 │              │
                 ↓              ↓
  ┌──────────────────┐      Send to user
  │  ③ Act           │     (loop ends)
  │     Execute tool │
  │     Append result│
  │     to history   │
  └──────┬───────────┘


    Back to ① Observe
   (with new observations)

This loop has one critically important design detail: the result of every tool call is appended to the conversation history.

What does that mean? On the fifth round of reasoning, the model can "see" everything that happened in the previous four — which tools were used, what results came back, what errors were encountered. It's not guessing what happened; it's reading a complete action log and deciding the next step based on that log.

This is the fundamental fork in capability between an Agent and an ordinary LLM. An ordinary LLM performs a one-shot computation; the ReAct loop performs continuous exploration — each step stands on the shoulders of all previous steps, progressively converging on the goal.

Why Serial Execution

You might wonder: can't multiple tools run in parallel to speed things up?

No — at least not within the same task. The reason is causal consistency:

"First create a file, then write content into it"

If executed in parallel:
  Create file  →  not yet complete
  Write content  →  file doesn't exist, write fails  ✗

If executed serially:
  Create file  →  success  →  Write content  →  success  ✓

In real tasks, the meaning of operation B often depends on the result of operation A. An Agent isn't executing a script in a static environment — it's advancing work in a world that keeps changing because of its actions. Serial execution is the inevitable requirement of this dynamic world.

Where ReAct Sits Among Three Approaches

Before ReAct, two extreme approaches had been tried:

SchoolStrategyFatal Weakness
Plan-firstGenerate a complete plan upfront, then execute step by stepThe plan rests on assumptions about the future; hard to correct when the unexpected happens
React-onlyPerceive-act-perceive-act, no reasoning layerNo goal orientation; easily trapped in local optima
ReActEvery step is a complete observe-reason-act micro-loopSerial execution; cost of deep loops grows with history length

The flaw in the plan-first approach: if step 3's input depends on step 2's output, and step 2's result is completely unknown before execution, the plan is wishful thinking. The flaw in the react-only approach: no reasoning layer means no internal representation of the goal — the Agent can't tell whether it's getting closer to or drifting away from the target.

ReAct takes the middle road: act with the best current judgment, let each step's result validate or revise prior assumptions. This more closely mirrors how humans actually work through complex tasks.


III. How the Loop "Remembers"

The ReAct loop must span multiple steps to complete a task, which raises a critical question: how does the loop "know" at each step what it has already done?

The answer doesn't rely on the model's implicit memory — it relies on explicit conversation history.

Working Memory: Living in the History

After every tool call, the result is appended to the conversation history — like taking real-time notes for itself. On the tenth round of reasoning, the model can read everything that happened in the previous nine steps: which tools were used, what results came back, what errors were encountered, how those errors were handled.

This ever-growing chain of history is the loop's working memory.

This design has an important characteristic: complete transparency. You can open the history and see step by step what decision the Agent made at each point and why. No black box, no hidden state. Transparency isn't an added feature — it's a natural product of this architecture.

Three-Layer Memory Structure

Working memory has a physical limit — the model's context window. For tasks that span dozens of steps or multiple sessions, conversation history alone isn't enough. OpenClaw implements three layers of memory:

LayerTime ScopeStorageAnalogy
Working memoryCurrent sessionConversation history (in memory)Work files spread open on the desk
Long-term memoryAcross sessionsMEMORY.md (filesystem)A work notebook in the drawer
World knowledgeFetched on demandRead via toolsA library — you go when you need it

The core principle of three-layer memory: the right information appears at the right moment. Not stuffing all information into the context at once, but loading on demand. Just like you don't move the entire library to your desk when working — the desk holds the files for the current task; more resources are fetched when needed.

When History Gets Too Long: Context Compression

When a task is complex enough and the loop runs deep enough, conversation history keeps growing and eventually approaches the physical limit of the context window. OpenClaw's response is progressive compression:

Preventive compression (checked after each round):
  Context pressure exceeds threshold  →  Proactively compress early records into a summary  →  Free space for subsequent rounds

Overflow recovery (when the limit has already been exceeded):
  Immediately compress early messages into a summary, retain the most recent N rounds in full  →  Continue execution

Compression is information distillation: preserving "what happened" while compressing "how each step was done." The cost is losing fine-grained details from earlier steps; the gain is that the task can keep moving forward.

Proactive prevention is more elegant than reactive recovery — the task ends before the context is exhausted, rather than being forcibly interrupted by an overflow.


IV. How the Loop "Heals Itself"

Traditional programs throw exceptions when they encounter errors — reasonable in closed systems, where all possible error conditions can be enumerated upfront and handled with preset logic.

But an Agent works in an open world. External APIs time out, files may not exist, command output may have unexpected formats, user intent may reveal a mismatch only partway through execution — no one can enumerate all possible failure cases in advance.

ReAct's approach to errors is a philosophical reversal:

Traditional model:
  Execute  →  Error  →  Throw exception / crash / preset error handler  →  Terminate

ReAct model:
  Execute  →  Error  →  Error appended to conversation history

                        Next reasoning round "sees" the error
                        "That approach doesn't work, because X —
                         so I should try Y"

                        Change direction, keep moving

Errors are not triggers for program termination — they are signposts guiding the Agent toward the correct path.

The Agent doesn't need to enumerate all possible error conditions upfront; it only needs the reasoning ability to replan based on error information. This gives the Agent a robustness similar to how humans solve problems — not avoiding attempts out of fear of mistakes, but relying on the ability to learn from errors to progressively converge on the goal.

One easily overlooked detail: the quality of tool return values directly affects loop efficiency.

A clear error message — "File not found: path does not exist, current working directory is /home/user" — lets the Agent find the right direction immediately in the next round. A vague error — "Operation failed" — may require several rounds before the Agent can diagnose the true cause.

Tool designers need to recognize: a tool's return value is not just for the user to read — it's primarily for the model's reasoning in the next round.


V. How the Loop "Knows When to Stop"

An autonomous loop brings capability — and with it a new problem: when to stop?

Without a termination mechanism, an Agent might plunge endlessly in the wrong direction, consuming massive resources without realizing it. Or it might autonomously decide on a high-risk operation without obtaining user authorization. OpenClaw's answer is: constrained autonomy — not eliminating autonomy, but drawing clear boundaries around it.

Three Termination Conditions

Termination TypeTrigger ConditionAgent's Behavior
Normal terminationThe model determines the task is completeGenerate the final reply and send it to the user
Safety terminationA resource or time boundary is reachedReport current progress and request instructions
Abnormal terminationSustained failure with no recovery possibleHonestly report where it's stuck and why

Safety termination and abnormal termination are equally important. An Agent that can clearly say "I can't go further from here, because X" is far more trustworthy than one that silently gets stuck. An honest failure report lets the user adjust the task description, provide additional information, or choose a different direction — rather than facing an unresponsive black box.

Loop Detection: Recognizing "Spinning in Place"

There is a common form of runaway loop: the Agent repeats the same operations without making real progress. Tools keep failing but the Agent doesn't change direction, just retrying endlessly; or two operations cancel each other out, alternating back and forth while staying in the same place.

OpenClaw implements a loop detection mechanism — identifying repetitive behavior patterns with no progress, forcing a re-evaluation of strategy at the right moment, or requesting human intervention.

This is a safety net, not the primary flow-control mechanism. In the ideal case, the Agent's reasoning ability should self-identify and adjust before falling into a truly unproductive loop. When loop detection is triggered, it often signals that the task itself has some structural problem — either an inherent contradiction in the task definition, or an unreachable external resource required — the kind of situation that needs human intervention to redefine the problem.

Human Intervention: Knowing When to Stop and Ask

Reducing intervention does not mean eliminating it. Some operations are irreversible — deleting files, sending external notifications, submitting changes that cannot be rolled back — the cost of the Agent deciding unilaterally is too high.

The framework for judging whether human confirmation is needed is simple:

  • Is it reversible? Reading and analyzing are safe; deleting and sending externally require confirmation.
  • What is the scope? Modifying a test file and modifying core configuration are on completely different risk scales.
  • Is there enough information? When information is incomplete, asking is wiser than guessing.

OpenClaw's Ask mode (off / on-miss / always) codifies this judgment framework into configurable behavioral rules. The recommendation is to start conservatively, and as you build trust in the Agent's behavior, gradually loosen the constraints — progressively building trust is more robust than delegating everything from the start.


Summary

The ReAct loop answers a fundamental question: how do you evolve AI from "answer once" to "see a task through to completion"?

Not by making the model smarter, but through an architectural design: turning the result of every action into fuel for the next thought, turning every error into a signpost for adjusting direction, and stringing the entire process into a traceable action log.

Core Loop CapabilityHow It's Implemented
Multi-step sustained progressResults appended to history; each step decides based on that foundation
Cross-step state memoryThree-layer memory structure: working / long-term / world knowledge
Error self-healingErrors are observations; failure is information, not termination
Controlled autonomyTermination conditions + loop detection + human intervention timing

This loop is the operational foundation for OpenClaw's other five pillars. The prompt system equips it with identity and rules; the tool system provides the hands and feet for interacting with the world; the message loop manages its concurrent scheduling; the unified gateway receives input from every channel; the security sandbox vets every tool call.

One message comes in, six pillars work in concert, the ReAct loop turns — this is what a "digital living system" actually looks like in operation.


Chapter 3: The Prompt System

Licensed under CC BY-NC-SA 4.0