Agent Architectures
ReAct, plan-and-execute, reflection loops, multi-agent systems, and orchestration strategies.
The 80/20
Agent architectures define how LLMs interact with tools, plan actions, and coordinate multiple reasoning steps. Most production systems use one of four core patterns: ReAct for simple tool use, plan-and-execute for complex multi-step tasks, reflection loops for self-improvement, and multi-agent systems for specialized coordination.
The key insight is that different tasks require different architectural approaches. Simple queries work with basic ReAct, complex workflows need planning, and tasks requiring quality need reflection. Choose the simplest architecture that meets your requirements—more complex patterns add latency and failure modes.
The Agent Architecture Problem
Traditional LLMs are stateless—they receive a prompt, generate a response, and forget everything. But many real-world tasks require multiple steps, tool usage, memory, and the ability to recover from errors. Consider booking a flight:
- Search for flights on a specific date
- Compare prices across airlines
- Check seat availability
- Handle booking errors (sold out, payment issues)
- Confirm reservation details
This requires calling multiple APIs, maintaining state across steps, handling failures, and potentially replanning when things go wrong. A single LLM call can't handle this complexity.
graph TD
subgraph "Single LLM Call"
P[Prompt] --> L[LLM] --> R[Response]
end
subgraph "Agent Architecture"
P2[Task] --> A[Agent]
A --> T1[Tool Call 1]
A --> T2[Tool Call 2]
A --> T3[Tool Call 3]
T1 --> A
T2 --> A
T3 --> A
A --> R2[Final Result]
end
Agent architectures solve this by adding control flow, memory, and tool integration around the LLM. The architecture determines how the agent reasons, acts, and learns from its actions.
Core Agent Patterns
Four patterns handle the majority of agent use cases in production systems. Each makes different tradeoffs between simplicity, capability, and reliability.
ReAct (Reasoning + Acting)
ReAct alternates between reasoning (thinking about what to do) and acting (using tools or taking actions). The agent follows a simple loop: think, act, observe, repeat until the task is complete.
def react_loop(task, tools, max_steps=10):
context = f"Task: {task}"
for step in range(max_steps):
# Reasoning step
thought = llm.generate(f"{context}\nThought:")
# Acting step
action = llm.generate(f"{context}\nThought: {thought}\nAction:")
if action.startswith("FINISH"):
return action.replace("FINISH: ", "")
# Observation step
result = execute_tool(action, tools)
context += f"\nThought: {thought}\nAction: {action}\nObservation: {result}"
return "Task incomplete after maximum steps"
Example ReAct trace:
Task: What's the weather in San Francisco and should I bring an umbrella?
Thought: I need to check the current weather in San Francisco to see if it's raining or likely to rain.
Action: get_weather(location="San Francisco, CA")
Observation: Current weather in San Francisco: 72°F, partly cloudy, 10% chance of rain, wind 8 mph
Thought: The weather looks good with only a 10% chance of rain. I should provide a recommendation about the umbrella.
Action: FINISH: The weather in San Francisco is 72°F and partly cloudy with only a 10% chance of rain. You probably don't need an umbrella today, but it's always good to check the forecast again before heading out.
Strengths:
- Simple to implement and debug
- Works well for straightforward tool use
- Low latency for simple tasks
- Easy to add new tools
Weaknesses:
- No long-term planning
- Can get stuck in loops
- Inefficient for complex multi-step tasks
- Limited error recovery
ReAct works best for tasks that can be solved with 1-5 tool calls where each step naturally follows from the previous observation.
Plan-and-Execute
Plan-and-execute separates high-level planning from step-by-step execution. The agent first creates a complete plan, then executes each step, potentially replanning when things go wrong.
def plan_and_execute(task, tools):
# Planning phase
plan = llm.generate(f"""
Task: {task}
Available tools: {list(tools.keys())}
Create a step-by-step plan to complete this task:
""")
steps = parse_plan(plan)
results = []
# Execution phase
for i, step in enumerate(steps):
try:
result = execute_step(step, tools, results)
results.append(result)
except Exception as e:
# Replanning on failure
remaining_steps = steps[i:]
new_plan = replan(task, remaining_steps, results, str(e))
steps = steps[:i] + parse_plan(new_plan)
return synthesize_results(results)
def replan(task, failed_steps, completed_results, error):
return llm.generate(f"""
Original task: {task}
Completed so far: {completed_results}
Failed step: {failed_steps[0]}
Error: {error}
Create a new plan to complete the remaining task:
""")
Example Plan-and-Execute:
Task: Research and book a flight from NYC to London for next Friday
Plan:
1. Search for flights from NYC to London on [date]
2. Compare prices and flight times
3. Check baggage policies for top 3 options
4. Select best option based on price and convenience
5. Initiate booking process
6. Handle payment and confirmation
Execution:
Step 1: search_flights(origin="NYC", destination="London", date="2024-03-22")
Result: Found 15 flights, prices $450-$890
Step 2: compare_flights(top_n=3)
Result: British Airways $650 (direct), Virgin $580 (1 stop), Delta $720 (direct)
Step 3: get_baggage_policy(airlines=["British Airways", "Virgin", "Delta"])
Result: BA: 1 free bag, Virgin: 1 free bag, Delta: 1 free bag
Step 4: analyze_options()
Result: Virgin offers best value at $580 with acceptable 1-stop
Step 5: initiate_booking(flight_id="VS123", passenger_details=...)
Result: Booking initiated, payment required
Step 6: process_payment(booking_id="ABC123")
Result: Payment successful, confirmation #DEF456
Strengths:
- Handles complex multi-step tasks well
- Can replan when things go wrong
- More efficient than ReAct for complex workflows
- Clear separation of planning and execution
Weaknesses:
- Higher latency due to planning overhead
- Plans can become outdated quickly
- More complex to implement and debug
- May over-plan for simple tasks
Plan-and-execute excels at complex workflows with 5+ steps where upfront planning saves time and reduces errors.
Reflection Loops
Reflection loops add self-critique and improvement to agent behavior. The agent performs an action, evaluates the result, and iteratively improves until reaching a satisfactory outcome.
def reflection_loop(task, tools, max_iterations=3):
current_attempt = ""
for iteration in range(max_iterations):
# Generate or improve attempt
if iteration == 0:
current_attempt = llm.generate(f"Task: {task}\nSolution:")
else:
current_attempt = llm.generate(f"""
Task: {task}
Previous attempt: {current_attempt}
Critique: {critique}
Improved solution:
""")
# Self-critique
critique = llm.generate(f"""
Task: {task}
Solution: {current_attempt}
Critique this solution. What could be improved?
Rate quality 1-10 and explain issues:
""")
# Check if good enough
if extract_rating(critique) >= 8:
return current_attempt
return current_attempt
def extract_rating(critique_text):
# Extract numerical rating from critique
import re
match = re.search(r'(\d+)/10|(\d+) out of 10', critique_text)
return int(match.group(1) or match.group(2)) if match else 5
Example Reflection Loop:
Task: Write a professional email declining a job offer
Iteration 1:
Solution: "Thanks but no thanks. I found something better."
Critique: Rating 3/10. Too casual and potentially rude. Lacks professionalism, gratitude, and doesn't maintain relationships.
Iteration 2:
Solution: "Thank you for the offer. After careful consideration, I've decided to pursue another opportunity that better aligns with my career goals. I appreciate the time you invested in the process."
Critique: Rating 7/10. Much better tone and professionalism. Could add more specific gratitude and leave door open for future opportunities.
Iteration 3:
Solution: "Thank you so much for extending the offer for the [Position] role at [Company]. I truly appreciate the time you and your team invested in getting to know me throughout the interview process. After careful consideration, I've decided to accept another position that more closely aligns with my long-term career objectives. I was impressed by [Company] and hope we might have the opportunity to work together in the future."
Critique: Rating 9/10. Excellent professional tone, specific gratitude, maintains relationships.
Strengths:
- Improves output quality through iteration
- Self-correcting behavior
- Works well for creative and analytical tasks
- Can catch and fix its own mistakes
Weaknesses:
- High latency due to multiple LLM calls
- Expensive (3x+ the cost of single attempts)
- May over-optimize or get stuck in loops
- Critique quality depends on model capability
Reflection loops are ideal for high-stakes outputs where quality matters more than speed—writing, analysis, code review, and creative tasks.
Multi-Agent Systems
Multi-agent systems coordinate multiple specialized agents to handle complex tasks. Each agent has specific capabilities and they communicate to achieve shared goals.
class MultiAgentSystem:
def __init__(self):
self.agents = {
'researcher': ResearchAgent(),
'writer': WritingAgent(),
'critic': CriticAgent(),
'coordinator': CoordinatorAgent()
}
self.shared_memory = {}
def execute_task(self, task):
# Coordinator decides task breakdown
plan = self.agents['coordinator'].create_plan(task)
for step in plan.steps:
agent_name = step.assigned_agent
agent = self.agents[agent_name]
# Execute step with access to shared memory
result = agent.execute(step.instruction, self.shared_memory)
# Update shared memory
self.shared_memory[step.output_key] = result
# Allow other agents to react/provide feedback
if step.requires_review:
feedback = self.agents['critic'].review(result)
if feedback.needs_revision:
result = agent.revise(result, feedback.suggestions)
self.shared_memory[step.output_key] = result
return self.agents['coordinator'].synthesize_results(self.shared_memory)
class ResearchAgent:
def execute(self, instruction, shared_memory):
# Specialized for information gathering
return self.search_and_analyze(instruction)
class WritingAgent:
def execute(self, instruction, shared_memory):
# Specialized for content creation
research_data = shared_memory.get('research_results', '')
return self.write_content(instruction, research_data)
Example Multi-Agent Workflow:
Task: Create a comprehensive market analysis report for electric vehicles
Coordinator Plan:
1. Researcher: Gather market data, competitor analysis, trends
2. Researcher: Collect regulatory information and policy impacts
3. Writer: Create executive summary based on research
4. Writer: Write detailed analysis sections
5. Critic: Review report for accuracy and completeness
6. Writer: Revise based on feedback
7. Coordinator: Compile final report
Execution:
Researcher → Gathers EV market data ($X billion market, Y% growth)
Researcher → Finds policy info (tax incentives, emission regulations)
Writer → Creates executive summary using research data
Critic → Reviews: "Missing competitive positioning analysis"
Writer → Adds competitive analysis section
Coordinator → Compiles polished final report
Strengths:
- Handles very complex, multi-faceted tasks
- Agents can specialize and improve independently
- Parallel execution possible for some tasks
- Natural division of labor
Weaknesses:
- High complexity to implement and debug
- Coordination overhead and potential conflicts
- Expensive due to multiple agent calls
- Communication protocols can become complex
Multi-agent systems work best for complex projects requiring diverse skills—research reports, software development, content creation pipelines, and collaborative analysis.
Implementation Considerations
Context Management
All agent patterns must handle context growth as conversations extend. Long-running agents can exceed context windows quickly.
class ContextManager:
def __init__(self, max_tokens=8000):
self.max_tokens = max_tokens
self.conversation_history = []
def add_interaction(self, thought, action, observation):
interaction = {
'thought': thought,
'action': action,
'observation': observation,
'tokens': estimate_tokens(thought + action + observation)
}
self.conversation_history.append(interaction)
self._trim_if_needed()
def _trim_if_needed(self):
total_tokens = sum(i['tokens'] for i in self.conversation_history)
if total_tokens > self.max_tokens:
# Keep recent interactions, summarize older ones
recent = self.conversation_history[-5:] # Keep last 5
older = self.conversation_history[:-5]
summary = self._summarize_interactions(older)
self.conversation_history = [{'summary': summary}] + recent
Error Handling
Production agents need robust error handling for tool failures, API timeouts, and invalid responses.
def robust_tool_call(tool_name, params, max_retries=3):
for attempt in range(max_retries):
try:
return tools[tool_name](**params)
except ToolTimeout:
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
continue
return f"Tool {tool_name} timed out after {max_retries} attempts"
except ToolError as e:
return f"Tool error: {str(e)}"
except Exception as e:
if attempt < max_retries - 1:
continue
return f"Unexpected error: {str(e)}"
Cost Optimization
Agent architectures can be expensive due to multiple LLM calls. Optimize by caching, using smaller models for simple tasks, and batching when possible.
class CostOptimizedAgent:
def __init__(self):
self.cache = {}
self.small_model = "gpt-3.5-turbo" # For simple tasks
self.large_model = "gpt-4" # For complex reasoning
def choose_model(self, task_complexity):
if task_complexity < 0.5:
return self.small_model
return self.large_model
def cached_call(self, prompt, model):
cache_key = hash(prompt + model)
if cache_key in self.cache:
return self.cache[cache_key]
result = llm.generate(prompt, model=model)
self.cache[cache_key] = result
return result
When to Use Which Pattern
Decision Framework
graph TD
A[Agent Task] --> B{Number of Steps?}
B -->|1-3 steps| C{Quality Critical?}
B -->|4-10 steps| D{Complex Planning Needed?}
B -->|10+ steps| E{Multiple Specializations?}
C -->|No| F[ReAct]
C -->|Yes| G[Reflection Loop]
D -->|Yes| H[Plan-and-Execute]
D -->|No| F
E -->|Yes| I[Multi-Agent System]
E -->|No| H
Task-Pattern Mapping
| Task Type | Best Pattern | Why |
|---|---|---|
| Simple Q&A with tools | ReAct | Direct tool use, minimal steps |
| Data analysis | Reflection Loop | Quality matters, iterative improvement |
| Complex workflows | Plan-and-Execute | Multi-step coordination needed |
| Creative projects | Multi-Agent | Diverse skills (research, writing, critique) |
| Code generation | Reflection Loop | Quality critical, self-debugging |
| Research tasks | Plan-and-Execute | Structured information gathering |
| Real-time chat | ReAct | Low latency requirements |
Performance Characteristics
| Pattern | Latency | Cost | Quality | Complexity |
|---|---|---|---|---|
| ReAct | Low | Low | Medium | Low |
| Plan-and-Execute | Medium | Medium | High | Medium |
| Reflection Loop | High | High | Very High | Medium |
| Multi-Agent | Very High | Very High | Very High | Very High |
Common Challenges
Tool Selection and Routing
As agents gain access to more tools, choosing the right tool becomes critical. Poor tool selection leads to inefficient workflows and errors.
class ToolRouter:
def __init__(self, tools):
self.tools = tools
self.tool_descriptions = {
name: tool.description for name, tool in tools.items()
}
def select_tool(self, task, context):
prompt = f"""
Task: {task}
Context: {context}
Available tools:
{self.format_tool_descriptions()}
Which tool is most appropriate? Respond with just the tool name.
"""
selected = llm.generate(prompt).strip()
if selected not in self.tools:
# Fallback to similarity matching
selected = self.find_most_similar_tool(task)
return selected
Loop Detection and Prevention
Agents can get stuck in loops, repeatedly trying the same failed action. Implement loop detection and circuit breakers.
class LoopDetector:
def __init__(self, max_repeats=3):
self.action_history = []
self.max_repeats = max_repeats
def check_action(self, action):
self.action_history.append(action)
# Keep only recent history
if len(self.action_history) > 10:
self.action_history = self.action_history[-10:]
# Count recent repeats
recent_actions = self.action_history[-self.max_repeats:]
if len(set(recent_actions)) == 1 and len(recent_actions) == self.max_repeats:
return False # Loop detected
return True # Action allowed
State Management Across Steps
Complex agents need to maintain state across multiple interactions while avoiding context window overflow.
class AgentState:
def __init__(self):
self.working_memory = {} # Current task context
self.episodic_memory = [] # Past interactions
self.semantic_memory = {} # Learned facts/patterns
def update_working_memory(self, key, value):
self.working_memory[key] = value
# Prevent memory overflow
if len(str(self.working_memory)) > 4000: # ~1000 tokens
self._compress_working_memory()
def _compress_working_memory(self):
# Summarize older entries
summary = llm.generate(f"Summarize key points: {self.working_memory}")
self.working_memory = {'summary': summary}
Appendix: Additional Patterns
Memory Patterns
Episodic Memory - Store and retrieve past experiences for learning and context.
class EpisodicMemory:
def __init__(self):
self.episodes = []
def store_episode(self, situation, action, outcome, success):
episode = {
'situation': situation,
'action': action,
'outcome': outcome,
'success': success,
'timestamp': time.time()
}
self.episodes.append(episode)
def retrieve_similar(self, current_situation, k=3):
# Use embedding similarity to find relevant past episodes
similarities = []
for episode in self.episodes:
sim = cosine_similarity(
embed(current_situation),
embed(episode['situation'])
)
similarities.append((sim, episode))
return sorted(similarities, reverse=True)[:k]
Working Memory - Manage short-term context and attention.
class WorkingMemory:
def __init__(self, capacity=7): # Miller's magic number
self.items = []
self.capacity = capacity
def add_item(self, item, importance=1.0):
self.items.append({'content': item, 'importance': importance})
if len(self.items) > self.capacity:
# Remove least important item
self.items.sort(key=lambda x: x['importance'])
self.items = self.items[1:]
def get_context(self):
return [item['content'] for item in self.items]
Advanced Reasoning Patterns
Tree of Thoughts - Explore multiple reasoning paths simultaneously.
def tree_of_thoughts(problem, depth=3, breadth=3):
class ThoughtNode:
def __init__(self, thought, parent=None):
self.thought = thought
self.parent = parent
self.children = []
self.value = None
root = ThoughtNode("Initial problem analysis")
def expand_node(node, current_depth):
if current_depth >= depth:
return
# Generate multiple next thoughts
thoughts = llm.generate(f"""
Problem: {problem}
Current reasoning: {node.thought}
Generate {breadth} different next reasoning steps:
""").split('\n')
for thought in thoughts[:breadth]:
child = ThoughtNode(thought.strip(), node)
node.children.append(child)
expand_node(child, current_depth + 1)
expand_node(root, 0)
# Evaluate all leaf nodes and backpropagate
def evaluate_path(node):
if not node.children: # Leaf node
path = []
current = node
while current:
path.append(current.thought)
current = current.parent
score = llm.generate(f"""
Problem: {problem}
Reasoning path: {' -> '.join(reversed(path))}
Rate this reasoning path 1-10:
""")
return float(score.strip())
return max(evaluate_path(child) for child in node.children)
return evaluate_path(root)
Control Flow Patterns
State Machines - Explicit state management for complex agent behavior.
class AgentStateMachine:
def __init__(self):
self.state = 'IDLE'
self.transitions = {
'IDLE': ['PLANNING', 'RESPONDING'],
'PLANNING': ['EXECUTING', 'REPLANNING'],
'EXECUTING': ['PLANNING', 'COMPLETED', 'ERROR'],
'ERROR': ['PLANNING', 'COMPLETED'],
'COMPLETED': ['IDLE']
}
def transition(self, new_state, context=None):
if new_state in self.transitions[self.state]:
old_state = self.state
self.state = new_state
self._on_state_change(old_state, new_state, context)
else:
raise ValueError(f"Invalid transition from {self.state} to {new_state}")
def _on_state_change(self, old_state, new_state, context):
# Handle state-specific logic
if new_state == 'PLANNING':
self._create_plan(context)
elif new_state == 'EXECUTING':
self._execute_current_step()
Behavior Trees - Hierarchical decision structures from game AI.
class BehaviorNode:
def execute(self, context):
raise NotImplementedError
class SequenceNode(BehaviorNode):
def __init__(self, children):
self.children = children
def execute(self, context):
for child in self.children:
result = child.execute(context)
if result != 'SUCCESS':
return result
return 'SUCCESS'
class SelectorNode(BehaviorNode):
def __init__(self, children):
self.children = children
def execute(self, context):
for child in self.children:
result = child.execute(context)
if result == 'SUCCESS':
return result
return 'FAILURE'
class ActionNode(BehaviorNode):
def __init__(self, action_func):
self.action_func = action_func
def execute(self, context):
return self.action_func(context)
These additional patterns provide specialized solutions for complex agent behaviors, but the four core patterns (ReAct, Plan-and-Execute, Reflection Loops, Multi-Agent) handle the majority of production use cases. Choose the simplest pattern that meets your requirements, and consider these advanced patterns only when the core patterns prove insufficient.