Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍌 Bananas are berries, but strawberries are not.

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🦥 Sloths can hold their breath longer than dolphins 🐬.

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

AI-AgenticAI

Understanding Agentic AI Memory

Learn how memory enables AI agents to retain context, recall past interactions, access knowledge, and execute complex tasks across sessions. Explore working, episodic, semantic, procedural, retrieval, and shared memory patterns used in modern agentic AI systems.

Artificial Intelligence

Agentic AI

AI Agents

Agent Memory

Large Language Models

Generative AI

← Previous

Understanding Agentic AI Workflows

Evaluating Agentic AI Systems

Types of Agentic Memory 🧠

Agentic memory is typically divided into several categories based on what is stored, how long it is retained, and how it is used during reasoning.

Memory Hierarchy in Agentic Systems

graph TD
    A[Agent Memory]

    A --> B["Short-Term Memory 💾"]
    A --> C["Long-Term Memory 🛢"]

    C --> D["Episodic Memory 📅"]
    C --> E["Semantic Memory 📚"]
    C --> F["Procedural Memory 🚗"]

    D --> G["Retrieval Layer 📁"]
    E --> G
    F --> G

    G --> H["Vector Database ↗"️]

The Three memory types for Agent

Dimension	Short-term 💾	Long-term 🛢	RAG 📁
Location	`Context window` (in-model attention)	External memory store (`DB`, `KV store`, `graph`, profile store)	External `vector DB`, `search index`, documents
Scope	Current conversation/session	Cross-session user, agent, or workflow memory	Knowledge corpus (documents, manuals, code, web pages)
Capacity	Limited by context length	Small to medium (important facts, summaries, preferences)	Effectively unlimited
Access	Automatic, always attended	Explicit read/write operations	Retrieval at inference time
Latency	Zero additional latency	One or more storage lookups	Embedding + retrieval latency
Persistence	Volatile: Lost when context expires	Persistent	Persistent
Typical Content	Current reasoning state, recent messages	User preferences, agent experiences, task history	Reference knowledge and source documents
Update Frequency	Every turn	Event-driven writes	Re-indexing / document updates
Staleness Risk	None while in context	Can become outdated	Depends on corpus freshness
Failure Mode	Context overflow	Memory not written or recalled	Retrieval misses, poor chunking, low recall

Real Agent Memory usage

graph TD
    A["User Query ❓"]

    A --> B["Working Memory 💾"]

    B --> C["Reasoning Engine 🧠"]

    C --> D["Retrieval Layer 📁"]

    D --> E["Vector Database ↗️"]
    D --> F["Knowledge Graph 🔗"]
    D --> G["Document Store 🗃"]

    E --> H["Semantic Memory  📚"]
    F --> H
    G --> H

    C --> I["Episodic Memory 📅"]
    C --> J["Procedural Memory 🚗"]

    H --> C
    I --> C
    J --> C

    C --> K["Tool Calls 🧰"]
    C --> L["Agent Response 💬"]

Details of Agentic Memory Types

1. Short-Term Memory(`STM`) 💾 : Working Memory

Stores information needed for the current task or conversation.

STM is everything currently inside the model's context window
Limited by context length limit — typically 8K to 200K+ tokens depending on the model.

Characteristics

Session-scoped: it resets entirely when the session ends
Zero lookup latency: Fast access
Temporary: An agent with only short-term memory forgets everything between sessions.
Usually stored in context window

Use Cases

Chatbots
Task execution
Multi-turn conversations

Examples

User: Book a flight to Berlin

Memory:
- Departure: Munich
- Date: 15 July
- Airline preference: Lufthansa

2. Long-Term Memory (LTM) 🛢

Long-term memory is an external store that survives across sessions.

The agent must decide, mid-session, which pieces of the current context are worth persisting

Common strategies are:

Summarise the conversation at session end and store the summary
Extract structured facts (name, preference, decision) as key-value pairs
Store the raw history up to a depth limit and evict older entries.

Typically, stored externally:

Vector DB
Knowledge Graph
SQL Database
Object Storage eg. S3

Examples

User preferences
Historical interactions
Enterprise documents
Learned workflows

LTM is an umbrella category that includes:

Episodic memory
Semantic memory
Procedural memory

2.1 Episodic Memory : Past Experiences 📅

Stores past experiences and interactions.

Inspired by human memory.

Characteristics

Experience-based
Time-oriented
Helps personalization user experience
Learns from previous interactions

Use Cases

Personal assistants
Customer support agents
Learning agents

Examples

Episode:
User asked about Kubernetes
Agent recommended EKS
User preferred self-hosted cluster

Later:

Agent remembers:
"Last time you preferred self-managed Kubernetes."

2.2 Semantic Memory : Knowledge 📚

Stores facts and knowledge.

Characteristics

Fact-based
Not tied to specific events
Usually stored in:
- Vector databases
- Knowledge graphs
- Knowledge bases

Use Cases

RAG systems
Enterprise knowledge assistants
Search agents

Examples

Berlin is the capital of Germany.

Docker is a container platform.

TensorRT optimizes LLM inference.

Episodic vs Semantic Memory

Episodic	Semantic
"User asked about Java yesterday"	"Java was released in 1995"
Experience	Fact
Event-based	Knowledge-based
Personalized	General knowledge

2.3 Procedural Memory 🚗 : Process / Skill

Stores how to perform tasks.

Characteristics

Skill-based
Process-oriented
Often encoded as workflows or plans

Use Cases

Workflow automation
Multi-agent orchestration
Business process agents

Examples

How to deploy Kubernetes:

1. Create cluster
2. Configure networking
3. Deploy workloads
4. Monitor health

Workflow:
Validate input
→ Query API
→ Format result
→ Return response

3. Retrieval Memory 📁 : RAG

Treats a large external corpus as memory that is too big to fit in the context window. Instead of loading it all, the agent queries it and injects only the relevant chunks at inference time.

Memory retrieved dynamically when needed.

RAG + Agents

RAG systems provide external memory.

Agentic systems add:

Planning
Reasoning
Dynamic retrieval
Adaptive execution

A modern research agent may:

Generate search queries
Retrieve documents
Rank relevance
Summarize findings
Detect knowledge gaps
Retrieve additional context
Revise conclusions

This creates recursive information acquisition loops.

RAG Pipeline has two separable phases:

1. Indexing (offline)

Documents are chunked into passages (typically 200–500 tokens)
Each chunk is passed through an embedding model to produce a dense vector
All vectors are stored in a vector database (Pinecone, Chroma, FAISS, etc.).

2. Retrieval (online, per query)

The user's query (or a reformulated version of it) is embedded using the same embedding model.
The vector database performs approximate nearest-neighbour (ANN) search to find the top-k most semantically similar chunks.
These chunks are injected into the context window — they become short-term memory for this inference step.

graph TD
    A[User Query]
    B[Generate Embedding]
    C["Vector Database ↗️"]
    D[Similarity Search]
    E[Top-K Retrieved Memories]
    F[Context Assembly]
    G[LLM Agent]
    H[Response]

    A --> B
    B --> D
    D --> C
    C --> E
    E --> F
    A --> F
    F --> G
    G --> H

Example

User:
What's my favorite programming language?

Retrieve:
"User prefers Java and Spring Boot."

This is the foundation of most RAG-based agents.

4. Shared Memory 🗂 : Multi-Agent Systems

Used when multiple agents collaborate.

All agents can access and update the same state.


graph TD

  User --> Orchestrator


  SharedMemory[("Shared Memory 🗂 <br/>Redis / Vector DB")]
  ResearchAgent["Research Agent 🤖"]
CodingAgent["Coding Agent 🤖"]
ValidationAgent["Validation Agent 🤖"]

Orchestrator --> ResearchAgent
Orchestrator --> CodingAgent
Orchestrator --> ValidationAgent

  ResearchAgent --> SharedMemory 
CodingAgent --> SharedMemory
ValidationAgent --> SharedMemory

Example

{
  "customer_id": 123,
  "issue": "payment failure",
  "status": "investigating"
}

5. Knowledge Graph (KG)

A knowledge graph is a directed, labelled graph where every fact is stored as a triple

Each fact is stored as a triple:

(subject, predicate, object)

Nodes = entities
Edges = relationships

Example

Hitesh ── lives_in ──> Munich
Hitesh ── interested_in ──> AWS
Hitesh ── preparing_for ──> NVIDIA Agentic AI Certification

The KG holds the structured relational world model — the stable, interconnected facts about entities and their relationships.

Subject     Predicate     Object

Hitesh      lives_in      Munich
Munich      located_in    Germany
AWS         provides      Bedrock`

This allows agents to perform multi-hop reasoning such as:

"Which cloud technologies is Hitesh likely interested in?"

Querying

A knowledge graph answers this by traversal: the query is expressed as a path pattern, and the graph engine follows typed edges to find the answer.

The result is not a chunk of text but a structured subgraph

Result subgraph is a set of entity–relation triples that can be serialised into natural language and injected into the agent's context as grounded, auditable facts.

Hitesh
  -> preparing_for -> NVIDIA Agentic AI
  -> interested_in -> AWS
  -> works_as -> Software Engineer

Usage

The KG sits alongside, not instead of, the other memory types.

Short-term context holds the live reasoning trace.
Long-term key-value memory holds user-specific episodic facts.

How KG outperform RAG

RAG retrieves text chunks by vector similarity — it finds passages that are semantically close to the query.

This works well for factual lookup ("what does the paper say about attention?")

but fails for relational reasoning across multiple hops ("who is the parent company of Ada's employer's acquirer?").

Knowledge Graph vs RAG

The graph answers "what is connected to what?"

The RAG system answers "show me the detailed documentation."

Feature	Knowledge Graph	RAG
Stores	Facts & relationships	Documents & chunks
Retrieval	Graph traversal	Semantic similarity
Best for	Multi-hop reasoning	Knowledge lookup
Explainability	High	Medium
Handles relationships	Excellent	Indirect
Handles unstructured text	Poor	Excellent

Stateful Orchestration for Multi-step Tasks

Stateful orchestration means the orchestrator maintains a persistent, serialisable state object that captures the full execution context.

Problem with stateless Orchestration

A stateless orchestrator would

re-derive everything from scratch on every LLM call
it has no memory of what steps have run, what results came back, or where it failed.
This breaks for any task longer than a single context window
you can't resume a failed step, you can't inspect progress, and
you can't safely hand control back to a human mid-task.

With State Object

At any moment you can pause, kill, resume, or audit the task by reading that object.

Agent store State in persistence DB


  state = {
    "goal": "original user intent",
    "plan": [
      "step1",
      "step2",
      "step3"
    ],
    "current_step": 2,
    "results": {},
    "errors": {
      "error": [],
      "retry_count": 0
    },
    "status": "running | paused | done |failed"
  }

State Have 5 Jobs

1. Durable memory across steps.

When step 2 needs the output of step 1, it reads it from state.results["step_1"] — not from the context window, which may have been truncated.

This decouples step execution from context length limits.

2. Step pointer and plan.

Current_step is the index into plan[]. After each step completes, the orchestrator increments the pointer and persists the new state before dispatching the next step.

This means a crash mid-execution loses at most one step — on restart, the orchestrator reads the state, sees the pointer, and resumes exactly where it left off.

3. Error tracking and retry budget.

Errors[] records every failed attempt with its reason. The orchestrator checks this before deciding whether to retry, skip, or escalate.

Without this, a retry loop could spin forever or give up too early.

4. Status machine.

The status field (running, paused, done, failed) is the canonical signal for all external parties — a dashboard, a human reviewer, a parent orchestrator — about what this task is doing.

Transitions are always explicit: an orchestrator can only move from running to paused by writing the state, never implicitly.

5. Checkpoint for human-in-the-loop.

When a step requires human approval, the orchestrator sets status = "paused" and stops.

The state is fully persisted. The human reviews, modifies if needed, and signals approval by updating the state and resuming. No context is lost.

Execution Flow

Load state from store
Decide: which step to run (or are we done / failed?)
Dispatch: call the tool or sub-agent for current_step
Write result into state.results, advance pointer, persist

flowchart TD
    A([User goal received]) --> B[Initialise state\ngoal · plan · step=0 · status=running]
    B --> C[(Persist state\nto store)]
    C --> D{All steps\ncomplete?}
    D -- No --> E[Load state]
    E --> F[Dispatch current_step\ntool call or sub-agent]
    F --> G{Success?}
    G -- Yes --> H[Write result to\nstate.results]
    H --> I[Advance step pointer]
    I --> J[(Persist state)]
    J --> K{Human approval\nneeded?}
    K -- No --> D
    K -- Yes --> L[Set status=paused\nPersist state]
    L --> M([Human reviews\ninspects · approves · overrides])
    M --> N[Set status=running\nPersist state]
    N --> D
    G -- No --> O{Retry\nbudget left?}
    O -- Yes --> P[Log error\nincrement retry count]
    P --> J
    O -- No --> Q[Set status=failed\nPersist state]
    Q --> R([Escalate or terminate])
    D -- Yes --> S[Set status=done\nPersist final state]
    S --> T([Return aggregated results])

    style A fill:#EEEDFE,stroke:#534AB7,color:#3C3489
    style T fill:#E1F5EE,stroke:#1D9E75,color:#085041
    style R fill:#FCEBEB,stroke:#A32D2D,color:#791F1F
    style M fill:#FAEEDA,stroke:#BA7517,color:#633806
    style C fill:#F1EFE8,stroke:#5F5E5A,color:#444441
    style J fill:#F1EFE8,stroke:#5F5E5A,color:#444441
    style L fill:#F1EFE8,stroke:#5F5E5A,color:#444441
    style N fill:#F1EFE8,stroke:#5F5E5A,color:#444441
    style Q fill:#F1EFE8,stroke:#5F5E5A,color:#444441
    style S fill:#F1EFE8,stroke:#5F5E5A,color:#444441

Stateful orchestration is the right choice when

The task has more steps than fit in one context window
Steps take significant wall-clock time (humans, external APIs) so crashes must be recoverable
Human review is required at intermediate checkpoints, or
Steps are expensive enough that re-running from scratch on failure is unacceptable.

Final Words

Finding Right Memory type is critical based on use case

Which memory type stores facts and knowledge?

Semantic Memory

Which memory type stores previous interactions?

Episodic Memory

Which memory type stores workflows and skills?

Procedural Memory

Which memory type holds current conversation context?

Working (Short-Term) Memory

Which memory type enables personalization from previous user interactions?

Episodic Memory

Written by Hitesh Sahu, a passionate developer and blogger.

Sun May 31 2026

Share This on

← Previous

Understanding Agentic AI Workflows

Evaluating Agentic AI Systems

AI-AgenticAI/2-2-Agent-Memory

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍌 Bananas are berries, but strawberries are not.

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

AI-AgenticAI

Understanding Agentic AI Memory

Learn how memory enables AI agents to retain context, recall past interactions, access knowledge, and execute complex tasks across sessions. Explore working, episodic, semantic, procedural, retrieval, and shared memory patterns used in modern agentic AI systems.

Artificial Intelligence

Agentic AI

AI Agents

Agent Memory

Large Language Models

Generative AI

← Previous

Understanding Agentic AI Workflows

Evaluating Agentic AI Systems

Types of Agentic Memory 🧠

Agentic memory is typically divided into several categories based on what is stored, how long it is retained, and how it is used during reasoning.

Memory Hierarchy in Agentic Systems

graph TD
    A[Agent Memory]

    A --> B["Short-Term Memory 💾"]
    A --> C["Long-Term Memory 🛢"]

    C --> D["Episodic Memory 📅"]
    C --> E["Semantic Memory 📚"]
    C --> F["Procedural Memory 🚗"]

    D --> G["Retrieval Layer 📁"]
    E --> G
    F --> G

    G --> H["Vector Database ↗"️]

The Three memory types for Agent

Dimension	Short-term 💾	Long-term 🛢	RAG 📁
Location	`Context window` (in-model attention)	External memory store (`DB`, `KV store`, `graph`, profile store)	External `vector DB`, `search index`, documents
Scope	Current conversation/session	Cross-session user, agent, or workflow memory	Knowledge corpus (documents, manuals, code, web pages)
Capacity	Limited by context length	Small to medium (important facts, summaries, preferences)	Effectively unlimited
Access	Automatic, always attended	Explicit read/write operations	Retrieval at inference time
Latency	Zero additional latency	One or more storage lookups	Embedding + retrieval latency
Persistence	Volatile: Lost when context expires	Persistent	Persistent
Typical Content	Current reasoning state, recent messages	User preferences, agent experiences, task history	Reference knowledge and source documents
Update Frequency	Every turn	Event-driven writes	Re-indexing / document updates
Staleness Risk	None while in context	Can become outdated	Depends on corpus freshness
Failure Mode	Context overflow	Memory not written or recalled	Retrieval misses, poor chunking, low recall

Real Agent Memory usage

graph TD
    A["User Query ❓"]

    A --> B["Working Memory 💾"]

    B --> C["Reasoning Engine 🧠"]

    C --> D["Retrieval Layer 📁"]

    D --> E["Vector Database ↗️"]
    D --> F["Knowledge Graph 🔗"]
    D --> G["Document Store 🗃"]

    E --> H["Semantic Memory  📚"]
    F --> H
    G --> H

    C --> I["Episodic Memory 📅"]
    C --> J["Procedural Memory 🚗"]

    H --> C
    I --> C
    J --> C

    C --> K["Tool Calls 🧰"]
    C --> L["Agent Response 💬"]

Details of Agentic Memory Types

1. Short-Term Memory(`STM`) 💾 : Working Memory

Stores information needed for the current task or conversation.

STM is everything currently inside the model's context window
Limited by context length limit — typically 8K to 200K+ tokens depending on the model.

Characteristics

Session-scoped: it resets entirely when the session ends
Zero lookup latency: Fast access
Temporary: An agent with only short-term memory forgets everything between sessions.
Usually stored in context window

Use Cases

Chatbots
Task execution
Multi-turn conversations

Examples

User: Book a flight to Berlin

Memory:
- Departure: Munich
- Date: 15 July
- Airline preference: Lufthansa

2. Long-Term Memory (LTM) 🛢

Long-term memory is an external store that survives across sessions.

The agent must decide, mid-session, which pieces of the current context are worth persisting

Common strategies are:

Summarise the conversation at session end and store the summary
Extract structured facts (name, preference, decision) as key-value pairs
Store the raw history up to a depth limit and evict older entries.

Typically, stored externally:

Vector DB
Knowledge Graph
SQL Database
Object Storage eg. S3

Examples

User preferences
Historical interactions
Enterprise documents
Learned workflows

LTM is an umbrella category that includes:

Episodic memory
Semantic memory
Procedural memory

2.1 Episodic Memory : Past Experiences 📅

Stores past experiences and interactions.

Inspired by human memory.

Characteristics

Experience-based
Time-oriented
Helps personalization user experience
Learns from previous interactions

Use Cases

Personal assistants
Customer support agents
Learning agents

Examples

Episode:
User asked about Kubernetes
Agent recommended EKS
User preferred self-hosted cluster

Later:

Agent remembers:
"Last time you preferred self-managed Kubernetes."

2.2 Semantic Memory : Knowledge 📚

Stores facts and knowledge.

Characteristics

Fact-based
Not tied to specific events
Usually stored in:
- Vector databases
- Knowledge graphs
- Knowledge bases

Use Cases

RAG systems
Enterprise knowledge assistants
Search agents

Examples

Berlin is the capital of Germany.

Docker is a container platform.

TensorRT optimizes LLM inference.

Episodic vs Semantic Memory

Episodic	Semantic
"User asked about Java yesterday"	"Java was released in 1995"
Experience	Fact
Event-based	Knowledge-based
Personalized	General knowledge

2.3 Procedural Memory 🚗 : Process / Skill

Stores how to perform tasks.

Characteristics

Skill-based
Process-oriented
Often encoded as workflows or plans

Use Cases

Workflow automation
Multi-agent orchestration
Business process agents

Examples

How to deploy Kubernetes:

1. Create cluster
2. Configure networking
3. Deploy workloads
4. Monitor health

Workflow:
Validate input
→ Query API
→ Format result
→ Return response

3. Retrieval Memory 📁 : RAG

Treats a large external corpus as memory that is too big to fit in the context window. Instead of loading it all, the agent queries it and injects only the relevant chunks at inference time.

Memory retrieved dynamically when needed.

RAG + Agents

RAG systems provide external memory.

Agentic systems add:

Planning
Reasoning
Dynamic retrieval
Adaptive execution

A modern research agent may:

Generate search queries
Retrieve documents
Rank relevance
Summarize findings
Detect knowledge gaps
Retrieve additional context
Revise conclusions

This creates recursive information acquisition loops.

RAG Pipeline has two separable phases:

1. Indexing (offline)

Documents are chunked into passages (typically 200–500 tokens)
Each chunk is passed through an embedding model to produce a dense vector
All vectors are stored in a vector database (Pinecone, Chroma, FAISS, etc.).

2. Retrieval (online, per query)

The user's query (or a reformulated version of it) is embedded using the same embedding model.
The vector database performs approximate nearest-neighbour (ANN) search to find the top-k most semantically similar chunks.
These chunks are injected into the context window — they become short-term memory for this inference step.

graph TD
    A[User Query]
    B[Generate Embedding]
    C["Vector Database ↗️"]
    D[Similarity Search]
    E[Top-K Retrieved Memories]
    F[Context Assembly]
    G[LLM Agent]
    H[Response]

    A --> B
    B --> D
    D --> C
    C --> E
    E --> F
    A --> F
    F --> G
    G --> H

Example

User:
What's my favorite programming language?

Retrieve:
"User prefers Java and Spring Boot."

This is the foundation of most RAG-based agents.

4. Shared Memory 🗂 : Multi-Agent Systems

Used when multiple agents collaborate.

All agents can access and update the same state.


graph TD

  User --> Orchestrator


  SharedMemory[("Shared Memory 🗂 <br/>Redis / Vector DB")]
  ResearchAgent["Research Agent 🤖"]
CodingAgent["Coding Agent 🤖"]
ValidationAgent["Validation Agent 🤖"]

Orchestrator --> ResearchAgent
Orchestrator --> CodingAgent
Orchestrator --> ValidationAgent

  ResearchAgent --> SharedMemory 
CodingAgent --> SharedMemory
ValidationAgent --> SharedMemory

Example

{
  "customer_id": 123,
  "issue": "payment failure",
  "status": "investigating"
}

5. Knowledge Graph (KG)

A knowledge graph is a directed, labelled graph where every fact is stored as a triple

Each fact is stored as a triple:

(subject, predicate, object)

Nodes = entities
Edges = relationships

Example

Hitesh ── lives_in ──> Munich
Hitesh ── interested_in ──> AWS
Hitesh ── preparing_for ──> NVIDIA Agentic AI Certification

The KG holds the structured relational world model — the stable, interconnected facts about entities and their relationships.

Subject     Predicate     Object

Hitesh      lives_in      Munich
Munich      located_in    Germany
AWS         provides      Bedrock`

This allows agents to perform multi-hop reasoning such as:

"Which cloud technologies is Hitesh likely interested in?"

Querying

A knowledge graph answers this by traversal: the query is expressed as a path pattern, and the graph engine follows typed edges to find the answer.

The result is not a chunk of text but a structured subgraph

Result subgraph is a set of entity–relation triples that can be serialised into natural language and injected into the agent's context as grounded, auditable facts.

Hitesh
  -> preparing_for -> NVIDIA Agentic AI
  -> interested_in -> AWS
  -> works_as -> Software Engineer

Usage

The KG sits alongside, not instead of, the other memory types.

Short-term context holds the live reasoning trace.
Long-term key-value memory holds user-specific episodic facts.

How KG outperform RAG

RAG retrieves text chunks by vector similarity — it finds passages that are semantically close to the query.

This works well for factual lookup ("what does the paper say about attention?")

but fails for relational reasoning across multiple hops ("who is the parent company of Ada's employer's acquirer?").

Knowledge Graph vs RAG

The graph answers "what is connected to what?"

The RAG system answers "show me the detailed documentation."

Feature	Knowledge Graph	RAG
Stores	Facts & relationships	Documents & chunks
Retrieval	Graph traversal	Semantic similarity
Best for	Multi-hop reasoning	Knowledge lookup
Explainability	High	Medium
Handles relationships	Excellent	Indirect
Handles unstructured text	Poor	Excellent

Stateful Orchestration for Multi-step Tasks

Stateful orchestration means the orchestrator maintains a persistent, serialisable state object that captures the full execution context.

Problem with stateless Orchestration

A stateless orchestrator would

re-derive everything from scratch on every LLM call
it has no memory of what steps have run, what results came back, or where it failed.
This breaks for any task longer than a single context window
you can't resume a failed step, you can't inspect progress, and
you can't safely hand control back to a human mid-task.

With State Object

At any moment you can pause, kill, resume, or audit the task by reading that object.

Agent store State in persistence DB


  state = {
    "goal": "original user intent",
    "plan": [
      "step1",
      "step2",
      "step3"
    ],
    "current_step": 2,
    "results": {},
    "errors": {
      "error": [],
      "retry_count": 0
    },
    "status": "running | paused | done |failed"
  }

State Have 5 Jobs

1. Durable memory across steps.

When step 2 needs the output of step 1, it reads it from state.results["step_1"] — not from the context window, which may have been truncated.

This decouples step execution from context length limits.

2. Step pointer and plan.

Current_step is the index into plan[]. After each step completes, the orchestrator increments the pointer and persists the new state before dispatching the next step.

This means a crash mid-execution loses at most one step — on restart, the orchestrator reads the state, sees the pointer, and resumes exactly where it left off.

3. Error tracking and retry budget.

Errors[] records every failed attempt with its reason. The orchestrator checks this before deciding whether to retry, skip, or escalate.

Without this, a retry loop could spin forever or give up too early.

4. Status machine.

The status field (running, paused, done, failed) is the canonical signal for all external parties — a dashboard, a human reviewer, a parent orchestrator — about what this task is doing.

Transitions are always explicit: an orchestrator can only move from running to paused by writing the state, never implicitly.

5. Checkpoint for human-in-the-loop.

When a step requires human approval, the orchestrator sets status = "paused" and stops.

The state is fully persisted. The human reviews, modifies if needed, and signals approval by updating the state and resuming. No context is lost.

Execution Flow

Load state from store
Decide: which step to run (or are we done / failed?)
Dispatch: call the tool or sub-agent for current_step
Write result into state.results, advance pointer, persist

flowchart TD
    A([User goal received]) --> B[Initialise state\ngoal · plan · step=0 · status=running]
    B --> C[(Persist state\nto store)]
    C --> D{All steps\ncomplete?}
    D -- No --> E[Load state]
    E --> F[Dispatch current_step\ntool call or sub-agent]
    F --> G{Success?}
    G -- Yes --> H[Write result to\nstate.results]
    H --> I[Advance step pointer]
    I --> J[(Persist state)]
    J --> K{Human approval\nneeded?}
    K -- No --> D
    K -- Yes --> L[Set status=paused\nPersist state]
    L --> M([Human reviews\ninspects · approves · overrides])
    M --> N[Set status=running\nPersist state]
    N --> D
    G -- No --> O{Retry\nbudget left?}
    O -- Yes --> P[Log error\nincrement retry count]
    P --> J
    O -- No --> Q[Set status=failed\nPersist state]
    Q --> R([Escalate or terminate])
    D -- Yes --> S[Set status=done\nPersist final state]
    S --> T([Return aggregated results])

    style A fill:#EEEDFE,stroke:#534AB7,color:#3C3489
    style T fill:#E1F5EE,stroke:#1D9E75,color:#085041
    style R fill:#FCEBEB,stroke:#A32D2D,color:#791F1F
    style M fill:#FAEEDA,stroke:#BA7517,color:#633806
    style C fill:#F1EFE8,stroke:#5F5E5A,color:#444441
    style J fill:#F1EFE8,stroke:#5F5E5A,color:#444441
    style L fill:#F1EFE8,stroke:#5F5E5A,color:#444441
    style N fill:#F1EFE8,stroke:#5F5E5A,color:#444441
    style Q fill:#F1EFE8,stroke:#5F5E5A,color:#444441
    style S fill:#F1EFE8,stroke:#5F5E5A,color:#444441

Stateful orchestration is the right choice when

The task has more steps than fit in one context window
Steps take significant wall-clock time (humans, external APIs) so crashes must be recoverable
Human review is required at intermediate checkpoints, or
Steps are expensive enough that re-running from scratch on failure is unacceptable.

Final Words

Finding Right Memory type is critical based on use case

Which memory type stores facts and knowledge?

Semantic Memory

Which memory type stores previous interactions?

Episodic Memory

Which memory type stores workflows and skills?

Procedural Memory

Which memory type holds current conversation context?

Working (Short-Term) Memory

Which memory type enables personalization from previous user interactions?

Episodic Memory

Written by Hitesh Sahu, a passionate developer and blogger.

Sun May 31 2026

Share This on

← Previous

Understanding Agentic AI Workflows

Evaluating Agentic AI Systems

AI-AgenticAI/2-2-Agent-Memory

Fetching content, this won’t take long…

🍌 Bananas are berries, but strawberries are not.

Fetching content, this won’t take long…

🦥 Sloths can hold their breath longer than dolphins 🐬.

AI-AgenticAI

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

Understanding Agentic AI Memory

Learn how memory enables AI agents to retain context, recall past interactions, access knowledge, and execute complex tasks across sessions. Explore working, episodic, semantic, procedural, retrieval, and shared memory patterns used in modern agentic AI systems.

Types of Agentic Memory 🧠

Memory Hierarchy in Agentic Systems

The Three memory types for Agent

Real Agent Memory usage

Details of Agentic Memory Types

1. Short-Term Memory(STM) 💾 : Working Memory

Characteristics

Use Cases

Examples

2. Long-Term Memory (LTM) 🛢

Examples

LTM is an umbrella category that includes:

2.1 Episodic Memory : Past Experiences 📅

Characteristics

Use Cases

Examples

2.2 Semantic Memory : Knowledge 📚

Characteristics

Use Cases

Examples

Episodic vs Semantic Memory

2.3 Procedural Memory 🚗 : Process / Skill

Characteristics

Use Cases

Examples

3. Retrieval Memory 📁 : RAG

RAG + Agents

RAG Pipeline has two separable phases:

1. Indexing (offline)

2. Retrieval (online, per query)

Example

4. Shared Memory 🗂 : Multi-Agent Systems

Example

5. Knowledge Graph (KG)

Querying

Usage

How KG outperform RAG

Knowledge Graph vs RAG

Stateful Orchestration for Multi-step Tasks

Problem with stateless Orchestration

With State Object

State Have 5 Jobs

1. Durable memory across steps.

2. Step pointer and plan.

3. Error tracking and retry budget.

4. Status machine.

5. Checkpoint for human-in-the-loop.

Execution Flow

Final Words

Which memory type stores facts and knowledge?

Which memory type stores previous interactions?

Which memory type stores workflows and skills?

Which memory type holds current conversation context?

Which memory type enables personalization from previous user interactions?

Written by Hitesh Sahu, a passionate developer and blogger.

Fetching content, this won’t take long…

🍌 Bananas are berries, but strawberries are not.

AI-AgenticAI

1. Short-Term Memory(`STM`) 💾 : Working Memory

1. Short-Term Memory(`STM`) 💾 : Working Memory