Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🦥 Sloths can hold their breath longer than dolphins 🐬.

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🐙 Octopuses have three hearts and blue blood.

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

AI-AgenticAI

Error Analysis for Agentic AI

Learn how to systematically diagnose, measure, and improve failures in Agentic AI systems using error analysis. Discover how traces, component-level evaluations, root cause analysis, and observability help identify bottlenecks and drive continuous improvement in AI agent performance.

Artificial Intelligence

Agentic AI

AI Agents

Error Analysis

Evaluation

Observability

← Previous

Error Analysis in Agentic AI

Tool Use in Agentic AI

Error Analysis for Agentic AI 🔍

Error classification — the first decision

Everything downstream depends on classifying the error correctly, because the wrong classification wastes retries on permanent failures and gives up too early on transient ones.

1. Transient errors ⚠️

Temporary conditions the system will recover from without any change:

Network timeouts
503 Service Unavailable
Temporary resource exhaustion.

Handling

These should be retried.

flowchart TD

    Error["Transient Error ⚠️"] --> Backoff["Exponential Backoff with Jitter"]
    Backoff--> Final["Success ✅ or Fail ❌"]

Retries with exponential backoff and jitter

The retry pattern protects against occasional failures.

The formula is:

wait = min(cap, base × 2^ⁿ) + random(0, base)

Where

base = 1s
cap = 30s
max_retries = 3.

The min(cap, ...) prevents the wait from growing unboundedly.

The + random(...) is jitter — critically important in systems where multiple agents retry simultaneously.

Without jitter, all agents back off to the same interval and then hammer the service in a thundering herd at exactly the same moment.

Jitter spreads them out. A typical configuration:

Caution

Never retry without a budget. max_retries is non-negotiable

An unbounded retry loop will starve other work and can take down a service more effectively than the original failure did.

1.1 Rate-limit errors (`429`)

API Limit Exceeded

Special case: they are transient, but the retry timing is dictated by the server via a Retry-After header, not by your backoff formula.

Handling

Always honour Retry-After header to retry API call

2. Permanent errors 🚨

Indicate a fundamental problem with the request itself

404 Not Found
400 Bad Request
401 Unauthorized
schema validation failures.

Retrying this is pointless and wastes budget.

Handling

Log, skip, and try a fallback or escalate.

flowchart TD

    Error["Permanent errors 🚨"] --> Log["Log Failure"]--> Final["fallback or escalate ❌"]

3. Circuit breaker ❗

The circuit breaker protects against sustained outages.

The circuit breaker sits in front of every downstream dependency call and tracks the failure rate in a rolling time window.

Why it exists

Without Circuit breaker, every incoming request triggers retries against a downed service, consuming threads, connections, and budget, and potentially causing cascading failure in upstream systems that are waiting for responses.

stateDiagram-v2
    [*] --> CLOSED

    CLOSED --> OPEN : Failure rate exceeds threshold
    OPEN --> HALF_OPEN : Recovery timeout expires

    HALF_OPEN --> CLOSED : Probe request succeeds
    HALF_OPEN --> OPEN : Probe request fails

    state CLOSED {
        [*] --> Healthy
        Healthy : Requests pass through
    }

    state OPEN {
        [*] --> FastFail
        FastFail : Reject requests immediately
    }

    state HALF_OPEN {
        [*] --> Probe
        Probe : Allow limited probe requests
    }

When failures exceed a threshold, it flips to OPEN and fast-fails all subsequent calls immediately — no actual network request made.

After a configured timeout, it enters HALF-OPEN and allows one probe request through.

If the probe succeeds, it resets to CLOSED.
If it fails, it returns to OPEN and resets the timer.

flowchart TD

    Request[Incoming Request]

    Request --> CB{Circuit State?}

    CB -->|CLOSED| Service[Call Downstream Service]

    Service -->|Success| Success[Return Response]

    Service -->|Failure| FailureCounter[Increment Failure Counter]

    FailureCounter --> Threshold{Threshold Reached?}

    Threshold -->|No| Error[Return Error]
    Threshold -->|Yes| Open[Open Circuit]

    CB -->|OPEN| FastFail[Fast Fail Immediately]

    Open --> Timer[Start Recovery Timer]

    Timer --> HalfOpen[HALF-OPEN]

    HalfOpen --> Probe[Allow Limited Probe Request]

    Probe --> ProbeResult{Probe Success?}

    ProbeResult -->|Yes| CloseCircuit[Close Circuit]
    ProbeResult -->|No| ReopenCircuit[Reopen Circuit]

    CloseCircuit --> Success
    ReopenCircuit --> Timer

Handling

You need one circuit breaker per downstream dependency, never a shared instance.

If service A and service B both fail, they should trip their own breakers independently.

6. Fallback chains — degrading gracefully ⛔

A fallback chain encodes the hierarchy of what to try when each level fails.

The key design principles:

The fallback must be genuinely useful.

Design each tier to return something a user can act on when Error happen.

A fallback that returns an empty response or throws a different error is not a fallback, it is a delayed failure.

Transparency is mandatory.

If you returned cached data that may be stale, say so. If you fell back to a weaker model, say so.

Silent degradation is a trust violation — the user believes they got a primary-quality response when they did not.

The final fallback must always succeed.

The bottom of the chain is a safe default — a static template, a human escalation alert, a "service temporarily unavailable" message.

This tier should have no external dependencies and never throw.

How Top Teams Decide What to Fix Next

One of the biggest challenges in building agentic systems is not creating the first version.

It is improving it.

Almost every AI engineer has experienced this:

graph TD
    A[Build Workflow 🤖]
    --> B[Test Workflow 🔍]

    B --> C[Disappointing Results 👎🏻]

    C --> D[Now What?]

The problem is that agentic workflows contain many moving parts.

Without a systematic process, teams often spend weeks optimizing the wrong component.

This is where Error Analysis becomes one of the most valuable skills in Agentic AI engineering.

Why Guessing Fails 🤔

Many teams optimize based on intuition:

"This feels like a prompt issue."

"This feels like a retrieval issue."

"This feels like the model is weak."

Sometimes they are right.

Often they are not.

The danger is spending months improving a component that contributes very little to overall performance.

The Engineering Mindset 🎯

Strong AI teams think like performance engineers.

Instead of asking:

What can we improve?

They ask:

What should we improve?

Those are very different questions.

The first creates endless work.

The second creates measurable progress.

Traces: The X-Ray of Agentic Systems

To diagnose failures, we need visibility into intermediate outputs.

These intermediate outputs are called:

Trace 🧾

A trace contains every step executed by an agent.

One of the most valuable debugging techniques is trace inspection.

A trace records:

prompts
intermediate reasoning
tool calls
retrieval outputs
state transitions
memory updates

Example:

{
  "query": "Recent developments in black hole science",

  "search_terms": [
    "event horizon telescope discoveries",
    "black hole imaging research"
  ],

  "search_results": [
    "https://astrokidnews.com/...",
    "https://spacefunblog.com/..."
  ],

  "selected_sources": [
    "...",
    "..."
  ],

  "summary": "..."
}

Instead of examining only the final answer, we inspect every intermediate step.

Span vs Trace

Two terms frequently appear in observability systems.

Span

A single step.

Example:

Generate Search Terms
Fetch Web Results

Trace

The complete execution path.

Example: Search Workflow:

graph LR
    A[Search Terms]
    --> B[Search Results]
    --> C[Selected Sources]
    --> D[Summary]

A trace is simply a collection of spans.

The Error Analysis Flywheel

The best teams repeatedly execute:

graph TD
    A[Build 🤖]
    --> B[Observe 👀]

    B --> C[Trace Analysis 🔎]

    C --> D[Identify Bottleneck ⚠️]

    D --> E[Improve Component 🔨]

    E --> F[Measure Again 📋]

    F --> A

Each iteration makes the system incrementally better.

Error Analysis Workflow

A structured approach looks like this:

graph TD
    A[Bad Final Output 👎🏻]
    --> B[Inspect Trace 🔍]

    B --> C[Identify Weak Component ⚠️]

    C --> D[Count Frequency 📋]

    D --> E[Prioritize Fixes 📝]

    E --> F[Improve System 📈]

This turns debugging into a data-driven process.

Practical Example

Suppose we ask a research agent:

Write a report on recent developments in black hole science.

The generated report misses several important discoveries.

At first glance:

Output Quality = Poor

But that tells us nothing about why.

The root cause could exist anywhere in the workflow.


graph TD
    A[User Query] --> B[Generate Search Terms] 
    B --> C[Web Search] 
    C --> D[Select Best Sources] 
    D --> E[Fetch Documents] 
    E --> F[Summarize] 
    F --> G[Generate Report]

Suppose our agent generated:

1. Search Terms

Black hole theories Einstein
Event Horizon Telescope radio

Question:

Would a human expert use these search terms?

If yes:

Search Term Generation = Good

Move on.

2. Search Results

Returned URLs:

AstroKidNews
SpaceFunBlog
SpaceBot2000

Question:

Would a human researcher use these sources?

Probably not.

A human would likely prefer:

Nature
Science
arXiv
NASA
ESA

Now we have a clue.

Search Results = Weak

The problem may not be the search terms.

The problem may be the search engine or ranking strategy.

Focus Only on Failures

One subtle but important recommendation:

Do not waste time analyzing successful runs.

Suppose:

Run	Result
1	Good
2	Good
3	Poor
4	Good
5	Poor

Focus on:

Run 3
Run 5

These contain the information you need.

This is why it is called:

Error Analysis

The goal is understanding failure modes.

Build an Error Analysis Spreadsheet

One of the simplest and most effective tools is Excel or Google Sheets.

Example:

Query	Search Terms	Search Results	Source Selection	Final Output
Black Holes	Good	Bad	Good	Bad
Seattle Housing	Good	Bad	Good	Bad
Fruit Harvesting Robots	Bad	Bad	Bad	Bad

Now count failures.

Example Statistics

Component	Error Rate
Search Terms	5%
Search Results	45%
Source Selection	10%
Summarization	8%

This immediately tells us:

Search Results are the largest source of failure.

Quantifying Error Rates

If:

100 traces analyzed
45 traces contain poor search results

Then:

ErrorRate_{search} = \frac{45}{100}

This provides objective evidence.

Instead of:

"I think search is the issue."

you can say:

"Search contributes to 45% of failures."

Prioritization Matrix

Not every problem deserves immediate attention.

A useful framework is:

Component	Error Rate	Easy to Fix?
Search Terms	5%	Yes
Search Results	45%	Yes
Summarization	8%	No

Prioritize:

High Error Rate
+
Easy Improvement

This often delivers the largest gains.

Example Improvements

After identifying weak search results, possible fixes include:

1. Better Search Provider

Replace:

Search Engine A

with:

Search Engine B

2. Improved Ranking

Add:

rerank_results()

before source selection.

3. Domain Filtering

Restrict searches to:

nature.com
science.org
arxiv.org
nasa.gov

These targeted improvements are only possible because error analysis revealed the bottleneck.

Error Analysis vs Prompt Engineering

Many beginners immediately modify prompts.

But consider:

Bad Search Results

Will a better summarization prompt help?

Probably not.

The summarizer can only work with the information it receives.

Error analysis prevents optimization in the wrong place.

Final Thoughts

Agentic systems contain many components:

planners
retrievers
search engines
evaluators
memory systems
tool callers
generators

When performance is poor, almost any component could be responsible.

Error analysis provides a systematic way to identify:

Which component is failing
How often it fails
Whether it is worth fixing

Without error analysis:

Optimization = Guessing

With error analysis:

Optimization = Evidence\ Based\ Engineering

And that distinction often determines whether an AI team improves a system in days or spends months optimizing the wrong thing.

Handling Failures

Retries

For architectural justification retries handle the common case of transient failures in distributed systems.

Circuit breakers

Circuit breakers prevent the retry pattern from amplifying sustained outages into cascading failures across the system.

Fallback

Fallback chains ensure the system degrades to reduced but functional capability rather than complete unavailability.

Together they form a defence-in-depth strategy: retries for noise, circuit breakers for sustained outages, fallbacks for the cases where recovery isn't possible within the task's time budget.

All three should be present in any production agent system that calls external services.

Written by Hitesh Sahu, a passionate developer and blogger.

Sun May 31 2026

Share This on

← Previous

Error Analysis in Agentic AI

Tool Use in Agentic AI

AI-AgenticAI/3-2-Agent-Error-Analysis

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🦥 Sloths can hold their breath longer than dolphins 🐬.

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

AI-AgenticAI

Error Analysis for Agentic AI

Learn how to systematically diagnose, measure, and improve failures in Agentic AI systems using error analysis. Discover how traces, component-level evaluations, root cause analysis, and observability help identify bottlenecks and drive continuous improvement in AI agent performance.

Artificial Intelligence

Agentic AI

AI Agents

Error Analysis

Evaluation

Observability

← Previous

Error Analysis in Agentic AI

Tool Use in Agentic AI

Error Analysis for Agentic AI 🔍

Error classification — the first decision

Everything downstream depends on classifying the error correctly, because the wrong classification wastes retries on permanent failures and gives up too early on transient ones.

1. Transient errors ⚠️

Temporary conditions the system will recover from without any change:

Network timeouts
503 Service Unavailable
Temporary resource exhaustion.

Handling

These should be retried.

flowchart TD

    Error["Transient Error ⚠️"] --> Backoff["Exponential Backoff with Jitter"]
    Backoff--> Final["Success ✅ or Fail ❌"]

Retries with exponential backoff and jitter

The retry pattern protects against occasional failures.

The formula is:

wait = min(cap, base × 2^ⁿ) + random(0, base)

Where

base = 1s
cap = 30s
max_retries = 3.

The min(cap, ...) prevents the wait from growing unboundedly.

The + random(...) is jitter — critically important in systems where multiple agents retry simultaneously.

Without jitter, all agents back off to the same interval and then hammer the service in a thundering herd at exactly the same moment.

Jitter spreads them out. A typical configuration:

Caution

Never retry without a budget. max_retries is non-negotiable

An unbounded retry loop will starve other work and can take down a service more effectively than the original failure did.

1.1 Rate-limit errors (`429`)

API Limit Exceeded

Special case: they are transient, but the retry timing is dictated by the server via a Retry-After header, not by your backoff formula.

Handling

Always honour Retry-After header to retry API call

2. Permanent errors 🚨

Indicate a fundamental problem with the request itself

404 Not Found
400 Bad Request
401 Unauthorized
schema validation failures.

Retrying this is pointless and wastes budget.

Handling

Log, skip, and try a fallback or escalate.

flowchart TD

    Error["Permanent errors 🚨"] --> Log["Log Failure"]--> Final["fallback or escalate ❌"]

3. Circuit breaker ❗

The circuit breaker protects against sustained outages.

The circuit breaker sits in front of every downstream dependency call and tracks the failure rate in a rolling time window.

Why it exists

stateDiagram-v2
    [*] --> CLOSED

    CLOSED --> OPEN : Failure rate exceeds threshold
    OPEN --> HALF_OPEN : Recovery timeout expires

    HALF_OPEN --> CLOSED : Probe request succeeds
    HALF_OPEN --> OPEN : Probe request fails

    state CLOSED {
        [*] --> Healthy
        Healthy : Requests pass through
    }

    state OPEN {
        [*] --> FastFail
        FastFail : Reject requests immediately
    }

    state HALF_OPEN {
        [*] --> Probe
        Probe : Allow limited probe requests
    }

When failures exceed a threshold, it flips to OPEN and fast-fails all subsequent calls immediately — no actual network request made.

After a configured timeout, it enters HALF-OPEN and allows one probe request through.

If the probe succeeds, it resets to CLOSED.
If it fails, it returns to OPEN and resets the timer.

flowchart TD

    Request[Incoming Request]

    Request --> CB{Circuit State?}

    CB -->|CLOSED| Service[Call Downstream Service]

    Service -->|Success| Success[Return Response]

    Service -->|Failure| FailureCounter[Increment Failure Counter]

    FailureCounter --> Threshold{Threshold Reached?}

    Threshold -->|No| Error[Return Error]
    Threshold -->|Yes| Open[Open Circuit]

    CB -->|OPEN| FastFail[Fast Fail Immediately]

    Open --> Timer[Start Recovery Timer]

    Timer --> HalfOpen[HALF-OPEN]

    HalfOpen --> Probe[Allow Limited Probe Request]

    Probe --> ProbeResult{Probe Success?}

    ProbeResult -->|Yes| CloseCircuit[Close Circuit]
    ProbeResult -->|No| ReopenCircuit[Reopen Circuit]

    CloseCircuit --> Success
    ReopenCircuit --> Timer

Handling

You need one circuit breaker per downstream dependency, never a shared instance.

If service A and service B both fail, they should trip their own breakers independently.

6. Fallback chains — degrading gracefully ⛔

A fallback chain encodes the hierarchy of what to try when each level fails.

The key design principles:

The fallback must be genuinely useful.

Design each tier to return something a user can act on when Error happen.

A fallback that returns an empty response or throws a different error is not a fallback, it is a delayed failure.

Transparency is mandatory.

If you returned cached data that may be stale, say so. If you fell back to a weaker model, say so.

Silent degradation is a trust violation — the user believes they got a primary-quality response when they did not.

The final fallback must always succeed.

The bottom of the chain is a safe default — a static template, a human escalation alert, a "service temporarily unavailable" message.

This tier should have no external dependencies and never throw.

How Top Teams Decide What to Fix Next

One of the biggest challenges in building agentic systems is not creating the first version.

It is improving it.

Almost every AI engineer has experienced this:

graph TD
    A[Build Workflow 🤖]
    --> B[Test Workflow 🔍]

    B --> C[Disappointing Results 👎🏻]

    C --> D[Now What?]

The problem is that agentic workflows contain many moving parts.

Without a systematic process, teams often spend weeks optimizing the wrong component.

This is where Error Analysis becomes one of the most valuable skills in Agentic AI engineering.

Why Guessing Fails 🤔

Many teams optimize based on intuition:

"This feels like a prompt issue."

"This feels like a retrieval issue."

"This feels like the model is weak."

Sometimes they are right.

Often they are not.

The danger is spending months improving a component that contributes very little to overall performance.

The Engineering Mindset 🎯

Strong AI teams think like performance engineers.

Instead of asking:

What can we improve?

They ask:

What should we improve?

Those are very different questions.

The first creates endless work.

The second creates measurable progress.

Traces: The X-Ray of Agentic Systems

To diagnose failures, we need visibility into intermediate outputs.

These intermediate outputs are called:

Trace 🧾

A trace contains every step executed by an agent.

One of the most valuable debugging techniques is trace inspection.

A trace records:

prompts
intermediate reasoning
tool calls
retrieval outputs
state transitions
memory updates

Example:

{
  "query": "Recent developments in black hole science",

  "search_terms": [
    "event horizon telescope discoveries",
    "black hole imaging research"
  ],

  "search_results": [
    "https://astrokidnews.com/...",
    "https://spacefunblog.com/..."
  ],

  "selected_sources": [
    "...",
    "..."
  ],

  "summary": "..."
}

Instead of examining only the final answer, we inspect every intermediate step.

Span vs Trace

Two terms frequently appear in observability systems.

Span

A single step.

Example:

Generate Search Terms
Fetch Web Results

Trace

The complete execution path.

Example: Search Workflow:

graph LR
    A[Search Terms]
    --> B[Search Results]
    --> C[Selected Sources]
    --> D[Summary]

A trace is simply a collection of spans.

The Error Analysis Flywheel

The best teams repeatedly execute:

graph TD
    A[Build 🤖]
    --> B[Observe 👀]

    B --> C[Trace Analysis 🔎]

    C --> D[Identify Bottleneck ⚠️]

    D --> E[Improve Component 🔨]

    E --> F[Measure Again 📋]

    F --> A

Each iteration makes the system incrementally better.

Error Analysis Workflow

A structured approach looks like this:

graph TD
    A[Bad Final Output 👎🏻]
    --> B[Inspect Trace 🔍]

    B --> C[Identify Weak Component ⚠️]

    C --> D[Count Frequency 📋]

    D --> E[Prioritize Fixes 📝]

    E --> F[Improve System 📈]

This turns debugging into a data-driven process.

Practical Example

Suppose we ask a research agent:

Write a report on recent developments in black hole science.

The generated report misses several important discoveries.

At first glance:

Output Quality = Poor

But that tells us nothing about why.

The root cause could exist anywhere in the workflow.


graph TD
    A[User Query] --> B[Generate Search Terms] 
    B --> C[Web Search] 
    C --> D[Select Best Sources] 
    D --> E[Fetch Documents] 
    E --> F[Summarize] 
    F --> G[Generate Report]

Suppose our agent generated:

1. Search Terms

Black hole theories Einstein
Event Horizon Telescope radio

Question:

Would a human expert use these search terms?

If yes:

Search Term Generation = Good

Move on.

2. Search Results

Returned URLs:

AstroKidNews
SpaceFunBlog
SpaceBot2000

Question:

Would a human researcher use these sources?

Probably not.

A human would likely prefer:

Nature
Science
arXiv
NASA
ESA

Now we have a clue.

Search Results = Weak

The problem may not be the search terms.

The problem may be the search engine or ranking strategy.

Focus Only on Failures

One subtle but important recommendation:

Do not waste time analyzing successful runs.

Suppose:

Run	Result
1	Good
2	Good
3	Poor
4	Good
5	Poor

Focus on:

Run 3
Run 5

These contain the information you need.

This is why it is called:

Error Analysis

The goal is understanding failure modes.

Build an Error Analysis Spreadsheet

One of the simplest and most effective tools is Excel or Google Sheets.

Example:

Query	Search Terms	Search Results	Source Selection	Final Output
Black Holes	Good	Bad	Good	Bad
Seattle Housing	Good	Bad	Good	Bad
Fruit Harvesting Robots	Bad	Bad	Bad	Bad

Now count failures.

Example Statistics

Component	Error Rate
Search Terms	5%
Search Results	45%
Source Selection	10%
Summarization	8%

This immediately tells us:

Search Results are the largest source of failure.

Quantifying Error Rates

If:

100 traces analyzed
45 traces contain poor search results

Then:

ErrorRate_{search} = \frac{45}{100}

This provides objective evidence.

Instead of:

"I think search is the issue."

you can say:

"Search contributes to 45% of failures."

Prioritization Matrix

Not every problem deserves immediate attention.

A useful framework is:

Component	Error Rate	Easy to Fix?
Search Terms	5%	Yes
Search Results	45%	Yes
Summarization	8%	No

Prioritize:

High Error Rate
+
Easy Improvement

This often delivers the largest gains.

Example Improvements

After identifying weak search results, possible fixes include:

1. Better Search Provider

Replace:

Search Engine A

with:

Search Engine B

2. Improved Ranking

Add:

rerank_results()

before source selection.

3. Domain Filtering

Restrict searches to:

nature.com
science.org
arxiv.org
nasa.gov

These targeted improvements are only possible because error analysis revealed the bottleneck.

Error Analysis vs Prompt Engineering

Many beginners immediately modify prompts.

But consider:

Bad Search Results

Will a better summarization prompt help?

Probably not.

The summarizer can only work with the information it receives.

Error analysis prevents optimization in the wrong place.

Final Thoughts

Agentic systems contain many components:

planners
retrievers
search engines
evaluators
memory systems
tool callers
generators

When performance is poor, almost any component could be responsible.

Error analysis provides a systematic way to identify:

Which component is failing
How often it fails
Whether it is worth fixing

Without error analysis:

Optimization = Guessing

With error analysis:

Optimization = Evidence\ Based\ Engineering

And that distinction often determines whether an AI team improves a system in days or spends months optimizing the wrong thing.

Handling Failures

Retries

For architectural justification retries handle the common case of transient failures in distributed systems.

Circuit breakers

Circuit breakers prevent the retry pattern from amplifying sustained outages into cascading failures across the system.

Fallback

Fallback chains ensure the system degrades to reduced but functional capability rather than complete unavailability.

Together they form a defence-in-depth strategy: retries for noise, circuit breakers for sustained outages, fallbacks for the cases where recovery isn't possible within the task's time budget.

All three should be present in any production agent system that calls external services.

Written by Hitesh Sahu, a passionate developer and blogger.

Sun May 31 2026

Share This on

← Previous

Error Analysis in Agentic AI

Tool Use in Agentic AI

AI-AgenticAI/3-2-Agent-Error-Analysis

Fetching content, this won’t take long…

🦥 Sloths can hold their breath longer than dolphins 🐬.

Fetching content, this won’t take long…

🐙 Octopuses have three hearts and blue blood.

AI-AgenticAI

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

Error Analysis for Agentic AI

Learn how to systematically diagnose, measure, and improve failures in Agentic AI systems using error analysis. Discover how traces, component-level evaluations, root cause analysis, and observability help identify bottlenecks and drive continuous improvement in AI agent performance.

Error Analysis for Agentic AI 🔍

Error classification — the first decision

1. Transient errors ⚠️

Handling

Retries with exponential backoff and jitter

Caution

1.1 Rate-limit errors (429)

Handling

2. Permanent errors 🚨

Handling

3. Circuit breaker ❗

Why it exists

Handling

6. Fallback chains — degrading gracefully ⛔

The fallback must be genuinely useful.

Transparency is mandatory.

The final fallback must always succeed.

How Top Teams Decide What to Fix Next

Why Guessing Fails 🤔

The Engineering Mindset 🎯

Traces: The X-Ray of Agentic Systems

Trace 🧾

Span vs Trace

Span

Trace

The Error Analysis Flywheel

Error Analysis Workflow

Practical Example

1. Search Terms

2. Search Results

Focus Only on Failures

Build an Error Analysis Spreadsheet

Example Statistics

Quantifying Error Rates

Prioritization Matrix

Example Improvements

1. Better Search Provider

2. Improved Ranking

3. Domain Filtering

Error Analysis vs Prompt Engineering

Final Thoughts

Handling Failures

Retries

Circuit breakers

Fallback

Written by Hitesh Sahu, a passionate developer and blogger.

Fetching content, this won’t take long…

🦥 Sloths can hold their breath longer than dolphins 🐬.

AI-AgenticAI

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

1.1 Rate-limit errors (`429`)

1.1 Rate-limit errors (`429`)