Home/ Articles/ How will agent analytics be different?

Fundamentals

How will agent analytics be different?

Agents change analytics at every layer: hierarchical traces replace events, evals replace KPIs, and the UI becomes a prompt box backed by an analysis agent.

Timothy Daniell · Published December 10, 2025 · 8 min read

Key takeaways

Agent trace data is hierarchical — spans inside spans. Traditional event-property analytics flattens that context and loses the reasoning chain.
Evals are likely to replace KPIs as the primary measurement primitive for agent products, though the frameworks that map evals to business outcomes are still being worked out.
Token efficiency and quality-vs-cost trade-offs become first-class concerns because every agent run costs real money.
The analytics UI shifts from chart builders to a prompt box, with an analysis agent building charts and first-pass interpretations on your behalf.
Actionability gets easier: model-router decisions are constant experiments, evals can run offline as counterfactuals, and self-improving agents are on the near horizon.

I’m bullish on agents. And for analytics, agents change everything: the data we analyze, how we analyze it, and what we can do with the insights.

Think about software before the AI-era. It was built with navigation, screens, and buttons, in a way optimized for a human to do a job, like sending marketing emails, or contacting sales prospects.

If we assume a lot of that work will be replaced by an agent, the interface doesn’t need to look the same anymore (if it exists at all). The workflows triggered will also change. The value of products is no longer measured by how much useful work a human can do, but rather by how productive the agents can be. And the analysis process? Agents are there to help with that too.

And because agents work differently to humans, much of what we’ve figured out for how we analyze agents will be different to traditional analytics.

Let’s look at how this shift transforms the three core pillars of analytics: data engineering, generating insights, and taking action.

Data Engineering

The Data Looks Different

The most obvious thing about analytics data for agent products is that it will look very different.

Before we had users, sessions, events, and properties.

Now we have traces and spans, with metadata about models, tokens, and tools.

For some agents, there will be a human user with an associated session. For others there won’t.

A span with metadata resembles an event with properties in some ways, but the critical difference is hierarchy: traditional analytics flattens data into a linear list of events, destroying the parent-child context that explains why an agent took a specific step.

Insights at the Moment of Tracking

An advantage of agents, is that we can instruct them to annotate the data that they log. I believe new tactical patterns will emerge around this.

For example, a support agent that can’t answer a question, might be able to reflect on whether the information was missing, or the question was unclear. This can be sent along in the span for more informed inspection and analysis later.

Insights and Intelligence

Goodbye Pirates

The AARRR pirate metrics have been a reliable framework for thinking about KPIs and metrics for SaaS products. But they’re not a fit for agents.

Now the use cases will look more varied, and as such the metrics more nuanced.

Evals will be a first-class consideration, and might end up directly replacing the concept of a KPI. But how will they be mapped to business outcomes? Which will we focus on and why? I believe the long-term frameworks for thinking about Evals are yet to be established.

Model cost becomes a concern of the product builder. Infinite scaling of SaaS is gone - now every session spends dollars. So we’ll get used to looking at token efficiency, and quality/cost trade-offs.

Finally I believe “time” will make a comeback. Session length sucked as a KPI: it was unreliable, hard to interpret, and easy to game. But with agents, we genuinely want to understand how fast they are - because time costs money.

Sidenote: do “Safety” metrics become the responsibility of the analyst too?

Goodbye Funnels

Since the metrics have changed, and so has the data, it makes sense we’re going to be armed with a completely new suite of favourite chart types.

Funnels were great for linear user journeys, but less fitting for agentic traces.

Waterfalls are becoming established as the UI for trace viewing.

Clustering tables will be helpful in diagnosing categories of failure.

Pareto fronts will be there, when you need to examine the trade-off between eval scores and token cost.

Hello Analysis Agent

Here’s the good news. You’re not going to be alone getting to grips with all these new metrics and charts.

Because the UI isn’t going to be a chart builder anymore. Now you get a prompt box.

Your personal analysis agent will build your chart for you. It’ll build your dashboard. It will even generate a first-pass interpretation of the results.

And that’ll leave you as the expert in your business - strategizing, feeding in domain knowledge, guiding the agent what to look into, and figuring out how to action the findings (more on that in a minute).

Your analysis agent can work at night too - always checking the latest trends, and fuelling your analysis backlog.

Actionability and Experimentation

The biggest challenge in analytics has always been turning insights into actions. Changing your product based on what you learn from the data. For agents, I expect this to become easier.

Model Changes

Firstly, the AI models agents use are constantly being updated and improved. Every time an agent uses a new model, this is effectively an experiment that needs to be measured. You can’t get away from this decision: you’ll always be needing to make a choice about which model to use.

A typical implementation of model choice is the “model router”: a step in the agent where it will decide whether to use a simpler and cheaper or more complex and expensive model. So we will always be analyzing whether the router configuration is getting us value for money, and adjusting it accordingly.

Evals Offline

Evals also change the way we think about taking action, because they can be run offline on historical or synthetic data. This means for some agents, we will be able to change a prompt, and run an offline counterfactual analysis of how the eval would have changed, before deploying the updated prompt. This loop reduces the friction to action a change - we have a playground for trying out ideas.

Self Improving Agents

The most exciting concept within actionable agent analytics is the idea that an agent could improve itself.

Conceptually this requires a loop between previous agent runs and future decisions.

This loop will probably include human or agentic analysis as a middle step, before feeding the learnings back in, probably via a coding agent.

But there’s also an approach where an agent will reference previous data online and adapt accordingly when making decisions, cutting out some or all of the analysis steps.

Likely we’ll see a hybrid approach, but it’ll be interesting to see where the lines are drawn.

What’s Next?

In this substack, I’ll dig into these topics in detail, and I’ll share the best playbooks I come up with for analyzing the agent that you’re building.

I’m Timothy Daniell, founder of TwoTail.AI, an analytics tool built for analyzing AI Agents. If you’re working on an agent, I’d love to talk to you. You can reach me here: https://www.linkedin.com/in/timothydaniell/

Summary

Analytics data changes from events with properties to spans with metadata — the hierarchy is the point.
AARRR pirate metrics don't fit agents. Expect evals, token efficiency, and latency to replace them.
Funnels are replaced by waterfalls, clustering tables, and Pareto fronts.
The interface to analytics becomes conversational — a prompt box plus an analysis agent.
Taking action gets faster: offline eval replays, model routing, and eventually self-improving agents.

Frequently asked questions

Why doesn't traditional analytics work for agents?

Traditional analytics assumes linear user journeys flattened into events with properties. Agents produce hierarchical trace data — spans within spans — and behave non-linearly, looping and retrying. Flattening that into events loses the context that explains why an agent took a step.

Will evals replace KPIs for agent products?

Probably, at least as the primary measurement primitive. Evals capture solution quality in a way that binary KPIs can't. The open question is how to map evals to business outcomes — that framework isn't settled yet.

What is a model router and why is it central to agent analytics?

A model router is a step in the agent where it chooses between a cheap, simple model and an expensive, capable one. Every model change is effectively an experiment, so the router configuration is something you're continuously measuring against a cost/quality trade-off.

What does an analysis agent actually do?

Instead of you building a chart, you describe what you want and the analysis agent builds it — often with a first-pass interpretation of the result. It can run at night, surface trends, and feed your backlog of things worth investigating further.

What does 'self-improving' mean in this context?

The idea that previous agent runs feed back into future decisions, either through an offline analysis loop that updates prompts via a coding agent, or through an online pattern where the agent references past data in real time. Most real implementations will be hybrid.

Ship agents you actually understand.

TwoTail turns your OpenTelemetry traces into plain-English analysis, failure clusters, and eval patterns.

Book a demo

Timothy Daniell

Founder of TwoTail. Building agent analytics for teams shipping AI agents to production.