For 15 years, SaaS founders sailed happily on the Pirate metrics ship.
AARRR (Acquisition, Activation, Retention, Referral, Revenue) was the perfect framework for the Web 2.0 era. It gave us a logical way to organize our dashboards because software was linear. A human landed on a page, clicked a button, and either converted or churned.
But as I build TwoTail.AI, I’m realizing that Pirates don’t make sense for Agents.
Agents don’t have linear funnels. They have loops. They don’t just “convert”; they reason. You can have high Retention (the agent keeps running) but zero Efficacy (it’s stuck in a hallucination loop). And you always pay a price (tokens).
If we want to measure the new paradigm, we need a new framework.
I call it the Dolphin Metrics (EEEE!).
Here is the dashboard structure I’m using to analyze Agents.
1. Engagement (Who?)
This is the only metric that survives from the old world. Before we care how smart the agent is, we need to know if it’s being used.
- The Metrics: Session counts, Active Users (DAU/MAU), and Invocation Rate.
- The Question: Is the agent actually being called, or is it gathering dust?
2. Execution (How?)
This is where traditional analytics breaks. We need to open the “Black Box” of the agent’s logic. We aren’t measuring clicks anymore; we are measuring the Trace.
- The Metrics: Latency per step, Tool usage frequency, ReAct loop depth (how many thought steps did it take?), and Error rates.
- The Question: Where are the bottlenecks? Did the agent take the scenic route to get to an answer? We are looking for “Reasoning Quality”—analyzing the path, not just the destination.
3. Efficacy (What?)
Old software was binary: it worked or it crashed. Agents are probabilistic: they can work mostly. “Efficacy” is the measurement of quality.
- The Metrics: Eval scores (e.g., “Hallucination Rate,” “Answer Relevance”), User Acceptance Rate (Did the user copy the code?), and CSAT.
- The Question: Did the agent actually solve the problem? This is arguably the most important metric, and the hardest to automate.
4. Economics (How much?)
In SaaS, scaling a database was cheap. In the Agent world, intelligence costs money. Every “thought” burns tokens. A highly effective agent that costs $5.00 per query to run is a failed product.
- The Metrics: Cost per Session, Tokens per Task, Model Efficiency (GPT-4 vs Llama-3 routing).
- The Question: Is the value generated worth the compute cost?
The New Dashboard
The next time you sit down to build an analytics view for your AI product, stop trying to force it into a Funnel.
Think like a Dolphin: Engagement shows you the users, Execution optimizes the workflow, Efficacy proves the value, Economics makes the business add up.
I’m Timothy Daniell, founder of TwoTail.AI, an analytics tool built for analyzing AI Agents. If you’re working on an agent, I’d love to talk to you. You can reach me here: https://www.linkedin.com/in/timothydaniell/