Agent Analytics Articles

Guides, comparisons, and deep dives on agent observability, failure clustering, and LLM evaluation.

Comparisons
Comparisons
Agent analytics pricing compared, 2026 edition
Side-by-side pricing for LangSmith, Langfuse, Arize Phoenix, Braintrust, and TwoTail — real numbers, what each plan includes, and where the gotchas are.
Timothy Daniell ·8 min read
Comparisons
Best agent observability tools in 2026
A direct, factual comparison of the leading agent observability and analytics tools in 2026 — LangSmith, Langfuse, Phoenix, Braintrust, Helicone, and TwoTail.
Timothy Daniell ·12 min read
Guides
Guides
How to debug AI agent failures — a practical playbook
A systematic playbook for debugging AI agent failures: reproduce, cluster, locate the broken step, form a hypothesis, and test the fix before shipping.
Timothy Daniell ·10 min read
Fundamentals
Fundamentals
What is agent analytics? A guide for AI & LLM agent teams
Agent analytics is the practice of analyzing trace data from AI agents to diagnose failures, measure quality, and improve outcomes. Here's how it works.
Timothy Daniell ·9 min read
Fundamentals
How will agent analytics be different?
Agents change analytics at every layer: hierarchical traces replace events, evals replace KPIs, and the UI becomes a prompt box backed by an analysis agent.
Timothy Daniell ·8 min read
Frameworks
Frameworks
The 3 types of agent optimization experiment
Three layers to experiment on when optimizing AI agents: semantic (prompts), hyperparameter (model config), and architecture (system shape).
Timothy Daniell ·6 min read
Frameworks
Dolphin Metrics: the pirate metrics for the agent era
AARRR doesn't fit AI agents. Dolphin Metrics — the 4 E's, Engagement, Execution, Efficacy, Economics — is the framework that replaces it.
Timothy Daniell ·5 min read
Experiments
Experiments
Agent optimization experiment #3: evals and proximity
Three proximity evals — category overlap, embedding similarity, LLM-as-judge — to diagnose Wiki Racer failures. Found the agent oscillates near targets.
Timothy Daniell ·7 min read
Experiments
Agent optimization experiment #2: loops, hallucinations, and model routers
Second Wiki Racer run: fixed looping, caught hallucinations, split planning and execution across models. Win rate went from 50% to 74%.
Timothy Daniell ·5 min read
Experiments
Agent optimization experiment #1: prompt change
First Wiki Racer optimization attempt: adding 'expert strategies' to the prompt backfired, nearly doubling loop failures. Here's what happened.
Timothy Daniell ·4 min read
Visualization
Visualization
The funnel chart of agent analytics
Funnels assume linear journeys agents don't have. Replace them with trace waterfalls, semantic clusters, Pareto fronts, and eval heatmaps.
Timothy Daniell ·7 min read