Articles — TwoTail

Comparisons

Agent analytics pricing compared, 2026 edition

Side-by-side pricing for LangSmith, Langfuse, Arize Phoenix, Braintrust, and TwoTail — real numbers, what each plan includes, and where the gotchas are.

Timothy Daniell ·8 min read

Comparisons

Best agent observability tools in 2026

A direct, factual comparison of the leading agent observability and analytics tools in 2026 — LangSmith, Langfuse, Phoenix, Braintrust, Helicone, and TwoTail.

Timothy Daniell ·12 min read

Guides

How to debug AI agent failures — a practical playbook

A systematic playbook for debugging AI agent failures: reproduce, cluster, locate the broken step, form a hypothesis, and test the fix before shipping.

Timothy Daniell ·10 min read

Fundamentals

What is agent analytics? A guide for AI & LLM agent teams

Agent analytics is the practice of analyzing trace data from AI agents to diagnose failures, measure quality, and improve outcomes. Here's how it works.

Timothy Daniell ·9 min read

Frameworks

The 3 types of agent optimization experiment

Three layers to experiment on when optimizing AI agents: semantic (prompts), hyperparameter (model config), and architecture (system shape).

Timothy Daniell ·6 min read

Experiments

Agent optimization experiment #3: evals and proximity

Three proximity evals — category overlap, embedding similarity, LLM-as-judge — to diagnose Wiki Racer failures. Found the agent oscillates near targets.

Timothy Daniell ·7 min read

Experiments

Agent optimization experiment #2: loops, hallucinations, and model routers

Second Wiki Racer run: fixed looping, caught hallucinations, split planning and execution across models. Win rate went from 50% to 74%.

Timothy Daniell ·5 min read

Experiments

Agent optimization experiment #1: prompt change

First Wiki Racer optimization attempt: adding 'expert strategies' to the prompt backfired, nearly doubling loop failures. Here's what happened.

Timothy Daniell ·4 min read

Visualization

The funnel chart of agent analytics

Funnels assume linear journeys agents don't have. Replace them with trace waterfalls, semantic clusters, Pareto fronts, and eval heatmaps.

Timothy Daniell ·7 min read

Frameworks

Dolphin Metrics: the pirate metrics for the agent era

AARRR doesn't fit AI agents. Dolphin Metrics — the 4 E's, Engagement, Execution, Efficacy, Economics — is the framework that replaces it.

Timothy Daniell ·5 min read

Fundamentals

How will agent analytics be different?

Agents change analytics at every layer: hierarchical traces replace events, evals replace KPIs, and the UI becomes a prompt box backed by an analysis agent.

Timothy Daniell ·8 min read

Agent Analytics Articles