What shall we analyze?
Recent Charts
Autonomy
Evals
Last 30 days
Sandbox
New sandbox run
1Type
2Task
3Variants
4Evals
5Inputs
6Review
What do you want to test?
Sandbox run
Preparing…
Tradeoffs
Recommendation
Datasets
Static collections of (input, output, eval) rows. Used for golden sets, calibration, annotation, and sandbox runs.
Loading…
Vocabulary
Help the analyst understand your agent's structure and goals
Strategy
Describe your agent's goals and what you're optimizing for
Terminology
Define business terms, KPIs, and domain-specific concepts
Data Structure
Span hierarchy and metadata fields (auto-generated from your traces)