From agent activity to measurable improvement
- Watch agent sessions and repository activity
- Capture prompts, tool calls, diffs, test output, failures, and retries
- Build a timeline of what happened and why
- Score task completion against real evidence
- Produce failure taxonomies and improvement notes
- Feed the next run, training loop, or product decision