Benchmark Overview
We provide a benchmark to evaluate the planning capabilities of state-of-the-art agentic models.
Available Benchmarks
DeepPlanning Benchmark
Evaluates the agent’s ability to handle complex, multi-step planning tasks that require reasoning and constraint satisfaction.
The DeepPlanning benchmark includes two major task categories:
- Travel Planning: Complete travel itinerary planning with multiple constraints
- Shopping Planning: Optimal shopping plan generation with budget and preference management
Last updated on