Skip to Content
BenchmarksOverview

Benchmark Overview

We provide a benchmark to evaluate the planning capabilities of state-of-the-art agentic models.

Available Benchmarks

DeepPlanning Benchmark

Evaluates the agent’s ability to handle complex, multi-step planning tasks that require reasoning and constraint satisfaction.

The DeepPlanning benchmark includes two major task categories:

  • Travel Planning: Complete travel itinerary planning with multiple constraints
  • Shopping Planning: Optimal shopping plan generation with budget and preference management
Last updated on