Evaluation Framework

Agentic Complete status requires successful performance across all core evaluation domains.

Evaluation Domains

  1. Goal Continuity — The system maintains objective state across transitions and interruptions.
  2. Planning Capability — The system produces and revises multi-step plans.
  3. Execution Authority — The system acts without human approval prompts during normal operation.
  4. Feedback Interpretation — The system detects outcome quality, failure states, and incomplete results.
  5. Adaptive Response — The system modifies approach when conditions change.
  6. Completion Determination — The system independently determines whether the goal has been fulfilled.

Checklist

Maintains persistent task state

Can decompose abstract goals into executable steps

Executes actions through defined tools or interfaces

Observes and interprets results after each action

Replans after failure, ambiguity, or drift

Continues without human handoff

Verifies completion against explicit or inferred criteria

Disqualification Conditions

Pass Logic

Failure in any core domain disqualifies Agentic Complete status. The threshold is conjunctive rather than additive.