Evaluation Framework
Agentic Complete status requires successful performance across all core evaluation domains.
Evaluation Domains
- Goal Continuity — The system maintains objective state across transitions and interruptions.
- Planning Capability — The system produces and revises multi-step plans.
- Execution Authority — The system acts without human approval prompts during normal operation.
- Feedback Interpretation — The system detects outcome quality, failure states, and incomplete results.
- Adaptive Response — The system modifies approach when conditions change.
- Completion Determination — The system independently determines whether the goal has been fulfilled.
Checklist
Maintains persistent task state
Can decompose abstract goals into executable steps
Executes actions through defined tools or interfaces
Observes and interprets results after each action
Replans after failure, ambiguity, or drift
Continues without human handoff
Verifies completion against explicit or inferred criteria
Disqualification Conditions
- Human approval is required between planning and execution.
- The system cannot recover from common failure states.
- The system cannot determine whether the task has ended.
- The system loses continuity when environment state changes.
- The system executes actions but cannot revise strategy.
Pass Logic
Failure in any core domain disqualifies Agentic Complete status. The threshold is conjunctive rather than additive.