AI Agent Capability Inventories: Knowing What the Delegate Can Really Do

An AI agent is often introduced by what it appears to do: answer questions, draft messages, update files, browse pages, open tickets, or prepare code changes. That description is useful for a demo, but it is too loose for operations. A delegate that can browse a knowledge base is not the same as a delegate that can browse the open web. A delegate that can prepare a record update is not the same as one that can apply it. A coding agent that can edit a sandbox branch is not the same as one that can merge into a protected branch.

Capability inventory is the plain practice of writing down what a particular agent lane can actually do, what it must not do, what context it is allowed to use, and what kind of human review should surround it. It sits before AI Agent Routing because routing only works when the routes point to known delegates. It also sits beside AI Agent Permissions and AI Agent Tool Contracts because capabilities are built from both authority and handles.

Capability Is Not One Thing

Teams sometimes describe an agent as capable or not capable at a whole category of work. That is usually too broad. The useful question is not whether the agent can handle support, research, finance, software, or operations. The useful question is which slice of that work has stable inputs, approved sources, bounded tools, clear success criteria, and a review path.

A research delegate may be capable of collecting source evidence from an approved corpus, summarizing tradeoffs, and marking uncertainty. It may not be capable of interpreting unapproved sources as policy or publishing final recommendations. A support delegate may be capable of drafting a reply from a current policy source. It may not be capable of promising refunds, changing account status, or inventing an exception. A coding delegate may be capable of reproducing a bug, editing a narrow file set, and running focused tests. It may not be capable of changing dependencies or deployment scripts without a separate approval.

This difference matters because an agent can be excellent inside a lane and dangerous outside it. Capability inventory prevents the conversation from collapsing into trust or distrust. It gives the workflow a more precise language: this delegate can inspect, draft, compare, prepare, validate, escalate, or act under these conditions. Anything outside that shape needs a different route, a different permission, or a human owner.

Name The Handles The Agent Uses

A capability inventory should begin with handles. Which tools can the agent call? Which repositories, records, document shelves, queues, browsers, calendars, or messaging systems can it touch? Which of those handles are read-only, draft-only, or state-changing? Which ones are experimental? Which ones return structured evidence, and which ones return messy content that needs interpretation?

AI Agent Tool Contracts explains the tool side in detail. The inventory is where those contracts become operational. A tool named search is not enough. The inventory should say whether search reaches approved internal documents, public web pages, customer records, archived material, or all of those at once. A tool named update is not enough. The inventory should say whether it prepares an update for review, writes to a staging record, writes to production, or creates a request for another system.

The same habit applies to non-tool capabilities. If the agent can remember preferences, the inventory should say which preferences and where they come from. If it can use a browser, the inventory should say which pages are in scope and which actions are stop conditions. If it can create artifacts, the inventory should say what those artifacts must include. AI Agent Artifact Design becomes easier when the inventory already names the work product.

Record Context Sources And Blind Spots

Capability depends on context as much as tools. A delegate with the right tool and the wrong source can produce confident errors. A delegate with current sources and no permission to inspect the relevant record may stop correctly, but it is not capable of completing the task.

The inventory should name the sources the agent is expected to use and the sources it should avoid. A policy agent may use the current policy collection, not old training decks. A coding agent may use the repository, test output, and issue thread, not unrelated package examples from the web unless the task asks for them. A finance operations agent may use a ledger export and approval record, not a casual Slack summary. AI Agent Knowledge Bases covers the maintained shelf; the inventory says which shelf belongs to this lane.

Blind spots deserve the same attention. If the agent cannot see private attachments, live production records, historical exceptions, or customer-specific constraints, that is part of its capability profile. A blind spot does not always make the agent useless. It may simply mean the agent can prepare a draft and mark what needs human confirmation. But if the blind spot is hidden, the workflow will confuse partial work with complete work.

Include The Review Burden

An inventory that only lists what the agent can produce is incomplete. It should also describe what review the output requires. Some capabilities are valuable only when the review burden is reasonable. If the delegate saves ten minutes of drafting but forces a reviewer to spend twenty minutes reconstructing the evidence, the capability is weaker than it looks.

Human Review for AI Agents is the natural companion here. The inventory can describe the expected review surface: source trail, diff, proposed action, validation result, confidence boundary, or escalation note. It can also name the reviewer role. A policy reviewer, engineering reviewer, operations owner, and requester are not interchangeable. If every capability routes to a generic human queue, the inventory has hidden an operating cost.

Review burden also changes with maturity. A new delegate may require close review for every run. After the workflow has evaluations, observability, accepted artifacts, and a history of correct stops, some low-risk runs may need lighter review. That change should update the inventory rather than live as tribal memory.

Keep Limits As First-Class Data

The most useful capability inventories are honest about limits. They do not read like product marketing. They say where the delegate is brittle, which tasks are out of scope, which data is stale or partial, which tools are slow, and which conditions require escalation.

This is especially important for model behavior. AI Agent Model Selection helps choose a model lane, but the inventory should record what that choice means for the workflow. A cheaper lane may be capable of classification but not subtle synthesis. A stronger lane may be capable of long reasoning but too slow for a queue that needs quick triage. A browser-capable lane may be useful for gathering evidence but too risky for forms that can submit irreversible actions.

Limits should be written in operational language. Instead of saying the agent is “not reliable with ambiguity,” say that it must stop when the request lacks a record identifier, when two records match the same customer, when the governing source cannot be found, or when the proposed action would exceed a named permission boundary. Limits are useful when they can be recognized during a run.

Use The Inventory During Routing And Change

Capability inventory is not a document to admire once and forget. It should be used when a task is routed, when a permission changes, when a tool contract changes, when a new source is added, and when an incident or near miss reveals a gap.

AI Agent Change Management matters because capabilities drift. A model upgrade can change output quality. A tool can add a field or remove one. A knowledge base can move. A reviewer queue can become overloaded. A prompt can gain a new instruction that expands scope quietly. If the inventory is not updated, routing decisions will point to yesterday’s agent.

The inventory also makes decommissioning easier later. AI Agent Decommissioning asks what the lane touched, what credentials it used, and what work needs redirecting. A current capability inventory answers much of that before shutdown becomes urgent.

A good inventory does not need to be long. It needs to be specific enough that a person can decide whether a task belongs in this lane. It names the handles, context, authority, review burden, limits, and owner. That small act turns an impressive demo into an accountable delegate. The team no longer asks, in a vague way, whether the agent can do the work. It asks whether this run fits the capabilities that have actually been designed.

AI Agent Capability Inventories: Knowing What the Delegate Can Really Do

On this page

Capability Is Not One Thing

Name The Handles The Agent Uses

Record Context Sources And Blind Spots

Include The Review Burden

Keep Limits As First-Class Data

Use The Inventory During Routing And Change

Turn agent lessons into a better review setup

JJ Ben-Joseph

On this page

Capability Is Not One Thing

Name The Handles The Agent Uses

Record Context Sources And Blind Spots

Include The Review Burden

Keep Limits As First-Class Data

Use The Inventory During Routing And Change

Turn agent lessons into a better review setup

JJ Ben-Joseph

Related guidebooks

AI Agent Quality Gates: Moving Work From Draft to Trust

AI Agent Shadow Mode Pilots: Comparing Delegation Before Authority

AI Agent Workspace Hygiene: Keeping Delegated Work Contained