An AI agent can only work with the world it can see. That sounds obvious until a task fails because the agent saw too much of the wrong thing and too little of the right thing.
People often talk about context windows as if they are storage boxes. A larger box sounds better. More documents, more chat history, more logs, more screenshots, more tickets, more files. Surely the agent will do better if nothing is missing.
Sometimes it will. Often it will not. A context window is not only a container. It is the agent’s working room. If the room is full of old drafts, stale decisions, duplicate files, private notes, abandoned plans, and loosely related background, the agent has to infer what matters. The model may still produce a confident answer, but the confidence may be built from clutter.
Good delegation is not dumping everything into view. It is choosing the working set.

The working set is the agent’s desk
A working set is the small collection of materials the agent should actively use for a task. It may include a user request, a relevant file, a short policy note, a recent error log, a customer record, an API reference, a design decision, or a previous run summary. The important part is not the format. The important part is that the materials belong to the job.
The desk metaphor helps. A person writing a careful memo may have three documents open and a folder nearby. They do not cover the desk with every document the company has ever produced. If they need more, they retrieve it. If something becomes irrelevant, they move it aside.
Agents need the same discipline. They can search, read, summarize, and retrieve, but they still benefit from a clean active surface. A good working set tells the agent what to treat as live evidence. It reduces the chance that the model will blend old and new instructions, mistake examples for requirements, or let a noisy transcript overpower the actual task.
AI Agent Memory and Context explains what agents should remember and what they should forget. The working set is the practical version of that idea during a single task. It is memory with a job to do.
Bigger context changes the failure mode
A larger context window can be useful when the task genuinely depends on many pieces. A code migration may need several files. A legal or compliance review may need policy, exception history, and source documents. A customer support agent may need account context, recent messages, and product behavior. In those cases, the extra room prevents blind spots.
But bigger context does not remove the need for selection. It changes the failure mode. Instead of missing a key detail, the agent may find a misleading one. Instead of asking for a source, it may rely on a stale source that was included accidentally. Instead of following the latest instruction, it may reconcile contradictory instructions into something nobody intended.
This is why context design is an operations problem, not only a model capability problem. The question is not “How much can the model hold?” The better question is “What should be present when this decision is made?”
For repeated workflows, answer that question in the runbook. AI Agent Runbooks is the natural companion to this guide because a runbook can specify the working set: which files to read first, which records to ignore, which logs are authoritative, when to retrieve more, and when to stop because the needed context is not available.
Old context can be more dangerous than missing context
Missing context often reveals itself. The agent says it cannot find a file, asks a clarifying question, or makes a gap visible in the handoff. Old context can be quieter. It looks like knowledge.
An agent may see last month’s product requirement beside this week’s decision. It may read an outdated policy before a newer correction. It may include a resolved bug as if it is still open. It may use a previous user’s preference in a new user’s task. None of this requires the agent to be malicious or careless. It only requires the working set to contain sediment.
The fix is not to delete history. History matters. The fix is to mark authority. A task should make it clear which source is current, which source is background, which source is only an example, and which source should not be used unless requested. Humans do this casually with phrases like “ignore the old draft” or “the customer changed their mind.” Agents need that same clarity in the materials they receive.
Dates help, but dates are not enough. A document can be recent and wrong. A policy can be old and still authoritative. A decision can be informal and final. The working set should explain status, not only chronology.
Retrieval should be deliberate
Many agent systems use retrieval to bring in relevant material as the task unfolds. This is powerful because the agent does not need everything at once. It can search the archive, read the top candidates, and expand the working set when the task asks for it.
Retrieval also needs judgment. A search result is not the same as evidence. A similar title is not the same as a relevant source. A long document is not necessarily the best document. The agent should know whether it is allowed to browse broadly, whether it should prefer official sources, whether private records require approval, and how to report what it used.
Good retrieval leaves tracks. If the agent changes a file because it found a policy, the handoff should name the policy. If it summarizes a customer issue, it should say which records were read. If it skips a likely source because access was restricted, it should say that. The point is not to burden the human with a full transcript. The point is to make trust inspectable.
AI Agent Observability is useful here because context is part of the trace. A reviewer should be able to see not only what the agent did, but what it had in view when it did it.
Privacy is a context decision
Context windows create privacy risk because adding material to the working set can expose it to tools, logs, reviewers, and downstream summaries. The safest data is not the data that was included carefully. It is the data that did not need to be included at all.
This matters for personal agents and workplace agents alike. A calendar task may not need the content of every meeting note. A support task may not need payment details. A code task may not need production secrets. A research task may not need private customer names. The working set should be shaped by minimum useful context, not maximum available context.
Permissions help, but permissions are only part of the answer. AI Agent Permissions describes what an agent may access and do. Context discipline describes what it should actually see for this run. A person with a company badge may be allowed into many rooms. That does not mean every meeting should happen in the records archive.
When sensitive material is necessary, it should be named as sensitive, limited to the task, and removed from later summaries unless the workflow explicitly requires it. Memory should not become a junk drawer for private details that happened to pass through a task once.
The human still frames the room
The agent may be able to gather context, but the human still frames the room. A good assignment says what the agent should treat as the live problem, what sources matter, what uncertainty remains acceptable, and what should trigger a question. It also says what success looks like after the context has been used.
This framing does not have to be long. It has to be specific. Instead of saying “review this whole project,” a better assignment might say that the agent should inspect the files related to checkout errors, treat the latest failing log as authoritative, avoid unrelated refactors, and report any missing reproduction steps. That is context design. It narrows the working set before the model starts filling gaps with guesswork.
The mature use of agent context feels less like hoarding and more like mise en place. The right ingredients are on the counter. The pantry still exists. The cook can fetch more, but the active surface is clean enough to work.
When an agent fails, do not only ask whether the model was strong enough. Ask what it could see. Ask what it could not see. Ask what stale material sat beside the live request. Ask whether the runbook taught it how to retrieve more. Ask whether private context was included because it was necessary or because nobody took the time to exclude it.
The context window is where delegation becomes concrete. It is the difference between asking a capable helper to work at a clear desk and asking them to make sense of a storage room with a deadline.


