Context Engineering

1. What it is

Context engineering is the practice of curating, governing, and maintaining what an AI system is allowed to see — before it sees it.

It is not a tool. It is not a model technique. It is not a prompt-engineering pattern. It is a discipline that lives outside the model: the editorial work of deciding which documents enter the retrieval surface, the metadata that lets the AI tell them apart, the written rules for how the corpus is maintained, and the audit trail that makes the system legible to clients, regulators, and your own future team.

Every AI deployment has context engineering happening to it. The question is whether the discipline is applied deliberately by humans who know the domain, or applied accidentally by whatever defaults the vendor shipped.

2. Why it matters: Wave 1 versus Wave 2

The first wave of AI adoption assumed quality was a model problem. "The output is unreliable — we need a better model." When the better model was just as unreliable, the diagnosis shifted: "We need a better tool." When the better tool produced the same drift, organizations reached for prompt engineering, output linters, post-hoc filters, and vendor swaps. Each layer was a patch on the symptom.

The second wave is recognizing that AI quality is a context problem. What the model is allowed to see, and what it's trained to weight, determines what it produces — more than which model was bought.

A 2017 tax memo cited as current isn't a model failure. No model knew the document had been superseded; the firm did. A renewal email opening with "Dear valued customer" — a phrase the brand voice guide explicitly forbids — isn't a writing failure. The model wasn't asked to read the brand voice guide; it was asked to learn from a corpus that included a 2022 campaign that nobody remembers writing.

The discipline that catches the citation is the same one that catches PII in retrieval, vendor lock-in, and tone drift. It is not a discipline of model selection. It is a discipline of context.

3. The four practitioner moves

There are four moves that consistently separate AI deployments that hold up from ones that drift. They show up across domains — legal, marketing, customer service, internal knowledge — but the moves are the same.

Curation. The editorial work of deciding what enters the index and what doesn't. Curation is human work — the people in the room who know which documents represent the firm's voice, which engagements contain client information you've committed not to retain, which memos have been superseded. Curation is not a one-time pass at launch; it is an ongoing practice with a written rubric.

Metadata. The information about documents that lets the AI distinguish them. Authorship. Currency. Supersession. Format. Audience. Sensitivity. Metadata turns a document collection into a retrieval surface that supports judgment. Without it, every document looks the same age, the same authority, the same applicability — which is to say, the AI cannot reason about them.

Governance. The written rules that say what is index-eligible, who decides, how supersession is recorded, what gets pulled, and how new documents enter. Governance is the discipline that survives staff turnover, vendor changes, and project pauses. The rubric you write for one team's AI rollout becomes the standard the rest of the organization adopts.

Audit. The ability to answer "how is this AI grounded?" with a defensible story. Audit means you can show what's in the index, what's not, when it was last refreshed, who decided what's eligible. Audit is what makes context engineering legible to clients, regulators, partners, and the next vendor pitching you a magic model.

These four moves compound. Curation without metadata is a list. Metadata without governance is a snapshot that decays. Governance without audit is a story you can't defend. Audit without curation is documentation of nothing.

4. What it isn’t

Three things context engineering gets confused with, and shouldn't be:

Prompt engineering. A "current as of" instruction in a system prompt cannot cure a retrieval surface that has no notion of currency. The surface doesn't know which documents are current, so the instruction has nothing to act on. Prompts work on top of context; they don't compensate for missing context.

Better models. The same model can produce drift on a stock plugin and discipline on a curated RAG. The variable is not the model. Practitioners who upgrade models hoping for fewer failures are usually solving the wrong problem.

Vendor selection. A vendor can host the infrastructure — the index, the embeddings, the retrieval, the orchestration. The discipline must stay inside your organization. If the vendor does not expose enough surface for you to apply curation, metadata, governance, and audit yourself, that is the answer to the procurement question.

5. The SMB lens

Context engineering scales differently for small businesses than for enterprises.

Enterprises throw teams at it: data governance offices, AI committees, RACI matrices for index changes, weekly reviews. The infrastructure is real; the cost is overhead. SMBs do not have those teams — and do not need them. What an SMB needs is a working version of the same discipline at one-tenth the surface area.

A practical starter set for a 25–200 person organization:

A one-page eligibility rubric for the index, written by the team that uses the AI. Not by a committee.
A default-current approach to metadata — newest version of a document supersedes older versions until explicitly told otherwise.
A single owner for the rubric — the person who can speak to what's in the index when a client asks.
A quarterly re-audit cadence, with weekly for high-velocity content categories.

The SMB advantage is that you can change the rubric on Tuesday and have it adopted by Friday. The cost of a flawed rubric is small; the cost of no rubric compounds. Use the speed.

6. Failure modes to recognize

The same handful of failure modes show up across domains. Recognizing the symptom is the first step toward applying the discipline.

Drift. The AI cites stale or superseded information because the retrieval surface has no notion of currency. Symptom: a partner finds a 2017 memo cited as authoritative; a customer-success manager finds a renewal email referencing pricing from two years ago. Fix: metadata for currency, plus a supersession record.

Patchwork. Post-hoc filters and wrappers compensate for what is missing inside the system. Symptom: a "currently effective" tag in a spreadsheet, alongside the AI; an output linter scanning for forbidden phrases. Fix: the governance moves into the retrieval surface, not next to it.

Lock-in. The discipline lives inside a vendor's system, not yours. Symptom: you can't audit the index, can't tell when training data was last refreshed, can't articulate to a client how their data is protected without saying "our vendor protects it." Fix: stay vendor-agnostic on the discipline; use vendors for infrastructure.

Voice drift. The AI sounds vaguely off because the corpus is undifferentiated. Symptom: drafts that pass a style-guide check but read like a different company. Fix: format / audience / voice tagging on the corpus.

Hedge fatigue. Maintaining two grounding stories — vendor for one practice area, internal for another — until the team can't say which is the standard. Symptom: clients hear different answers depending on who they ask. Fix: hard exit decision on pilots; one defensible answer per question.

7. Where to start

If you are starting from scratch:

Inventory the documents the AI will be allowed to see. Do not index everything; most of "everything" is wrong.
Write a one-page eligibility rubric with the team that will use the AI. The rubric is the artifact that survives the project.
Add metadata for the three things that always matter: currency, authorship, supersession. Other tags can come later.
Pick one owner for the rubric. They make the calls; everyone else proposes.
Set a cadence — quarterly re-audit, weekly for high-velocity content categories.

If you are inheriting a system:

Audit what is in the index. Do not fix anything yet — just see what is there.
Trace one drift incident. The root cause is almost always missing metadata or missing governance, not a model issue.
Replace the post-hoc patches with surface-level governance. Cheaper to fix once than to maintain forever.
Establish ownership and cadence before adding new documents.

The order matters. Inventory before metadata. Metadata before governance. Governance before audit. Audit before scale.

8. Why this is the moat

Tools change. Models change. Vendors change. The discipline of context engineering is what doesn't change — and it is what your team can keep developing as the surface area shifts beneath it. Organizations that build the discipline early can adopt new infrastructure without losing trust in their AI; organizations that skip the discipline find themselves re-litigating the same failure modes every time the stack changes.

The agents are only as good as the context you give them. The context is the work.

Last verified: 2026-05-08

1. What it is

2. Why it matters: Wave 1 versus Wave 2

3. The four practitioner moves

4. What it isn’t

5. The SMB lens

6. Failure modes to recognize

7. Where to start

8. Why this is the moat

Related learning surfaces

Agentic RAG patterns

Curation governance for SMB AI

Picking a Data Foundation