Context Engineering: Why Prompt Engineering Is Dead (And What Replaced It for Knowledge Workers)

The Tweet That Killed Prompt Engineering

On June 19, 2025, Tobi Lütke, the CEO of Shopify, posted on X that he preferred the term "context engineering" over "prompt engineering." He described it as "the art of providing all the context for the task to be plausibly solvable by the LLM." Six days later, Andrej Karpathy, one of the most respected voices in AI, amplified the term. His definition was sharper: "context engineering is the delicate art and science of filling the context window with just the right information for the next step." (Karpathy, 2025)

The phrase itself wasn't new. Walden Yan at Cognition, the team behind the autonomous coding agent Devin, had been writing about it earlier in the year. But June 2025 was when the label went mainstream. By mid-2025, Gartner had baked it into its analyst briefings with a simple line: "context engineering is in, prompt engineering is out." (Gartner, 2025)

What happened wasn't a rebrand. It was a correction. The AI community quietly admitted that the skill called "prompt engineering" had always been a subset of something bigger, and that the subset was no longer the interesting part. A prompt is one component. Context is the whole room.

This matters because knowledge workers have spent two years learning the wrong thing. They memorized prompt templates. They collected "ultimate prompt" Twitter threads. They treated the prompt like a spell. That effort is not useless, but it is no longer sufficient. The question isn't how you phrase your request. The question is what you put next to your request.

What Context Engineering Actually Means

Here's the plainest definition: context engineering is the practice of deciding, assembling, and delivering everything an AI model needs to do a task well, before the model runs.

Think of it like briefing a new consultant. A bad brief is a one-line email. A good brief includes the company background, the relevant history, the files they'll need, who the stakeholders are, what success looks like, and what's out of scope. If you hire a brilliant consultant and give them a bad brief, you get a mediocre deliverable. The same is true of AI.

The consultant analogy is Addy Osmani's, from his essay "Context Engineering: Bringing Engineering Discipline to Prompts," which remains one of the cleanest write-ups on the shift. His point is that prompt engineering optimized the one-line email. Context engineering optimizes the entire briefing package.

Practically, this covers a lot of ground. It includes the system prompt (who the model is), the retrieval layer (what documents it can see), persistent memory (what it remembers about you), tool use (what actions it can take), attachments (what files you loaded for this session), and conversation history (what was already said). Every one of these is a lever. Every lever affects output.

The reason this bundle got a new name is that you can't get great results by optimizing only one lever anymore. You have to think about the stack.

Prompt Engineering Wasn't Wrong. It Was Just Incomplete.

It's tempting to treat this as a generational shift where everything old is wrong. That's lazy framing. Prompt engineering techniques still work. Chain-of-thought, few-shot examples, role assignment, explicit output formats, all of it still moves the needle.

What changed is the ceiling. In 2023, a well-phrased prompt could double the quality of a response because the underlying models were easily confused by ambiguity. You could turn GPT-3.5 from a bumbling intern into a coherent analyst with the right sentence structure. That gap was real, and prompt engineering exploited it.

Frontier models in 2026 don't need the hand-holding. Claude, GPT-5, and Gemini 2.5 understand ambiguous requests reasonably well. The marginal return on phrasing has dropped. But the marginal return on supplying relevant source material, scoped memory, and curated examples has gone up sharply. The leverage moved.

Here's the comparison, laid out.

Dimension	Prompt Engineering	Context Engineering
What you tune	The wording of your request	The entire input stack fed to the model
Primary unit	A sentence	A bundle: system prompt, documents, memory, tools, history
Who it's for	Anyone using a chat box	Anyone whose output quality depends on AI
Skill required	Good writing, pattern recognition	Curation, information architecture, judgment
When it fails	The model misunderstands the instruction	The model understands fine but lacks the facts, examples, or history to answer well
Fix when stuck	Rephrase, add examples, specify output format	Add the right source, trim the wrong sources, adjust memory, scope the retrieval
Peak era	2022 to 2024	2025 onward

Notice the last row. Prompt engineering didn't die because it was wrong. It died because the bottleneck moved somewhere else.

The 6 Layers of Context

To do context engineering deliberately, you have to know what you're engineering. Every modern AI interaction pulls from six layers, whether you think about them or not. The skill is knowing which ones to adjust.

Layer	Purpose	Example
System prompt	Defines who the model is, what rules it follows, what tone it takes	A `claude.md` file in your repo, Cursor's `.cursorrules`, or a custom GPT instruction like "You are a senior editor. Prefer active voice. Never use em-dashes."
Persistent memory	Things the model remembers about you across conversations	ChatGPT's memory feature storing your profession, writing style, and ongoing projects
Retrieval (RAG)	Pulls relevant chunks from a larger knowledge base on demand	Asking your AI "what did I highlight about network effects last month?" and it fetches the exact passages
Tool use	Lets the model take actions or fetch live data	The model calls a calculator, runs code, searches the web, or queries your calendar
Attachments	Files, images, or URLs loaded into this specific session	A PDF contract you drop in to get reviewed, or a screenshot you paste to debug
Conversation history	What's already been said in this thread	The back-and-forth above your current message, including earlier corrections and preferences

A well-engineered context uses all six deliberately. A poorly engineered context dumps everything into one layer (usually attachments, often the conversation history) and hopes the model sorts it out.

The mistake most knowledge workers make is treating AI as a chat interface when it's actually a context assembler. The chat is the tip. The iceberg is what you feed into it before you type.

For a related angle on how personal information architecture shapes AI usefulness, see Personal Context Management: The Missing Layer Between You and AI.

Why Bigger Context Windows Made This Worse, Not Better

In 2023, a 100K-token context window was exotic. By 2026, 1M-token windows are common. You can drop the full text of War and Peace into a single prompt. So the natural assumption is that context engineering is getting easier. More room, less triage, right?

Wrong. It got harder.

The foundational paper here is Liu et al. (2024), "Lost in the Middle: How Language Models Use Long Contexts," published in TACL. The researchers tested whether models could find and use specific information depending on where it was placed in a long context. The finding was uncomfortable: performance is U-shaped. Models pay the most attention to information at the very beginning and the very end of the context. Information in the middle gets systematically underweighted, sometimes ignored entirely. (Liu et al., 2024)

Put a critical instruction in the middle of a 50-page document and the model may act as if it never saw it. That's not a bug you can prompt your way out of.

Then, in 2025, Chroma published "Context Rot: How Increasing Input Tokens Impacts LLM Performance." They tested 18 frontier models, including GPT-4.1, Claude Opus 4, and Gemini 2.5. The result was consistent across every model: performance degraded as input grew, well before the context window was anywhere near full. A 200K-token window could exhibit serious rot by 50K tokens. The model technically "saw" everything. It acted as if it hadn't.

This is why more context isn't better context. It's why dumping your entire Google Drive into a prompt doesn't work, even when the window allows it. The engineering discipline is knowing what to exclude, not just what to include.

This is the hidden cost of the 1M-token era. The window grew faster than the models' ability to use it. And it turned "what should I leave out?" into the most valuable question in the stack.

The Skill Nobody Named: Curation

If context rot is the problem, curation is the solution. And curation happens to be a skill most knowledge workers already practice, without calling it that.

Every time you highlight a passage in an article, you're curating. You're saying: this matters. The rest is background. When you annotate a PDF, bookmark a paper, or save a quote, you're doing the same thing. You're building a signal-to-noise filter over a world full of text.

The problem until recently was that this curation was trapped. Your highlights lived in one app. Your Kindle notes lived in another. Your web research lived in your browser history. When you sat down to brief an AI, you couldn't actually pull any of it into the context window efficiently. You ended up re-reading everything or worse, pasting in raw sources and hoping for the best.

Context engineering as a discipline has a huge gap exactly here. Companies solved it by building internal knowledge bases and RAG pipelines. But individual knowledge workers don't have an engineering team. They have the same problem (too much source material, not enough signal) and none of the infrastructure.

This is why reading tools that capture highlights durably have quietly become AI infrastructure. Glasp's web highlighter exists to solve exactly this: it turns your reading into structured, retrievable context. When you highlight a paragraph in a blog post, that highlight becomes a piece of context you can hand to any AI later, filtered by topic, by source, by date.

The same principle applies to long-form reading. Your Kindle highlights are arguably the highest-quality signal you've ever generated about what matters to you. You paid attention long enough to highlight them. That's a costly filter, and it's wasted if the highlights sit in a closed system.

For a broader treatment of why curated reading outperforms dumped documents, see The Hidden Cost of Information Overload: Why Your Brain Needs a Second Layer.

Context Engineering for Individuals (Not Just Engineers)

Most writing on context engineering targets developers. It's about building production AI systems: how to shape a system prompt for a coding agent, how to chunk documents for retrieval, how to wire up tool calls. That's useful if you ship software. It's less useful if you're a consultant, researcher, writer, analyst, or student trying to get better AI output.

But the same discipline applies. You just run it by hand.

You design system prompts, informally. Every custom GPT, every Claude Project, every claude.md-style instruction file you set up is a system prompt. When you write "you are my research assistant, I work on renewable energy policy, prefer skeptical summaries," you're doing system prompt design. Do it deliberately.

You manage memory. ChatGPT's memory feature and Claude's projects both let you pin facts that persist across conversations. Most people either ignore this (and lose continuity) or dump everything into it (and create noise). The right move is to curate memory like you'd curate a resume: only the things you want the model to use every time.

You do retrieval, manually. Pasting the right article into a chat is manual RAG. The question is where "the right article" comes from. If it comes from frantically scrolling your browser history, you have no retrieval system. If it comes from a library of passages you've already flagged as interesting, you have one.

You load attachments intentionally. The temptation is to upload the whole book. The better move is to upload the 40 pages you actually highlighted. You're bypassing context rot by doing the filtering upstream.

You manage conversation history. Long threads get worse over time because old messages dominate the context unhelpfully. Starting a fresh thread for a new subtask, with a clean brief, often outperforms continuing the mega-thread.

None of this requires engineering skill. It requires the same skill good researchers and good journalists already have: knowing what to include, what to cut, and what to pull from where.

Your Highlights Are Your Competitive Context

Here's the part that's underrated.

Most people treat their notes and highlights as memory aids. Things to go back to someday. That framing made sense in 2010, when going back to them was the only way to use them. It's obsolete in 2026.

Your highlights are now a feed that can be handed to AI. Every passage you've flagged, every quote you've saved, every annotation you've made is a piece of context. And because you generated it by paying attention, it's higher-signal than anything scraped at random from the web.

Think about what this means competitively. Two knowledge workers use the same AI model. One has three years of structured reading and highlighting. The other has three years of browser tabs they never revisited. When they ask the AI the same question, the first person can feed it their own curated corpus. The second person is stuck with the model's generic training data and whatever they can remember to paste in. The gap is not a prompting gap. It's a context gap.

This is why Glasp has been shifting in how it positions itself. The original pitch was a social web highlighter: highlight things, see what others highlighted, build a reader identity. All still true. But the deeper value now is that every highlight is a context token waiting to be used. Your reading history compounds into a personal RAG corpus, one paragraph at a time.

When you pair this with Glasp's AI chat, the workflow becomes closer to what engineers build for their companies. You highlight as you read. Later, you ask questions and the AI pulls from what you actually cared about, not from a generic web index. That's context engineering, except the context is your own library.

For more on how this flips the reading-AI relationship, see The AI Reading Assistant That Doesn't Do the Reading for You.

A Simple Framework to Engineer Context for Any AI Task

Enough theory. Here's a concrete workflow you can run the next time you open a chat.

Step 1: Define the job before you type. One sentence. What does done look like? "Draft a 500-word memo summarizing the three main arguments against a four-day workweek, written for a skeptical COO." That's a job. "Help me with this article" is not.

Step 2: Gather your sources, then cut them. Pull the materials that actually bear on the task. If you have highlights on the topic, start there, not with the full articles. If you have memory set up, check whether it already contains useful background. Leave out anything that's only tangentially related. Context rot is real.

Step 3: Set the role and rules. Before the task, tell the model who it is and what rules apply. "You're editing for a skeptical COO. No jargon. No hedging. Numbers before adjectives." This is the system prompt layer. It takes ten seconds and changes the tone of everything that follows.

Step 4: Feed the task plus the bundle, in order. Put the most important context first and the task last. Because of the Lost in the Middle effect, you want the instruction and the sharpest material at the beginning and the end. The middle is a swamp.

Step 5: Iterate on context, not phrasing. If the output is bad, resist the urge to rewrite your prompt twelve ways. Ask instead: did I give it the right material? Was there a passage I forgot? Was there a source that was misleading? Adjust the inputs, re-run, and watch the quality jump.

Do this a few dozen times and it becomes reflexive. You'll stop asking "how do I prompt this?" and start asking "what does the model need to see before it answers?" That shift is the whole discipline.

Frequently Asked Questions

Is prompt engineering actually dead?

The phrase is retiring. The techniques under the phrase still work. Chain-of-thought, few-shot examples, and clear output formats are all still useful. What's dead is the idea that good phrasing alone gets you great output. In 2026, phrasing is a minor lever. Context assembly is the major one. When people say "prompt engineering is dead," this is what they mean.

Do I need to be technical to do context engineering?

No. The engineering metaphor throws some people off, but it just means doing the work deliberately instead of by accident. A consultant preparing a brief, a journalist researching a piece, a student organizing source material for an essay, these are all context engineering in disguise. The core skill is curation and judgment. The technical version is just the same skill applied to system prompts, RAG pipelines, and memory stores.

What's the difference between context engineering and RAG?

RAG (retrieval-augmented generation) is one layer of context engineering, specifically the retrieval layer. It's the machinery that pulls relevant chunks from a knowledge base when needed. Context engineering is the broader discipline that includes RAG, plus system prompts, memory, tool use, attachments, and conversation history. RAG is a technique. Context engineering is the practice.

Won't bigger context windows eventually solve this?

They haven't so far, and the evidence suggests they won't. Liu et al. (2024) showed models ignore the middle of long contexts. Chroma's 2025 study showed all 18 frontier models tested degrade well before the window fills. The bottleneck isn't window size. It's attention allocation inside the window. Curation stays valuable even if windows grow another 10x.

How does this relate to AI "memory" features?

Memory (like ChatGPT's persistent memory or Claude's projects) is one layer of context. It's what the model knows about you across sessions. Context engineering includes memory but is broader. Memory is the always-on layer. Retrieval, attachments, and system prompts are the per-task layers. A good context engineer uses all of them together.

What should I stop doing?

Stop hoarding prompt templates. Stop pasting full documents when highlighted passages would do. Stop starting conversations with no system prompt and wondering why the tone is off. Stop treating the chat box as the only surface. The chat box is the final centimeter of a much longer pipeline, and that pipeline is where the quality gains live.

Where do highlights fit into this?

Highlights are the rawest, cheapest form of personal context. Every time you highlight something, you're pre-filtering noise out of your own future AI sessions. Tools that capture highlights durably (across articles, PDFs, Kindle books, and YouTube transcripts) turn your reading into reusable context. That's why reading-capture tools and AI tools are converging.

Isn't this just fancy note-taking?

Partly. The difference is that traditional note-taking is optimized for you rereading your notes. Context engineering is optimized for a model consuming your notes. The format requirements are different (structure, atomicity, retrievability matter more), but the underlying practice of capturing what's worth remembering is the same. Good note-takers have a head start here.

Conclusion: The New Literacy

Every era of computing has had a literacy that separated amateurs from serious users. In the 1990s, it was learning to search Google well. In the 2010s, it was learning to structure information in apps like Notion or Airtable. In 2026, it's learning to engineer context for AI.

The people who figure this out will pull far ahead of the people who don't. Not because they have better access to models (everyone has the same models), but because they show up to every task with better material. They know what to feed in. They know what to leave out. They know where their best source on a topic is, because they bothered to capture it months ago.

This is why curation is quietly becoming the most valuable metaskill of the AI era. Every highlight you save, every passage you annotate, every piece of reading you actually process is a deposit into a personal context engine. The future of AI productivity isn't people with secret prompts. It's people with thoughtful libraries.

You already do the reading. You already have opinions about what matters. The only question is whether any of it sticks around long enough to be useful to your future self, and to the AI working alongside you. The tools exist. The habit is the hard part.

Pick something worth reading today. Highlight the parts that matter. That is context engineering. Everything else is technique.