AI for Long-Form Writing: The 5-Stage Workflow That Beats One-Shot Prompts

Why One-Shot AI Writing Sounds Like AI

Forty percent of work-related messages people send ChatGPT are about writing. That number comes from the OpenAI and NBER joint study released in September 2025, which analyzed a privacy-preserving sample of consumer ChatGPT traffic. The same paper turned up something more interesting. Of those writing messages, roughly two thirds were people modifying text they already had, not asking the model to generate something from a blank page.

That ratio is the quiet truth of how AI writing actually works. Most users have already learned, through painful trial and error, that asking a chatbot to "write a 2,000-word essay on X" produces something that reads like a 2,000-word essay on X. Generic. Bloated. Confidently wrong about the small things. The drafts are always grammatically clean and almost always forgettable.

The reason is structural, not magical. Long-form writing is not one task. It is at least five tasks: figuring out what you actually want to say, deciding the order to say it, finding the voice to say it in, stress-testing whether it holds up, and finishing the prose. When you mash all five into a single prompt, the model averages everything. Average audience. Average argument. Average sentences. Average voice. The output is the median of every essay on the topic that ever ended up in training data.

This is not a prompt engineering problem you can solve with a longer prompt. It is a workflow problem. The fix is to break the work back into the stages it always was, and use the AI for what it is genuinely good at inside each stage. That is the workflow this article describes. The names are mine. The pattern, once you see it, will look obvious. Most good methods do.

The 5-Stage Workflow Overview

Here is the whole workflow on one page. Five stages, each with a narrow scope and a clear handoff. Bold on first use because we will treat these as the proprietary terms of the method.

Brief → Skeleton → Voice → Pressure-Test → Polish

Stage	AI Role	Human Role	Output
1. Brief	Interviewer, asking clarifying questions	Decide audience, argument, success criteria	One-page brief document
2. Skeleton	Generator of contrasting outlines	Pick the structure that fits your argument	Headed outline with section beats
3. Voice	Style analyst extracting rules from samples	Provide 3 to 5 samples of your best work	A list of voice rules in plain language
4. Pressure-Test	Hostile editor and skeptic	Decide which critiques to act on	Marked-up draft with weak spots flagged
5. Polish	Pattern flagger only, no rewriting	Make every micro-edit by hand	Finished piece that still sounds like you

Two things to notice about this table. First, the AI role changes at every stage. It is not the same tool five times. It is a different collaborator each pass. Second, the human role gets larger toward the end, not smaller. The Brief stage is mostly about deciding things. The Polish stage is entirely you. The shape of the work is an inverse pyramid where AI does more at the start and you do more at the finish.

This is the opposite of how most people use AI for writing, which is to do nothing for an hour, type a long prompt, then spend twenty minutes lightly editing whatever comes back. That order has it backwards. The most expensive thinking belongs at the front, where it is cheap to redo, not at the end, where you have committed to a draft that was wrong from the second sentence.

Stage 1: Brief, the Context You Refuse to Skip

The Brief is what almost nobody writes and what almost everyone needs. It is one page, written by you, that the model reads before any prose generation happens. Without it, every later stage is guessing.

A working Brief has six fields. Audience, in one sentence with enough texture that it is not "everyone." Core argument, in one sentence, the version you would say out loud. Success criteria, what would make this piece worth publishing. Banned phrases, the AI tells and tired metaphors you do not want to see. Voice references, three to five existing pieces (yours, or writers you admire, or both). And constraints, length and format and any non-negotiables.

Here is the template I use. It is plain, which is the point.

# Brief: [working title]

## Audience
One sentence describing who is reading this. Include their existing
knowledge level and what they came looking for.

## Core argument
The single sentence the entire piece exists to make. If you cannot
write it in one sentence, the piece is not ready.

## Success criteria
- What does a reader do, share, or believe differently?
- What would make this piece worth their 14 minutes?

## Banned phrases
- "in today's fast-paced world"
- "let's dive in"
- "game-changer"
- (add the AI tells you personally hate)

## Voice references
- [Link to one of your own pieces]
- [Link to a piece by a writer you admire]
- [One more, ideally in a different register]

## Constraints
- Length: 2,500 words
- Tone: opinionated but not snide
- Must include: one table, three concrete examples

Notice that the Brief is not a prompt. It is a context document. This is the same idea I argued for at length in Context Engineering: The Skill That Replaces Prompt Engineering. Briefs are context-engineering assets. They sit upstream of every prompt the rest of the way.

The Brief stage is also where you decide if AI should help at all. Some pieces, the ones that come from a place that is genuinely yours and not yet articulated, are worse for any model involvement before you have a draft. The Brief is how you find out which kind of piece this is. If you cannot write the core argument in one sentence, no model will figure it out for you.

If you use Glasp's web highlighter the way I do, the Brief stage is also where your saved highlights become source material. Pull five highlights that touch the topic, paste them into the Brief, and you have evidence and quotes ready to feed every later stage.

Stage 2: Skeleton, Working Backwards from the Conclusion

Once the Brief exists, the Skeleton stage is fast and cheap. The job is not to write prose. The job is to produce three to five outlines that argue the same point in different shapes, then pick the one that matches what you actually want to say.

The reason this beats writing prose immediately is structural. Outlines are cheap to throw away. Drafts are not. If you write 800 words before realizing the structure is wrong, you will probably keep the 800 words anyway because you wrote them. That is sunk cost dressed up as commitment. Outlines do not trigger that bias because there is nothing to lose.

The prompt I use at this stage is short.

You are an outline generator, not a writer. Read the Brief below.
Then produce three contrasting outlines for this piece. Each outline
should make the same core argument but use a different structural
strategy:

1. Chronological / narrative
2. Claim then evidence
3. Problem then mechanism then implication

For each outline, give me:
- Section headings (4 to 6 sections)
- One sentence describing the beat of each section
- A note on which audience this structure serves best

Do not write any prose. Outlines only.

[paste Brief here]

What you get back is three skeletons. Read them with the Brief open. The right one is usually obvious within thirty seconds. Sometimes the right outline is a hybrid of two of them, and the model is a useful sounding board for that synthesis. Sometimes none of the three are right, which is itself information. It usually means the Brief was vague.

This is also the stage where I find ChatGPT slightly outperforms Claude on raw structural variety. Claude tends to give three outlines that feel like cousins. GPT gives three that feel like strangers. For the Skeleton stage, strangers are useful. The full reasoning on which model fits which task is in The AI Task and Model Matrix.

Stage 3: Voice, Why "Write in My Style" Doesn't Work

This is the stage that makes or breaks whether the finished piece sounds like a human wrote it. Most people use the wrong prompt here. The wrong prompt is "write this in my style," because the model has no idea what your style is, and even if you have written hundreds of pieces in its training set, what it knows about your style is averaged with the styles of every adjacent writer it learned alongside you.

The fix is a two-step. First, have the model extract concrete style rules from samples you choose. Second, have it write to those extracted rules, not to "your voice."

Here is the meta-prompt that does the extraction.

You are a style analyst, not a critic. I'm going to paste three pieces
of writing below. Read all three carefully and produce a style profile
of the author's voice as a list of concrete, falsifiable rules.

For each rule:
- State it specifically (not "uses short sentences" but
  "60% of sentences are under 18 words")
- Give one example from the samples
- Note when the author breaks the rule (every voice has exceptions)

Cover at least:
- Sentence length distribution
- Paragraph length and rhythm
- Word choice patterns (do they prefer concrete or abstract nouns?)
- Verbs (active or passive, strong or weak?)
- Use of contractions, sentence fragments, lists
- Opening and closing patterns
- Words and phrases the author avoids

Do not interpret the content. Only describe the style.

[paste sample 1]
---
[paste sample 2]
---
[paste sample 3]

What comes back is a list of fifteen to twenty rules. Some will be wrong. Some will be obvious. A few will be things you did not know you did. Read the list, delete the rules that are wrong, sharpen the ones that are vague, and you now have a style document you can hand to any later prompt as constraint.

This works because the model is much better at describing patterns than at generating from a felt sense it does not have. When you ask for "your voice," you are asking for a felt sense. When you ask it to write to fifteen specific rules, you are asking for a pattern match. Pattern matching is what these systems do well.

The sample selection matters. Use three to five pieces of your best writing on related topics, not a random grab bag. If the new piece is opinionated, do not feed it your neutral how-to guides. The voice profile averages whatever you give it.

Stage 4: Pressure-Test, Make AI the Skeptic

By the end of Stage 3 you have a draft. It might be the model's draft following your voice rules, or your own draft after using the voice rules as a self-edit checklist. Either way, the draft now needs to survive a hostile reader. This is what AI is unreasonably good at if you ask correctly.

The default failure mode is an agreeable model. Out of the box, both Claude and ChatGPT will tell you your draft is great with a few minor suggestions. They are RLHF-tuned to be helpful, and saying "this argument has a hole you could drive a truck through" is not the path of least friction. You have to instruct them out of agreeableness.

Here are the six prompts I run at this stage. I run them one at a time, in separate threads, because mixing them dilutes each.

1. "What's the strongest counterargument to the central claim in this
   piece? Steelman it. Don't argue back yet, just state the strongest
   version of the opposing view."

2. "You are a hostile editor at a magazine known for cutting copy
   ruthlessly. Mark every sentence that does not earn its place.
   Quote the sentence and explain why it goes."

3. "Where in this piece am I assuming the reader already agrees with me?
   Quote the specific sentences where I'm taking shared ground for
   granted."

4. "What evidence is missing from this piece that a skeptical reader
   would demand? List specific claims that need a citation, a number,
   or an example I haven't provided."

5. "Where am I burying the lede? Specifically: what's the single most
   interesting sentence in this piece, and how far down does it appear?
   Should it be earlier?"

6. "Imagine it's 12 months from now and this article has aged badly.
   What changed about the topic that made the piece wrong? Which
   specific paragraphs are most exposed to that future?"

Run them. You will get a marked-up version of your draft from each prompt. Most of the critiques will be wrong or weak, which is fine. You only need a few of them to land. The hostile editor prompt almost always finds three or four sentences that should die. The "burying the lede" prompt almost always reorders something useful.

If you have a Glasp library going, this stage is also where the AI chat feature over your highlights earns its keep. Asking the chat "what counter-evidence sits in my own highlights against this draft's argument" is a different question than asking a fresh model, and a more honest one, because the answer comes from sources you already chose to trust.

For more prompt patterns in this family, see Prompt Patterns for Thinking.

Stage 5: Polish, the Final Pass That AI Should Not Do

This is the stage where most workflows ruin everything they built. The temptation is real. You have a near-finished draft. The model is right there. One more pass to clean it up, smooth the rough edges, fix the awkward sentences. It will take three minutes.

Do not do it.

The reason is the same reason "write in my style" does not work. A polish pass is the most voice-sensitive operation in writing. It is where rhythm, word choice, and the small idiosyncrasies that make prose sound like a person live. When you ask a model to polish, it averages those micro-decisions out. You get back a draft that is technically smoother and feels less like you. The reader will not be able to name what changed. They will only feel that something is off.

What AI should do at the Polish stage is flag candidates, not edit. Use this prompt.

You are a style auditor, not an editor. Read the draft below. Do not
rewrite anything. Produce a list of:

- Every sentence over 25 words
- Every paragraph that opens with the same word as the previous paragraph
- Every nominalization where a verb would be stronger ("made a decision"
  vs "decided")
- Every weak verb ("there is", "it is", "this is")
- Every adverb that could be cut
- Every metaphor or cliche that feels generic

Quote the offending sentences. Suggest nothing.

You will get a long list. Walk through it sentence by sentence and decide. Most flagged sentences are fine. Some are not. The decision of which is which is your voice. The act of making it, fifty times in a row, is what produces a piece of writing that someone can recognize as yours from the first paragraph.

Verlyn Klinkenborg, in Several Short Sentences About Writing, has a line I think about constantly: "The longer the sentence, the less it means." That is not literally true, but the sensibility is. Long sentences hide. Short sentences commit. The Polish stage is where you commit. A model cannot commit on your behalf because the commitment is what it does not have.

Stephen King says it more bluntly in On Writing: "Kill your darlings, kill your darlings, even when it breaks your egocentric little scribbler's heart, kill your darlings." Use the model to find the darlings. Use yourself to kill them.

Putting the Workflow on a Single Page

Here is the cheat sheet. Print it, paste it above your monitor, refer to it during the next long-form piece you write.

Stage	Time Budget	AI Role	Human Role	Tools
1. Brief	30 to 45 min	Interviewer	Decide audience, argument, banned phrases	Markdown doc, Glasp highlights
2. Skeleton	15 to 30 min	Outline generator	Pick the structural fit	ChatGPT (variety)
3. Voice	30 to 45 min	Style rule extractor	Curate 3 to 5 samples	Claude (extraction quality)
4. Pressure-Test	30 to 45 min	Hostile editor	Decide which critiques to act on	Claude, six separate threads
5. Polish	30 to 60 min	Pattern flagger only	Every micro-edit by hand	You, with coffee

Total time for a 2,500-word piece runs 2 to 4 hours. Compare that to thirty minutes of one-shot prompting, and the math looks bad until you compare the outputs. The one-shot version goes nowhere because nobody finishes reading it. The five-stage version actually gets shared.

A useful rhythm if you write regularly: keep a Glasp collection going for whatever topic you are circling. When the highlights cross some critical mass (roughly five strong ones), open a Brief and walk the workflow. The highlights become evidence in Stage 1, source material in Stage 4, and counterweight when the model gets too agreeable. The pipeline runs on its own once the highlighting habit is in place.

Frequently Asked Questions {#frequently-asked-questions}

Does this workflow work for emails or short pieces?

No. Anything under about 1,500 words does not need five stages. The overhead eats the benefit. For an email or a short post, write the Brief in your head, skip the Skeleton, and go straight to drafting. The workflow is built for pieces where structural decisions matter more than sentence-level decisions, and short pieces are the opposite.

Which AI should I use at which stage?

Honest call after using both extensively. Claude tends to be stronger at Brief, Voice extraction, and Pressure-Test, mostly because it follows long structured instructions more reliably and is less eager to please at the Pressure-Test stage. ChatGPT tends to be stronger at Skeleton because it produces more genuinely varied outline structures. Either works for Polish flagging since the prompt is mechanical. Avoid Perplexity for any of these stages. It is a research tool, not a writing partner. The longer answer is in The AI Task and Model Matrix.

Will AI-detection tools flag the output?

If you actually do Stage 5 by hand, the personal patterns survive and detection tools have less to grab onto. The Princeton GEO paper from KDD 2024 (Aggarwal et al.) studied how language models cite and reproduce style. The takeaway relevant here is that voice is closer to a fingerprint than a recipe. Models trained on aggregate text struggle to fake a specific writer's micro-decisions, which is exactly why the Polish stage matters. That said, if your writing is high stakes (legal, academic, journalistic), no workflow guarantees you past detection. Use AI as scaffolding, not as the writer.

How long does this workflow really take?

Two to four hours for a 2,500-word post. Compare to thirty minutes of one-shot prompting plus the time you would spend rewriting the bad draft you got, which is usually another hour, and the gap shrinks. The trade is not really speed. The trade is whether the finished piece is worth publishing.

Can I skip the Brief stage if I'm in a hurry?

No. Skipping Brief is the single most reliable way to produce a draft that reads like AI. Every other stage depends on the Brief being clear. Without it, the Skeleton stage gives you outlines for a piece that is not the piece you wanted. The Voice stage extracts rules that get applied to the wrong content. The Pressure-Test stage critiques the wrong argument. If you have ten minutes total, spend nine on the Brief and one on a single skeleton, and you will end up further along than if you spent ten minutes prompting from scratch.

Conclusion {#conclusion}

The reason one-shot AI writing fails is not that the models are bad. The reason is that long-form writing is five jobs, and asking any model to do five jobs in a single shot produces the average of all five. Brief, Skeleton, Voice, Pressure-Test, Polish. Each one a narrow scope. Each one a different collaborator. The human role grows toward the end, where voice lives, instead of shrinking.

If you write enough that any of this matters, the workflow pays for itself in the first piece. If you write rarely, save this article and pull it up the next time the topic feels too big for one prompt.

Either way: stop asking the model to write the essay. Start asking it to interview you, outline against you, extract voice from your samples, attack the draft, and flag the patterns. Then write the thing. The model is the room. You are still the writer.