Desirable Difficulties: Why Effortful Learning Outlasts Easy Learning

Q: ### Isn't highlighting passive? Why does Glasp lean on it?

Highlighting *can* be passive, and most highlighting is. The fix isn't to abandon it, it's to do it generatively. Highlight sparsely (one or two passages per page). Write a note for each highlight in your own words. Treat your highlights as future retrieval cues, not as a substitute for reading. Done that way, highlighting is a generation task and a discrimination task at once.

The Fluency Illusion

Open a textbook chapter you've already read twice. Run your eyes down the page. The sentences feel familiar. Each paragraph clicks into place. You close the book convinced you know the material.

A week later, you can barely retrieve the main argument.

Cognitive psychologists call this the fluency illusion or illusion of competence, and it's the single biggest reason most study time is wasted. When information processes smoothly, your brain interprets the smoothness as evidence of mastery. It isn't. Smoothness is just smoothness.

The data is brutal. Dunlosky and colleagues' 2013 review in Psychological Science in the Public Interest ranked ten common study techniques by evidence strength. Highlighting and re-reading, the two most popular methods on Earth, landed in the lowest tier. Practice testing and distributed practice landed in the highest tier. The methods learners love produce weak learning. The methods learners avoid produce strong learning.

In a classic massed-vs-spaced study, learners who studied a list four times in one session felt more confident than those who spread the same time across four sessions. On an immediate test, the massed group did slightly better. A week later they were destroyed: the spacers remembered roughly twice as much. Re-reading shows the same pattern. Callender and McDaniel (2009) found that re-reading textbook chapters produced essentially zero benefit on comprehension tests. Familiarity isn't memory. Recognition isn't recall.

The illusion is structural. Your brain uses ease of processing as a heuristic for "I know this." The heuristic just happens to be wrong for predicting future retrieval. Learning to distrust your sense of mastery is the first step toward studying things that stay studied.

What Bjork Actually Discovered

Robert A. Bjork and Elizabeth Ligon Bjork have spent forty years untangling why this happens. Their 1992 paper "A New Theory of Disuse and an Old Theory of Stimulus Fluctuation" introduced the framework that explains every result above.

The theory splits memory into two separate dimensions.

Storage strength measures how deeply a memory is wired in. It's a function of how much you've engaged with the material across different contexts and over time. Storage strength can only go up.

Retrieval strength measures how easily you can pull that memory up right now. It fluctuates wildly. It rises when you've just studied something, falls without use, and depends heavily on the cues available in the current moment.

The fluency illusion lives in the gap between these two. When you re-read a chapter, retrieval strength shoots up because the material is right in front of you. Storage strength barely moves. The moment retrieval strength fades, there's nothing underneath to support recall.

The Bjorks' second move gave the field its name. In a 1994 chapter, they argued that increases in storage strength come specifically from retrieving information when retrieval is difficult, not from re-presenting information when it's easy. Difficulty during practice, paradoxically, is what creates lasting learning. Hence: desirable difficulties.

Soderstrom and Bjork's 2015 paper, "Learning versus Performance," tightened the distinction further. Performance is what you can do during practice, today. Learning is the relatively permanent change in your ability to do it later, in a new context. Most study habits maximize performance and undermine learning.

The takeaway sits in the center of everything that follows. If your session feels effortless, retrieval strength is doing the work and storage strength is going nowhere. If your session feels effortful in a productive way (you're searching, failing, recovering, reaching), storage strength is climbing. The discomfort is the deposit.

This is what ties together the methods Glasp readers already know. Active recall, spaced repetition, the Feynman technique, the protégé effect, and the blurting method aren't five unrelated tricks. They're five different ways of forcing your brain to do the kind of effortful work that builds storage strength. Desirable difficulties is the meta-principle. Everything else is implementation.

Feels Easy but Doesn't Work	Feels Hard but Works
Re-reading the same chapter	Closing the book and writing what you remember
Massed practice (cramming)	Distributed practice over days or weeks
Studying one topic to mastery before moving on	Interleaving multiple topics in a single session
Reading worked examples	Generating answers before checking
Practicing the exact same problem type	Mixing problem types and contexts
Highlighting whole paragraphs	Highlighting sparsely and writing your own marginalia
Recognizing terms in a glossary	Producing definitions from memory

Spacing

The spacing effect is the oldest and most replicated finding in this whole literature. Hermann Ebbinghaus described it in 1885. It still works.

The claim is simple. If you have ten total minutes to study a piece of material, you'll remember more of it on a delayed test by splitting the ten minutes across several sessions than by spending all ten in one block. The total time is identical. The distribution is what matters.

Cepeda et al. (2006) ran a meta-analysis of 254 studies and found the effect held across age groups, content types, and time horizons. As a rough rule, space sessions at roughly 10-20% of the time you want to retain the material. Want to remember something for a month? Review every 3-6 days. For a year, stretch the intervals out.

The mechanism connects directly to the storage/retrieval split. When you study something twice in a row, the second exposure happens while retrieval strength is still high, so the practice is essentially free and storage gains are minimal. When you study it again three days later, retrieval strength has decayed. Pulling the memory up takes work. That work is the deposit.

In practice, spacing fights two enemies: cramming and unscheduled review (which never happens). The fix is calendar-level. Pick the things you actually want to retain, give each a recurring review slot, and trust the schedule over your sense of what needs more attention. That sense is, again, mostly fluency talking.

Interleaving

Interleaving means mixing different topics or problem types within a single study session, rather than studying one to mastery before starting the next. If you're learning algebra, you don't do twenty quadratic problems in a row. You do a quadratic, then a system of equations, then a function transformation, then back to a quadratic.

It feels worse. Performance during the session drops. Learners regularly rate interleaved practice as less effective even after they've measurably learned more from it. This metacognitive mismatch is one of the strongest illusions in the field.

Doug Rohrer and Kelli Taylor (2007) gave middle school students math problems either blocked (all of one type, then all of the next) or interleaved. On the practice worksheets, blocked students scored higher. On a delayed test, interleaved students more than doubled them. Rohrer's later work has replicated the effect across geometry, algebra, and statistics, with effect sizes that should embarrass any curriculum still organized around blocked practice.

Why does it work? Two mechanisms. First, interleaving forces discrimination: you can't just apply the same procedure on autopilot, you have to figure out which procedure each problem calls for. That discrimination is the skill you actually need on a test or in real work. Second, interleaving spaces each topic by definition: the gap between problem-type-A items is filled with problem-type-B items, so each return to A involves real retrieval.

For self-directed learners, interleaving is easier than it sounds and harder than it looks. You don't need a fancy scheduler. You just need to refuse the urge to "finish" a topic before moving on. Read a chapter on stoicism, then a chapter on probability, then a chapter on UI design, then back to stoicism. Your brain will protest. Your brain is wrong.

Retrieval Practice

If you only adopt one desirable difficulty, make it this one. Retrieval practice (the testing effect) is the act of pulling information out of memory rather than pushing it back in. It's the engine behind active recall, flashcards, the blurting method, and most of what works in education.

Henry Roediger III and Jeffrey Karpicke's 2006 paper in Psychological Science, "The Power of Testing Memory," is the canonical demonstration. Students read a passage, then either re-read it or took a free-recall test. Five minutes later, the re-readers won. Two days later, the testers won. A week later, the testers crushed them, with retention roughly 50% higher. Same total time, different work: every retrieval, even a failed one, modified the underlying memory in a way re-presentation couldn't.

The mechanism is retrieval-induced reconsolidation. When you successfully pull a memory up, the neural pattern that supports it gets re-encoded with whatever cues are currently active. That re-encoding strengthens the pattern and broadens the cue set that can trigger it later. When you fail and then study the answer, the failed search itself primes you to encode the correction more deeply (the pretesting effect, replicated by Richland, Kornell, and Kao in 2009).

Retrieval works in dozens of forms: closed-book recall, flashcards, free-form summary writing, teaching aloud, the Feynman technique. What they share is the underlying move: produce the answer before you see it.

After you've highlighted a chapter with Glasp's web highlighter, close the source and try to reconstruct the argument from your highlights alone. Then open the page and check. The gap between what you produced and what's there is your storage-strength deficit. Closing that gap is the work.

Generation

The generation effect is a close cousin of retrieval practice, distinct enough to deserve its own line. Slamecka and Graf (1978) showed that learners who generated material themselves remembered it better than learners who simply read the same material.

Their original experiment was almost embarrassingly simple. One group read pairs like "lamp - light." Another saw "lamp - l___" and had to generate "light." On the later test, the generators won by a wide margin, even though the read group had seen the answer outright.

The principle scales up. Reading a worked proof builds less competence than attempting the proof and then comparing. Reading someone's book summary teaches less than writing your own. Watching someone code teaches less than typing the code yourself, hitting an error, and figuring out why.

Generation is desirable difficulty in pure form. It deliberately withholds information you'd be happy to receive, forcing you to produce it. The frustration of half-remembering, half-guessing is the mechanism. By the time you check your answer, you've already done the work that converts exposure into encoding.

Highlighting can be a generation tool or a passive one. Yellow-bombing every paragraph is passive: you're outsourcing judgment to a future re-read. Sparse, deliberate highlighting is generative: you're forcing present-you to commit to "this matters more than that," which is itself synthesis. The science of highlighting goes deeper.

Varied Practice

The fifth difficulty is variability. Practice the same skill in slightly different forms, contexts, and conditions, rather than under identical conditions every time.

Kerr and Booth's 1978 study with eight-year-olds throwing beanbags became the textbook example. One group practiced from a single distance. Another practiced from a mix of distances that didn't include the test distance. The mixed group, despite never practicing the exact target throw, outperformed the single-distance group on it. They'd built a more general motor representation.

Cognitive learning shows the same pattern. Vary the wording of definitions you study. Vary the contexts in which you encounter a new term. Read about a concept in two different fields rather than two different chapters of the same textbook.

Variability supports transfer: the ability to use what you've learned outside the conditions in which you learned it. Transfer is what most learners want and what most study habits actively prevent. If you practice a skill in one form, you encode the form along with the skill, and you'll struggle when the form changes. Variability decouples the skill from any single form.

Difficulty	Mechanism	Concrete Example	Key Study
Spacing	Forces real retrieval as memory decays	Review notes on day 1, 3, 7, 21 instead of four times today	Cepeda et al. (2006) meta-analysis
Interleaving	Forces discrimination between problem types	Mix algebra, geometry, and stats in one session	Rohrer & Taylor (2007)
Retrieval Practice	Re-encodes memory with new cues	Close the book, write what you remember, then check	Roediger & Karpicke (2006)
Generation	Producing forces deeper encoding than reading	Predict the answer before reading the explanation	Slamecka & Graf (1978)
Varied Practice	Builds context-independent representations	Solve the same concept in 3 different domains	Kerr & Booth (1978)

When Difficulties Become Undesirable

The word "desirable" matters. Not all difficulty helps.

A difficulty becomes undesirable when the learner can't engage with it productively. If you can't decode the words at all, slowing your reading further won't help. If you don't know what a derivative is, interleaving derivative and integration problems is noise. If your retrieval practice is so far past your storage strength that you produce nothing, you're not retrieving, you're flailing.

Bjork's framing: a difficulty is desirable if it engages effortful processes the learner is capable of executing. There has to be enough scaffolding underneath for the struggle to land. Push past that line and you've moved into cognitive overload.

The right zone is where you can produce something, even if it's incomplete. You're reaching, not falling. You finish a session tired but with material to compare against the source. A useful target: failing on roughly 20-40% of retrieval attempts, not 0% (too easy) or 90% (too hard).

Two practical adjustments. When something's genuinely beyond you, don't romanticize the struggle: read the explanation, get the scaffolding in place, then come back. When something feels too easy, don't let the comfort fool you: increase the spacing, mix in harder variants, or move to a more generative format.

A useful 2x2: storage strength on one axis, retrieval strength on the other. High storage / high retrieval is recently practiced and well-learned. High storage / low retrieval is where retrieval practice does the most good, because the search is hard but the deposit is real. Low storage / high retrieval is the dangerous one: stuff you just re-read and feel you know but haven't built. Cramming lives here. Low storage / low retrieval is genuinely new material, where you need scaffolding first.

Designing Your Routine for Desirable Difficulties

Knowing the principles doesn't make them automatic. You have to engineer them in, because the path of least resistance will always be the comfortable, low-storage option. Here's how to bake the five difficulties into a self-directed learning system.

Highlight to generate, not to mark. Use Glasp's web highlighter to mark sentences sparingly, then write a brief note in your own words explaining why this passage matters and how it connects to what you already know. The sparse selection is discrimination. The note is generation. Highlighting without notes is the passive trap.

Use AI chat for retrieval, not for explanations. The wrong way to use an AI assistant is to ask it to summarize a chapter you haven't read. The right way is to read the chapter, close it, write your own summary, then paste your summary into Glasp's AI chat and ask it to grade your reconstruction against the source. You did the retrieval. The AI does the comparison.

Space with Kindle highlights, not memory. Your sense of when to review is broken: it's run by recency and emotion, not by storage curves. Schedule reviews on a calendar, pull up highlights from a book you read three weeks ago, and try to reconstruct the argument before scrolling. The mechanical schedule is what protects spacing from your fluency-driven instincts.

Interleave with the community feed. Don't read four articles in a row from the same person on the same topic. Cycle: cognitive science, then entrepreneurship, then writing, then back to cognitive science a day later. Discrimination across domains is a stronger workout than depth in one.

Watch a YouTube Summary, then test yourself. Close the page and write three things you'd want to remember in a year. Compare to the transcript. The gap is where your work is.

Vary the format. Read the same idea in a book, a paper, a thread, and a video. Each medium encodes the idea differently, and your brain has to abstract across them. That's the point of varied practice.

The system view lives in our companion piece on building a learning OS. Desirable difficulties is the principle. The OS is the daily mechanics.

Frequently Asked Questions

Isn't highlighting passive? Why does Glasp lean on it?

Highlighting can be passive, and most highlighting is. The fix isn't to abandon it, it's to do it generatively. Highlight sparsely (one or two passages per page). Write a note for each highlight in your own words. Treat your highlights as future retrieval cues, not as a substitute for reading. Done that way, highlighting is a generation task and a discrimination task at once.

Should I struggle on every problem?

No. The goal is productive struggle, not flailing. If you're producing nothing, you've gone past the desirable zone into overload. Back up, get the scaffolding in place, then return to the difficulty. A useful target is 20-40% retrieval failure during practice.

How long should I space my reviews?

Roughly 10-20% of the time you want to retain the material. For a one-month horizon, review every 3-6 days. For a year, stretch out (1 day, 1 week, 1 month, 3 months). When a review feels easy, the next can be longer. When it feels hard, shorten the gap. Using a schedule beats using your gut.

Does this apply to skills as well as facts?

Yes, possibly more strongly. Variability research came from motor learning. Interleaving was studied in sports and music before textbooks. Any skill with discrimination, transfer, and retrieval components benefits: coding, writing, design, language, instruments, athletic technique. The form of practice changes; the principle doesn't.

Why does my school still teach the easy way?

Because the easy way looks better in the short run. Massed practice and re-reading produce higher scores on quizzes given right after instruction, and worse scores a month later. Most schools don't measure delayed retention. Soderstrom and Bjork (2015) made exactly this point: confusing performance with learning is structural, not personal. As a self-directed learner, you don't have to wait for institutions to catch up.

Conclusion

The principle to walk away with: if your studying feels easy, you're probably not studying. Effort is the price of durable learning, and the brain pays out only when you've earned it through retrieval, spacing, interleaving, generation, or variability.

This doesn't mean grinding harder, it means grinding smarter. Most learners are already putting in the time. They're just spending it on activities that maximize fluency and minimize storage. Swap one re-read for a closed-book reconstruction. Swap one cram session for four spaced reviews. Swap one block of identical problems for a mix. Each swap trades short-term comfort for long-term retention.

Glasp was built around this trade: sparse highlighting, marginal notes, AI-graded reconstructions, spaced reviews of past reads, and a community feed that interleaves topics by default. Each is a small piece of friction designed to convert exposure into storage.

Tomorrow, pick the easiest thing in your routine and replace it with the harder version. That swap is the whole game.