The Curiosity Graph: A Map of What Humans Find Worth Remembering

Three Maps of Human Attention

For roughly thirty years, the internet has been quietly producing maps of what we pay attention to. Not maps of geography, but maps of human interest. They run constantly in the background of every keystroke, scroll, and tap. Most people never see them as maps. We just call them "the algorithm" or "the feed" and move on.

There are three of them that actually matter, and they are categorically different.

The first is Google, a map of demand. Every query is a person, somewhere, raising a hand and asking a question. Multiplied across billions of sessions per day, the map shows what humanity wants to know, ranked by frequency, time, and place. Search trends are the cleanest aggregate demand curve we have ever had for human curiosity in the abstract.

The second is social media, a map of virality. Likes, shares, replies, watch-time, retweets, dwell. The map shows what spreads, what catches, what provokes. Platforms then feed the top of that map back into more people's feeds, which sharpens it, which feeds it again. The map is recursive by design.

The third is the one almost nobody talks about as a map at all. It is the corpus of highlights: the passages humans have personally marked as meaningful while reading. On paper, this used to be marginalia, underlines, dog-ears. Online, it became the highlighter. On platforms like Glasp, it became public, social, and aggregated. This is a map of intentionality, of what people found worth keeping.

Three maps. Three categorically different signals. In 2026, only one of them still works the way it was designed to.

What Each Map Actually Records

It's easy to lump these together as "data the internet collects about us," but the underlying acts are different in ways that matter.

A search is a question asked under uncertainty. The user doesn't yet know the thing they're querying. Search records desire-to-know, which is one of the most truthful behaviors we have. The bias is huge but human: we search what we don't have. That's why Google Trends correlates with everything from flu outbreaks to election turnout. The map measures the gap between what we have and what we want to know next.

A like or share is a reaction to content already presented. The user has been served something by a system, and the click is a vote on the system's choice. Engagement is never raw human interest. It is human reaction to content the platform decided to show, ranked by what the platform decided was likely to provoke a reaction. The map measures responsiveness inside an engineered environment.

A highlight is something else again. The reader has chosen the source, opened it, and read past the highlighted passage to make the decision to mark it. Three actions: select source, attend to text, mark a fragment for the future. Each act is voluntary, deliberate, and asymmetric in cost. You can scroll a feed for an hour and not highlight a sentence. Most people, in fact, never do.

Here is the comparison that the rest of this essay returns to:

Map	What it records	The asymmetry of faking it	Status in the AI era
Google (queries)	Demand: what humans want answered	Cheap to fake at scale, hard to launder back through the index without distorting it	Degrading: the corpus it ranks is now flooded with AI-generated text
Social media (engagement)	Virality: what reaction the algorithm can manufacture	Industrial: bot farms, engagement pods, paid amplification, AI-written replies	Degraded and engineered
Glasp (highlights)	Intentionality: what a human chose to keep	Faking the artifact is trivial; faking the underlying cognition is not	Holds up, and arguably gets more valuable

This table is the spine. Every later section is an argument about a row.

Why Two of the Maps Are Going Bad

Maps go bad when the territory they describe stops matching what they record. That's what is happening to Google and to social engagement, for different but related reasons.

Google's territory is text on the open web. That territory is being flooded. Generative models can produce credible-sounding paragraphs at near-zero marginal cost. Sites optimized for ad revenue have noticed. SEO farms have noticed. Affiliate networks have noticed. The result is a category of content that veteran observers like Charlie Warzel and Casey Newton have started calling "AI slop": prose that is grammatical, generic, and almost completely empty of original signal. It exists to be indexed, not read.

Worse, this content folds back into training data. Ilia Shumailov and colleagues' 2024 paper in Nature, "The Curse of Recursion: Training on Generated Data Makes Models Forget," showed formally what many had suspected: when models train on the outputs of earlier models, the tails of the distribution collapse. Rare ideas, edge cases, and minority perspectives disappear first. Veselovsky and colleagues had already shown in 2023 that crowd-worker datasets, long treated as gold-standard human input, were already being silently completed with ChatGPT outputs. The maps the models are drawn from are being filled in by other maps.

This isn't just an AI training problem. It's a search problem. Google's index has always been an index of what humans wrote. It is now, partially, an index of what models wrote about what humans might want. The signal-to-noise ratio of clicks, dwell time, and back-clicks is dropping. The map is still being drawn, but the territory underneath is mutating.

Social media's territory is engagement, and engagement was never the same as interest. Cory Doctorow's "enshittification" essay (2023) is the cleanest framing. Platforms start by serving users, pivot to serving advertisers, then pivot to serving themselves. At each step, the metrics shift to favor whatever extracts the most value at the lowest cost. Attention farms emerge. Engagement-bait formats dominate. Bots scale because bots are cheap. AI-written replies, AI-generated reaction videos, and AI-cloned creators all compound the trend.

The more engineered the environment, the worse engagement performs as a proxy for human interest. A like in 2009 was probably a person who read the post. A like in 2025 might be a bot, a bored thumb, a paid pod, or an AI agent operating an account on behalf of a brand. The territory got simulated, and the map didn't notice.

So we are left with two of the three big public signals of human attention quietly losing fidelity at the same time. This is not an attack on either platform. It's a description of an entropic process that any attention-monetizing system eventually undergoes when content production becomes too cheap.

Why Highlighting Stays Honest

The third map holds up because of how it's made.

A highlight requires three filters in sequence. First, a person decides to read a piece of text long enough to encounter the passage. Second, the passage has to land: it has to feel important, true, useful, beautiful, dangerous, or otherwise worth remembering. Third, the person has to act on the feeling and mark it. Each step has cost.

Filter	What it requires of a human	What AI would have to fake to bypass it
Read	Time, attention, sustained focus on a single source	A synthetic reader that maintains a coherent reading session, on a real source, traceable to a real account
Feel meaningful	A subjective judgment that this fragment is worth keeping	A model of which fragments a real human reader would find meaningful, deployed at human pace
Preserve	A deliberate gesture, often public on Glasp	A long-lived, consistent identity that highlights with patterns indistinguishable from a real person

Fluently producing the artifact of a highlight is trivial: it's just a text span and a timestamp. Producing the signal that a highlight is supposed to encode is hard, because the signal is "a human attended to and chose this." To fake that at scale you don't just need synthetic text. You need synthetic readers who synthetically attended, on real source URLs, with consistent taste, against a population baseline that anomaly detection can't pick out. Each layer of the fake collapses one of the filters and breaks the signal.

This is a different argument from "AI can't write." AI can write. The point is that highlighting is not primarily a writing activity. It's a reading and selecting activity, and reading at human pace is one of the few things synthetic agents are still genuinely bad at faking economically. You can spin up a thousand AI accounts that post; spinning up a thousand AI accounts that read books, year after year, with stable preferences that match a real intellectual life, is a much harder problem, and the moment you try, you've built something so close to a person that the distinction stops mattering.

There's also a deeper, older point here. Mortimer Adler, in How to Read a Book (1940/1972), argued that marginalia is the act that turns reading from passive consumption into a conversation with the author. The highlight is the modern descendant of his marginalia. It is the visible residue of the reader thinking. You can't get that residue without the thinking. The artifact is cheap. The cognition behind it isn't.

From Marks to Map: The Curiosity Graph

A single highlight is a private moment. A million highlights, public and timestamped and connected to identities and texts, is something else. It is a graph.

Think of it as a bipartite structure: on one side, sources (books, essays, papers, videos, transcripts). On the other side, readers. Edges between them are highlights, weighted by how many distinct readers chose the same passage, when they chose it, what else they highlighted around it, and what they wrote in their own annotations. Aggregate this across years and you get the Curiosity Graph: a continuously updated map of which fragments of which texts have been deemed worth keeping, by whom, and in what intellectual neighborhoods.

Three properties make this graph unusually well-behaved.

It's stable across time. Heavily highlighted passages in books from 1990 still get highlighted today. The most-marked sentences in Meditations, in The Origin of Species, in Thinking, Fast and Slow, don't churn the way social-media trends churn. The graph has weeks-old activity layered over decades-old shape. That stability is a feature: it means the map measures something more durable than what is currently popular.

It's distributed across languages and contexts. A reader in São Paulo highlighting the same paragraph from a 2014 essay that a reader in Seoul also highlights, three years apart, isn't responding to a feed or a trending topic. They're each independently encountering the text and marking it. When that pattern repeats at scale, it's evidence that the passage is doing something real.

It's interpretable. Unlike clicks or watch-time, highlights come with the actual text attached. You can read the map directly. You don't need to model what users meant; the highlighted span is the meaning. This is rare in attention data and makes the graph unusually useful as both a public record and a research substrate.

To make the durability concrete:

Source	Year	Pattern
Meditations, Marcus Aurelius	~170 CE / ongoing	Stoic passages on impermanence and judgment dominate highlights across decades
Thinking, Fast and Slow, Kahneman	2011 / ongoing	The same handful of "System 1 vs. System 2" passages remain top-highlighted year over year
Paul Graham essays	2003-present	Lines about doing things that don't scale, on schedules, and on starting startups recur as highlight clusters
The Almanack of Naval Ravikant	2020 / ongoing	Specific aphorisms cluster in highlights regardless of when readers encounter the book

Notice what this isn't: a list of trending posts, a leaderboard of viral threads, an "engagement chart." It's a map of stable human meaning, drawn slowly, by individual readers, one passage at a time. Closer in spirit to the way we've described collective intelligence than to anything that lives on a feed.

The Curiosity Graph in the AI Era

Two facts collide in 2026.

Fact one: training data for large models is contaminated. Models that learn from the open web learn, increasingly, from the residue of other models. Shumailov's recursion result is not a worst-case scenario; it's an asymptote.

Fact two: the most valuable signal for any AI system that wants to be useful to humans is "what humans actually find meaningful." That signal cannot be inferred reliably from page text alone, because the page text is partly synthetic. It cannot be inferred from clicks, because clicks are gameable. It can be inferred quite well from highlights, because highlights are the rare data type where humans deliberately said this part, not the rest.

A corpus of public highlights has properties that AI products desperately need:

High-quality salience labels. Every highlight is a human-validated "this matters" tag on a specific text span.
Provenance chained to a source. Highlights are tied to URLs and books, so the map is grounded in real, attributable text.
Reader identity continuity. Over time, an individual reader's highlight history forms a coherent intellectual signature that's expensive to fake.
Cross-source linking. Readers who highlight passage A in one book and passage B in another book create implicit semantic links that no single text could express.

Compare this to a scrape of the open web in 2026. The scrape is bigger. The highlight corpus is honest. For training, for retrieval, for ground-truth alignment, honesty wins over size past a certain threshold. We've made that argument elsewhere about why Glasp matters as collective intelligence in the age of AI; the Curiosity Graph is the structural form of that argument.

There's a second-order effect. As AI products race to feel personal, they need user-specific signals about what each user finds meaningful. Highlights are basically the cleanest such signal a user can voluntarily produce. A reader who has highlighted five hundred passages over two years has handed an AI system a precise, dense, opinionated index of their intellectual life. We've called this personal context management: the practice of building the input layer that lets AI work on your behalf. The Curiosity Graph is its public-facing twin.

Glasp as Public Infrastructure for Human Meaning

Most note-taking apps treat highlights as a personal artifact. You highlight in Notion, in Obsidian, in Apple Notes, in Readwise. The highlight goes into your private store. It might surface in your daily review. It might never be seen by another person.

That model is fine for personal knowledge management. It's wrong for the Curiosity Graph.

The graph only exists if highlights are public by default. A private highlight is a closed point on a map nobody can read. A public highlight is a coordinate. The architectural decision to make highlights social-by-default, attached to identity, browseable, and aggregated across readers is what turns the practice from private literacy into public infrastructure. We argued for this evolution from a second brain to a shared brain: the shift from optimizing personal recall to contributing to a collective record.

This is what makes Glasp's web highlighter different in kind from a private highlighting tool. The same gesture, public, becomes a different thing. You highlight a sentence in a Paul Graham essay. The highlight goes into your reader profile. It also joins the cluster of every other reader who ever marked that sentence. It also strengthens the inferred salience of that span across the index. It also becomes one node in a network that connects every reader who has ever highlighted in that intellectual neighborhood. None of that happens in a private notebook.

The same logic extends across formats. YouTube Summary lets readers mark segments of videos. Kindle highlights brings book annotations into the same graph. The community layer is where the graph becomes legible: where you can see what others are reading, what they're marking, and what passages cluster around the questions you're chasing.

Public highlighting is, in this sense, an act of small civic generosity. You're contributing one labeled coordinate to a map that becomes more valuable with every coordinate. That's the structural argument for why we're building Glasp, and the structural reason that public-by-default is not a feature but the foundation.

What This Means for Being Human Now

Herbert Simon, writing in 1971, gave us the line that still governs the internet: "What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention." For most of the last fifty years, attention has been the scarce resource and information has been the abundant one.

In 2026 we are crossing into a new regime. Information is no longer just abundant; it's being produced at zero marginal cost, often without a human author at all. The output of generative models is rapidly becoming larger than the entire human-authored corpus that preceded it. In that regime, "information" is no longer the meaningful unit. The meaningful unit is "what a human cared about, on purpose, and was willing to mark."

Walter Benjamin, in 1935, worried that mechanical reproduction would erase the "aura" of the original artwork. He was half-right and half-wrong. Reproduction did flatten the visual arts in the way he predicted. It also created a new kind of aura, attached to provenance: signed prints, verified originals, authenticated artifacts. The same dynamic is playing out with text. Generative reproduction is flattening the field. What gets the new aura is the verified human gesture: the marked passage, the personal note, the public commitment to "this is the part I cared about."

A highlight, in this light, is a small claim of presence. I read this. I was here. This sentence was worth holding onto. Multiply that across a hundred million reading sessions and you have something like a public record of human meaning, drawn slowly, surviving the era. We called this, in another piece, the greatest legacy for future generations: not the books, but the inheritance of having read them, structured so the next reader can find what mattered.

When AI can produce text fluently, "what humans cared about" becomes the rare signal. The Curiosity Graph is what that signal looks like when you draw the map.

Frequently Asked Questions

Isn't this overstated? Highlights are just a personal note-taking practice.

At the level of one reader, yes. The argument is about aggregation. A million private notebooks are a million private notebooks. A million public highlights, attached to identities and sources, are a network. The claim isn't that any single highlight is profound. It's that the network of highlights, aggregated, has properties (intentionality, durability, provenance) that no other public attention dataset has, and that those properties become more valuable as other datasets degrade.

Doesn't Goodreads, Pocket, or Readwise already do this?

Each captures a slice. Goodreads tracks what you read, not what you marked inside it. Pocket archived links to read later, mostly without granular passage-level data. Readwise is excellent at private highlight management and import, but its design center is personal recall, not public aggregation. The Curiosity Graph requires public-by-default highlights at passage granularity, attached to identity, across sources. That combination is what Glasp is built around. The difference isn't features; it's whether the data forms a graph at all.

Can AI fake highlights?

It can fake the artifact. A bot can mark a span on a page and call it a highlight. What it has a much harder time faking is the underlying behavior: a sustained reading history with consistent taste, on real sources, at human pace, with a stable identity and patterns that match a real intellectual life. The signal isn't the highlight by itself. It's the highlight's place in a long pattern of behavior. Faking the artifact is cheap; faking the cognition behind a multi-year reading life is, for now, prohibitively expensive. The asymmetry is the whole point.

What about privacy?

Public highlights are voluntary. Readers can highlight privately, share selectively, or contribute fully to the public graph. The Curiosity Graph argument is about what happens when readers choose to make their highlights public. It's not a claim that private highlighting is worthless; it's a claim that public highlighting produces a categorically different artifact, and that artifact is the one that becomes infrastructure.

Does my one highlight matter?

In the same way one vote matters. Individually, the marginal signal is small. Collectively, the only reason the graph exists is that millions of small signals add up. If you highlight a paragraph that nobody else has marked, you've created a new edge. If you highlight a paragraph that ten thousand others have marked, you've reinforced an existing one. Both are useful. The graph doesn't care about big gestures; it cares about real ones.

Is this just a fancy way of saying "user data is valuable"?

It's the opposite. Most platforms treat user behavior as private exhaust to be monetized. The Curiosity Graph treats public, voluntary intellectual gestures as a shared resource to be aggregated for the benefit of everyone who reads. The model isn't extraction. It's a public commons whose value comes from being legible to all participants. That's also why we've framed Glasp as part of a broader learning OS: an open layer for how humans read, mark, and share understanding.

Conclusion

Three maps of human attention. One records demand. One records virality. One records intentionality. The first two are degrading under the pressure of cheap synthetic content and engineered engagement. The third holds up because it depends on something synthetic content cannot economically reproduce: actually reading, finding meaning, and choosing to preserve.

Aggregated, those choices form the Curiosity Graph. It is a slow, durable, interpretable map of what humans across decades of reading have found worth remembering. As AI fluency floods every channel, this map becomes more valuable, not less, because it is the most authentic remaining ground truth of human intellectual life.

Every public highlight you make adds one point to that map. One coordinate, attached to a real reader, on a real source, marking a real fragment of meaning. Multiply that across the next decade of reading and the result is something the era genuinely needs: a public record of human attention that survives the synthetic flood.

If that's a project you want to be part of, the gesture is small. Pick something worth reading. Mark the passage that lands. Make it public. Open Glasp's web highlighter and contribute one coordinate. Browse the community and follow the readers whose maps overlap with yours. The graph grows one honest mark at a time. That's the only way it ever could.