Research as a Growth Channel

In 2026, Glasp started doing something that isn't in any startup growth playbook we've ever read: we began publishing research papers on arXiv.

Not blog posts dressed up with charts. Actual papers, with methods sections, pre-registered thresholds, held-out test sets, and public repositories. Written by a team you can count on one hand, between everything else a startup demands.

This chapter is about why we did it, what we found, and why we think original research might be one of the most underrated growth channels of the AI era.

Why a Startup Publishes Research

The honest answer has two halves, one idealistic and one strategic, and we'd be telling the story wrong if we hid either.

The idealistic half: it's the mission at a different altitude. Glasp exists to make learning public, to ensure that what one person figures out can benefit the next. For years that meant individual highlights and notes. But after millions of people had saved millions of highlights, the platform itself had learned something about how humans read, and keeping that locked in a private database felt like a violation of our own premise. If a user's highlights deserve to outlive the moment, so do the patterns across all of them.

The strategic half: in the answer-engine era we described in Chapter 6, original research is among the most citable content that exists. Answer engines are hungry for primary sources, for claims that come with evidence attached. A thousand blog posts repeat each other; a paper with novel findings is the thing they all end up citing. Publishing research is differentiation that cannot be copied quickly, because the only way to copy it is to do the work.

What the Highlights Taught Us

Our main research line asked a deceptively simple question: when you highlight a passage, how much of that choice is you?

The intuition we started with, and that most of the personal-knowledge-management world shares, is that highlighting is deeply personal. Your highlights are your intellectual fingerprint. An AI trained on them should be able to predict what you'll find important in ways no generic model could.

The data said something more interesting. When different people highlight the same article, they agree far more than they differ. What stands out in a text mostly stands out to everyone; salience is largely shared, social rather than idiosyncratic. The individuality is real, but it doesn't live where we expected. It lives in selection: which documents you choose to engage with in the first place, which topics you return to, what you decide is worth your attention at all. And that selection behavior turns out to be remarkably stable over time, less like a mood and more like a trait.

In other words: within a document, we read like a crowd. Across documents, we read like ourselves.

Finding this meant breaking some of our own assumptions, including ones we had been excited about. An early version of one analysis seemed to show individual highlighting styles beating the crowd; our own audit found bugs and leakage in that result, and we retracted and rebuilt the work before publishing. The honest version of the paper was different from the one we had hoped to write, and it was stronger for it.

A Natural Experiment in AEO

We also turned the research lens on ourselves.

The shift from search engines to answer engines, the one that forced the strategy change in Chapter 6, is exactly the kind of event researchers call a natural experiment. We were living inside it, with our own traffic and citation data as the laboratory. So we studied the transition rigorously and published that analysis too.

There was something satisfyingly recursive about it: the growth strategy itself became open knowledge. The same way we once turned user interviews into case studies, we turned a platform shift into a paper anyone can read, check, and build on.

Open by Default

Every paper went out with a public repository. This was Chapter 5's open-source instinct carried to its conclusion: we had open-sourced tools, and now we were open-sourcing findings.

The reasons are the same ones that made open-sourcing our AI tools work. Transparency builds trust, and trust compounds, a principle we'll come back to in the next chapter. Researchers who can verify your work become advocates for it. And in an era when AI systems increasingly decide which sources to lean on, a track record of verifiable, honestly reported research is the deepest trust signal we know how to send.

There's also a discipline benefit we didn't fully anticipate. Knowing the work will be public, with the analysis open to inspection, forces a level of rigor that internal dashboards never demand. Publishing made us more honest with ourselves about what our data does and doesn't show.

What Founders Can Take From This

A few transferable lessons, for anyone sitting on product data and wondering.

Your product data probably contains publishable insight. Not engagement metrics, which interest no one outside your boardroom, but genuine questions about human behavior that only your vantage point can answer. We could study how people highlight because we're where people highlight. Whatever your product is, you're the world's best-positioned observer of something.

Rigor is the price of admission, and it's higher than content marketing. Be prepared to question your favorite hypothesis hardest, to have your best finding dissolve under audit, and to publish the honest result instead of the exciting one. We found that out firsthand. The retracted-and-rebuilt paper taught us more than a smooth success would have.

And the payoff is unlike other channels. A viral feature spikes and decays. A research finding, once cited, keeps being cited; it becomes part of how a field talks about a topic. It's the compound effect again, operating on the longest time horizon we've found yet.

For a company whose founding question was how knowledge outlives the person who found it, publishing research isn't a detour from the mission. It might be the most direct expression of it we've shipped.