The Mythos Effect: How to Stress-Test Your Own Mental Models with AI

The Mythos Wake-Up Call

The most uncomfortable line in Anthropic's Mythos Preview report wasn't the part about thousands of zero-day vulnerabilities. It was a quieter sentence buried inside the technical write-up: the bugs Mythos surfaced had, in many cases, survived decades of human review and over five million automated tests.

That phrase matters. It says something specific about how dangerous flaws hide. They don't hide by being clever. They hide by being unexamined. The 17-year-old FreeBSD NFS bug (CVE-2026-4747), the 27-year-old OpenBSD TCP SACK vulnerability, the 16-year-old FFmpeg flaw, all sat in plain sight, in code most of the internet was running, while attention moved on to newer modules. The systems weren't unaudited. They were audited by people who had stopped looking at certain assumptions.

Read that sentence again with one substitution: replace "code" with "your worldview." When was the last time you ran a fresh audit on a belief you formed a decade ago? Most people, if they're honest, can't name one. The defaults they set in their twenties about money, relationships, career, learning, politics, or how the world works are still running in the background, shaping decisions, never re-examined since they went into production.

This is the Mythos effect. Not the model itself, but the lesson it broadcasts. If a more capable analyzer can surface decades-old flaws in some of the most-reviewed code on earth, it's almost certainly capable of surfacing flaws in the much less rigorously audited operating system you carry around in your head.

The good news is you don't need access to Mythos to do this. You need a corpus and a workflow. Most people already have the corpus. Almost no one has the workflow.

Why Beliefs Are More Like Old Codebases Than We Want to Admit

Programmers have known for a long time that the most dangerous code isn't the new code. It's the code nobody touches. Stable, "battle-tested," running in production. Right up until the moment a researcher discovers that one of the assumptions baked into it became false five years ago, and now the whole tower is exposed.

Beliefs have the same pathology. Three properties make them especially vulnerable.

First, they're load-bearing. A belief about, say, what kind of work makes you happy isn't a discrete claim. It's an assumption that other decisions depend on. Career moves, financial planning, where you live. Like a deeply-imported library, when it shifts, everything downstream shifts with it. This makes people resistant to re-examination, because the cost of the audit feels enormous.

Second, they're rarely tested in adversarial conditions. Most beliefs get exercised only in environments that confirm them. You spend time with people who share your priors, read sources that reinforce your framing, work in industries where certain assumptions are local consensus. Like code that's never been fuzz-tested, the inputs that would reveal the bug never arrive. That's true even for very smart people, especially as their environments become more homogeneous over time.

Third, they were optimized for conditions that may no longer exist. The belief you formed in 2015 about how careers work was reasonable for 2015. The world it was responding to had different ground rules. Tech industry hiring norms, social media incentives, the geography of remote work, the cost of capital, the speed at which AI is changing professional moats. All of those have shifted, sometimes radically. A belief optimized for an older environment running in the current environment is the cognitive equivalent of a 17-year-old kernel module that nobody updated.

What Mythos demonstrated, with uncomfortable clarity, is that the cost of running stale code drops to zero until the day something exposes it. Then it goes vertical.

The Five Vulnerability Classes in Personal Cognition

Cognitive science has, over the last fifty years, mapped most of the major ways human reasoning goes quietly wrong. Five categories matter most for the kind of audit this article is about.

Stale priors. Beliefs you formed under one set of conditions that you've never re-examined as conditions changed. These are the most common and the most expensive. They feel reliable precisely because they've never been challenged. Daniel Kahneman's work on System 1 thinking explains why: the mind treats long-held conclusions as cached results, skipping re-derivation.

Confirmation drift. Over time, you increasingly read, watch, and surround yourself with sources that share your existing framing. Your inputs narrow without you noticing. By year ten, you're not so much reasoning about a question as restating a position from inside a tighter and tighter information bubble. This was Cass Sunstein's diagnosis in his work on group polarization.

Framework allegiance. You learned an intellectual framework (a leadership model, an economic school, a productivity system, a worldview) and started filtering everything through it. The framework becomes invisible. You stop noticing it's a framework at all. It just feels like "how things are." Robert Kegan's developmental theory describes this as being "subject to" a mental construct rather than holding it as an "object" you can examine.

Availability bias. What's mentally accessible feels truer than what's not. Recent events, vivid examples, things your social circle talks about, all get systematically over-weighted. Older data, distant geographies, low-status sources get under-weighted. Tversky and Kahneman demonstrated this experimentally in the 1970s, and it hasn't gotten better with social media.

Sunk-cost beliefs. You've publicly committed to a position, made decisions based on it, told other people. Now changing your mind has social cost on top of cognitive cost. So the belief gets defended past the point where the evidence would otherwise have moved you. The defense feels like rigor. It's actually momentum.

None of these are personal flaws. They're how human cognition works. The point isn't to feel bad about having them. It's to notice that they accumulate exactly the way bugs accumulate in old code: silently, predictably, and waiting for the right input to expose them.

What Counts as a Mental Model Vulnerability

It helps to be specific about what we're actually scanning for. Not every shift in opinion is a vulnerability. Not every comfortable belief is a bug.

Useful indicators:

Vulnerability Class	What it looks like	The Mythos analogue
Stale prior	"I always thought X about my industry" never updated	Decades-old code path nobody touched
Confirmation drift	All your highlights on a topic come from sources that agree	Test suite that only checks happy paths
Framework allegiance	You can't articulate the framework's failure modes	Hidden assumption never written down
Availability bias	Recent vivid story drives outsized confidence	Cache-only logic that ignores history
Sunk-cost belief	"I've already said publicly that..."	Legacy API kept alive for compatibility

A useful test: would a thoughtful person who didn't share your priors call this a load-bearing assumption you've never validated? If yes, that's the target.

What you're not looking for: every place you might be wrong. That set is infinite. The audit is targeted. Look for assumptions that other decisions depend on, that haven't been tested, and that you could realistically re-examine.

Why You Can't Run This Scan in Your Head

Here's the awkward part. The thing you're trying to audit (your own thinking) is the same thing doing the audit. That's a problem.

When you self-examine, you tend to find what you expect to find. The same cognitive shortcuts that produced the vulnerabilities are running the inspection. Self-audit without external grounding is like a codebase running its own tests against assumptions baked into the codebase. It will pass.

Three external grounding mechanisms work.

The first is other people, specifically people who hold different priors. This is why intellectual diversity in your network matters. But it's slow, socially expensive, and most people's networks are less diverse than they think. So it's usually insufficient on its own.

The second is time. Read what you wrote a decade ago. The you-now is, in a meaningful sense, an outside auditor of the you-then. The problem is that time-based audits are infrequent and depend on having a written record. Most people don't have one in any usable form.

The third, new in the last two years, is AI-assisted retrieval over your own corpus. This is the option that's underused, even though the infrastructure is now cheap. Feed an AI everything you've highlighted, annotated, and noted over the years, and ask it specific structured questions. The AI doesn't know which conclusions you want to keep. It doesn't have your social investment in your past positions. It can surface contradictions, gaps, and unexamined dependencies at a pace human self-reflection can't match.

This is the same logic explored in the chat-with-your-notes-personal-RAG approach, applied with a different intent. The goal isn't to summarize your notes. It's to interrogate them.

What you're building, in effect, is a personal version of what Anthropic built for code analysis. A system that can scan your own deposits of thinking and flag the places where the assumptions look stale.

Highlights as Your Personal Source Code

For the scan to work, you need a corpus. Not just any text. Specifically: text you marked because it meant something at the time.

This is what makes Glasp's web highlighter and Kindle import useful for this purpose. The data they produce is unusually high signal. You only highlighted sentences that felt important. Each highlight is, in effect, a small commit message from your past self: "this is worth keeping."

Three years of consistent highlighting is the equivalent of a personal source repository. You can see your influences, the arguments you found compelling, the framings you adopted, the writers you returned to. The history isn't visible in your head. It's smeared across forgotten browsing sessions. But on the page, in a database, it's just data.

A few specific reasons highlights work better than other formats for this audit:

They're you-curated. Unlike browser history (full of noise), highlights are pre-filtered by your attention.
They're source-attributed. You can trace every claim back to who said it, and check whether you've drifted into echo chambers.
They're time-stamped. You can see which framings you adopted in 2021 versus 2024 and ask whether the underlying claims still hold.
They're small enough for AI to load. Even thousands of highlights fit comfortably in a modern context window.

Building a second brain was the foundational case for the practice. What's changed is that the second brain is now actually useful for something beyond retrieval. It's useful for self-audit. And personal context management is the broader frame for why this matters in the AI era.

YouTube and podcast highlights matter here too, because a lot of mental model formation happens through audio and video. If your reading log shows balanced sources but your watch history is monocultural, that's a vulnerability the reading log alone won't expose. YouTube Summary timestamps mean video sources can sit in the same corpus, scannable on equal terms.

A Practical Personal Vulnerability Scan

Here's a workflow you can run quarterly. It takes roughly an hour. None of it is heroic, and the parts that look effortful in description go quickly in practice.

Step 1: Choose a belief that's load-bearing. Not a casual opinion. A belief that several decisions in your life depend on. Examples: "in my field, the most important skill is X," "people in my situation should optimize for Y," "the right way to learn Z is the one I've been using." Write the belief in one sentence.

Step 2: Date the belief. Roughly when did you form it? Under what conditions? What sources did you trust at the time? This step alone often surfaces something. A belief you formed in 2014, based on writers you stopped following in 2018, is operating under data that may no longer apply.

Step 3: Query your corpus for supporting evidence. Using AI Chat over your highlights, ask: "show me every highlight I've marked that supports this belief, with sources and dates." Read the list. Notice the distribution. Are all your supporting highlights from the same three writers? From a single intellectual tradition? Are they all from a five-year window?

Step 4: Query for contradicting evidence. This is the step most people skip and where most of the value lives. Ask: "find any highlight in my corpus that contradicts or complicates the claim that X." Be specific about what would count as contradiction. The AI will surface things you marked but didn't integrate. Often things you flagged as interesting but never followed through on.

Step 5: Look at the gap. What evidence is missing? If you couldn't find any contradiction in your own corpus, that's not a clean bill of health. It's a sign you've been reading inside a bubble. Step outside it. The point of the gap isn't to disprove yourself. It's to identify where you don't have data.

Step 6: Decide what to do. Options: update the belief, stress-test it against new sources, commission yourself to read one or two contradicting works in the next month, or explicitly mark it as "currently uncertain" and proceed with that label. The decision matters less than the labeling. You're moving the belief from invisible-default to visible-choice.

A version of this is what good investors, scientists, and operators already do informally. The Mythos lesson is that the informal version is no longer enough. The audit is now cheap enough that not doing it is the expensive option.

Pitfalls and How to Avoid Them

A few traps to watch for, because the workflow is easy to misuse.

Don't make it confessional. The goal isn't to feel bad about past beliefs. It's to detect drift and update. Treat each finding clinically. Update the model and move on.

Don't let the AI tell you what to believe. The AI isn't an oracle. It's a retrieval engine over your own corpus, plus general knowledge. When you ask "is X true," you'll get a fluent-sounding answer that may be wrong. Use it to surface evidence, not to issue verdicts. The judgment stays yours. The AI thinking trap is real and applies here too.

Don't audit everything at once. Pick one belief per quarter. Maybe three a year. Trying to re-examine your entire worldview at once is both exhausting and useless, because deep beliefs need separate attention.

Don't confuse newness with rightness. A contemporary view isn't automatically better than an older one. Some of your stale-looking priors are stale because they're right and the world keeps proving it. The point of the audit isn't to chase trends. It's to verify, deliberately, which conclusions still earn their position.

Don't skip the source check. When AI surfaces a contradicting highlight, look at where you originally read it. Sometimes the contradiction is real. Sometimes you marked it because it was provocative, not because it was correct. Source quality still matters.

The workflow works best as a quiet habit, not a dramatic ritual. Quarterly, an hour, one belief. Over five years, that's twenty beliefs deliberately re-examined. Compare that to the implicit alternative: zero.

Frequently Asked Questions

Isn't this just an excuse to second-guess myself constantly?

No, and the design of the workflow is meant to prevent exactly that. You're auditing one load-bearing belief per quarter, not running a perpetual self-critique. The risk you want to avoid is the opposite: never auditing anything. People who change their minds constantly aren't running this audit. They're being blown around by the latest take. The point is structured re-examination at a slow cadence.

What if I find that most of my beliefs are well-supported?

Then you've confirmed something useful and you can stop worrying about it for that belief. The audit doesn't have to find vulnerabilities every time, just like a code audit doesn't have to find bugs every time. Negative results have value. The bigger problem would be never running the scan at all and assuming everything's fine.

How is this different from journaling or self-reflection?

Two things. First, it uses an external corpus (your highlights), which is much less subject to in-the-moment bias than introspection. Second, it uses AI for retrieval, which can scan years of input in seconds. Journaling tends to surface what's salient to you right now. This workflow surfaces what was salient to you over time, which is a much wider and more honest dataset.

Do I need years of highlights to make this work?

No, but more is better. Even six months of consistent highlighting on a few topics is enough to surface patterns. The first audit will be lighter. By year two, the corpus is dense enough that AI retrieval starts to feel uncanny. If you're starting from zero, the answer is to start now and run a small audit in three months.

Doesn't AI just confirm my existing views by default?

It depends on how you query. If you ask "is my belief X correct," yes, you'll often get a flattering, hedged answer. If you ask "find every passage I've highlighted that complicates this," you get something useful. The query structure is the whole game. Treat the AI as a retrieval engine, not a judge.

What if my highlights are scattered across multiple tools?

That's a workflow problem worth solving, but not a blocker for starting. Pick the largest single source and audit against that first. Over time, consolidating into one queryable corpus (Glasp with Kindle import and YouTube Summary feeds into the same system) makes the audit much more powerful.

What about beliefs I've never written down?

Those are the hardest, by design. The whole point of building a highlight corpus is that it externalizes what would otherwise stay only in your head. Beliefs that have never made it into your corpus are also beliefs you've never engaged with deliberately enough to mark. They're not invisible to the audit, but they only become legible when you start to externalize them, which is itself a useful practice.

Conclusion: Schedule the Next Scan

Anthropic's Mythos Preview is going to be discussed for years in the context of cybersecurity. The more durable lesson it offers, though, isn't about code. It's about review.

The bugs Mythos found weren't sophisticated. They were old. They survived because nobody re-examined them. Once a more capable analyzer arrived, with the patience to look at what humans had stopped looking at, the bugs surfaced almost immediately. The vulnerability was in the absence of audit, not in the code itself.

Your mental models live under the same conditions. The defaults you set years ago are running in the background, shaping decisions, never re-examined since you set them. They're not necessarily wrong. They're just unchecked. Some of them are almost certainly bugs. You don't know which ones until you scan.

The tools to run that scan now exist. They didn't five years ago. A corpus of your own highlights, plus AI that can query it, plus a structured quarterly workflow. That's the whole stack. It costs less than an hour every three months. Almost nothing about that is technically hard. The difficulty is making it a habit.

Pick one load-bearing belief this week. Date it. Run the scan. See what comes back. You may end up confirming the belief, in which case you've earned the right to keep it for the next round. Or you may find a stale prior that's been quietly producing bad downstream decisions, in which case you've just bought yourself the most valuable kind of upgrade.

Glasp's web highlighter, Kindle highlights import, and AI chat over your corpus are designed for exactly this kind of work. Start the corpus today. Run the first scan in three months. Don't wait for a Mythos-equivalent revelation to find out which of your defaults is broken.

The model isn't your auditor. You are. The corpus is what makes the audit possible.