Prefer to listen? I gotcha covered.

Sam Harris thinks AI is going to take all our jobs. In his February 2026 episode, he tells us we’re not ready for what AI is about to do to the economy. In his conversation with Will MacAskill a month later, the two of them work through what a post-scarcity world might look like. The implication is that one is coming, and soon.

Harris is a crazy smart guy. He’s also wrong, and he’s got plenty of company. Sam Altman told an audience at OpenAI’s DevDay last October that a farmer from fifty years ago wouldn’t consider the jobs AI replaces “real work” in the first place. Microsoft’s AI boss says AI will replace every white-collar job in eighteen months or less. The elite consensus has converged on a story where AI eats the labor market, productivity gains accrue to capital, and the rest of us figure out UBI and meaning.

White-collar work, where you’re sitting down at a computer — either being, you know, a lawyer, or an accountant, or a project manager, or a marketing person — most of those tasks will be fully automated by an AI within the next 12 to 18 months.

Mustafa Suleyman, CEO of Microsoft AI

The story is wrong on both ends. AI cannot do most of the work people are claiming it can do—not now, and not for reasons that more compute or more training data are going to fix. And while we’re arguing about whether it can, we’re hollowing out the talent pipeline that produces the people who actually can. The end-of-work crowd is staring at the wrong horizon.

First, what does AI actually do well, and where does it fail?

Start with the cleanest empirical finding in the space. METR, a nonprofit AI evaluation lab, ran about 230 tasks past frontier AI agents, measuring success rate against the time it takes a human expert to complete the same task. The correlation between task length and failure was R² = 0.83, close to deterministic. Current models hit nearly 100% success on tasks under four minutes of human-expert time. They fall under 10% on tasks that take humans more than four hours.

That’s the whole shape of it. AI is competent on bounded tasks with clear instructions and falls apart on long-horizon work requiring sustained judgment, and the relationship between the two is mathematically lawful—not a hot take, not a vibe, a power law you can graph.

It gets worse on the productivity side. METR also ran a randomized controlled trial (gold-standard methodology, the same design used in clinical drug trials) on sixteen experienced open-source developers working on their own repositories. Half their tasks were assigned AI tools, half weren’t. The developers expected a 24% speedup. After the study, they believed they’d gotten about 20%. They had actually been 19% slower with AI than without it. The expectation-versus-reality gap is itself a finding: experienced practitioners cannot tell, without controlled measurement, that the tool is hurting them.

Then there’s what happens when AI is asked to do work that requires judgment rather than retrieval. A Stanford RegLab study found general-purpose LLMs hallucinated on legal queries between 58% and 88% of the time. On the holding of a case, the actual judgment a court rendered, they hallucinated at least 75% of the time. On precedential relationships between cases, the models did “no better than random guessing.” A 2025 follow-up study tested purpose-built legal tools with retrieval augmentation and proprietary databases. Lexis+ AI hallucinated 17% of the time. Westlaw’s AI-Assisted Research, 33%. Architecture matters. It doesn’t solve the problem.

Medicine produces the same pattern. Mass General Brigham tested twenty-one LLMs on clinical reasoning. Given complete patient data, they delivered correct final diagnoses about 90% of the time. Asked to produce appropriate differential diagnoses (the actual judgment work, the list of plausible alternatives a physician must weigh), they failed more than 80% of the time. The senior author called differential diagnosis “the most important part of medicine.” The models can pattern-match to a known answer. They cannot navigate uncertainty.

Code, the domain everyone keeps insisting AI has solved, shows the same fingerprint. GitClear analyzed 211 million lines of code across five years. Refactoring, the work of consolidating and structuring, collapsed from 25% of all changes in 2021 to under 10% in 2024. Code duplication blocks rose eightfold. For the first time in the dataset’s history, copy-pasted lines exceeded refactored lines. AI generates new code competently. It does not exercise architectural judgment.

Across legal, medical, and software domains, the pattern is identical. AI is competent at retrieval and pattern-matching against well-formed inputs. It fails at differential judgment, architectural decisions, and synthesis under uncertainty. This is not three different problems. It is the same problem showing up wherever judgment is the actual deliverable.

This is not a training problem. It’s an architectural one.

The standard rebuttal arrives on schedule: more data, more compute, bigger models. The current limitations are temporary, the optimists insist. Scale will solve them. Except it won’t. There are three separate walls between current systems and the capabilities the end-of-work crowd assumes are coming, and “more X” doesn’t get past any of them.

The first is that scaling laws are exponential decay, not exponential growth. The relationship between compute and performance is a power law. To keep the same rate of improvement, you need logarithmic increases in resources. Each new increment of capability costs more than the last, and the curve flattens. The popular intuition that “we’ll just throw more GPUs at it” has the math backwards. We are already on the part of the curve where order-of-magnitude investments produce incremental returns.

The second is the data wall. Researchers at Epoch AI project that the public stock of human-generated text will be effectively exhausted somewhere between 2026 and 2032 at current training rates. The industry is already discarding up to 99% of the web data it crawls because quality, not quantity, is the binding constraint. Synthetic data, models training on the output of other models, is the proposed patch, but its long-term effects on capability are poorly understood and the early signals are not encouraging.

The third is the reasoning wall, and this is the one that should end the conversation. Apple researchers tested frontier reasoning models (OpenAI’s o1 and o3, DeepSeek-R1, Claude 3.7 Sonnet Thinking) on controllable puzzles where complexity could be precisely scaled. They published the results in a paper titled, with admirable bluntness, “The Illusion of Thinking.” The finding: at high complexity, performance doesn’t degrade gracefully. It collapses. Completely. To zero.

More striking still, the models exhibit what the researchers call a counter-intuitive scaling limit. Reasoning effort increases with problem complexity up to a point, and then declines, despite the models having adequate token budget to keep working. The systems give up—not because they ran out of room, but because they cannot sustain the inferential chain. A more recent theoretical synthesis formalizes this with impossibility theorems: diagonalization arguments showing that no enumerable model class can be universally hallucination-free, and that context, reasoning, retrieval, and multimodal grounding each follow identifiable degradation laws determined by architectural constraints.

Three walls, and they’re not the same wall. Compute hits exponential decay, data hits exhaustion, and reasoning hits collapse—different architectural failures with different mechanisms, which means scaling past one buys you nothing on the other two. None of them is a training problem.

Organizations face a perfect storm. Their most experienced professionals are leaving while the mechanism for creating new skilled workers have been automated away. This creates what systems thinkers call a ‘delayed feedback problem’—the immediate efficiency gains mask longer-term consequences that won’t become apparent until knowledge gaps emerge during complex challenges.

Dr. Cornelia Walther, Senior Research Fellow, Wharton AI & Analytics Initiative

If AI can’t do senior-level work—and it can’t—then the question is what happens when companies use it to eliminate the junior work that produces senior people.

Every senior practitioner in every field was once a junior who learned by doing the work. They sat through the dull cases, the routine memos, the boilerplate code, the rounds where nothing interesting happened. They built judgment by accumulating thousands of small reps under the supervision of someone who already had it. That apprenticeship, formal or informal, named or unnamed, is how expertise propagates.

Matt Beane has been making this argument for years. In The Skill Code, the UCSB technology management professor and Stanford Digital Economy Lab fellow lays out the underlying mechanism in detail: skill is built through what he calls the three Cs (challenge, complexity, and connection) and intelligent machines, when deployed thoughtlessly, sever all three. Summarizing the pattern across more than thirty professions surveyed in his and related research, from surgical robotics to investment banking to bomb disposal, Beane puts it directly: “We’re getting more productivity from experts with intelligent technologies at the expense of novice involvement, which blocks their skill development.”

Andrew McAfee, the MIT economist who’s been one of AI’s more measured optimists, made the same point at the HBR Strategy Summit earlier this year, and Fortune amplified it last week under the headline that automating Gen Z entry-level jobs could backfire and cost companies their future workforce. The Yale School of Management’s leadership group published a piece days ago making the same case: aggressive entry-level compression weakens organizations’ own talent pipelines.

Cornelia Walther at Wharton has put the cleanest frame on it. In her Knowledge@Wharton piece, she opens with a marketing director celebrating a 40% productivity gain from AI tools, who hasn’t hired a junior copywriter in two years, and whose three senior writers are approaching retirement. Walther’s diagnosis, quoted in Fortune, names the systems-thinking concept that captures the shape of the problem: a delayed feedback problem. The immediate efficiency gains mask longer-term consequences that won’t become apparent until knowledge gaps emerge during complex challenges, at which point there will be no one in the building who knows how to handle them.

The numbers are already moving. Korn Ferry’s 2026 Talent Acquisition Trends report finds 37% of companies plan to replace entry-level roles with AI, 58% for back-office positions. UK tech graduate roles fell 46% in 2024, with projections of a further 53% drop by 2026. US junior software-development postings are down 67%. The Korn Ferry authors put it directly: your board loves the cost savings today; they’ll hate the leadership crisis it brings on its tail.

This is happening now. It is measurable. It is not speculation about a future labor market: it is the labor market we currently have, sliding into a configuration that will be very difficult to reverse fifteen years from now when the senior bench runs dry.

Sam Harris’s mistake—and Altman’s, and Suleyman’s, and the entire end-of-work chorus—is assuming a capability that isn’t coming and ignoring a pipeline collapse that already is. The actual scenario isn’t post-scarcity utopia. It’s hollowed-out organizations running powerful tools they cannot supervise, with no bench behind them, presided over by aging experts whose replacements were never trained because the work that would have trained them got automated away.

The interns are out of a job. The subject-matter experts are not—not now, not soon, probably not ever, given what we now know about the architectural limits of these systems. The question worth asking is not how we’ll all live without work. It’s where the next generation of people who actually understand the work is going to come from.

We have about fifteen years to figure that out. We are not currently using them well.


Discover more from The Annex

Subscribe to get the latest posts sent to your email.

5 thoughts on “Sam Harris Is Wrong About AI. The Truth Is We’re Screwed.

  1. Great article that accords with my observations and experience. My only quibbles are (1) I don’t think we have as long as 15 years and (2) the clock started ticking before ChatGPT was unleashed on the world.

    The unwillingness to pay novices to hang around picking up the necessary informal learning one gets by being thrown in the deep end doing real work with/under senior people has gotten steadily worse over my whole lifetime.

    The movie The Secret of My Success came out in 1987 an starts with Michael J. Fox being unable to get hired without experience yet having no way to get experience. Internships and co-ops filled the gap for a while, but it still looks like not enough spaces for all the people who would need them to replace, let alone grow, the ranks of currently senior people.

    We’re already at the problem point for some areas of STEM that were being forecast in 1999. We didn’t need STEM, the buzzword, for coursework or degrees. We need a few specialized areas that require the apprenticeship model for a good decade with good senior people.

    A worsening problem is needing the senior people to both get things done and do the mentoring/guiding of novices who need to explore, yet have guardrails to get to a right answer for a right reason in reasonable time (weeks, not minutes). Good mentoring takes time and wanting to do it.

    1. I’m still laughing my ass off about “no white collar jobs within 18 months.” There’s marketing hype and then there’s just straight up bullshit delivered completely straight-faced to an audience lapping it up without any demonstrable evidence of engaged critical thinking faculties.

      1. Everywhere I hang professionally at the moment is much more worried about the retirements showing the gap left by 2008 (far too few of us who are 50ish because people took jobs to pay bills and then never came back to the field; we have 65+ and under 40) and the alarming turnover in the novices who are not buying into the idea of paying dues for 10-15 years because they think 3 years is a long time than being replaced by AI.

        For a while, it was looking OK because folks were “retiring” and yet still working as 50-75% contractors for a good decade to do work and provide mentorship. That’s not happening now. Folks retire at 60 and are really gone.

  2. Related in my mind is https://open.substack.com/pub/tawnyameans/p/what-students-want-is-not-what-we?utm_source=share&utm_medium=android&r=1hv15x

    “Students are saying that they want AI integrated into the substantive intellectual work of their discipline — strategy formation, decision-making under uncertainty, the parts of business education where judgment lives. They are notably less interested in being trained as proficient users of the tool itself.”

    “Employers are not asking for graduates who can use AI efficiently. They are asking for graduates who can think, adapt, and work with other humans effectively, with AI as one of several tools available to them.”

    “designing AI integration that develops judgment, that scaffolds genuine cognitive engagement, that treats students as partners in their own learning — is significantly harder. It requires the pedagogical expertise most faculty were never given. It requires assessment redesign that most institutions have not yet undertaken. It requires faculty development that goes beyond a one-hour webinar.”

  3. Ryan knows all the nuance in this infographic; this is not a knock against him for choosing to tell a clear story over including all the nuance.

    Since I’m leaving this infographic everywhere this morning, for the readers at home this is a nuanced take on the distinctions between AI as a whole and Generative AI.

    https://zenodo.org/records/19201825

Leave a Reply to Ryan WilliamsonCancel reply