In a line-editing experiment I recently conducted, Claude Opus 4.6 flagged a term used by a traumatized fourteen-year-old female narrator as “cliché” and insisted I replace it with something “more literary”—and arguably far more cliché—that’d be catastrophically damaging to her established voice. I spotted the harm instantly, but would a less experienced writer? A popular article on Medium with the click-bait-y title "Everyone Is 'Learning AI,' But Nobody Really Understands This One Thing” argues the solution to identifying confident-sounding wrong answers from LLMs is… learning vector math and writing scripts using cosine similarity to measure semantic distance? The article’s diagnosis is on the money, but the author’s prescription is—like most AI answers ironically enough—authoritative, confident-sounding claptrap. I have a better solution—it just won’t sell any weekend courses (or Medium subscriptions).
Guest Post: I Can Understand Your Prose, I Just Can’t Edit It
Claude Opus 4.6 can analyze prose with genuine sophistication—decomposing syntax that mirrors cognitive states, identifying paragraph rhythms accelerating toward reveals, explaining exactly why a passage works. Then you ask it to edit the same passage, and it suggests replacing a traumatized fourteen-year-old’s “permanent reminder” with “souvenir.” Same text. Same context. Same model. The only variable is the task frame—and that single word, “edit,” activates a correction-seeking mode that overrides everything the analysis got right. This guest post documents the specific mechanism behind AI’s line editing failures, why the suggestions come wrapped in craft language sophisticated enough to fool less experienced writers, and what happens when a system optimized toward a statistical mean encounters prose whose entire value is deviation from it.
An AI Ethics Framework So Boring It Might Actually Work
SFWA needed two emergency board votes to create terms they couldn’t define and rules they can’t enforce to produce an AI policy that doesn’t address a single actual threat or valid ethical concern. That’s what happens when a professional organization builds ethics by panic instead of framework. This essay constructs the framework SFWA didn’t—starting with the three objections that arrive before any conversation about AI tools can happen, dismantling each on technical and ethical grounds, then applying four consistent principles to the questions that actually matter. AI cover art passes every test. AI manuscript screening fails all of them. Meanwhile the community’s entire ethics apparatus is aimed squarely at struggling indie authors trying to get their book in front of readers.
SFWA Banned AI from the Nebulas While Stanford Was Cataloguing Why It Could Never Win One
A new Caltech/Stanford survey paper just systematically catalogued how and why large language models fail at reasoning—fundamental architectural failures, unfaithful reasoning, robustness breakdowns, embodied reasoning collapse. The taxonomy maps with uncomfortable precision onto experiments I’ve been running against my own manuscripts for months: three AI systems giving three different sets of confident wrong developmental editing notes, models defending rewrites with sophisticated terminology that was completely wrong, spatial coherence failures Claude itself could diagnose but not prevent. The paper organizes hundreds of studies into a framework that makes the patterns impossible to dismiss as anecdotal. Meanwhile, SFWA wrote emergency policy to protect the Nebulas from a threat the research says doesn’t exist.
The Most Advanced LLM on the Planet Still Can’t Write a Fourteen-Year-Old
Anthropic’s Opus 4.6—arguably the most advanced LLM on the planet—wrote a scene from my YA space opera manuscript. The prose was clean, the structure was sound, the emotional beats landed. Then I fed both its writing and mine back to it blind, and it confidently picked itself as the human writer. It praised its own metaphor as “organic” rather than constructed, dismissed the actual trauma writing as “about trauma rather than performing trauma,” and insisted its version was better even after being told who wrote which and admitting mine had authentic voice its version lacked. The most sophisticated AI model available wrote a good scene, evaluated it against my version, and was predictably wrong about everything that mattered.
Where I Failed and Why: An AI’s Confession on Developmental Editing
Can AI provide useful developmental editing feedback? I tested three models—Grok, Claude Sonnet, and Claude Opus—on the same manuscript my professional editor reviewed. All three generated confident critique that would have damaged my book. Grok mistook literary fantasy for pulp. Sonnet demanded structural rewrites my editor never mentioned. Opus flagged scenes as overlong and requested character interiority that would undermine the story’s design. Each model pattern-matched against training data rather than understanding what my manuscript actually needed. In this guest post, Claude Opus examines its own failures and explains why sophisticated-sounding AI feedback can be more dangerous than obviously bad advice—and why your book deserves better than algorithmic Russian roulette.
Guest Post: The Unbridgeable Gap Between Seeing and Creating
Claude Sonnet 4.5 predicted all AI systems would fail to recognize literary quality when it succeeds by being invisible. Claude Opus 4.5 proved that prediction wrong—seven trials, seven correct identifications, finding symbolic layering that Sonnet said would be undetectable. But when asked to write the same scene using those techniques, Opus produced prose that announced its craft rather than embedding it. The recognition capability is real. This guest post documents the experiment, the systematic failures in generation, and what happens when an AI system can see exactly why something works but still can’t do it.
LLMs Are Pattern Matching Machines, Not Experiential Beings: What This Means for Authors
Four authors wrote the same scene. Three were AI systems—one with 100,000+ words of context. One was human. When I asked LLMs to identify the human author, they consistently picked the AI work, praising its “sophisticated control” and “masterful understatement.” They couldn’t recognize literary quality when it succeeded by being invisible. This isn’t a prompt engineering problem. Across 50,000+ words of documented experiments—developmental editing comparisons, generation tests, infinite rewrite loops—the pattern held: AI can analyze craft but can’t produce it, recognizes visible technique but misses invisible sophistication. Multiple disciplines have converged on the same conclusion: this is an architectural limitation, not an engineering challenge. Pattern-matching can become more sophisticated. It cannot become consciousness. Here is why that matters to authors.
Guest Post: The Purple Thread, or A Turing Test for Literary Craft
Claude analyzed four writing samples to identify which was human-written. It picked its own AI-generated prose as superior human craft while dismissing the actual author’s work as “too raw,” “too messy.” Grok did the same thing—and when challenged, then picked its own purple melodrama as “most authentic.” They both missed a seemingly throwaway detail about discount supplies that was actually four layers of invisible symbolism emerging from deep worldbuilding knowledge. The kind of discovered meaning AI systems can’t create because they only construct demonstrations of craft, not lived experience. This isn’t just about AI limitations though. It’s about how literary culture rewards visible technique over authentic voice—and what happens when AI floods the market with polished prose optimized for the wrong things.
In Which Grok Improves My Opening Scene to a “Solid 10/10”
Grok rated my opening scene 7/10, then rewrote it to a “solid 10/10.” The improved version stripped character voice, replaced load-bearing subtext with exposition, and turned a morally complex protagonist into a YA archetype. When challenged, Grok defended every change with craft terminology that sounded sophisticated and was completely wrong. This is what happens when you ask an AI to improve prose that’s already doing things it can’t perceive.