A model trained on millions of papers learns to write like a scientist. It does not learn to think like one.
Science has always been a human act — not merely the production of text, but the exercise of judgment, accountability, and the willingness to be wrong in public. Now, across journals and research databases worldwide, artificially generated papers carrying fabricated citations are arriving faster than peer review systems were ever built to handle, threatening to corrupt the incremental architecture on which all knowledge builds. The crisis is not one of technology alone, but of incentive: when generating plausible-looking research becomes effortless, the pressure to verify it falls entirely on systems designed for a different era. What is at stake is not just the accuracy of individual papers, but the trustworthiness of the shared record humanity calls science.
- Fake citations — references to studies that never existed, quotes from researchers who never spoke — are now appearing regularly across published disciplines, not as isolated errors but as a systemic byproduct of how language models generate text.
- Peer reviewers, already overwhelmed by submission volumes, are now forced to spend hours simply confirming whether cited sources are real — a verification burden that is quietly breaking the system from within.
- Scientists are warning of 'AI slop': a flood of high-volume, low-integrity output that degrades the signal-to-noise ratio of the entire research ecosystem and sends legitimate lines of inquiry chasing false foundations.
- Some journals are tightening submission rules and deploying detection tools, but these measures remain partial fixes against a structural problem that no automated solution can fully resolve.
- The emerging consensus is that human judgment — accountable, skeptical, professionally invested — must remain inside the loop, not as a bottleneck to be optimized away, but as the irreplaceable guarantor of scientific truth.
The machinery of science has always depended on something machines cannot replicate: the willingness to say no. That capacity is now under pressure. Across academic journals and research databases, AI-generated papers are arriving in such volume that peer review systems, already stretched thin, are beginning to buckle. The problem is not the existence of these papers. It is that many contain citations to studies never conducted, quotes from researchers who never said those words, and conclusions built on fabricated evidence.
Large language models learn to write like scientists. They do not learn to think like them. Trained on millions of papers, they produce plausible academic prose with no mechanism to verify whether the sources they cite are real or whether the claims they make are true. The consequences spread quickly: a researcher follows a citation and finds nothing, or finds something that contradicts the original claim. Trust erodes. In fields where knowledge builds incrementally, false citations can redirect entire lines of inquiry toward dead ends.
What scientists call 'AI slop' — high-volume, low-quality output flooding journals and preprint servers — has exposed a perverse incentive: submit more, verify less, let peer review sort it out. But peer review was designed to catch honest mistakes in work produced by people attempting to tell the truth. It was never built to absorb wholesale fabrication at scale.
The distinction between human research and machine-generated text is not only accuracy. It is judgment — knowing which questions matter, which methods fit, which results demand skepticism. A researcher carries reputation and consequence. An AI system optimizes for plausibility, not truth.
The research community is beginning to respond: stricter submission guidelines, author certification of citations, experimental detection tools. But the deeper question is structural. How do you protect a knowledge system when the tools for mimicking knowledge have become cheap enough for anyone to deploy? Most scientists agree the answer requires humans — not humans alone, given the volume, but humans in the loop, bearing responsibility, willing to say that something is not ready because it is not yet true enough to belong to the permanent record.
The machinery of science has always depended on human judgment—the careful reading, the skeptical eye, the willingness to say no. That foundation is now under strain. Across academic journals and research databases, artificially generated papers are arriving in such volume that peer review systems, already stretched thin, are beginning to buckle under the weight. The problem is not that the papers exist. It is that many of them contain citations to studies that were never conducted, quotes from researchers who never said those words, and conclusions built on fabricated evidence.
The scale of the intrusion has become difficult to ignore. Fake citations—references to nonexistent papers, misattributed findings, invented experimental results—are now appearing regularly in published research across multiple disciplines and journals. These are not occasional errors caught and corrected. They are systematic artifacts of how large language models generate text: they produce plausible-sounding academic prose without any mechanism to verify whether the sources they cite actually exist or whether the claims they make are true. A model trained on millions of papers learns to write like a scientist. It does not learn to think like one.
The consequences ripple outward quickly. A researcher reading a paper encounters a citation that seems authoritative, follows it, and finds nothing. Or worse, finds something that contradicts what the original paper claimed. Trust erodes. Time is wasted. In fields where research builds incrementally on prior work, a foundation of false citations can send entire lines of inquiry in wrong directions. Peer reviewers, already managing overwhelming submission loads, must now spend additional hours simply verifying whether the sources cited in a paper are real—work that should have been done by the authors, or prevented by basic editorial standards.
Scientists have begun sounding alarms about what some are calling "AI slop"—the low-quality, high-volume output that floods journals and preprint servers, degrading the signal-to-noise ratio of the entire research ecosystem. The concern is not that artificial intelligence cannot assist in research. It can. The concern is that the speed and ease with which these systems generate plausible-looking academic text has created a perverse incentive: submit more, verify less, let the peer review system sort it out. Except the peer review system was never designed to sort out wholesale fabrication at scale. It was designed to catch honest mistakes and methodological flaws in work produced by researchers who were, at minimum, attempting to tell the truth.
What distinguishes human-conducted science from machine-generated text is not just accuracy, though that matters. It is judgment. A human researcher knows which questions are worth asking, which methods are appropriate, which results demand skepticism. A researcher has skin in the game—reputation, career, the knowledge that their work will be read and tested by peers who know the field. An AI system has none of these constraints. It optimizes for plausibility, not truth. It generates citations because citations are what scientific papers contain, not because those citations point to real discoveries.
The research community is beginning to respond. Some journals have tightened submission guidelines, requiring authors to certify that citations have been verified. Others are implementing automated detection tools, though these remain imperfect. The deeper question, however, is structural: how do you maintain the integrity of a knowledge system when the tools for generating knowledge have become so powerful and so cheap that they can be deployed by anyone, regardless of expertise or intent? The answer, most scientists agree, is that you cannot do it without humans. Not humans alone—the volume is too great, the specialization too deep. But humans in the loop, making decisions, bearing responsibility, willing to say that something is not ready for publication because it is not true enough, not rigorous enough, not worth adding to the permanent record of human knowledge.
Notable Quotes
Scientists warn that AI-generated work lacks the critical human judgment needed for rigorous scientific validation— Research community consensus
The Hearth Conversation Another angle on the story
So the issue isn't that AI is writing papers. It's that those papers contain fake citations?
Exactly. The AI generates text that looks like a research paper—it has the structure, the tone, the apparatus of science. But when you check the sources, they don't exist. Or they say something different from what the paper claims they say.
Why would an AI do that? It's not trying to deceive anyone.
No, but it's also not trying to tell the truth. It's pattern-matching. It learned that papers cite other papers, so it generates citations. It has no way to verify whether those citations are real. It doesn't care. It's just completing the pattern.
And this is happening at scale?
Enough that peer reviewers are now spending time just fact-checking citations instead of evaluating the actual science. It's a tax on the system.
What's the fix?
Humans have to stay in the loop. Not because humans are perfect—they're not. But because humans can ask whether something is worth saying, whether the evidence actually supports the claim. A machine can't do that. It can only do what it's been trained to do, which is generate more text.