A doctor following a guideline may be treating patients based on research that never existed.
In the long tradition of human error finding new vessels, artificial intelligence has introduced a novel form of scholarly corruption: citations that look real, sound real, and point to nothing at all. A Columbia University researcher, embarrassed by a fabricated reference in his own work, turned that moment into a sweeping investigation — and found that across nearly 2.5 million biomedical papers, the rate of invented citations has grown twelvefold in just three years. What troubles observers most is not the scale of the deception, but where it is landing: inside the clinical guidelines that physicians consult when deciding how to treat living patients.
- The fabrication rate has surged from 1 in 2,828 papers in 2023 to 1 in 277 in early 2026, a trajectory that researchers describe as the visible tip of a much larger iceberg.
- AI language models generate references with perfect structural plausibility — correct authors, journals, and dates — making them nearly invisible to peer reviewers working under time pressure.
- One 2025 oncology paper had 60% of its references fabricated, and patterns consistent with coordinated paper mills have emerged across multiple surgical journals.
- Fabricated citations are now infiltrating systematic reviews and clinical guidelines, meaning treatment decisions for real patients may rest on research that never existed.
- Former JAMA editors have called for outright retraction of affected papers and classification of fabricated citations as formal research misconduct, while a four-point reform plan urges automated verification, integrity metadata, retroactive screening, and new integrity database categories.
Maxim Topaz did not set out to expose a crisis. He set out to correct an embarrassment. After a journal flagged a citation in his own paper — plausible in every detail, attributed to a real researcher, and entirely nonexistent — he realized he had no idea how it got there. An AI writing tool had invented it, and he had not caught it. That moment of humiliation became the foundation for something much larger.
Working with colleagues at Columbia University, Topaz built an automated verification system and ran it across nearly 2.5 million biomedical papers published between January 2023 and February 2026. Checking 125.6 million references against PubMed, Crossref, OpenAlex, and Google Scholar, they identified 4,046 fabricated citations spread across more than 2,800 papers. The rate of fabrication had jumped more than twelvefold — from roughly 4 per 10,000 papers in 2023 to nearly 57 per 10,000 in early 2026. The spike tracked almost precisely with the lag between widespread AI adoption and the time those papers cleared peer review.
The mechanism is both simple and insidious. Large language models hallucinate — they produce text that sounds authoritative while being entirely invented. A fake citation can pass every visual check: right format, right names, right journal, right year. Peer reviewers, relying on spot-checks rather than exhaustive verification, routinely miss them. One 2025 paper on surgical techniques had 60% of its references fabricated, each one tailored to the paper's narrow topic and attributed to real specialists. Patterns consistent with paper mills also surfaced — one author pair appeared across eleven papers in a single journal, with fabricated references distributed among them.
The concern deepens as those citations migrate downstream. Topaz's team found fabricated references appearing not just in individual papers but in systematic reviews and clinical guidelines — the documents physicians consult when making treatment decisions. A guideline can appear authoritative while its evidentiary foundation is phantom. Writing in The Lancet, former JAMA editors Howard Bauchner and Frederick Rivara called for outright retraction of affected papers and formal classification of fabricated citations as research misconduct.
Topaz and his team proposed four remedies: automated reference verification before peer review begins, integrity metadata added to indexing records, retroactive screening of already-published literature, and a dedicated category for fabricated references in research integrity databases. Harvard's Arjun Manrai put the stakes simply: 'If our scientific foundation erodes, we are in deep trouble.' The fabrication rate is still climbing, and the fake citations are already shaping the documents that guide clinical care.
Maxim Topaz was embarrassed. After submitting a paper to a journal, the editorial staff came back with a problem: one of his citations didn't exist. The reference was plausible, correctly formatted, attributed to a real researcher, and bore a believable publication date. But when they tried to find it, it wasn't there. Topaz uses AI tools regularly to polish his writing. He realized he had no idea where the fake citation came from—the language model had simply invented it, and he hadn't caught it.
That moment of humiliation became the seed for a larger investigation. Topaz, a researcher at Columbia University, and his colleagues developed an automated system to verify references across biomedical literature. They scanned nearly 2.5 million papers published between January 2023 and February 2026, checking 125.6 million references against multiple databases: PubMed, Crossref, OpenAlex, and Google Scholar. What they found was alarming. Across more than 2,800 papers, they identified 4,046 fabricated citations—references that had no corresponding publication anywhere.
The trajectory is what makes this genuinely frightening. In 2023, roughly one in 2,828 biomedical papers contained at least one fabricated reference. By 2025, that ratio had tightened to one in 458. In the first seven weeks of 2026 alone, it was one in 277. The fabrication rate itself jumped more than twelvefold, from 4.0 per 10,000 papers in 2023 to 56.9 per 10,000 papers in early 2026. The spike in mid-2024 aligned precisely with the lag between when large language models became widely adopted and when those papers made their way through peer review and into publication. Topaz told colleagues he suspects they are "catching just the tip of the iceberg."
The mechanics of the problem are straightforward and insidious. Large language models are prone to what researchers call hallucination—they generate text that sounds authoritative and plausible but is entirely fabricated. A fake citation can have the right structure, the right author names, the right journal title, the right year. It passes the eye test. Peer reviewers, working under time pressure and relying on spot-checks rather than exhaustive verification, often miss it. One particularly stark example emerged from a 2025 paper on ureteroileal anastomotic techniques published in an open-access oncology journal. Sixty percent of its references were fabricated. Each one was tailored to the paper's narrow surgical topic and attributed to real urologists. The researchers also found patterns consistent with paper mills—an author duo appearing across eleven papers in a single surgical journal in 2025, with fifteen fabricated references distributed among them.
The implications ripple outward in concentric circles of concern. At the most immediate level, fabricated citations undermine the integrity of individual papers. But Topaz's team has begun finding something worse: fabricated citations are now appearing in systematic reviews and clinical guidelines. These are not obscure documents read by specialists. They are the foundation on which clinicians make treatment decisions. A doctor consulting a guideline to determine how to treat a patient may be following recommendations that cite research that never existed. The guideline appears authoritative. The citations appear real. But the evidence base is phantom.
In an accompanying editorial published in The Lancet, Howard Bauchner and Frederick Rivara—both former editors of JAMA—argued that papers containing fabricated references should be retracted outright. They characterized fabricated citations as research misconduct, plain and simple. The responsibility, they wrote, falls primarily on authors to verify their own work before submission. But they also acknowledged a hard truth: as AI tools become more embedded in the writing process, the problem will only worsen unless systemic safeguards are put in place.
Topaz and his colleagues proposed four concrete steps. First, publishers should implement automated reference verification before papers enter peer review, catching problems before they ever reach reviewers. Second, indexing services should add integrity metadata to article records, flagging papers with known issues. Third, publishers should retroactively screen already-published research to identify and correct or retract papers with fabricated citations. Fourth, fabricated references should become an official category in research integrity databases, creating a public record of the problem.
Arjun Manrai, a researcher at Harvard Medical School who was not involved in the study, framed the stakes plainly: "If our scientific foundation erodes, we are in deep trouble." The concern is not abstract. It is about whether the body of published research—the accumulated knowledge that medicine relies on—can still be trusted. Topaz emphasized that the responsibility for verification falls on everyone in the chain: authors, journals, indexing services, and the AI tool developers themselves. But the clock is ticking. The fabrication rate is accelerating, and the fake citations are moving downstream into the documents that shape clinical practice.
Citas Notables
I think we are catching just the tip of the iceberg.— Maxim Topaz, Columbia University
If our scientific foundation erodes, we are in deep trouble.— Arjun Manrai, Harvard Medical School
La Conversación del Hearth Otra perspectiva de la historia
How did this problem even get discovered? It seems like something that could hide indefinitely.
Topaz found a fake citation in his own work after submission. That personal embarrassment drove him to build a verification system and look at the broader pattern. Without that moment of catching himself, we might still be in the dark about the scale.
So the AI isn't being malicious—it's just making things up because it can.
Exactly. Language models generate text that sounds right. They don't know the difference between a real paper and a plausible-sounding fake one. They're pattern-matching machines, not fact-checkers. And when a human author uses them to draft citations without verification, the hallucination passes through.
The 60 percent fabrication rate in that oncology paper—how does something like that get published?
Peer review is not exhaustive. Reviewers spot-check citations. If the fake ones are well-formatted and attributed to real researchers in the field, they look legitimate. The system assumes authors have done their homework. That assumption is breaking down.
What worries you most about this?
The downstream effect. When fabricated citations end up in clinical guidelines, a doctor following that guideline is making treatment decisions based on research that doesn't exist. That's not just an academic problem anymore. That's patient safety.
Can this be fixed?
Yes, but it requires work at every level. Automated verification before publication. Retroactive screening of existing papers. Authors taking responsibility for their citations. And honestly, people need to be more skeptical of AI-generated text, even when it looks polished.
Is this the end of trusting published research?
Not if we act now. But the window is closing. The fabrication rate is accelerating. If we don't build verification systems into the publishing pipeline, we could reach a point where the scientific literature becomes unreliable.