AI-generated fake citations are flooding scientific literature across publications, scientists warn

 

The citations at the end of a research paper should represent a solid foundation of existing knowledge about a particular field, a pool of peer-reviewed sources built over years of research and study. However, with the increasing use of AI and large language models in writing research papers, there's a growing chance that the citation someone clicks on may not even exist, and that the study, the source, or even the researchers themselves could be entirely fake.

In a recent study posted to the arXiv preprint server, researchers audited millions of papers and found that an estimated 146,900 hallucinated citations were present in research papers hosted on four major scientific repositories—arXiv, bioRxiv, SSRN, and PubMed Central. These numbers were for 2025 alone.

The hallucinated citations were not limited to a handful of bad apples but appeared across many papers, each containing a small number of fake references, pointing to a broader pattern of researchers using AI yet failing to fact-check the output.

Scientific research advances by building on prior discoveries, where each new finding depends on what has already been established. In this space, the rapid growth of AI use and the accompanying hallucinations show no sign of slowing down, which raises serious concerns.

Hallucinating intelligence

Generative AI tools built on large language models are quite good at producing information that sounds plausible and realistic, yet is completely fabricated or incorrect. These models are trained on massive datasets to learn patterns, which they then use to predict the next word and generate new content.

As a result, they can sometimes produce output based on pattern prediction rather than any reliance on actual facts.

Hallucinated content isn't limited to scientific literature, as it makes its appearances in government reports, legal filings, and even news articles from renowned media publications.

Scientists have previously studied AI hallucinations, but most studies were either conducted under laboratory conditions or confined to small samples or narrow domains. The actual scale and impact of such mistakes, particularly within scientific literature, was still unclear.

Exposing the non-existent

In this study, the team conducted a large-scale audit of 111 million references drawn from 2.5 million scientific papers. Using a mix of automated and manual checks, they searched for citation titles that could not be linked to any real publication.

Over 95% of the references were successfully matched. For the remaining entries, they corrected typing errors using AI until a match appeared, and for the few mystery titles still left, they turned to Google Scholar to ensure no obscure publications were missed.

To isolate AI's role, the team also looked at unmatched citation rates before 2023, before ChatGPT, Gemini, and other large language models took off, which gave them a baseline for measuring how much of the problem could be attributed to AI versus human error.

The audit revealed a sharp surge in fake, non-existent citations appearing in serious scientific papers, especially from mid-2024 onward.

The study found that early-career scientists and small teams were most likely to include these fake citations, and in some cases, these same researchers saw their productivity increase by roughly three times since the advent of AI.

Another interesting pattern appeared where hallucinated references tended to disproportionately credit already prominent and male scholars, suggesting that errors generated by LLMs may reinforce existing inequalities in scientific recognition.

The data exposed existing gaps in guardrails, such as preprint moderation, journal editors, and peer review, which could catch only a small fraction of these errors. For example, while arXiv moderation caught some issues, an estimated 78.8% of non-existent citations still passed through and appeared on the platform.

The researchers warn that hallucinations are steadily infiltrating knowledge production at scale, threatening both its reliability and equity. Without intervention, its impact could bleed from the future of scientific discovery to policy and public understanding.

Comments