AI-generated nonsense is leaking into scientific journals

In February, an absurd, AI-generated rat penis somehow snuck its way into a since retracted Frontiers in Cell and Developmental Biology article. Now that odd travesty seems like it may just be a particularly loud example of a more persistent problem brewing in scientific literature. Journals are currently at a crossroads on how best to respond to researchers using popular but factually questionable generative AI tools to help draft manuscripts or produce images. Detecting evidence of AI use isn’t always easy, but a new report from 404 Media this week shows what appears to be dozens of partially AI-generated published articles hiding in plain sight. The dead give away? Commonly uttered, computer generated jargon. 

404 Media searched the AI-generated phrase “As of my last knowledge update” into Google Scholar’s public database and reportedly found 115 different articles that appeared to have relied on copy and pasted AI model outputs. That string of words are one of many turns of phrase often churned out by large language models like OpenAI’s ChatGPT. In this case, the “knowledge update” refers to the period when a model’s reference data was updated. Chat. Other common generative-AI phrases include “As an AI language model” and “regenerate response.” Outside of academic literature, these AI artifacts have appeared scattered in Amazon product reviews, and across social media platforms.  

Several of the papers cited by 404 Media appeared to copy the AI text directly into peer-reviewed papers purporting to explain complex research topics like quantum entanglement and the performance of lithium metal batteries. Other examples of journal articles appearing to include the common generative AI phrase “I don’t have access to real-time data” were also shared on X, formerly Twitter, over the weekend. At least some of the examples reviewed by PopSci did appear to be in relation to research into AI models. The AI utterances, in other words, were part of the subject material in those instances. 

Though several of these phrases appeared in reputable, well-known journals, 404 Media claims the majority of the examples it found stemmed from small, so-called “paper mills” that specialize in rapidly publishing papers, often for a fee and without scientific scrutiny or scrupulous peer review.. Researchers have claimed the proliferation of these paper mills has contributed to an increase in bogus or plagiarized academic findings in recent years. 

Unreliable AI-generated claims could lead to more retractions  

The recent examples of apparent AI-generated text appearing in published journal articles comes amid an uptick in retractions generally. A recent Nature analysis of research papers published last year found more than 10,000 retractions, more than any year previously measured. Though the bulk of those cases weren’t tied to AI-generated content, concerned researchers for years have feared increased use of these tools could lead to more false or misleading content making it past the peer review process. In the embarrassing rat penis case, the bizarre images and nonsensical AI-produced labels like “dissiliced” and “testtomcels” managed to slip by multiple reviewers either unnoticed or unreported. 

There’s good reason to believe articles submitted with AI-generated text may become more commonplace. Back in 2014, the journals IEEE and Springer combined removed more than 120 articles found to have included nonsensical AI-generated language. The prevalence of AI-generated text in journals has almost surely increased in the decade since then as more sophisticated, and easier to use tools like OpenAI’s ChatGPT have gained wider adoption. 

A 2023 survey of scientists conducted by Nature found that 1,600 respondents, or around 30% of those polled, admitted to using AI tools to help them write manuscripts. And while phrases like “As an AI algorithm” are dead giveaways exposing a sentence’s large language model (LLM) origin, many other more subtle uses of the technology are harder to root out. Detection models used to identify AI-generated text have proven frustratingly inadequate. 

Those who support permitting AI-generated text in some instances say it can help non-native speakers express themselves more clearly and potentially lower language barriers. Others argue the tools, if used responsibly, could speed up publication times and increase overall efficiency. But publishing inaccurate data or fabricated findings generated by these models risks damaging a journal’s reputation in the long term. A recent paper published in Current Osteoporosis Reports comparing review article reports written by humans and generated by ChatGPT found the AI-generated examples were often easier read. At the same time, the AI-generated reports were also filled with inaccurate references. 

“ChatGPT was pretty convincing with some of the phony statements it made, to be honest,” Indiana University School of Medicine professor and paper author Melissa Kacena said in a recent interview with Time. “It used the proper syntax and integrated them with proper statements in a paragraph, so sometimes there were no warning bells.”

Journals should agree on common standards around generative AI

Major publishers still aren’t aligned on whether or not to allow AI-generated text in the first place. Since 2022, journals published by Science have been strictly prohibited from using AI-generated text or images that are not first accepted by an editor. Nature, on the other hand, released a statement last year saying they wouldn’t allow AI-generated images or videos in its journals, but would permit AI-generated text in certain scenarios. JAMA currently allows AI-generated text but requires researchers to disclose when it appears and what specific models were used. 

These policy divergences can create unnecessary confusion both for researchers submitting works and reviewers tasked with vetting them. Researchers already have an incentive to use tools at their disposal to help publish articles quickly and boost their overall number of published works. An agreed upon standard around AI generated content by large journals would set clear boundaries for researchers to follow. The larger established journals can also further separate themselves from less scrupulous paper mills by drawing firm lines around certain uses of the technology or prohibiting it entirely in cases where it’s attempting to make factual claims.