What a massive database of retracted papers reveals about science publishing’s ‘death penalty’
Better editorial oversight, not more flawed papers, might explain a flood of retractions
A scientist’s fraudulent studies put patients at risk
Volunteer watchdogs pushed a small country up the rankings
One publisher, more than 7000 retractions
Fallout for co-authors
About these data
Nearly a decade ago, headlines highlighted a disturbing trend in science: The number of articles retracted by journals had increased 10-fold during the previous 10 years. Fraud accounted for some 60% of those retractions; one offender, anesthesiologist Joachim Boldt, had racked up almost 90 retractions after investigators concluded he had fabricated data and committed other ethical violations. Boldt may have even harmed patients by encouraging the adoption of an unproven surgical treatment. Science, it seemed, faced a mushrooming crisis.
The alarming news came with some caveats. Although statistics were sketchy, retractions appeared to be relatively rare, involving only about two of every 10,000 papers. Sometimes the reason for the withdrawal was honest error, not deliberate fraud. And whether suspect papers were becoming more common—or journals were just getting better at recognizing and reporting them—wasn’t clear.
Still, the surge in retractions led many observers to call on publishers, editors, and other gatekeepers to make greater efforts to stamp out bad science. The attention also helped catalyze an effort by two longtime health journalists—Ivan Oransky and Adam Marcus, who founded the blog Retraction Watch, based in New York City—to get more insight into just how many scientific papers were being withdrawn, and why. They began to assemble a list of retractions.
That list, formally released to the public this week as a searchable database, is now the largest and most comprehensive of its kind. It includes more than 18,000 retracted papers and conference abstracts dating back to the 1970s (and even one paper from 1756 involving Benjamin Franklin). It is not a perfect window into the world of retractions. Not all publishers, for instance, publicize or clearly label papers they have retracted, or explain why they did so. And determining which author is responsible for a paper’s fatal flaws can be difficult.
Still, the data trove has enabled Science, working with Retraction Watch, to gain unusual insight into one of scientific publishing’s most consequential but shrouded practices. Our analysis of about 10,500 retracted journal articles shows the number of retractions has continued to grow, but it also challenges some worrying perceptions that continue today. The rise of retractions seems to reflect not so much an epidemic of fraud as a community trying to police itself.
Among the most notable findings:
Although the absolute number of annual retractions has grown, the rate of increase has slowed.
The data confirm that the absolute number of retractions has risen over the past few decades, from fewer than 100 annually before 2000 to nearly 1000 in 2014. But retractions remain relatively rare: Only about four of every 10,000 papers are now retracted. And although the rate roughly doubled from 2003 to 2009, it has remained level since 2012. In part, that trend reflects a rising denominator: The total number of scientific papers published annually more than doubled from 2003 to 2016.
Much of the rise appears to reflect improved oversight at a growing number of journals.
Overall, the number of journals that report retractions has grown. In 1997, just 44 journals reported retracting a paper. By 2016, that number had grown more than 10-fold, to 488. But among journals that have published at least one retraction annually, the average number of retractions per journal has remained largely flat since 1997. Given the simultaneous rise in retractions, that pattern suggests journals are collectively doing more to police papers, says Daniele Fanelli, a lecturer in research methods at the London School of Economics and Political Science who has co-written several studies of retractions. (The number per journal would have increased, he argues, if the growing number of retractions resulted primarily because an increased proportion of papers are flawed.)
“Retractions have increased because editorial practices are improving and journals are trying to encourage editors to take retractions seriously,” says Nicholas Steneck, a research ethics expert at the University of Michigan in Ann Arbor. Scientists have kept the pressure on journals by pointing out flaws in papers on public websites such as PubPeer.
In general, journals with high impact factors—a measure of how often papers are cited—have taken the lead in policing their papers after publication. In 2004, just one-fourth of a sampling of high-impact biomedical journals reported having policies on publishing retractions, according to the Journal of the Medical Library Association (JMLA). Then, in 2009, the Committee on Publication Ethics (COPE), a nonprofit group in Eastleigh, U.K., that now advises more than 12,000 journal editors and publishers, released a model policy for how journals should handle retractions. By 2015, two-thirds of 147 high-impact journals, most of them biomedical titles, had adopted such policies, JMLA reported. Proponents of such policies say they can help journal editors handle reports of flawed papers more consistently and effectively—if the policies are followed.
Journals with lower impact factors also appear to be stepping up their standards, Steneck says. Many journals now use software to detect plagiarism in manuscripts before publication, which can avoid retractions after.
But evidence suggests more editors should step up.
A disturbingly large portion of papers—about 2%—contain “problematic” scientific images that experts readily identified as deliberately manipulated, according to a study of 20,000 papers published in mBio in 2016 by Elisabeth Bik of Stanford University in Palo Alto, California, and colleagues. What’s more, our analysis showed that most of the 12,000 journals recorded in Clarivate’s widely used Web of Science database of scientific articles have not reported a single retraction since 2003.
Relatively few authors are responsible for a disproportionate number of retractions.
Just 500 of more than 30,000 authors named in the retraction database (which includes co-authors) account for about one-quarter of the 10,500 retractions we analyzed. One hundred of those authors have 13 or more retractions each. Those withdrawals are usually the result of deliberate misconduct, not errors.
Nations with smaller scientific communities appear to have a bigger problem with retractions.
Retraction rates differ by country, and variations can reflect idiosyncratic factors, such as a particularly active group of whistleblowers publicizing suspect papers. Such confounding factors make comparing retraction rates across countries harder, Fanelli says. But generally, authors working in countries that have developed policies and institutions for handling and enforcing rules against research misconduct tend to have fewer retractions, he and his colleagues reported in PLOS ONE in 2015.
A retraction does not always signal scientific misbehavior.
Many scientists and members of the public tend to assume a retraction means a researcher has committed research misconduct. But the Retraction Watch data suggest that impression can be misleading.
The database includes a detailed taxonomy of reasons for retractions, taken from retraction notices (although a minority of notices don’t specify the reason for withdrawal). Overall, nearly 40% of retraction notices did not mention fraud or other kinds of misconduct. Instead, the papers were retracted because of errors, problems with reproducibility, and other issues.
About half of all retractions do appear to have involved fabrication, falsification, or plagiarism—behaviors that fall within the U.S. government’s definition of scientific misconduct. Behaviors widely understood within science to be dishonest and unethical, but which fall outside the U.S. misconduct definition, seem to account for another 10%. Those behaviors include forged authorship, fake peer reviews, and failure to obtain approval from institutional review boards for research on human subjects or animals. (Such retractions have increased as a share of all retractions, and some experts argue the United States should expand its definition of scientific misconduct to cover those behaviors.)
Determining exactly why a paper was withdrawn can be challenging. About 2% of retraction notices, for example, give a vague reason that suggests misconduct, such as an “ethical violation by the author.” In some of those cases, authors worried about damage to their reputations—and perhaps even the threat of libel lawsuits—have persuaded editors to keep the language vague. Other notices are fudged: They state a specific reason, such as lack of review board oversight, but Retraction Watch later independently discovered that investigators had actually determined the paper to be fraudulent.
Ironically, the stigma associated with retraction may make the literature harder to clean up.
Because a retraction is often considered an indication of wrongdoing, many researchers are understandably sensitive when one of their papers is questioned. That stigma, however, might be leading to practices that undermine efforts to protect the integrity of the scientific literature.
Journal editors may hesitate to hand down the death penalty—even when it’s justified. For instance, some papers that once might have been retracted for an honest error or problematic practices are now being “corrected” instead, says Hilda Bastian, who formerly consulted on the U.S. National Library of Medicine’s PubMed database and is now pursuing a doctorate in health science at Bond University in Gold Coast, Australia. (The Retraction Watch database lists some corrections but does not comprehensively track them.) The correction notices can often leave readers wondering what to think. “It’s hard to work out—are you retracting the article or not?” Bastian says.
COPE has issued guidelines to clarify when a paper should be corrected, when it should be retracted, and what details the notices should provide. But editors must still make case-by-case judgments, says Chris Graf, the group’s co-chair and director of research integrity and publishing ethics at Wiley, the scientific publisher based in Hoboken, New Jersey.
A concerted effort to reduce the stigma associated with retractions could allow editors to make better decisions. “We need to be pretty clear that a retraction in the published literature is not the equivalent of, or a finding of, research misconduct,” Graf says. “It is to serve a [different] purpose, which is to correct the published record.”
One helpful reform, some commentators say, would be for journals to follow a standardized nomenclature that would give more details in retraction and correction notices. The notices should specify the nature of a paper’s problems and who was responsible—the authors or the journal itself. Reserving the fraught term “retraction” for papers involving intentional misconduct and devising alternatives for other problems might also prompt more authors to step forward and flag their papers that contain errors, some experts posit.
Such discussions underscore how far the dialogue around retractions has advanced since those disturbing headlines from nearly a decade ago. And although the Retraction Watch database has brought new data to the discussions, it also serves as a reminder of how much researchers still don’t understand about the prevalence, causes, and impacts of retractions. Data gaps mean “you have to take the entire literature [on retractions] with a grain of salt,” Bastian says. “Nobody knows what all the retracted articles are. The publishers don’t make that easy.”
Bastian is incredulous that Oransky’s and Marcus’s “passion project” is, so far, the most comprehensive source of information about a key issue in scientific publishing. A database of retractions “is a really serious and necessary piece of infrastructure,” she says. But the lack of long-term funding for such efforts means that infrastructure is “fragile, and it shouldn’t be.”
Ferric Fang, a clinical microbiologist at the University of Washington in Seattle who has studied retractions, says he hopes people will use the new database “to look more closely at how science works, when it doesn’t work right, and how it can work better.” And he believes transparent reporting of retractions can only help make science stronger. “We learn,” he says, “from our mistakes.”
One publisher, more than 7000 retractions
Some 40% of the retractions in the Retraction Watch database have a single curious origin. Over the past decade, one publisher—the Institute of Electrical and Electronics Engineers (IEEE) in New York City—has quietly retracted thousands of conference abstracts.
Most of the abstracts are from IEEE conferences that took place between 2009 and 2011. The 2011 International Conference on E-Business and E-Government alone resulted in retractions of more than 1200 abstracts. In all, IEEE has retracted more than 7300 such abstracts. Most of the authors are based in China, and their papers covered topics as diverse as physical sciences, business, technology, and social sciences.
Many of the retraction notices offer few specifics about the reason. For example, the notice for retracting “The Study on Simulating Binaural Room Impulse Response” says simply: “After careful and considered review of the content of this paper by a duly constituted expert committee, this paper has been found to be in violation of IEEE’s Publication Principles.”
So what happened? IEEE hasn’t given many details. The group, which sponsors more than 1700 conferences each year, requires peer review of all abstracts and papers before publication. But several years ago, in its decades-old catalog of abstracts, IEEE staff started to notice thousands of summaries that “did not meet our guidelines,” according to a spokesperson. The spokesperson wouldn’t disclose how they noticed the issue, “for reasons of operational integrity.”
The episode may reflect the more rapid and less intensive form of peer review that conference submissions often undergo compared with papers submitted to traditional journals, says computer scientist Lior Pachter of the California Institute of Technology in Pasadena. The accelerated timetable “allows for quick turnaround for ideas and quick sharing,” he says, but it also can mean that mistakes slip through.
To prevent future mass retractions, IEEE says it has formed a committee of staff and volunteer experts to serve as “gatekeepers” for conference materials and provide an additional level of quality control. That sounds like a good step, Pachter says. Researchers in quick-moving fields such as computer science “know and have known for a long time that many [conference] papers are problematic,” he notes. And “people don’t want to have garbage in their conferences.”
A scientist’s fraudulent studies put patients at risk
In biomedical science, most papers that lead to retractions don’t threaten anyone’s life. But medical studies published by a once-prominent anesthesiologist offer a troubling exception, a story of how flawed and fraudulent papers can put patients in danger.
By 2010, Joachim Boldt was a research leader at Klinikum Ludwigshafen, an academic teaching hospital in Germany. That year, a sharp-eyed reader noticed a suspicious figure in one of Boldt’s 2009 publications, in the journal Anesthesia & Analgesia. A later investigation revealed that Boldt had likely fabricated data, ignored ethics rules, and committed other kinds of misconduct in 98 articles he published with co-authors. All but two are now retracted.
Many of those studies had supported the effectiveness of intravenous solutions containing hydroxyethyl starch, or hetastarch, which doctors use to stabilize the blood pressure of patients during and after surgery or trauma. Although hetastarch and related products have been in widespread use since the 1960s, they are controversial. Study findings have suggested such products can have side effects including kidney damage and death. But Boldt’s research appeared to show that a particular form, containing synthetic molecules called colloids, was safe.
That conclusion is now in doubt. In 2011, after the scandal, a group of medical societies in the United Kingdom withdrew their influential guidelines on intravenous fluid therapy—which endorsed colloids—because they included references to four of Boldt’s tainted papers.
Joachim Stumpp, Boldt’s former boss at Klinikum Ludwigshafen, told Science and Retraction Watch that, as far as he knows, investigators have found “no reported [cases] of serious impairment or injury of any patient” treated by Boldt (whose current whereabouts and work Science could not determine). But many other patients around the world likely were touched by the fraud, says Christian Wiedermann, an intensive care specialist at the Private University for Health Sciences, Medical Informatics and Technology in Hall in Tyrol, Austria, who has written several papers about the scandal. Although proving that particular patients suffered or died because of the misconduct would likely require expensive, large-scale studies, he says, “Logic dictates that, shamefully, patient harm must undoubtedly have resulted from Boldt’s actions.”
Researchers from the University of Manitoba in Winnipeg, Canada, have tried to assess how Boldt’s misconduct might have distorted the scientific literature. Their analysis, published in 2013 in The Journal of the American Medical Association, examined 38 published articles, including seven by Boldt, comparing the use of fluids containing hetastarch with three other types of volume expanders. Taken together, the findings of the studies—which involved nearly 11,000 critically ill patients—suggested hetastarch solutions were as safe as the other fluids. But when the researchers excluded the 590 patients in Boldt’s studies on hetastarch, a darker picture emerged: The fluids posed modest but statistically significantly greater risks of kidney damage and death.
The future of hetastarch treatments remains uncertain. This year, the European Medicines Agency, which has already curtailed the use of the products, proposed banning them outright.
German authorities reportedly considered bringing criminal charges against Boldt but have not.
Fallout for co-authors
In 2011, chemist Bernhard Biersack, a postdoctoral fellow at the University of Bayreuth in Germany, struck a promising deal to collaborate with a well-funded cancer scientist based in the United States. They launched a multiyear partnership that Biersack says produced 12 journal articles, including seven that reported original research.
But the seemingly productive alliance soon became Biersack’s worst nightmare: He had unknowingly signed on with a scientist who would be found guilty of scientific misconduct. That researcher, Fazlul Sarkar, formerly of Wayne State University in Detroit, Michigan, has now had more than 30 of his papers retracted.
You don’t have to look hard for other cases of collaborators ensnared in scandals over fraudulent publications. In one high-profile case, social psychologist Diederik Stapel’s tendency to make up entire experiments led to dozens of retracted papers, most of which included junior collaborators.
So how do such disasters affect their careers? The short answer is: It depends.
Some collaborators face a frustrating struggle to clear their names. Thomas Hall, a professor of accounting at the University of Texas in Arlington, has repeatedly implored the publisher of a 2002 paper he co-wrote to reconsider its 2015 decision to retract it. Hall says the paper was withdrawn simply because another author, James Hunton, was found guilty of sweeping misconduct. Hall argues the results reported in their paper are valid and have been supported by later research. (The publisher, the American Accounting Association, didn’t respond to requests for comment.)
In other cases, co-authors escape relatively unscathed. Biersack, for instance, has not been a co-author on any of Sarkar’s retracted papers. Still, when Biersack learned about the misconduct, he was worried: Sarkar had contributed data and wording to some of his publications. So “I checked my papers with him again,” he says. “I could not find mistakes.”
Biersack remains a postdoc in Bayreuth, working on a temporary contract. He says he has seen no signs that his collaboration with Sarkar has held him back; no referees have mentioned it in their reviews of his work, for example.
His experience is consistent with findings reported by Joshua Krieger and colleagues at Harvard Business School in Boston in 2017. They showed that more prominent authors of papers retracted for fraud or misconduct often face greater penalties—in the form of fewer citations to their previous work—than do less prominent authors. But a different, 2013 study found that when it’s not obvious who on a research team was to blame for a retraction, the less prominent coauthors experience larger declines in citations, reported Ginger Zhe Jin of the University of Maryland in College Park and colleagues.
To avoid possible career damage, Krieger suggests scientists build a portfolio of papers that includes ones written with different co-authors, which can help make a researcher “less sensitive to the discrediting of any one paper or researcher.” But even if a co-author is hit with retractions, Biersack says, “it does not mean the end of your career.”
Volunteer watchdogs pushed a small country up the rankings
Drumroll, please: The countries that top our rankings of most retractions by nation are … Iran and Romania.
Why? They’re not among the world’s leaders in the absolute number of retractions—that dubious honor goes to the United States and China.
But ranking countries that way can be misleading. The United States and China fund many researchers who together publish many papers, which in turn can increase the number of papers that must be retracted.
Instead, Science and Retraction Watch created two measures that allow consistent comparisons across countries. The first is retractions per dollar of national research funding from 2003 to 2016, which is a proxy for the size of a nation’s scientific establishment. The second is retractions per paper published.
By the funding measure, Romania takes the top spot. (The United States falls to 34th, and China to 14th.) But the story doesn’t end there: Romania’s leading rate of retractions per research dollar probably reflects the outsize effect of some dogged watchdogs—a small band of researchers who have been politely but firmly contacting journals to point out suspected plagiarism by Romanian authors. That activism has led to dozens of retracted papers.
The effort was launched in 2013 by Stefan Hobai of the University of Medicine and Pharmacy in Târgu Mureş, Romania, who dubbed it, with no intentional irony, the Project dedicated to arrest of the name decline of the Romanian achievement (PANDORA) in biomedical publishing. Hobai tells Science and Retraction Watch that he acted because the editors of Acta Medica Marisiensis—published by the same university Hobai works for—ignored 17 messages in which he reported articles suspected of plagiarism.
Since then, PANDORA’s few members, who other than Hobai have remained anonymous, have posted allegations on two blogs, including side-by-side comparisons of similar text. The editor of Acta Medica Marisiensis has questioned Hobai’s motives and called his allegations “vicious.” Editors at three other publications have acted on PANDORA’s allegations, but not always in ideal fashion: They often simply removed plagiarized papers from their websites with no notification or reason, leaving no trace that the paper ever existed. That practice runs contrary to guidelines issued by the Committee on Publication Ethics, an international group that advises journal editors.
PANDORA is just one example of bands of researchers taking it upon themselves to clean up the literature. Some are subject specific, and many, such as PubPeer, are international. Others—such as VroniPlag, which began as a way to crowdsource suspected plagiarism in theses in Germany, and PANDORA—are country specific.
By the second measure—retractions per paper published—Iran tops the leaderboard and Romania drops to second (among countries that published at least 100,000 papers from 2003 to 2016; see graphic, left). Iran’s position may reflect several high-profile scandals involving fake peer review. But this analysis may overstate Iran’s retraction rate. That’s because the incidence was calculated using a tally of published papers developed by the U.S. National Science Foundation. That count includes only papers published in English. If it also included papers published in Farsi—Iran’s national language—the rate could change.