Should we believe the headline, "Drinking four cups of coffee daily lowers risk of death"? How about, "Mouthwash May Trigger Diabetes. . ."? Should we really eat more, not less, fat? And what should we make of data that suggest people with spouses live longer?
These sorts of conclusions, from supposedly scientific studies, seem to vary from month to month, leading to ever-shifting "expert" recommendations. However, most of their admonitions are based on flawed research that produces results worthy of daytime TV.
Misleading research is costly to society directly because much of it is supported by the federal government, and indirectly, when it gives rise to unwise, harmful public policy.
Social science studies are notorious offenders. A landmark study in the journal Nature Human Behaviour in August reported the results of efforts to replicate 21 social science studies published in the prestigious journals Nature and Science between 2010 and 2015.
The multi-national team actually "conducted high-powered replications of the 21 experimental social science studies — using sample sizes around five times larger than the original sample sizes" and found that "62% of the replications show an effect in the same direction as the original studies." One out of the four Nature papers and seven of the seventeen Science papers evaluated did not replicate, a shocking result for two prestigious scientific journals. The authors noted two kinds of flaws in the original studies: false positives and inflated effect sizes.
Science is supposed to be self-correcting. Smart editors. Peer review. Competition from other labs. But when we see that university research claims – published in the crème de la crème of scientific journals, no less -- are so often wrong, there must be systematic problems. One of them is outright fraud – "advocacy research" that has methodological flaws or intentionally misinterprets the results.
Another is the abject failure of peer review, which is especially prevalent at "social science" journals. The tale of three scholars who tested the integrity of journals' peer review is revealing. They
wrote 20 fake papers using fashionable jargon to argue for ridiculous conclusions, and tried to get them placed in high-profile journals in fields including gender studies, queer studies, and fat studies. Their success rate was remarkable: By the time they took their experiment public on [October 2nd], seven of their articles had been accepted for publication by ostensibly serious peer-reviewed journals. Seven more were still going through various stages of the review process. Only six had been rejected.
The articles were designed to be an easy call for reviewers to reject. For example, one dismissed "western astronomy" as sexist and imperialist, and made a case for physics departments to study feminist astrology or practice interpretative dance instead.
Another way to cheat is to publish in "predatory journals," which will publish virtually anything for a hefty fee. They are commonly used by ideologues who are trying to further some social or economic agenda. Even when their "research" is discredited or retracted, it continues to be cited by activists.
A subtler manifestation of dishonesty in research is what amounts to statistical cheating. Here is how it works... If you try to answer one question – by asking about levels of coffee consumption, for example, to test whether drinking certain amounts a day are associated with more or less cancer; or whether being married is associated with increased longevity -- and test the results with appropriate statistical methods, there is a 5% chance of getting a (nominally) statistically significant result purely by chance (meaning that the finding isn't real).
If you try to answer two questions, the probability is about 10%. Three questions, about 14%. The more you test, the more likely you'll get a statistical false positive, and researchers exploit this phenomenon. Researchers have thereby created what is for them a winning "science-business model": Ask a lot of questions, look for associations that may or may not be real, and publish the result. Just how many questions are typical in a university research project? It varies by subject area, and researchers have become masters at gaming their methodology. Often, they ask thousands of questions. And they get away with it, because there are no scientific research cops.
In the absence of outright, proven fraud or plagiarism, universities provide little oversight over their scientists, in contrast to industry where monitoring quality-control is de rigeur. Universities claim that peer review is sufficient, but as discussed above, in many fields, it is unreliable, or at best, spotty. The peers are in on the game. In a research-publishing version of The Emperor's New Clothes, editors wink and nod if the researcher seems to be following the rules. And there are no consequences if a researcher's findings are repudiated by others' subsequent research. Their ultimate product is a published paper. The way the game operates is publish, get grants (thanks, taxpayers) and progress up the academic ladder.
The realization that there's something rotten in academic epidemiology research, in particular, is hardly new. As long ago as 2002, two epidemiologists at the University of Bristol (U.K.) wrote in a journal article:
When a large number of associations can be looked at in a dataset where only a few real associations exist, a P value of 0.05 is compatible with the large majority of findings still being false positives [that is, spurious]. These false positive findings are the true products of data dredging, resulting from simply looking at too many possible associations.
An equally disreputable practice related to data dredging is formulating a Hypothesis After the Result is Known, or HARKing, which might be thought of as making up a story to fit the data, instead of the correct practice of accumulating data and seeing whether it confirms or contradicts your a priori hypothesis.
Together, data dredging and HARKing violate the fundamental scientific principle that a scientist must start with a hypothesis, not concoct one after the data set has undergone analysis.
Data dredging and HARKing that yield invalid results can also be applied to laboratory animal experiments, as explained here by Dr. Josh Bloom, a chemist at the American Council on Science and Health. Those phenomena apply as well to clinical studies.
This situation creates a self-serving, self-aggrandizing process. Researchers have been thriving by churning out this junk science since at least the early 1990s, and as most of the work is government funded, it's ripping off taxpayers as well as misleading them. It's a kind of business model in which the dishonest researchers win, and you lose: You lose on the initial cost of the research, the flawed policy implications, and the opportunity costs.
Because editors and peer-reviewers of research articles have failed to end widespread statistical malpractice, it will fall to government funding agencies – or their appropriators, the Congress -- to cut off support for studies with flawed design; and to universities, which must stop rewarding the publication of bad research. Only last month a tenured professor at Cornell University was forced to resign for data dredging and HARKing, but to truly turn the tide, we will need pressure from many directions.
Dr. S. Stanley Young is a statistician who has worked at pharmaceutical companies and the National Institute of Statistical Sciences on questions of applied statistics. He is an adjunct professor at several universities and a member of the EPA's Science Advisory Board. Henry I. Miller, a physician and molecular biologist, is a Senior Fellow at the Pacific Research Institute in San Francisco. He was the founding director of the FDA's Office of Biotechnology. Follow him on Twitter @henryimiller.