Scientists, institutions and journals have been increasingly evaluated statistically, by metrics that focus on the number of published reports rather than on their content, raising a concern that this approach interferes with the progress of biomedical research. To offset this effect, we propose to use the R-factor, a metric that indicates whether a report or its conclusions have been verified.
The ability of academic scientists to keep their job, be promoted, or receive funding has become increasingly dependent on three statistical parameters: the number of their publications, how often these publications have been cited, and the impact factor of the journals in which these publications appeared (Abbott et al. 2010, Hall 2012, Van Noorden 2010, Sahel 2011). The reliance on these parameters varies among countries and institutions (Abbott et al. 2010, Hall 2012, Sahel 2011, Van Noorden 2010), but the administrative convenience of the statistical approach suggests that it will continue to spread (Abbott et al. 2010, Van Noorden 2010). A growing concern is that this approach interferes with the progress of biomedical research by forcing publication prematurely, before the veracity of the findings has been verified (Abbott et al. 2010, Fang, Steen, and Casadevall 2012, Lawrence 2007, Ioannidis 2005b, Ioannidis 2005a, Young, Ioannidis, and Al-Ubaydli 2008). As a result, the number of reports that are irreproducible and thus potentially misleading, especially to non-experts in the field, has grown sufficiently large (Begley and Ellis 2012, Ioannidis 2005b, Ioannidis 2005a) to call for action to solve this problem (Couzin-Frankel 2012, https://www.scienceexchange.com/reproducibility , http://openscienceframework.org/project/EZcUj/wiki/home).
A systemic solution would be to offset the parameters that encourage publication with a parameter (s) that evaluate what is reported. Currently, this function is served by the citation index of a report, and the impact factor of the journal in which this report appeared. However, the citation index can be misleading, if only because it increases even if the report is cited as being irreproducible or wrong (Lawrence 2007). The utility of the impact factors, which are average citation indexes for the papers published by the journals over the last two years, has also been questioned, especially if used as a tool to evaluate individual scientists (Lawrence 2007, Sahel 2011, Editorial 2013).
We propose to use a measure termed the R-factor, which would indicate how many studies attempted to verify a given article - that is to determine whether the results can be reproduced or the main conclusions confirmed-and what was the outcome. A newly published article would have the R-factor of 0. If another article finds that the experiments described in the article can be repeated with similar results, and/or the main conclusions or predictions are correct, then the R-factor becomes 1. If either of these conditions are not met, the R-factor would be 0. As more studies attempt to verify the article, The R-factor would change to a value between 0 and 1. For example, if ten studies attempt to verify a report and all successfully do so, its R-factor would be 1 (10/10). If two of them fail, the R-factor would be 0.8 (8/10) and if all find it irreproducible, then the R-factor would be 0 (0/10). The number of studies used to calculate the R-factor would be indicated in brackets next to it, such as 0.8 (10). The R-factor is applicable to any report that makes a testable conclusion, whether the study is experimental or theoretical and would not punish the authors that conducted rigorous research but made wrong interpretations, nor the authors who made right conclusions for a wrong reason. The R-factor of scientists, institutions, or journals would be the average of the R-factors of the papers they have published.
We suggest that by giving an explicit numerical value to the veracity of scientific reports the R-factor would make biomedical research more rigorous and efficient, and its results and conclusions more accessible and transparent outside of a specific research field. For example, the need to explain a low R-factor at the next evaluation would make a scientist think twice before publishing a study that calls for further verification. Having an R-factor assigned to each publication would bring the discussion about the veracity of studies from the grapevine to the public view and for the public benefit. An outsider to a field could use the R-factor as a guide to focus on more reliable publications without the need to seek the opinions of the insiders. The possibility of receiving an R-factor of 0 (n) could be used as a deterrent against an overly enthusiastic colleague or advisor who pushes for publishing the results before they are verified. Science journals would also be more attentive to the content of manuscripts to avoid hurting their R-factor, while individuals and institutions would pride themselves on the quality of their research by citing the R-factor along with their citations indexes.
Our optimistic view raises three practical questions: How feasible is it to determine the R-factors, who would do that and keep the scores, and would the R-factor cause more harm than good?
In theory, since the R-factor is a simple ratio of publications that confirm or disprove the report in question, calculating it should be relatively straightforward for an expert in the research field. It would require obtaining the citation index of the report, determining which of the citing articles attempted to verify the results and how many of them were successful. Some experts would not even need to resort to the citation index, as they know the published and unpublished history of their field by heart. In practice, the ease of determining whether a study is verifiable would be true for some articles, but not the others, as it has been outlined in detail by a previous proposal to introduce a metric for evaluating reproducibility of scientific publications (Hartshorne and Schachner 2012). The ease would depend on whether the experimental procedures are described in sufficient detail to reproduce them, whether the conclusions are formulated explicitly enough to be verifiable, whether the experimental setting can be recapitulated without required expertise (Bissell 2013) and at reasonable expense, and whether the results of verification are published, which is often not the case. We suggest that the incentive to increase their R-factor would encourage scientists to describe the experimental conditions in sufficient detail and to formulate their conclusions unambiguously. The use of the R-factor in evaluating scientists and institutions would encourage authors and editors to publish reports that attempt to verify previous studies.
Who would calculate the R-factor and keep the scores? The R-factor can be calculated by individual scientists, scientific societies, bibliometric companies, such as Elsevier and Thomson Reuters, reproducibility initiatives (Couzin-Frankel 2012, https://www.scienceexchange.com/reproducibility , http://openscienceframework.org/project/EZcUj/wiki/home) and evaluation committees. The variety of potential sources implies the need to aggregate the resulting R-factors in an accessible way, as it is currently done with citation indexes. This function can be fulfilled by an open-access resource with the required expertise (Hartshorne and Schachner 2012). For example, the NCBI, which have expertise in analyzing and annotating scientific reports can include the R-factor as a field for the papers referenced in Pubmed. A natural solution would also be to link the R-factor to the citation indexes. Introducing three types of citations - positive, if the cited report is verified, negative, if it is not, and neutral, if the report is mentioned without evaluation, which would make the citation index more meaningful and would allow the R-factor of a report to be computed in real time. We feel that once the R-factor enters the public domain, the opportunities to keep the scores and use them would evolve beyond what we can now envision.
One concern is whether using the R-factor would do more harm than good, for example, by preventing reports of unorthodox ideas, by being used as a tool to undermine someone's reputation, or by maligning the studies after failing to reproduce them for the lack of expertise. We feel that the transparency of calculating the R-factor - the papers that will be used to calculate the R-factor are all in the public domain - would make using it for non-scientific purposes difficult. As for the new ideas, the R-factor would help a non-expert to distinguish hypotheses and ideas that have been confirmed from those that are presented or accepted as established facts without sufficient verification. We understand at the same time that science is a human activity, meaning that the R-factor can be misused as the case with other apparently benign tools, including the citation indexes and impact factors.
We hope, however, that introducing an explicit and quantitative measure that focuses on the veracity of scientific reports and the validity of their conclusions would offset at many levels - from the bench to the editorial board - the push to publish no matter what and thus would accelerate progress in biomedical research. We invite the scientific community and the institutions that evaluate the scientific literature to give the R-factor a try.
We thank David Vaux, Daniela Cimini, and Martin Schwartz for their comments and discussions.
Fang, F. C., R. G. Steen, and A. Casadevall. 2012. "Misconduct accounts for the majori\nty of retracted scientific publications." Proc Natl Acad Sci U S A no. 109 (42):17028-33. doi: 10.1073/pnas.1212247109.
http://openscienceframework.org/project/EZcUj/wiki/home. Open Science Framework Reproducibility Project.
http://www.scienceexchange.com/reproducibility. Science Exchange Reproducibility Initiative.
Showing 2 Reviews
The present culture where quantity is the universal
criterion damages all of science, not only biomedical research. Anything that
might mitigate this condition is well worth trying, and the proposed R-Factor
would address the problem head-on by introducing a measure of reliability.
Universal availability of R-Factor data would also be a
powerful discouragement of deliberate faking of results.
I’m not clear how this would work: “The R-factor . . . would not punish the authors that conducted
rigorous research but made wrong interpretations, nor the authors who made
right conclusions for a wrong reason”. Surely a wrong interpretation =
conclusion shows up as not reproducible?
Might R-Factors add to the difficulty that truly
ground-breaking advances encounter, things that presage a scientific revolution
because they are counter to accepted beliefs? “Cold fusion” was officially
dismissed almost immediately because many would-be replications failed. But a
significant number of researchers continue to achieve positive results in the
general area of “low energy nuclear reactions” (LENR), and many of the early
non-replications were by individuals or groups in physics or nuclear science
who were not competent in the pertinent electrochemical and thermal techniques.
Moreover, the continuing research in this field tends to be published in
non-mainstream places because mainstream reviewers continue to regard the field
as spurious. So the most potentially important work might garner low R-scores
and be hindered even more than is presently the case.
How would indirect replications be handled? There are
comparatively few publications that report attempts to replicate precisely.
Most commonly, the soundness of published work is tested when others attempt to
use it to advance further. When that works, it suggests that the work was
indeed sound. When it doesn’t work, it may not be the earlier publication was
unsound, the problem may be with the attempted new advance.
On the other hand, proceeding further with apparent success
does not necessarily mean that the relied-upon earlier work was actually sound.
Much work can seem to be advancing even though the fundamental paradigm is
mistaken. Enormous numbers of publications have been generated in HIV/AIDS
research even though the basic premise that HIV causes AIDS is wrong (The Case
against HIV, http://thecaseagainsthiv.net). Similarly, the literature on
human-caused global warming is huge and apparently mutually reinforcing even
though the basic belief is at best unproven, that carbon dioxide is the chief
forcer of warming (Henry H. Bauer, Dogmatism
in Science and Medicine: How Dominant Theories Monopolize Research and
Stifle the Search for Truth, McFarland 2012; A politically liberal
global-warming skeptic?, http://wp.me/a2VG42-f).
The only way to determine whether R-Factors are feasible,
and whether their benefit meets expectations, and whether there are negative
consequences, and to discover possible unintended consequences, is to try them
out. Interest and collaborations might be found in several places:
Specifically for medical matters, the Cochrane Collaboration
(http://www.cochrane.org) was established more than two decades ago as an
independent body free from conflicts of interest to evaluate the actual efficacy
and safety of contemporary practices. Published Cochrane Reviews might
constitute a database for testing the R-Factor concept. People who have worked
in the Cochrane Collaboration might be valuable collaborators toward putting
the R-Factor idea into practice.
Testing the R-Factor concept seems a natural for research in
Science & Technology Studies, eminently feasible as the basis for thesis
and dissertation projects. Practices associated with Citation Indexing have
long been a significant aspect of Science & Technology Studies, and
R-Factor studies would be a natural corollary of this sub-specialty. An obvious
program would be to apply R-Factor analyses to topics in which there are highly
cited articles and to compare and contrast the Citation scores with R-Factor
scores. Since voluminous citation is associated with famous blunders as well as
with major advances, one might expect to find a bimodal distribution of
highly-cited articles, with clusters at both the high and the low ends of the
Establishment of the Citation Index and the associated work
in Science & Technology Studies has enabled the latter to become visible to
practicing scientists with potentially more impact on actual practices in
science than were achieved by academic philosophy of science or history of
science or sociology of science. Substantial development of the R-Factor by
scholarship in Science & Technology Studies might well mediate significant
impact of R-Factor scores on actual scientific practice.
This article and its reviews are distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and redistribution in any medium, provided that the original author and source are credited.