private beta
search about

Infectious disease epidemiologist and microbiologist, aspirational barista. mlipsitc@hsph.harvard.edu Director @CCDD_HSPH

Jun 6, 2020 1:32 PM
copied to clipboard

In light of the retractions it’s worth remembering: Peer review is one imperfect part of the at-least 4-part safety net that keeps science functioning. Layer 1 is basic ethics among investigators: don’t make up or misrepresent data.

Layer 2 is related to but distinct from layer 1. Try your best to be the one who finds the limitations or flaws in your findings and esp interpretations, before anyone else does. Fix flaws / limitations if you can; highlight them in Discussion if you can’t http://bostonreview.net/science-nature/marc-lipsitch-good-science-good-science

Layer 3 is transparency – to the extent practicable, make code, data and other resources for replication available to others. This is costly – sharing usable code is time consuming, and trades off with other forms of productivity. “Open Science” is a good principle, but costly.

I like the idea of open science but think reasonable people can disagree about where to resolve the tradeoffs involved. As @rafalab pointed out in our discussion last week, new tools make it easier and should be used and taught.

The next layer is peer review. Depending on how well the other layers have been done, and on how much time the peer reviewers and editors devote, this can be a strong part of the net (probably strongest when least needed, as transparency enhances peer review) or a fallible one

As noted, peer review can’t usually prove fraud, but it can sniff out a need for more info to disprove it – as post-publication peer review did for those two @surgisphere papers. Peer review can’t find all methodological flaws because peer reviewers are not omniscient and …

…have limited time esp in a pandemic. Post-publication peer review is another layer (timing ambiguous in the preprint era, before or after formal publication). And finally replication by other labs is the last layer.

As before replication is likely to be most effective when it is least needed – when transparency has been scrupulously observed to make replication easier. But both causally (being transparent with your data and code helps you find flaws) and temperamentally…

(those who are transparent are also in my experience more careful) here again, replication is most needed where it is hardest to do.

Overall perhaps my metaphor was off a little – given the interdependence of these parts of the process, it seems a better metaphor is that they are all strands in one safety net, and when several strands are weak they make the whole thing weak.

But certainly the general point is that saying “peer review can/should do x” is realistic – if at all–only in the context of the other parts of scientific functioning.

Corrollary to #1: If using someone else’s data, don’t let excitement at the possibilities for papers without the hassle of gathering the data oneself blind one to due diligence on data quality and provenance. As a frequent user of data collected by others I recognize the risk.