All Scientific Research Should Be Crowdsourced

Very few days go by without a new article describing the limits of published scientific research.  The headline cases are about scientists who plagarize or completely fabricate data.  Yet, in my experience, most scientists are actually quite ethical, meticulous, hard-working, and really concerned with finding the truth.  Still, non-scientists would likely be surprised to know that a large number of scientific studies are actually false.  An Amgen study found that 46 out of 53 studies with ‘landmark’ findings were unable to be replicated.  A team at Bayer found a slightly more optimistic picture where 43 out of 65 studies revealed inconsistencies when tested independently.  Scientific journals continue to accept articles based on the novelty and projected impact of the submission, yet simulations illustrate how the bias of journals toward publishing novel results likely leads to an environment where most published results are actually false.  My home discipline of psychology is currently doing some soul searching as it’s a relatively open secret that many results are difficult to reproduce such that a systematic reproducibility project is taking place.

Crowdsourcing is, and always has been the solution.  Indeed, the phrase at the bottom of Google Scholar, “standing on the shoulders of giants”, acknowledges that science has always been about crowdsourcing, as every scholar is collaborating with the scholars before them.  Findings are not produced in a vacuum and build upon (or challenge) previous findings.  Replication by others, which effectively crowdsources verification of results, is at the heart of the scientific method.  It is perhaps a sign of the narcissism of our age that scientists feel compelled to believe that they discover things largely independently, such that they feel compelled to attack when their findings are challenged.  Yet a willingness to be wrong about something is essential to learning, as we can’t learn to walk without falling or learn about relationships without heartbreak.  When science becomes more about ego, career, and grant money, it naturally becomes less accurate.  Insisting that findings be crowdsourced solves this.  No single study, paper, or research group can prove anything by themselves.

Crowdsourcing is not simply averaging the opinions of the masses, as those who would argue against that straw man would have you believe.  Mathematically, crowdsourcing is about reducing the influence of sources of error and there is a great deal of academic research on this topic.  A good crowdsourcing algorithm does not weight all inputs equally, but instead seeks to identify clustered sources of error, which explains why aggregating across people with diverse personalities, perspectives or job functions produces better results.  Inputs need to have some signal vs. noise and need to have uncorrelated error.  The unfortunate assumption in most research is that error is uncorrelated statistical noise that can be dealt with using statistical tests.  Yet error also occurs due to the unconscious biases of researchers, the sheer number of researchers trying to find novel findings, the degrees of freedom that a researcher has in trying to prove their hypothesis, the non-randomness of sampling, and the volume of available statistical tests that a researcher can use.  Given all these other sources of error, it is no wonder that many findings are false.  A good crowdsourcing algorithm would be weighted such that true results would have to be shown by multiple researchers using multiple methods, multiple samples, multiple statistical tests, and multiple paradigms.  This requires crowdsourcing as no single person can do all this, and even if they could, they would still represent a single source of error.

Technology enables crowdsourcing to be conducted far more efficiently, as has been proven by successful science crowdsourcing projects like GalaxyZoo, FoldIt, Seti@Home, and psychology’s reproducibility project.  Trends like citizen science, the quantified self, open access publishing, and interdisciplinarity improve the diversity of perspectives which mathematically improves the ability to find truth.  Every meta-analysis result and Nate Silver’s success in aggregating polls in the last election take advantage of the mathematical principles that underlie crowdsourcing, specifically the certainty that aggregating across sources of error produces more truth.  In our daily lives, we all crowdsource knowledge that we are uncertain about, looking for confirmation from multiple independent sources when we are skeptical.  This same skepticism serves scientists well and scientists should embrace being wrong, confident that the broader truth will be revealed when all data is aggregated intelligently and all perspectives are valued.  Crowdsourcing is not some new technique that threatens to fundamentally change scientific research.  Rather, it is an extension of the collective effort of knowledge aggregation that is the heart of science and scientists should embrace it as such.

- Ravi Iyer

Comments

comments

Also read...