Skip to main content

Good science gone wrong?

Paul Gertler, professor, Haas School of Business and School of Public Health | August 3, 2015

Most scientists want to tell the truth. We want to help people by answering important questions, and sharing what we learn. But the research endeavor is big and messy. And as we’ve learned from the climate change and HIV/AIDS debates, there will always be folks who favor controversy, dogma, and press coverage over scientific consensus.

Sadly, last week we saw a major step backward in global health, with the launch of a media frenzy around children’s deworming.

The blogosphere is buzzing about a group of epidemiologists at the London School of Hygiene and Tropical Medicine, led by Alex Aiken, that carried out the verification and reanalysis of a landmark 2004 trial of school-based deworming.

The original study, by Michael Kremer and Ted Miguel, measured school participation across two groups of children in Western Kenya: those who received deworming treatments, and those who did not (although the control group received the pills a few years later). The study was unique because Miguel and Kremer took care to measure the spillover effects of deworming: treating some kids can interrupt transmission for nearby children who are not treated, which provides an important benefit.

The punchline from the original study was profound: kids who received deworming were absent from school 25% less than those in the control group. Essentially, healthier kids were spending more time in the classroom. A long term follow-up with the same study population suggests that, on average, kids who were dewormed are earning 20% more as adults.

Parallel studies find similar results, including an evaluation of mass deworming in Uganda and a retrospective evaluation of a deworming campaign in the Southern United States, implemented in the early 1900s by Rockefeller.

NYTimes - Deworming - 2012 There is a consensus among many academic researchers, the World Health Organization, and other technical experts that children’s deworming campaigns represent good value for money, particularly in communities with a heavy burden of worm infections. As a result, more than 100 million Kenyan and Indian kids are now receiving deworming medicines, with support from their governments.

But the re-analysis by the London School suggests a departure from this consensus– and indeed a departure from standard statistical methods. The team does verify that deworming increases school attendance in a “pure” replication of the study, but they also carry out a statistical re-analysis (Davey et al) that introduces several unconventional judgments into the mix.

They show that you can eliminate the impact of deworming on school attendance — if you torture the data. In summary, they:

  • redefine the way the treatment was assigned in the study in a way that is not consistent with the protocols that were used to assign treatment,
  • massage the measures used to estimate school absenteeism (i.e., use non-standard and controversial reweightings of the data), and
  • break the sample into sub-groups, which dramatically reduces statistical power

This fishing exercise was “successful” in getting results that eliminated the effects. One wonders how many other unreported analyses were done until the authors were able to find this particular set of approaches.

Most of my colleagues in the academic community would reject this sort of data mining. I’ve reviewed the replications myself, and I have major concerns with their methodological choices. And this isn’t about a difference between public health and economics. I sit in a public health school and in an economics department, and my colleagues on both sides of campus use the same statistics. We both run randomized, controlled trials. And while we may use different jargon, we share the same methods.

I’m not the only researcher questioning the choices made by Aiken and Davey (see posts from Chris Blattman, Berk Ozler, Michael Clemens, and Alexander Berger).

And I do think that replications are an essential part of science. Existing data should be reanalyzed by independent, outside researchers. This is how we uncover truth. The recent case of Green and Lacours is a great example of this: two graduate students discovered inconsistencies in a study of voter behavior in California, which prompted retraction of the flawed publication.

But here’s the rub: what if the replication, itself, is widely seen as flawed? I think that replications should be held to the same standards as any other study.

In an ideal world, journalists would catch this subtlety. But science requires a great deal of judgment, and one researcher’s judgment might not stand up to the broader community’s scrutiny. This is why an independent peer review process is so important. It makes me wonder why Aiken et al decided to publish their work in the International Journal of Epidemiology. The editor-in-chief of the IJE is a colleague in the same department as the replication authors. While there is no evidence that the Journal loosened its standards for this paper, it does make one wonder. Frankly, for an issue as sensitive as children’s deworming, the optics would be better had the authors chosen a more independent venue for publishing their work.

If the media’s process of self-correction works, journalists will start reporting some of the more critical views of this reanalysis. But so far, they have narrowly focused on the replication’s findings, without giving a voice to the other academics (including the original study authors) who are raising concerns.

So as of last week, journalists and bloggers were reporting that deworming no longer holds up. They are claiming that it has been “debunked” as a cost-effective way to improve children’s school participation. Yet these reporters never contacted the original study authors, nor did they look at other studies that found similar effects in other settings. This is irresponsible, because government policy is highly vulnerable to claims made by the media.

Deworming - World Bank image In the early 2000s, I was a policymaker at the World Bank – chief economist for human development. My job involved reviewing the evidence and deciding how to apply it. We funded programs like mass deworming based on a large amount of evidence from multiple studies. But we would never consider overturning an evidence-based program simply because of one bad replication.

For full disclosure, I should also mention that I am at the same university but not the same department as one of the original 2004 study authors, and I chaired the board of the International Initiative for Impact Evaluation (3ie), the organization that paid the researchers to conduct this replication. I am no longer involved with 3ie, and I was not part of the decision to fund the study. However, I have been involved in other replications funded by 3ie, and the experience has not been positive. As an academic community, we need to come together and set common standards for replication. This is not something we should outsource to researchers for hire.

A final note, on transparency: some articles suggest that data and code from the 2004 study were just recently released — but in fact they were made available more than 8 years ago (in January 2007) and have been shared with dozens of scholars. Back in 2007, Miguel and Kremer did acknowledge a series of minor rounding errors and a coding error in the original work, which caused them to overestimate the impacts of deworming on nearby untreated children.

But they took the time to correct their errors, and they published an updated set of results online. They also actively disseminated the new results to key stakeholders. This makes the story a bit messier, but that’s the process of discovery. It’s never linear.

It would be great if every research group (and journalist!) showed such a commitment to openness and scholarly values. Without this commitment, we will probably keep rolling back our progress in science — rather than advancing the frontiers.

Comments to “Good science gone wrong?

  1. I am really amazed at the amount of criticism that this review is getting from the initial study camp. As a research team member, I saw things that I did not like when I got to see the original study from 2004 — like how unethical is to deprive some schools included in the study from the deworming treatment, even though they got it later. Maybe this lies in the fact that it was a study led by economists, willing to prove a point to justify massive amounts of funding being drawn from public-health institutions to end in pharma companies. Maybe I am wrong.

    But it is very interesting to see how Mr.Gertler is very quick to indicate some sort of bias from the IJE towards the replicating authors, but very quickly dismiss any correlation between his acid article and the fact that one of the original authors works in the same university as him…is it not the same…ah!, no, different department…I see.

    I also understand that not all the data obtained in the original study was used in the 2004 study and therefore, the result also “massaged,” so we have the same accusation going both ways. Everyone knows how a P value can be swerved one way or another if needed.

    This whole question only will be answered when more replications of the study are done, but not by dismissing an article and vilifying the authors just because they have challenged the results of the one and only study about this matter.

  2. Journalists can mostly be excused from blame in this controversy because their news source, The Journal of Epidemiology, has a solid reputation and it is rather unrealistic to expect journalists to attempt to dig deeper into a detailed science article written by-and-for specialists with PhD degrees. However, in their coverage, journalists should have run a counterpoint quote from the authors of the original study to provide balance.

    The top editorial staff at the Journal of EpidemiologyG Davey Smith, S Ebrahim, E Thorn, J Ferrie — must be called to account as to why and how the decision was made that the reinterpretation met scientific standards and got green-lighted for publication in this prestigious journal.

    It’s simply not enough that the original study authors who felt maligned got their grievance published in response. There must be an independent assessment of this controversy to both maintain the esteemed status of the Journal of Epidemiology and to verify the efficacy of this and by extension other deworming programs.

    Macroparasitic diseases are a primary and fundamental bane of human existence …. so on a personal/intuitive level, would not the data-reinterpreting skeptics of deworming’s beneficial outcomes actually have chosen if in the same situation to have had their own kids dewormed?!

  3. How on earth have the original authors and their friends gotten it into their heads that the replication authors were trying their hardest to prove the original study wrong? They weren’t. They simply did a new analysis of the data using methods that made sense to them (and that make sense to many epidemiologists and economists). That’s all.

    The media buzz is not the fault of the study authors and the vast majority (if not all) of the blogs that have since been written on the topic are by people who are vehemently criticizing the replication. The replication authors state very clearly that the case for deworming should not be based on one study and that one needs to look at the whole body of evidence in order to make an informed decision about whether it ‘works.’ Your implication that the publication of the replications in IJE was an inside job is completely unprofessional.

Leave a Reply

Your email address will not be published. Required fields are marked *

Security Question *