How Do We Deal with Non-Confirmatory Results?

Fig 1. Photo of some members of the research team on one of our experimental small islands by J. Losos

Thanks to Nick for doing a new research post when our paper “Consumer responses to experimental pulsed subsidies in isolated vs. connected habitats” first came out. Here I want to give some backstory on the road to publication (all views are my own).

This was an epic experiment overall: 52 experimental units, 4+ years, thousands of person-minutes of lizard surveys, thousands of food web stable isotope samples, several tons of seaweed, and one hurricane that washed it all away.

I think the most interesting thing about this paper is that we did not find what we expected.

For some biological background, a meta-analysis (Yang et al. 2010) of largely observational studies found that populations increase the most and the fastest when consumers respond to resource pulses (brief, unpredictable periods of resource superabundance) via both aggregation and reproduction. To test the prediction that without aggregation the numerical response would be slower and smaller, in the current study we manipulated seaweed on mainlands (as in our previous study, e.g., Spiller et al. 2010, Wright et al. 2013) and also on very small islands (Fig. 1) where aggregation on ecological time scales is not possible.

Despite a bigger N this time around, we did not replicate the numerical response on mainlands that we saw in Spiller et al. (2010). In other words, more seaweed did not result in more lizards on mainlands. Conversely, we saw fast and large population gains on small islands. We did replicate the timing and magnitude of the diet shifts, indicating that lizards were consuming the subsidized resources. So whether resource pulses translate into more individuals is context-dependent, even with the same researchers using the same methods with the same species. In the discussion we talk about what could be driving these differences.

Now to my main story with this post: what happens when you have un-tidy, non-confirmatory results? The first reviews at a top tier ecology journal were very positive about the generality of the questions and the realistic temporal and spatial scale of the experiment. We were rejected for not being able to explain the mechanisms; fair enough. However, this same critique would be true even if we had confirmatory results. I don’t think we would have drawn that critique, or at least it would not have had such a large impact on the editorial decision, with confirmatory results. We next tried at a second-tier ecology journal, and were rejected without review.

I was up for the tenure the year this paper was going through the review process. Pretty much the only way the paper would be accepted pre-dossier would be to go back to the first journal and accept their original offer to shunt to their online-only sister journal. I have happily made that call in the past given different trade-offs. In this case, I felt rejection was largely being driven by the non-confirmatory results, which I stubbornly believed did not compromise the quality of the paper. To me, good science is asking good questions (i.e., rooted in theory) with good design; the value of the paper is not predicated on the outcome of the study. I asked some senior profs in my department for advice and got both, “a published paper is better than no paper” and “do what you would have done regardless of tenure.” I went with the latter because at that point I felt one paper was not going to make or break my diverse contributions over the prior five years.

I decided to try next at The American Naturalist for a couple reasons. One, their checklist for authors signals similar values to mine, such as indicating whether the study was pre-registered. Another was that by chance, Dan Bolnick, current EIC, was in my session at the ESA meeting. Dan announced that he would be holding “office hours” to promote submission to AmNat. I had never pitched a paper to an editor before, but this was made easier since (also by chance) I know Dan from grad school.

I gave Dan my 2-minute pitch, emphasizing that we had unexpected results that we couldn’t fully explain. He opened his response with, “I sympathize…”, and I braced for the polite rejection. But he meant that he literally sympathized, because he had a study with confirmatory results published in a high profile journal, but a later replication with more data was non-confirmatory and ended up several tiers down. He encouraged me to submit (with no guarantees of course), and I leaned in hard to our unexpected results and lack of replication, from the cover letter through supplemental material, being as transparent as possible. It was still a tough and long review process, and the paper has several real limitations, but I am gratified that it got into a top journal on its merits as planned, warts and all, without spin.

We haven’t seen the replication crisis in ecology I think for two main reasons. One is that big field experiments like our pulsed subsidies studies are rarely repeated (for lots of reasons), and two because ecologists are very comfortable with context-dependency. But how often is a lack of replication due to real biological differences that are useful to understand (as I argue was the case in our paper) vs. the statistical issues that plague other disciplines? Ecologists are often taught to cope with non-confirmatory results by reframing to “tell the story you have,” which runs the risk of HARKing, one of the four horsemen of the reproducibility crisis. Preferences for confirmatory results help drive these practices. In our study, the questions, hypotheses, and design were essentially pre-registered in the grant that funded the work, and staying committed to a plan regardless of the results is the best defense against the garden of forking paths.

As for studies rarely being repeated in the first place, I am haunted by a review of restoration studies by Vaughn and Young (2010) that found fewer than 5% of studies were initiated in more than one year, and 76% of studies that did use multiple years found different results in different years. To me this means that we should not inhale too deeply on single studies, we should focus more on replication and less on novelty, and that our inability to replicate some of the results of Spiller et al. 2010 is a feature, not a bug!

If you are interested in learning more about this system, check out Piovia-Scott et al. 2019 which shows that the strength of top-down control by lizards varies predictably over the course of the pulse.

Author
Recent Posts

Amber Wright

I am hapa-filipina and was born and raised in Honolulu, Hawaii. I grew up catching introduced anoles, which through a lot of luck on top of hard work became a career. I am a professor at the University of Hawaii at Manoa.

4 Comments

Add Comment →

Rick Wallach

Much enjoyed this; I inhabit the arts myself (and an overgrown Miami property with all kinds of introduced and hybrid anoles) so it’s nice to see a scientist come to terms with the same sorts of capricious peer review habits as, say, literary critics.

But I did enjoy the “El jardín de senderos que se bifurcan” reference and the paper to which it referred. Elliptical Borges references always delight me. Speaking of which, perhaps the solution to the replication crisis may be found in another Borges story, “Tlon, Uqbar, Orbis Tertius.” I suspect the “hron” could well become ecological confirmability’s answer to the WIMP in dark matter physics.

Well, it’s worth a try, innit?

Loading...

September 6, 2020

- Amber Wright
  
  Having ecologists read more Borges certainly couldn’t hurt!
  
  Loading...
  
  September 6, 2020
  
Kurt Schwenk

Amber, thanks for this very interesting comment. Personally I am a big believer in chance, variation and context dependence. In my experience, the year-to-year variation in basic field/natural history observations is extreme. Any single year study would grossly misrepresent reality. Similarly, in lab-based functional morphology, we have found among- and even within-individual variance in functional attributes equivalent to differences identified in the literature as diagnostic of large taxonomic groups! I believe that the systemic emphasis on quantity over quality has largely driven these problems, including the sense that replicating prior results is less likely to advance one’s career than a novel result, no matter how bogus. But then, I am a cynic.

Loading...

September 6, 2020

- Amber Wright
  
  I agree that career pressures play a role. I have never been particularly strategic, but I am looking forward to making this the core of my mission now that I am transitioning to mid-career. I aim to promote these values in my own work, in how I train students, and my role as a reviewer of grants, papers, and job applicants. We are the field, we reflect it, we can change it.
  
  Loading...
  
  September 6, 2020