o2r student assistant about impressions of reproducibility ready to start a career in research
15 Jun 2020 | By Laura Goulier“Geoscientist with experience in or willingness to learn R programming for reproducible research wanted!”
I had just completed a beginner course in R programming for my master’s thesis and saw my chance to further develop this knowledge and enter the field of geoinformatics, even get a little away from the pure ecology of my master studies in landscape ecology. I had never before heard of the words “reproducible research”, neither heard of any reason why this topic is of importance. So I took the job and worked my way in. After a couple of months I had to realise that in order to publish my master’s thesis, it was the journals obligation to make all code and data openly available to enable other researchers so they could fully understand and reuse my analysis. And there I was, as a landscape ecologist who believed I had nothing to do with reproducible research. Apparently it is important after all, yet not that simple.
During my work in the o2r project I experienced first hand the whole range of reasons why people struggle so much making their work reproducible for others. The main argument, also for me, was this giant amount of additional work. Is it really worth it, I thought? I also believed I had my own structure while scripting and it would be much easier for me not to script in a way so other people understand my analysis, but to primarily make myself understand it. “I would have to spend an entire extra year for my PhD, just to prepare all scripts again for everyone to comprehend”, some PhD students from the atmospheric sciences told me. The desire for reproducibility in research is not always an open door. But maybe it is the same as for everything else. A clean method of working should always be the goal. Students in school should write cleanly so that the teacher can understand their essays. Every company needs a well organised structure to be successful. Scripting, so that only myself and no one else can understand what has been calculated, may in the short term have its benefits as I understand my own work because of the embedded history and context. After two years at the latest, however, not even I myself could look through my work and answer specific questions about my calculations. If we are honest, it happens far too often that we don’t know exactly what we thought at that time, we made that one small change or attempted to fix that nasty bug. We tend to lose track of which scripts contain which results, how a certain parameter was calculated, or what the results would look like if we would change certain values. Getting it right from the beginning is not an extra effort though, it is just a change in the way we work, which saves us time in the long run. And not only for us, but also for many others who no longer need to find answers to the same questions or redo complex analyses themselves.
Now that I finished my master’s thesis, my time in the o2r project is over and I am starting my PhD in terrestrial data analytics at Jülich Research Centre, investigating the impact of human water use on atmospheric extremes. During my job interview, they asked me quite a lot of details about my work at o2r, about limitations and obstacles, about difficulties and successes. This signals to me that reproducibility is not only gladly implemented, but is also an inevitable change that everyone must consider and adapt to, even if it is sometimes bothersome and entails some difficulties that we did not have to think about before. For me, reproducibility also has a social component. To do things not only for oneself, but for making others’ work easier and letting them benefit from one’s own method. For my PhD, I am taking along to further improve my method of working for best practice, because it certainly takes a lot of training. As a beginner in academia, I strongly hope to get help by detailed insights into the scripts of more experienced scientists in order to facilitate my own research.
The o2r team thanks Laura for her contribution to the project. She did great work bridging between geofinformatics and landscape ecology and contributed greatly, among other things, to a paper on platforms for reproducible research. We wish her best of luck for her future academic career!