Reproducible Research at EGU GA - A short course recap
03 May 2017 | By Daniel Nüst, Vicky Steeves, Rémi RampinAt last week’s EGU general assembly members of the o2r and ReproZip projects organized the short course “Reproducible computational research in the publication cycle”. This post is a recap of the course by Daniel Nüst, Vicky Steeves, and Rémi Rampin.
All materials for the course are published in an Open Science Framework repository at https://osf.io/umy6g/ and you can learn about the motivation for the course in the course page at EGU.
The short was divided into two parts: a practical introduction to selected tools supporting computational reproducibility, and talks by stakeholders in the scientific publication process followed by a lively panel discussion.
In the first part, Daniel and Vicky began with sharing some literature on reproducible research (RR) with the roughly 30 participants. After all, the participants should take home something useful, so a reading list seems reasonable for RR newcomers but also for researchers writing about the reproducibility aspects in upcoming papers.
Then Daniel fired up a console and took a deep dive into using containers to encapsulate environments for reproducible computational research. He started with a very quick introduction to Docker and then demonstrated some containers useful to researchers, i.e. Jupyter Notebook and RStudio.
The material presented by Daniel is a starting point for an Author Carpentry lesson, which is currently developed on GitHub, so he highly appreciates any feedback, especially by shourt course attendees. We were surprised to learn a good portion of the participants had already some experience with Docker. But even better was realizing a few actually hacked along as Daniel raced through command-line interface examples! This “raw” approach to packaging research in containers was contrasted in the second section.
.@nordholmen forked author carpentry to make a lesson for us today! About to look at rstudio & jupyter notebooks w/ Docker! #egu2017 pic.twitter.com/ekgYuJPkS6
— Vicky Steeves (@VickySteeves) April 24, 2017
Under the title “ReproZip for geospatial analyses”, Vicky and Rémi showcased ReproZip, a tool for automatically tracing and packaging scientific analyses for easily achieved computational reproducibility. The resulting file is a ReproZip package (.rpz
), which can be easily shared due to it’s small size, and contains everything necessary to reproduce research (input files, environmental information etc.) across different operating systems. They demonstrated their various unpackers and showed how these .rpz
files can be used for reproducibility and archiving. They also demoed they brand new user interface for the first time in Europe.
The materials presented by Vicky and Rémi are also available on both the Open Science Framework here and on the ReproZip examples website.
@edzerpebesma @benmarwick @o2r_project And @VickySteeves and @remram44 showing #reprozip pic.twitter.com/4hxpEsmqPN
— Daniel Nüst (@nordholmen) April 24, 2017
The practical demonstrations paved the way for the second part of the short course, which was more abstract yet proofed to excellently demonstrate the breadth of reproducible research. Selected speakers provided their perspectives on the topic of reproducing scientific papers in the broader context of the scientific publication cycle. In short talks they wore a specific role of the scholarly publication process and shared their experience as as researcher, infrastructure provider, publisher, reviewer, librarian, or editor. The speakers:
- Edzer Pebesma talked about his experiences as journal editor for JStatSoft as well as Computers & Geosciences, and his original motivation to enter the area of reproducible research with his prize-winning “one-click reproduce” concept and initiator of o2r: annoyance by not being able to share the full integrated material of his works easily.
- Tobias Weigel from the german national climate computing center introduced the challenges and limitations for a supercomputer facility which provides crucial resources for reproducibility.
- David Ham shared the priorities of the journal Geoscientific Model Development (GMD) where he is editor, when it comes to reproducibility and the issues they face. Proper provenance and citations are examples for the former, the ephemerality of code and data for the latter.
- Xenia van Edig lead us through the stages of Open Access that Copernicus went and is going through as a publisher: from public data (1.0) via interactive articles and public peer review (2.0) to the future of open science and executable papers (3.0)
- Vicky Steeves advertised the expertise of librarians worldwide in supporting research in all aspects, including reproducibility, writing grants, or data management plans, but also pointed out the necessity to support scientists with proper tools and teach the required skills.
- Daniel Nüst (research software engineer perspective)
All speakers touched on the topic of scientific culture, which was seen in a process of changing towards more openness, but with still quite some way to go. The cultural aspects and larger scale challenges were a recurring topic in the panel discussion after the short talks. These aspects included resistance to share supplemental material, so that journals cannot make sharing everything mandatory, for example because of unwillingness (fear of stealing) or because authors might not be allowed to do so. A member of the audience could share that in their experience as a publisher, requiring data and software publication did not result in a decrease in submissions when accompanied by transparent and helpful author guidelines. Such guidelines for both data and code are lacking for many journals but are a means to improve the overall situation - and make the lives of editors simpler. When the progress of the last years on Open Data was pointed out as largely a top down political endeavour, the contrast to Open Source as a bottom-up grassroots initiative became clear. Nevertheless, the hope was phrased that with the success of Open Data, things might go smoother with Open Source in science.
A further topic the discussion covered for some time was creditation, and the need to update the ways researchers get and give credit as part of grant-based funding and publishing scholarly articles. Though it was pointed out that RR is also about “doing the right thing”. Credit and culture were seen as closely linked topics, which can only be tackled by improving the education of scientists, both as authors and reviewers(!), and spreading the word about the importance of reproducibility for all of science, not least in the light of the marches for sciences taking place just a few days before the short course.
While one could say we were mostly preaching to the choir, it was great to see an interest in the topic of reproducible research amongst EGU attendees. This workshop being the first of its kind at the EGU general assembly hopefully was a step towards even higher visibility and interest for RR as a crucial topic in today’s research.
We thank the short course attendees and invited speakers for turning the first afternoon of EGU 2017 into an instructive and diverting few hours. Will there be a reproducible research short course next year at EGU? We don’t know yet, but please do get in touch if you would like to support the planning. It could be worth providing a longer course targeted as early career scientists, giving the next generation the tools to work reproducibly.