New article published in International Journal of Geographical Information Science
17 Dec 2018 | By Markus Konkol, Daniel NüstA few weeks ago, a new journal article written by o2r team member Markus got published. In our last article, we talked about the reproducibility of papers submitted to the AGILE conference. We checked if the papers had materials attached and if these materials were complete. The results were rather unfortunate. In our newest article, we took one further step and tried to re-run the analyses of articles which had code and data in the supplements.
Markus Konkol, Christian Kray & Max Pfeiffer (2019). Computational reproducibility in geoscientific papers: Insights from a series of studies with geoscientists and a reproduction study, International Journal of Geographical Information Science, 33:2, 408-429, DOI: 10.1080/13658816.2018.1508687
The article builds upon our paper corpus for demonstrating the o2r platform. Feel free to distribute this piece of research to whoever might be interested. Feedback is always welcome.
Here is a non-specialist summary:
Recreating scientific data analysis is hard, but important. To learn more about the state of reproducibility in geosciences, we conducted several studies. We contacted over 150 geoscientists who publish and read articles based on code and data. We learned that as readers they often would like to have access to these materials, but as authors they often do not have the time or expertise to make them available. We also collected articles which use computational analyses and tried to execute the attached code. This was not as easy as it sounds! We describe these numerous issues in a structured way and our experiences in this publication. Some issues were pretty easy to solve, such as installing a missing library. Others were more demanding and required deep knowledge of the code which is, as you might imagine, highly time consuming. Further issues were missing materials (code snippets, data subsets) and flawed functionalities. In some cases, we contacted the original authors who were, and this was a positive outcome, mostly willing to help. We also compared the figures we got out of the code with those contained in the original article. Bad news: We found several differences related to the design of the figures and results that deviated from those described in the paper. OK, this is interesting, but why is it important? We argue, a key advantage of open reproducible research is that you can reuse existing materials. Apparently, this is usually not possible without some significant effort. Our goal is not to blame authors. We are very happy that they shared their materials. But they did that with a specific purpose in mind, i.e. making code and data available and reusable for others to build upon that. One incentive in this context is an increased number of citations, one of the main currencies for researchers. To facilitate that, we suggest some guidelines to avoid the issues we encountered during our reproducibility study, such as using Executable Research Compendia (ever heard of them? :)).