The Python library geoextent by the o2r project team was selected for presentation at the 2021 EarthCube annual meeting in the peer-reviewed notebooks session. In this blog post, student assistant Sebastian reports from the event.
geoextent presented at the 2021 EarthCube
Notebooks as a scholarly object, database interoperability, FAIR workflows, connecting data and code, and tools for geosciences research are some of the topics discussed at the 2021 EarthCube annual meeting. At the event, o2r team members Sebastian and Daniel presented geoextent, a Python library designed to extract temporal and spatial extent from data files.
We presented the librry as part of the 2nd call for Notebooks for a digital proceedings of the EarthCube annual meeting following the increased interest of the geosciences research community on reproducible workflows.
Exploring research data repositories with geoextent
The notebook, accessible through Binder, includes an introduction of geoextent’s usage and a case study where we explored more than 300 Zenodo repositories (over 25.000 files!) with geoextent. An initial exploration of Zenodo’s API showed that spatial metadata is rarely available, difficulting data integration and discovery. The objective of our case study was to verify if we can increase the current percentage of repositories with geospatial information on Zenodo by using geoextent.
Screenshot of presentation showing the current state of spatial metadata in Zenodo
Our results suggest that geoextent could be used to increase spatial metadata of repositories by directly extracting information from the files deposited on them.
However, we identified a series of challenges for this approach including geospatial information being stored in ambiguous formats (e.g., CSV and
.asc files) or incorrectly georeferenced files in specialized formats (e.g., missing coordinate reference system or flipped coordinates).
This case study also provide information for further development of geoextent to support more file formats and fix.
Screenshot of presentation showing the results of our case study with geoextent
For more information about geoextent you can follow these links:
EarthCube meeting 2021
In addition to presenting geoextent, the participation in the event allowed us to get an insight into notebooks as research objects and scientific publications. Some of the reflections on the evolution of the guidelines, review process, and selected notebooks with respect to the first call were discussed in a panel. In the same panel, representatives of the Jupyter, R Markdown, and Matlab communities presented different tools to share research results and how they could be integrated better within the context of scientific publications.
Among the other 18 accepted notebooks we found interesting tools, for example cf_xarray, used to simplify the usage of Climate and Forecast (CF) compliant datasets by improving the metadata of files , a methodology to access to OpenTopography’s Cloud Optimized GeoTIFF data for topography information or an educational platform to learn about glaciers . All of these studies give us a picture of different geosciences research questions and how they are presented in fully reproducible workflows.