Introducing geoextent

geoextent is an easy to use library for extracting the geospatial extent from data files with multiple data formats.

Take a look at the source code on GitHub, the library on PyPI and the documentation website. You can view and test geoextent implementation through interactive notebooks on mybinder.org with a click on the following binder.

Binder

Here is a small example how to use geoextent.

geoextent -b -t -input= 'cities_NL.csv'

The output will show the rectangular bounding box, time interval and crs extracted from file data, as follow:

{'format': 'text/csv',
 'crs': '4326',
 'tbox': ['30.09.2018', '30.09.2018'],
 'bbox': [4.3175, 51.434444, 6.574722, 53.217222]}

The input file used above was obtained from Zenodo. The map below based on OpenStreetMap shows the area of extracted bounding box.

screenshot of example map

You can get quick usage help instructions on the command line, too:

geoextent --help
geoextent is a Python library for extracting geospatial and temporal extents of a file or a directory of multiple geospatial data formats.

usage: geoextent [-h] [-formats] [-b] [-t] [-input= '[filepath|input file]']

optional arguments:
  -h, --help            show help message and exit
  -formats              show supported formats
  -b, --bounding-box    extract spatial extent (bounding box)
  -t, --time-box        extract temporal extent
  -input= INPUT= [INPUT= ...]
                        input file or path

By default, both bounding box and temporal extent are extracted.

Examples:

geoextent path/to/geofile.ext
geoextent -b path/to/directory_with_geospatial_data
geoextent -t path/to/file_with_temporal_extent
geoextent -b -t path/to/geospatial_files


Supported formats:
- GeoJSON (.geojson)
- Tabular data (.csv)
- Shapefile (.shp)
- GeoTIFF (.geotiff, .tif)

Motivation

Geospatial properties of data can serve as a useful integrator of diverse data sets and can improve discovery of datasets. However, spatial and temporal metadata is rarely used in common data repositories, such as Zenodo. Users may ask what data is available for my area of interest over a specific time interval? This question formed the initial idea for creating a library that can serve as the basis for integration geospatial metadata in data repositories. Because a core function is the extraction of the geospatial extent, we named it geoextent. The data extracted using the library can be added to record metadata, which will allow users, specifically researchers, to find relevant data with less time and effort.

Origins

The library’s source code is based on two groups projects (Cerca Trova and Die Gruppe 1) of the study project Enhancing discovery of geospatial datasets in data repositories. We decided to develop the library with Python as we plan to integrate it with o2r’s metadata extraction and processing tool o2r-meta.

Process of creating the codebase

Luckily we did not have to start from scratch but could make geoextent a reimplementation of existing prototypes. We roughly followed these steps:

Current features

For more examples, see documentation.

Next steps

As an immediate next steps, we want to integrate the extraction of extents into or2-meta so that users creating an ERC will have to do less manual metadata creation. We also hope that geoextent is useful to others and have plenty ideas about extending the library. For example, being a Python project, we would like to explore integrating geoextent into Zenodo. Most importantly, we will add support for multiple files and directories, but also further data formats - see project issues on GitHub. We welcome your ideas, feature requests, comments, and of course contributions!

Cite this blog post as Yousef Qamaz, Daniel Nüst. "Introducing geoextent" (2020) in Opening Reproducible Research: a research project website and blog. Daniel Nüst, Marc Schutzeichel, Markus Konkol (eds). Zenodo. doi:10.5281/zenodo.1485437

If you want to discuss the article above, find us on Twitter: @o2r_project

Creative Commons Licence
Except where otherwise noted site content created by the o2r project is licensed under a Creative Commons Attribution 4.0 International License.