User guide: ERC creation¶

This user guide comprises instructions how to create an ERC by hand. It is thus limited to mandatory elements in some places. However, a fundamental goal of the ERC specification is to be simple enough to allow manual ERC creation as demonstrated in this document. It is supposed to ease the understanding of the ERC especially for the authors of scientific publications. For using tools or services for creation and validation of ERCs, please see the developer guide.

Note

This is a draft. If you have comments or suggestions please file them in the issue tracker. If you have explicit changes please fork the git repo and submit a pull request.

Step 1: create workspace¶

Do your research and create something useful that works for you. The ERC specification makes no restrictions on the contents of a workspace, but guidelines and best practices do exist and should be followed by users during their research i.e. even before packaging it in an ERC.

Code and versioning¶

If the base directory contains a script file or source code used to conduct the packaged analysis, we recommend this code is managed using distributed version control, see software carpentry guidelines. The base directory should contain a copy of the complete repository in that case.

Workspace structure¶

The base directory contents should follow common guidelines to project organisation. Some useful resources are

Step 2: create image container for runtime¶

To create a working ERC you must include a complete environment description and an executable image.

We recommend using Docker, so a Dockerfile and a Docker image tarball archive file, to achieve these goals.

See the runtime manifest and image for detailed requirements.

Practically speaking, our tool containerit may help you in creating a first Dockerfile and then make necessary adjustments, e.g. installing a dependency from an unsupported source. It is important that the Dockerfile includes a command that creates the display file from the main file, i.e. executing the workflow and rendering of the output document, and uses a suitable working directory within the container. The working directory is important to support features such as file substitution.

The following Dockerfile snippet switches to the required working /erc directory and uses rmarkdown::render(..) to generate the display file display.html (output to the working directory, just to be sure) from the main file main.Rmd.

WORKDIR /erc/
CMD ["R", "--vanilla", "-e", "rmarkdown::render(
    input = \"/erc/main.Rmd\",
    output_format = rmarkdown::html_document(),
    output_dir = \"/erc\",
    output_file = \"display.html\")"]

Step 3: create metadata¶

ERC metadata¶

Structural & administrative metadata must be put into the ERC configuration file erc.yml as defined in the specification.

When creating the erc manually, you can receive a uuid4 as id for the erc configuration file using an online service, e.g. uuidgenerator or one of the numerous implementations for the common programming languages.

License metadata¶

Please consult your employer or legal department for a suitable license for your work. Make sure you hold the copyright for any code that you want to release under a self-chosen license and that the license is compatible with the conditions of licenses of used data or software.

A good discussion of the legal aspects of reproducible research is given in Victoria Stodden’s “The Legal Framework for Reproducible Scientific Research: Licensing and Copyright” (public preprint) .

Further resources that are linked here without any endorsement or being checked:

choosealicense.com (for code)
opendefinition.org (for code, data, text)
A short lecture on Open Licensing by Lorena A. Barba

License information must be put into the ERC configuration file erc.yml as defined in the specification.

Content metadata¶

Content metadata are used for making your work findable. Properties for the content metadata are defined in the specification and must be put into the metadata.json file.

Secondary metadata¶

As of now, we do not recommend creating secondary metadata by hand.

Secondary metadata are used for third party services, e.g. repositories that define their own obligatory metadata. In general they can be added in different formats to support different use cases and they should be stored in a directory .erc.

More information on secondary metadata can be found in the preservation section.

Step 4: validate¶

You can use the container created in step 2 for validation purposes, too. Run the analysis in the container, then copy the analysis output to a temporary directory on the host system, and finally compare the original workspace and the temporary directory according the validation rules to ensure a complete replication.

Step 5: create bag¶

To create a package that is suitable for being stored in an archive or repository, ERCs must be bundled as BagIt bags. Take a look at the preservation section for a detailed background about the purpose of BagIt and other digital preservation aspects.

Third party tools for creating BagIt bags¶

Bagger (version 2.7, Java-based, with UI)
bagit-python (Python package)

Creating the bag¶

In this guide we will create the bag manually by using the Library of Congress’s (LoC) tool Bagger, listed above.

Start by selecting “Create new bag” from the main menu and proceed with “<no profile>“.
Add your files with the “+” Button.
Uncheck the “Standard” feature in the Bag-Info-Editor on the right and add ERC-Version with the appropriate version you want to use, e.g. 1. Optionally fill out additional Bag-Info metadata, e.g. contact information.
Save your bag using the main menu.

Validating the bag¶

A file tree for the final bagged ERC may look like this:

├── bag-info.txt
├── bagit.txt
├── data
│   ├── .erc/metadata.json
│   ├── erc.yml
│   ├── manuscript.Rmd
│   ├── display.html
│   ├── Dockerfile
│   └── image.tar
├── manifest-md5.txt
└── tagmanifest-md5.txt

You can validate your bag with Bagger by loading the bag and then clicking on “Validate Bag” in the main menu. The programme will check for completeness of BagIt-related files and verify the integrity of the data files by computing their checksums (hashes) and report any potential issues.