Templates#

Important when you are running your own instance of the FAIR Data Station

An important part of the metadata registration form is the metadata.xlsx file that is incorporated into the resource (src/main/resources/metadata.xlsx). The current version can be found here. This excel file contains all the terms used in the term sheet and package restrictions can be recorded in one or more sheets. Each level contains some obligatory elements such as the identifier or description of an object and can easily be modified as indicated by the Requirement column.

The metadata file#

The excel file is embedded inside the jar package. After startup the excel file is copied into the fairds_storage folder. You can modify this excel file and after restarting the application will automatically use this for all its validation purposes. You can add new packages by appending them above/below the existing ones.

Sheets overview:

Sheet name

Description

Regex

Contains a lookup list of commonly used regex variables for the other sheets.

Terms

Contains all the terms used in the metadata

ISA-levels

Contains all the packages used in the metadata

There are 2 reserved sheets in the metadata.xlsx file. These are the regex and the terms sheet.

The regex sheet contains the following columns#

Short Hand Form

Long Form

Example

Name

Description

{file}

.*

GXB01322.fast5

Internal variable for file objects

Match a file

{text}

.*

This is a random example text.

Any text pattern

This matches any text.

{dna}

[ACGTUWSMKRYBDHVN]+

AAAGGGTGGAAA

DNA pattern

Match a string of nucleotides

{email}

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

example.email@adress.com

Email

An email address

Here you can see that {file} is not restricted in this context but that when {dna} is used in one of the terms it translates automatically to the regex [ACGTUWSMKRYBDHVN]+. More complex regexes that correspond for example to {email} makes it much cleaner to use in the terms overview.

The term sheet contains the following columns#

Item label

Value syntax

Example

Preferred unit

URL

Definition

16S recovered

(No|Yes)

Yes

Can a 16S gene be recovered from the submitted bin, SAG or MAG?

16S recovery software

Tools used for 16S rRNA gene extraction. Add names and versions of software(s), parameters used

Chlorophyll Sensor

{number}

5 mg Chl/m3

mg Chl/m3

Fluorescence of the water measured in volts and converted to milligrammes of chlorophyll per cubic metre. Format: ##.####, SDN:P02:75:CPWC, SDN:P06:46:UMMC for mg Chl/m3. Example: 0.066.

Citation

^(doi\:)?\d{2}\.\d{4}.*$

Citation of the Sample Registry (HTML version) at the PANGAEA. Example: doi.pangaea.de/10.1594/PANGAEA.76752.

Demultiplexed forward file

{file}

NG-13425_Fyig_005_lib124679_5331_4_1.fastq.gz

.fq.gz|.fastq.gz|fastq.bz2|fq.bz2

http://fairbydesign.nl/ontology/file

File path or name of the forward reads when working with demultiplexed reads

Demultiplexed reverse file

{file}

NG-13425_Fyig_005_lib124679_5331_4_2.fastq.gz

.fq.gz|.fastq.gz|fastq.bz2|fq.bz2

http://fairbydesign.nl/ontology/file

File name of the reverse reads with demultiplexed reads

Department

{text}

Laboratory of Systems and Synthetic Biology

http://schema.org/department

The department this person belongs to

Depth

{number}

10m

m

https://w3id.org/mixs/terms/0000018

The distance below the surface of the water at which a measurement was made or a sample was collected. Format: ####.##, Positive below the sea surface. SDN:P06:46:ULAA for m. Example: 14.71

Item label

A human readable label which is used in the excel headers and in the ontology as rdfs:label for the properties

Value syntax

The format used (numeric, data, string, unit or regular expressions) validated through regular expressions with the provided examples (e.g., {float} {unit})

Example

An example of the value and how it is defined (e.g., 410 parts per million) also used for validation during startup.

Preferred unit

The unit of measurement that is preferred but not obligatory. When this is used it will automatically become part of the regex for validation and can contain a list of numbers separated by a ‘|’.

URL

RDF property URL when defined otherwise will switch to default URL space + / structured comment name

Definition

The definition of the structured comment name

The other sheet contains the following information#

All packages that are to be developed can be placed in a single sheet. For best practices we enabled the support of separated sheets to make it easier to manage the different levels of metadata.

For each package you can define the following properties:

Level

The level of the metadata package (Investigation, Study, ObservationUnit, Sample or Assay)

Package name

The name of the package (e.g., default, air, soil, water, etc.). Default terms are used in all packages of that level.

Item label

The name of the term as defined in the term sheet

Requirement

The requirement of the term and this can vary per package. (mandatory, optional or recommended).

The levels, Investigation, Study, ObservationUnit, Sample and Assay are fixed. The content of the optional properties can be freely adjusted. New ObservationUnit, Sample and Assay types can be created by creating a new rows with a different Package name. These packages are an extension on the core package which contains elements that are shared among the packages defined.

For example:

# Level

Package name

Item label

Requirement

Assay

default

sample preparation

optional

Assay

default

notes

optional

Assay

Amplicon library

Demultiplexed forward file

optional

Assay

Amplicon library

Demultiplexed reverse file

optional

Assay

Amplicon library

Forward primer

mandatory