Templates#

Important when running your own FAIR Data Station instance

The metadata file#

The metadata.xlsx file defines all metadata fields, validation rules, and package structures used in the FAIR Data Station. It is a core configuration file that controls how metadata templates are generated and validated. The default version of this Excel file is embedded in the application .jar file (src/main/resources/metadata.xlsx). It can also be downloaded here. After starting up the application, the Excel file is copied from the location where the application is being run from (origin_folder/fairds_storage/metadata.xlsx). This copied over configuration file can be modified and this updated version will be used by the application for template generation and validation purposes. New packages can be added by appending them above/below the existing ones.

Sheets overview:

Sheet name

Description

Regex

Contains a lookup list of commonly used regex variables for the other sheets.

Terms

Contains all the terms used in the metadata

Metadata levels

Contains all the packages used in the metadata

The regex and terms sheets are required and should not be removed.

The regex sheet#

The regex sheet defines reusable validation patterns that can be referenced in the terms sheet using shorthand notation (e.g. {dna}, {email}). It contains the following columns:

Short Hand Form

Long Form

Example

Name

Description

{file}

.*

GXB01322.fast5

Internal variable for file objects

Match a file

{text}

.*

This is a random example text.

Any text pattern

This matches any text.

{dna}

[ACGTUWSMKRYBDHVN]+

AAAGGGTGGAAA

DNA pattern

Match a string of nucleotides

{email}

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

example.email@adress.com

Email

An email address

For example, {file} is not restricted in this context but {dna} maps automatically to the regular expression [ACGTUWSMKRYBDHVN]+. Using shorthand patterns such as {email} instead of complex regular expressions keeps the terms sheet readable and easier to maintain.

The term sheet#

The terms sheet defines a list of terms that can be referenced in the metadata levels sheets using the item label. It contains the following columns:

Item label

Value syntax

Example

Preferred unit

URL

Definition

16S recovered

(No|Yes)

Yes

Can a 16S gene be recovered from the submitted bin, SAG or MAG?

16S recovery software

Tools used for 16S rRNA gene extraction. Add names and versions of software(s), parameters used

Chlorophyll Sensor

{number}

5 mg Chl/m3

mg Chl/m3

Fluorescence of the water measured in volts and converted to milligrammes of chlorophyll per cubic metre. Format: ##.####, SDN:P02:75:CPWC, SDN:P06:46:UMMC for mg Chl/m3. Example: 0.066.

Citation

^(doi\:)?\d{2}\.\d{4}.*$

Citation of the Sample Registry (HTML version) at the PANGAEA. Example: doi.pangaea.de/10.1594/PANGAEA.76752.

Demultiplexed forward file

{file}

NG-13425_Fyig_005_lib124679_5331_4_1.fastq.gz

.fq.gz|.fastq.gz|fastq.bz2|fq.bz2

File path or name of the forward reads when working with demultiplexed reads

Demultiplexed reverse file

{file}

NG-13425_Fyig_005_lib124679_5331_4_2.fastq.gz

.fq.gz|.fastq.gz|fastq.bz2|fq.bz2

File name of the reverse reads with demultiplexed reads

Department

{text}

Laboratory of Systems and Synthetic Biology

http://schema.org/department

The department this person belongs to

Depth

{number}

10m

m

https://w3id.org/mixs/0000018/

The distance below the surface of the water at which a measurement was made or a sample was collected. Format: ####.##, Positive below the sea surface. SDN:P06:46:ULAA for m. Example: 14.71

Item label

A human readable label which is used in the Excel headers and in the ontology as rdfs:label for the properties.

Value syntax

The format used (numeric, data, string, unit or regular expressions) validated through regular expressions with the provided examples (e.g., {float} {unit}).

Example

An example of the value and how it is defined (e.g., 410 parts per million) also used for validation during startup.

Preferred unit

The unit of measurement that is preferred but not obligatory. When this is used it will automatically become part of the regex for validation and can contain a list of numbers separated by a ‘|’.

URL

RDF property URL when defined otherwise will switch to default URL space + / structured comment name.

Definition

The definition of the structured comment name.

The metadata level sheets#

The other sheets correspond to the different metadata levels. They define which metadata fields are used at each level and whether they are mandatory, optional, or recommended. The sheets contain the following columns:

Level

The level of the metadata package (Investigation, Study, ObservationUnit, Sample or Assay)

Package name

The name of the package (e.g., default, air, soil, water, etc.). Default terms are used in all packages of that level.

Item label

The name of the term as defined in the term sheet.

Requirement

The requirement of the term and this can vary per package. (mandatory, optional or recommended).

Packages can be defined within a single sheet or separated across multiple sheets corresponding to the metadata level for easier management. The metadata levels, Investigation, Study, ObservationUnit, Sample and Assay are fixed. The content of the optional properties can be freely adjusted. New ObservationUnit, Sample and Assay types can be created by adding rows with a new Package name. These packages are an extension on the core package, inheriting the fields that are shared across all packages on that level.

For example:

Level

Package name

Item label

Requirement

Assay

default

sample preparation

optional

Assay

default

notes

optional

Assay

Amplicon library

Demultiplexed forward file

optional

Assay

Amplicon library

Demultiplexed reverse file

optional

Assay

Amplicon library

Forward primer

mandatory