Templates#
Important when running your own FAIR Data Station instance
The metadata file#
The metadata.xlsx file defines all metadata fields, validation rules, and package structures used in the FAIR Data Station. It is a core configuration file that controls how metadata templates are generated and validated.
The default version of this Excel file is embedded in the application .jar file (src/main/resources/metadata.xlsx). It can also be downloaded here. After starting up the application, the Excel file is copied from the location where the application is being run from (origin_folder/fairds_storage/metadata.xlsx). This copied over configuration file can be modified and this updated version will be used by the application for template generation and validation purposes. New packages can be added by appending them above/below the existing ones.
Sheets overview:
Sheet name |
Description |
|---|---|
Regex |
Contains a lookup list of commonly used regex variables for the other sheets. |
Terms |
Contains all the terms used in the metadata |
Metadata levels |
Contains all the packages used in the metadata |
The regex and terms sheets are required and should not be removed.
The regex sheet#
The regex sheet defines reusable validation patterns that can be referenced in the terms sheet using shorthand notation (e.g. {dna}, {email}). It contains the following columns:
Short Hand Form |
Long Form |
Example |
Name |
Description |
|---|---|---|---|---|
|
|
|
Internal variable for file objects |
Match a file |
|
|
|
Any text pattern |
This matches any text. |
|
|
|
DNA pattern |
Match a string of nucleotides |
|
|
|
An email address |
For example, {file} is not restricted in this context but {dna} maps automatically to the regular expression [ACGTUWSMKRYBDHVN]+. Using shorthand patterns such as {email} instead of complex regular expressions keeps the terms sheet readable and easier to maintain.
The term sheet#
The terms sheet defines a list of terms that can be referenced in the metadata levels sheets using the item label. It contains the following columns:
Item label |
Value syntax |
Example |
Preferred unit |
URL |
Definition |
|---|---|---|---|---|---|
16S recovered |
(No|Yes) |
Yes |
Can a 16S gene be recovered from the submitted bin, SAG or MAG? |
||
16S recovery software |
Tools used for 16S rRNA gene extraction. Add names and versions of software(s), parameters used |
||||
Chlorophyll Sensor |
|
5 mg Chl/m3 |
mg Chl/m3 |
Fluorescence of the water measured in volts and converted to milligrammes of chlorophyll per cubic metre. Format: ##.####, SDN:P02:75:CPWC, SDN:P06:46:UMMC for mg Chl/m3. Example: 0.066. |
|
Citation |
|
Citation of the Sample Registry (HTML version) at the PANGAEA. Example: doi.pangaea.de/10.1594/PANGAEA.76752. |
|||
Demultiplexed forward file |
|
NG-13425_Fyig_005_lib124679_5331_4_1.fastq.gz |
.fq.gz|.fastq.gz|fastq.bz2|fq.bz2 |
File path or name of the forward reads when working with demultiplexed reads |
|
Demultiplexed reverse file |
|
NG-13425_Fyig_005_lib124679_5331_4_2.fastq.gz |
.fq.gz|.fastq.gz|fastq.bz2|fq.bz2 |
File name of the reverse reads with demultiplexed reads |
|
Department |
|
Laboratory of Systems and Synthetic Biology |
The department this person belongs to |
||
Depth |
|
10m |
m |
The distance below the surface of the water at which a measurement was made or a sample was collected. Format: ####.##, Positive below the sea surface. SDN:P06:46:ULAA for m. Example: 14.71 |
- Item label
A human readable label which is used in the Excel headers and in the ontology as rdfs:label for the properties.
- Value syntax
The format used (numeric, data, string, unit or regular expressions) validated through regular expressions with the provided examples (e.g., {float} {unit}).
- Example
An example of the value and how it is defined (e.g., 410 parts per million) also used for validation during startup.
- Preferred unit
The unit of measurement that is preferred but not obligatory. When this is used it will automatically become part of the regex for validation and can contain a list of numbers separated by a ‘|’.
- URL
RDF property URL when defined otherwise will switch to default URL space + / structured comment name.
- Definition
The definition of the structured comment name.
The metadata level sheets#
The other sheets correspond to the different metadata levels. They define which metadata fields are used at each level and whether they are mandatory, optional, or recommended. The sheets contain the following columns:
- Level
The level of the metadata package (Investigation, Study, ObservationUnit, Sample or Assay)
- Package name
The name of the package (e.g., default, air, soil, water, etc.). Default terms are used in all packages of that level.
- Item label
The name of the term as defined in the term sheet.
- Requirement
The requirement of the term and this can vary per package. (mandatory, optional or recommended).
Packages can be defined within a single sheet or separated across multiple sheets corresponding to the metadata level for easier management. The metadata levels, Investigation, Study, ObservationUnit, Sample and Assay are fixed. The content of the optional properties can be freely adjusted. New ObservationUnit, Sample and Assay types can be created by adding rows with a new Package name. These packages are an extension on the core package, inheriting the fields that are shared across all packages on that level.
For example:
Level |
Package name |
Item label |
Requirement |
|---|---|---|---|
Assay |
default |
sample preparation |
optional |
Assay |
default |
notes |
optional |
Assay |
Amplicon library |
Demultiplexed forward file |
optional |
Assay |
Amplicon library |
Demultiplexed reverse file |
optional |
Assay |
Amplicon library |
Forward primer |
mandatory |