Templates#
Important when you are running your own instance of the FAIR Data Station
An important part of the metadata registration form is the metadata.xlsx file that is incorporated into the resource (src/main/resources/metadata.xlsx). The current version can be found here. This excel file contains all the terms used in the term sheet and package restrictions can be recorded in one or more sheets. Each level contains some obligatory elements such as the identifier or description of an object and can easily be modified as indicated by the Requirement column.
The metadata file#
The excel file is embedded inside the jar package. After startup the excel file is copied into the fairds_storage folder. You can modify this excel file and after restarting the application will automatically use this for all its validation purposes. You can add new packages by appending them above/below the existing ones.
Sheets overview:
Sheet name |
Description |
---|---|
Regex |
Contains a lookup list of commonly used regex variables for the other sheets. |
Terms |
Contains all the terms used in the metadata |
ISA-levels |
Contains all the packages used in the metadata |
There are 2 reserved sheets in the metadata.xlsx
file. These are the regex
and the terms
sheet.
The regex sheet contains the following columns#
Short Hand Form |
Long Form |
Example |
Name |
Description |
---|---|---|---|---|
|
|
|
Internal variable for file objects |
Match a file |
|
|
|
Any text pattern |
This matches any text. |
|
|
|
DNA pattern |
Match a string of nucleotides |
|
|
|
An email address |
Here you can see that {file} is not restricted in this context but that when {dna} is used in one of the terms it translates automatically to the regex [ACGTUWSMKRYBDHVN]+
. More complex regexes that correspond for example to {email}
makes it much cleaner to use in the terms overview.
The term sheet contains the following columns#
Item label |
Value syntax |
Example |
Preferred unit |
URL |
Definition |
---|---|---|---|---|---|
16S recovered |
(No|Yes) |
Yes |
Can a 16S gene be recovered from the submitted bin, SAG or MAG? |
||
16S recovery software |
Tools used for 16S rRNA gene extraction. Add names and versions of software(s), parameters used |
||||
Chlorophyll Sensor |
|
5 mg Chl/m3 |
mg Chl/m3 |
Fluorescence of the water measured in volts and converted to milligrammes of chlorophyll per cubic metre. Format: ##.####, SDN:P02:75:CPWC, SDN:P06:46:UMMC for mg Chl/m3. Example: 0.066. |
|
Citation |
|
Citation of the Sample Registry (HTML version) at the PANGAEA. Example: doi.pangaea.de/10.1594/PANGAEA.76752. |
|||
Demultiplexed forward file |
|
NG-13425_Fyig_005_lib124679_5331_4_1.fastq.gz |
.fq.gz|.fastq.gz|fastq.bz2|fq.bz2 |
File path or name of the forward reads when working with demultiplexed reads |
|
Demultiplexed reverse file |
|
NG-13425_Fyig_005_lib124679_5331_4_2.fastq.gz |
.fq.gz|.fastq.gz|fastq.bz2|fq.bz2 |
File name of the reverse reads with demultiplexed reads |
|
Department |
|
Laboratory of Systems and Synthetic Biology |
The department this person belongs to |
||
Depth |
|
10m |
m |
The distance below the surface of the water at which a measurement was made or a sample was collected. Format: ####.##, Positive below the sea surface. SDN:P06:46:ULAA for m. Example: 14.71 |
- Item label
A human readable label which is used in the excel headers and in the ontology as rdfs:label for the properties
- Value syntax
The format used (numeric, data, string, unit or regular expressions) validated through regular expressions with the provided examples (e.g., {float} {unit})
- Example
An example of the value and how it is defined (e.g., 410 parts per million) also used for validation during startup.
- Preferred unit
The unit of measurement that is preferred but not obligatory. When this is used it will automatically become part of the regex for validation and can contain a list of numbers separated by a ‘|’.
- URL
RDF property URL when defined otherwise will switch to default URL space + / structured comment name
- Definition
The definition of the structured comment name
The other sheet contains the following information#
All packages that are to be developed can be placed in a single sheet. For best practices we enabled the support of separated sheets to make it easier to manage the different levels of metadata.
For each package you can define the following properties:
- Level
The level of the metadata package (Investigation, Study, ObservationUnit, Sample or Assay)
- Package name
The name of the package (e.g., default, air, soil, water, etc.). Default terms are used in all packages of that level.
- Item label
The name of the term as defined in the term sheet
- Requirement
The requirement of the term and this can vary per package. (mandatory, optional or recommended).
The levels, Investigation, Study, ObservationUnit, Sample and Assay are fixed. The content of the optional properties can be freely adjusted. New ObservationUnit, Sample and Assay types can be created by creating a new rows with a different Package name
. These packages are an extension on the core package which contains elements that are shared among the packages defined.
For example:
# Level |
Package name |
Item label |
Requirement |
---|---|---|---|
Assay |
default |
sample preparation |
optional |
Assay |
default |
notes |
optional |
Assay |
Amplicon library |
Demultiplexed forward file |
optional |
Assay |
Amplicon library |
Demultiplexed reverse file |
optional |
Assay |
Amplicon library |
Forward primer |
mandatory |