Metadata Configurator#

Introduction#

The ISA metadata framework standard specifies an abstract model to capture experimental metadata using three core levels, Investigation, Study and Assay.

The FAIR DS organises this metadata according to this standard. The ‘Investigation’ level provides the project context. A ‘Study’ is a unit of research and an ‘Assay’ an analytical measurement.

In this workshop metadata of the following Investigation was used as an example Parbie PK et al., 2021.

So, let’s pretend we are about to start this Investigation. We define the aim of the study, the approach and specify the sample specific metadata. Using a simple four-step process the FAIR Data Station provides guidance on the management of the metadata. The steps are:

  1. Selection of appropriate metadata standards resulting in spreadsheet template generation

  2. Recording of metadata using the spreadsheet template

  3. Validation of metadata content according to template requirements.

  4. Data FAIRification through generation of a FAIR machine actionable metadata resource

Step 1. Provide context to the investigation#

Start up the FAIR DS tool available at https://fairds.fairbydesign.nl and click on Metadata Configurator

Step 1

Step 2

Step 3

Create an ISA metadata Excel template

Recording of your metadata

Validation of registered Field/Value pairs

Investigation#

The Identifier, Title and the Description of this Investigation would be something like: (you can copy/paste the various items in the Investigation textbox of the FAIR-DS tool).

Identifier: demo1
Title: Dysbiotic Fecal Microbiome in HIV-1 Infected Individuals in Ghana
Description: The aim is to investigate the composition of the gut microbiome in HIV-1 infected individuals undergoing antiretroviral therapy in Ghana, West Africa. Despite effective control of viral replication, these individuals often experience non-AIDS-related diseases like cardiovascular and metabolic disorders.

Note: Make sure that when you add (your) name and e-mail you push the add button.

Add

An Investigation can encompass one or more Studies and Studies can encompass multiple Assays.

Study#

Now Unfold the Study textbox (At the bottom left of the Investigation textbox).

In our example there is one Study – but you can imagine that there will be a follow-up study at a later stage. Here we add the specifics of the particular study. In this case:

Identifier: control_vs_infected
Title: Comparison of the Fecal Microbiome of HIV-1 Infected Individuals in Ghana with seronegative controls
Description: The study will involve 55 HIV-1 infected adults (HIV+) and 55 seronegative controls (HIV-) matched for age and gender.

Paste this text in the textbox, and we have already finished step 1 of 3!

Figure 2. Investigation/Study textbox with Investigation in collapsed state

Investigation/Study information

If you now push Generate workbook button it will export the Investigation and Study into an Excel notebook. As is mentioned, the Observation Unit, Sample and Assay can be integrated at a later stage. When you keep the window open we can move and integrate Step 2 directly.

Step 2: Using the spreadsheet templates#

Observation Unit#

Next, we need to define the Observation Units (in the official ISA ontology called: source-material) from which the samples (ISA: sample-material) will be taken. An observation unit can be a fermenter, a plant, a group of animals under condition A etc.

Note

Here we plan to take one sample from paired 55 (HIV-1 pos) + 55(HIV-1 neg) individuals. We can opt to define them as 110 individual observation units. This would make sense If for instance we would do per individual a time series experiment involving multiple samples.
Alternatively, we can collapse them into two groups, “HIV-1 infected” and “seronegative controls”. With respect to the metadata there are consequences.
For instance, if we define them as 110 individual observation units the “date of birth” becomes a fixed attribute directly linked to individuals /observation unit. If we collapse them into two groups, “Host age” becomes a variable which should be directly linked to the sample taken.

Let’s unfold Observation Unit Information, click the drop-down menu search a package and select the default package. If you click the Generate workbook button now it will export a new Excel file containting the Investigation, Study and Observation Unit.

Note

If you click the Export button inside the Observation Unit, the program will then export only the observation unit sheet of the complete workbook (Figure 3) which can be handy feature when we may want to amend our experimental design with more groups, but here we need to have the complete workbook first.

Figure 3: column headers represent interoperable Field names or attributes

ColumnHeaders

Sample#

On the website we move on to sample level by unfolding Sample Information at the bottom left of the textbox.

The FAIR-DS currently has 40+ minimal information models (packages) to choose from and the most appropriate is in this case the “human gut” package. Mandatory Fields are selected by default, others can be selected to further enrich the metadata.

Figure 4: Human gut package selection, Mandatory human gut fields are selected by default.

PackageSelection

Let’s go back to the metadata per sample available from the original study. (ENA Accsesion: SAMD00244418)

Sample metadata

Attribute/Field_Name

Value

Organism

human gut metagenome

Sample Accession

SAMD00244418

Sample Title

16s rDNA sequence from fecal sample of non-HIV-1 infected male from Koforidua, Ghana, sample ID HG-P-001-KO-A

Center Name

AIDS Research Center, National Institute of Infectious Diseases

Sample Alias

SAMD00244418

Broker Name

DDBJ

ART Status At Baseline

Control

Secondary Accession

DRS176859

Education

Primary school

Occupation

Trader

Collection Date

23/03/2018

Sample Name

001A

Env Broad Scale

human gut

Host Disease Stat

HIV-1 negative

Sex

male

Geo Loc Name

Ghana:Koforidua

Marital Status

Married

ART Drugs Current

Control

Env Local Scale

human gut environment

HIV Risk Exposure

Heterosexual

Project Name

Dysbiotic fecal microbiome in HIV-1 infected individuals in Ghana

Host

Homo sapiens

Env Medium

fecal material

Age

50

Bio Sample Model

MIMARKS.survey.human-gut

ENA-FIRST-PUBLIC

26/03/2021

ENA-LAST-UPDATE

05/11/2023

A number of these Fields represent sample specific metadata (that we would like record on site while sampling). Although the terms are understandable for humans, they have interoperability issues. If we, for instance, concentrate on the block marked in yellow we know that the correct “interoperable” terminology is different.

Field

Value

Interoperability issue (ontology)

Host

Homo sapiens

“ncbi taxonomy id” (9606)

Env Medium

fecal material

“environmental medium”

Age

50

Host age

The selected minimal information model on the other hand does contain the correct Field names and the restrictions on the linked values. These restrictions are also directly available in the resultant Excel workbook (Figure 5).

Figure 5: Metadata field info box Note

To increase the interoperability of your metadata you have the option to either systematically review this list, which incidentally helps you identify crucial metadata you might not have considered and could enhance your design, or filter specifically for terms related to in this case “host” (Figure 6).

Figure 6: Overview of the sample metadata selection (Filter for host and select host age) Sample metadata selection

Finally, the planned Assays#

From the publication we can learn that that 16s rDNA sequences were obtained from fecal samples. On the website we move on to “Assay” information by unfolding Assay Information at the bottom left of the textbox and select “Amplicon demultiplexed”. Again, there are mandatory and optional fields for you to customize.

Figure 7: Assay metadata field info box. Assay metadata field

When you are done you can download the Excel workbook and start filling in your sample metadata.

Hint

You can download a template Excel workbook created by this tutorial available at: here

Step 3: Metadata Validation#

Interoperable Field/value pairs in the metadata often have specific, limited options you can choose from and a mistake is easily made. To check for these restrictions, we have built a validation tool.

It uses regular expressions to check for each restricted Field/value pair, the correct range and the right format. For this you simply upload your Excel workbook and it will check for these mistakes.

You can perform this check at any point in the metadata acquisition process, and it is recommended to conduct regular checks throughout this process.

In the example below the non-existent biosafety level 5 was entered. The validation tool will flag the entry. It compares your input against a set of rules—regular expressions—to ensure it fits the correct format and range. For example, entering '5' for biosafety level will trigger a prompt to suggest an accepted value, such as '1', '2', '3', '4', or 'unknown'.

Figure 8: Example of Metadata Validation Error - The validation tool displays an error message, guiding the user to correct it according to the accepted standards. Validation error

Hint

In the next chapter, we will embark on an exploration of the validation tool.