Metadata Configurator#
Introduction#
The ISA metadata framework standard specifies an abstract model to capture experimental metadata using three core levels, Investigation, Study and Assay.
The FAIR DS organises this metadata according to this standard. The ‘Investigation’ level provides the project context. A ‘Study’ is a unit of research and an ‘Assay’ an analytical measurement.
In this workshop metadata of the following Investigation was used as an example Parbie PK et al., 2021.
So, let’s pretend we are about to start this Investigation. We define the aim of the study, the approach and specify the sample specific metadata. Using a simple four-step process the FAIR Data Station provides guidance on the management of the metadata. The steps are:
Selection of appropriate metadata standards resulting in spreadsheet template generation
Recording of metadata using the spreadsheet template
Validation of metadata content according to template requirements.
Data FAIRification through generation of a FAIR machine actionable metadata resource
Step 1. Provide context to the investigation#
Start up the FAIR DS tool available at https://fairds.fairbydesign.nl and click on Metadata Configurator
Step 1 |
Step 2 |
Step 3 |
---|---|---|
Create an ISA metadata Excel template |
Recording of your metadata |
Validation of registered Field/Value pairs |
Investigation#
The Identifier
, Title
and the Description
of this Investigation would be something like: (you can copy/paste the various items in the Investigation textbox of the FAIR-DS tool).
Identifier: demo1
Title: Dysbiotic Fecal Microbiome in HIV-1 Infected Individuals in Ghana
Description: The aim is to investigate the composition of the gut microbiome in HIV-1 infected individuals undergoing antiretroviral therapy in Ghana, West Africa. Despite effective control of viral replication, these individuals often experience non-AIDS-related diseases like cardiovascular and metabolic disorders.
Note: Make sure that when you add (your) name and e-mail you push the add button.
Package selection (Investigation)#
At each level the metadata can vary depending on the packages selected. At the investigation level click on ‘Select a package’ and choose default.
Within a package not all terms have to be selected (by default only mandatory and recommended terms are selected). Terms from other packages are available to be selected in case you want to mix and match different packages together and re-use terms.
Once this is done you can go to the next section.
Study#
Now Unfold the Study textbox (At the bottom left of the Investigation textbox).
In our example there is one Study – but you can imagine that there will be a follow-up study at a later stage. Here we add the specifics of the particular study. In this case:
Identifier: control_vs_infected
Title: Comparison of the Fecal Microbiome of HIV-1 Infected Individuals in Ghana with seronegative controls
Description: The study will involve 55 HIV-1 infected adults (HIV+) and 55 seronegative controls (HIV-) matched for age and gender.
Paste this text in the textbox, and we have already finished step 1 of 3!
Figure 2. Investigation/Study textbox with Investigation in collapsed state
If you now push Generate workbook
button it will export the Investigation and Study into an Excel notebook. As is mentioned, the Observation Unit, Sample and Assay can be integrated at a later stage. When you keep the window open we can move and integrate Step 2 directly.
Package selection (Study)#
As is done for the investigation a package needs to be selected for the study as well. Select default in the dropdown menu.
Step 2: Using the spreadsheet templates#
Observation Unit#
Next, we need to define the Observation Units (in the official ISA ontology called: source-material) from which the samples (ISA: sample-material) will be taken. An observation unit can be a fermenter, a plant, a group of animals under condition A etc.
Note
Here we plan to take one sample from paired 55 (HIV-1 pos) + 55(HIV-1 neg) individuals. We can opt to define them as 110 individual observation units. This would make sense If for instance we would do per individual a time series experiment involving multiple samples.
Alternatively, we can collapse them into two groups, “HIV-1 infected” and “seronegative controls”. With respect to the metadata there are consequences.
For instance, if we define them as 110 individual observation units the “date of birth” becomes a fixed attribute directly linked to individuals /observation unit. If we collapse them into two groups, “Host age” becomes a variable which should be directly linked to the sample taken.
Let’s unfold Observation Unit Information, click the drop-down menu search a package and select the default package. If you click the Generate workbook button now it will export a new Excel file containting the Investigation, Study and Observation Unit.
Note
If you click the Export button inside the Observation Unit, the program will then export only the observation unit sheet of the complete workbook (Figure 3) which can be handy feature when we may want to amend our experimental design with more groups, but here we need to have the complete workbook first.
Figure 3: column headers represent interoperable Field names or attributes
Sample#
On the website we move on to sample level by unfolding Sample Information at the bottom left of the textbox.
The FAIR-DS currently has 40+ minimal information models (packages) to choose from and the most appropriate is in this case the “human gut” package. Mandatory Fields are selected by default, others can be selected to further enrich the metadata.
Figure 4: Human gut package selection, Mandatory human gut fields are selected by default.
Let’s go back to the metadata per sample available from the original study. (ENA Accsesion: SAMD00244418)
Sample metadata
Attribute/Field_Name |
Value |
---|---|
Organism |
human gut metagenome |
Sample Accession |
SAMD00244418 |
Sample Title |
16s rDNA sequence from fecal sample of non-HIV-1 infected male from Koforidua, Ghana, sample ID HG-P-001-KO-A |
Center Name |
AIDS Research Center, National Institute of Infectious Diseases |
Sample Alias |
SAMD00244418 |
Broker Name |
DDBJ |
ART Status At Baseline |
Control |
Secondary Accession |
DRS176859 |
Education |
Primary school |
Occupation |
Trader |
Collection Date |
23/03/2018 |
Sample Name |
001A |
Env Broad Scale |
human gut |
Host Disease Stat |
HIV-1 negative |
Sex |
male |
Geo Loc Name |
Ghana:Koforidua |
Marital Status |
Married |
ART Drugs Current |
Control |
Env Local Scale |
human gut environment |
HIV Risk Exposure |
Heterosexual |
Project Name |
Dysbiotic fecal microbiome in HIV-1 infected individuals in Ghana |
Host |
Homo sapiens |
Env Medium |
fecal material |
Age |
50 |
Bio Sample Model |
MIMARKS.survey.human-gut |
ENA-FIRST-PUBLIC |
26/03/2021 |
ENA-LAST-UPDATE |
05/11/2023 |
A number of these Fields represent sample specific metadata (that we would like record on site while sampling). Although the terms are understandable for humans, they have interoperability issues. If we, for instance, concentrate on the block marked in yellow we know that the correct “interoperable” terminology is different.
Field |
Value |
Interoperability issue (ontology) |
---|---|---|
Host |
Homo sapiens |
“ncbi taxonomy id” (9606) |
Env Medium |
fecal material |
“environmental medium” |
Age |
50 |
Host age |
The selected minimal information model on the other hand does contain the correct Field names and the restrictions on the linked values. These restrictions are also directly available in the resultant Excel workbook (Figure 5).
Figure 5: Metadata field info box
To increase the interoperability of your metadata you have the option to either systematically review this list, which incidentally helps you identify crucial metadata you might not have considered and could enhance your design, or filter specifically for terms related to in this case “host” (Figure 6).
Figure 6: Overview of the sample metadata selection (Filter for host and select host age)
Finally, the planned Assays#
From the publication we can learn that that 16s rDNA sequences were obtained from fecal samples. On the website we move on to “Assay” information by unfolding Assay Information at the bottom left of the textbox and select “Amplicon demultiplexed”. Again, there are mandatory and optional fields for you to customize.
Figure 7: Assay metadata field info box.
When you are done you can download the Excel workbook and start filling in your sample metadata.
Hint
You can download a template Excel workbook created by this tutorial available at: here
Step 3: Metadata Validation#
In the next chapter, we will embark on an exploration of the validation tool.