Semantically Defining Populations for 'Omics Research
The study of populations is central to ‘omics research, whether sequencing environmental samples, controlling for population structure when looking for genetic variation within a species, or studying the evolution of large clades. Researchers use different operational definitions of populations and communities, via the highly varied creation of operational taxonomic units (OTUs) and, in some cases, use of unclustered sequences. The use of different methods, even within one study type (Swarm, UCLUST, CD HIT, etc.), creates very different OTUs, possibly affecting interpretation and leading to questionable reproducibility. The Population and Community Ontology (PCO) offers the semantics to clarify exactly which collection of organisms (i.e., ecological community or population) was used in an investigation. When combined with methods for standardizing observational data from the BioCollections Ontology (BCO), protocol classes from the Ontology for Biomedical investigations (OBI), and characterization of environments from the Environment Ontology (ENVO), PCO can fully describe the methods used to derive organismal or species-based (i.e. taxonomic) OTUs used for biodiversity analysis and monitoring. PCO is not well suited to describe “OTUs” based on sequence variants that may or may not map to population or individual level variation (e.g., output of some clustering algorithms). In this case, the Sequence Ontology (SO) may be more appropriate. This presentation will describe the key ontology design patterns used in the PCO and provide examples of how and when PCO and related ontologies should be used in omics research, with a focus on environmental/metagenomic sequencing applications.
Helmholtz Research Programs > PACES II (2014-2020) > TOPIC 4: Research in science-stakeholder interactions > WP 4.3: Providing information - enabling knowledge