W12-02: TraitBank: semantic integration of biodiversity data from diverse sources

TitleW12-02: TraitBank: semantic integration of biodiversity data from diverse sources
Publication TypeConference Paper
Year of Publication2016
AuthorsSchulz K, Hammock J
Conference NameInternational Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016)
Date Published11/30/16
PublisherCEUR-ws.org Volume 1747
Other NumbersVol-1747|urn:nbn:de:0074-1747-1

Easy access to large amounts of biodiversity data has the potential to transform research across the life sciences. However, most of the data generated so far are not easily integrated or repurposed due to a lack of standardization in how scientists talk about the characteristics of organisms, how they describe the context of their observations, and how they document the methods with which the data were collected. TraitBank (eol.org/traitbank) addresses this impediment by linking information aggregated from diverse sources to community-developed ontologies and controlled vocabularies. These post hoc annotations help to organize distributed, heterogeneous knowledge into a lightweight, scalable semantic framework supporting retrieval and reuse for a variety of applications, ranging from large-scale synthetic analyses of biodiversity to linked data products and hands-on data science in the classroom. The TraitBank data store currently holds over 11 million measurements and facts for more than 1.7 million taxa including animals, plants, fungi, and microbes. These data are mobilized from major biodiversity information systems (e.g., International Union for Conservation of Nature, Ocean Biogeographic Information System, Paleobiology Database), open literature repositories (e.g., Dryad, Ecological Archives, Pangaea), label data from natural history collections, and legacy/unpublished data sets. TraitBank subject coverage  is very broad ranging from distribution, ecology, and life history to morphology and physiology. Data can be downloaded via CSV files or a JSON-LD service. Reuse and redistribution with attribution to the original data sources is encouraged. TraitBank complements taxon or subject-specific knowledge management systems by filling gaps (both in taxonomic and trait space), by recruiting new types of data (e.g., from text-mining, citizen-science, and specimen data digitization efforts) and by integrating knowledge across the entire tree of life and multiple scientific domains. The emerging semantic framework will facilitate data discovery, support queries across data sets, and advance data integration and exchange among projects, thus making more biodiversity data available for use in scientific and policy-oriented applications.