ICBO workshops | W01 | W02 | W03 | W04 | W05 | W06 | W07 | W08 | W09 | W10 | W11 | W12 | W13 | W14 |
ICBO 2016 Workshop #W13 | |
Title |
Semantic Metadata with ISO 11179 Metadata Registry Standard |
Workshop type |
Tutorial |
Organizer |
Denise Warzel, National Cancer Institute |
Co-organizer(s) | Dianne Reeves Gilberto Fragoso |
Website | N/A |
Workshop Abstract |
The ISO/IEC 11179 MDR (11179) standard takes advantage of ontologies to both help make the meaning of data unambiguous, and to provide a mechanism by which one can leverage ontology to gain a deeper understanding of the data through related concepts. However, many people find the 11179 metamodel to be daunting, as the standard is composed of 6 parts, which serves to put off those who might benefit from its use, and thus the use of ontologies for helping to understand data. This tutorial will explain the 11179 metamodel and the primary classes with emphasis on the classes that express the data semantics. It will teach attendees how to decompose the meaning of a data field into its 11179 parts in order to register its meaning in a publically available repository, which will include explaining how to use ontology to make its meaning unambiguous. In 2000, the National Cancer Institute (NCI) adopted the practice of recording data element metadata based 11179 to enable language- and application - independent representation and redistribution of common data elements (CDEs). These CDEs are used to describe the data in NCI Clinical Trails. The CDEs are entered into NCI’s 11179 based metadata registry, the cancer Data Standards Registry and Repository (caDSR), and are made available in human and machine readable formats. NCI uses the NCIt, a cancer specific terminology resource created by NCI Enterprise Vocabulary Services (EVS) in caDSR as a way to ensure the meaning of the data elements is unambiguous. Each of the semantic classes in 11179 are linked to NCIt: the Object Class describing the thing in the real world the data is about; the Property describes the attribute of the thing that is being observed or measured; and when the field is constrained to an enumerated list, each Value in the list. By linking these semantic classes to NCIt terminology (or nay other terminology or ontology), , a unique expression of the meaning of each instance of the data is created. We use NCIt for two primary reasons: the definitions for the terms are created by NCI with cancer specific context thus when decomposing the meaning of the data, a cancer specific usage is provided; the use of a single terminology enabled development of 11179 based algorithms to help independent curators avoid inadvertent CDE duplication. The creation of metadata elements for cancer research data provides unambiguous semantics and representational rules wherever they are used. The use of such semantic metadata for NCI and others holds the promise to help address the identification, aggregation, and use of disparate datasets for big data analysis. Using proven teaching materials developed and currently used by NCI to train our community curators, we will take attendees through a short introduction to Metadata, the development of CDEs using 11179, and the use of CDEs to create collections of fields to be or collected) together. We will use the NCI tools during the workshop. |
Rationale |
A tutorial offers the unique hands on opportunity to learn about and use controlled terminology to describe data in a standardized way. Data semantics is an area of data management that often is neglected in formalized training. This opportunity will hopefully encourage young researchers to promote the use of structured metadata to describe data, along with ontologies for unambiguous for domain specific data semantics. Attendees will experience NCI’s approach to data semantics which may encourage them to access some NCI public data such as TCGA. They will better understand how to use the CDEs and the terminology concepts “inside” them. Also, due to their understanding of ontologies, they may do interesting things with the related concepts. The “built-in” semantics from the CDE data dictionary can help them build new applications for knowledge expansion when used in conjunction with the data as the Univ. of Pitt has in their personalized medicine database. The ISO standard is often considered unapproachable thus the opportunity to learn it from experts in a workshop format versus a presentation could help get more people using it. Our hope is that attendees will go back to their respective places of work or study and through the understanding they have gained, promote the use of structured data semantics using ontologies and 11179, as well as an understanding about how the precisely linked concepts can be used in new and interesting ways. Those who use the same 11179 CDE metadata structure can independently create compatible data that ready to exploit the use of ontology for its interpretation. The use of this standard to create data dictionaries leads to the ability to semi-automate the production of semantic web ready data, tagged by the related concepts. The hands on nature of the tutorial, plus the availability of the NCI caDSR for public usage, will hopefully stimulate interest in the power of 11179 and ontologies for data semantics. |
Funding source (if any) | N/A |