IT606: Measuring the importance of annotation granularity to the detection of semantic similarity between phenotype profiles

TitleIT606: Measuring the importance of annotation granularity to the detection of semantic similarity between phenotype profiles
Publication TypeConference Paper
Year of Publication2016
AuthorsManda P, Balhoff JP, Vision TJ
Conference NameInternational Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016)
Date Published11/30/16
PublisherCEUR-ws.org Volume 1747
Other NumbersVol-1747|urn:nbn:de:0074-1747-1
Abstract

Inphenotypeannotationscuratedfromthebiolog-icalandmedicalliterature,considerablehumaneffortmustbeinvestedtoselectontologicalclassesthatcapturetheexpressivityoftheoriginalnaturallanguagedescriptions,andannotationgranularitycanalsoentailhighercomputationalcostsforpartic-ularreasoningtasks.Docoarseannotationsforcertainapplications?Here,wemeasurehowannotationgranularityaffectsthestatisticalbehaviorofsemanticsimilaritymetrics.Weusearandomizeddatasetofphenotypeprdrawnfrom57,051taxon-phenotypeannotationsinthePhenoscapeKnowledgebase.WecomparedqueryprhavingvariableproportionsofmatchingphenotypestosubjectdatabaseprusingbothpairwiseandgroupwiseJaccard(edge-based)andResnik(node-based)semanticsimilaritymetrics,andcomparedstatisticalperformanceforthreedifferentlevelsofannotationgranularity:entitiesalone,entitiesplusattributes,andentitiesplusqualities(withimplicitattributes).Allfourmetricsexaminedshowedmoreextremevaluesthanexpectedbychancewhenapproximatelyhalftheannotationsmatchedbetweenthequeryandsubjectprwithamoresuddendeclineforpairwisestatisticsandamoregradualoneforthegroupwisestatistics.Annotationgranularityhadanegligibleeffectonthepositionofthethresholdatwhichmatchescouldbediscriminatedfromnoise.Theseresultssuggestthatcoarseannotationsofphenotypes,atthelevelofentitieswithorwithoutattributes,maybetoidentifyphenotypeprwithstatisticallysemanticsimilarity.

URLhttp://ceur-ws.org/Vol-1747/IT606_ICBO2016.pdf