Abstract

<p>Inphenotypeannotationscuratedfromthebiolog-icalandmedicalliterature,considerablehumaneffortmustbeinvestedtoselectontologicalclassesthatcapturetheexpressivityoftheoriginalnaturallanguagedescriptions,andannotationgranularitycanalsoentailhighercomputationalcostsforpartic-ularreasoningtasks.Docoarseannotationsforcertainapplications?Here,wemeasurehowannotationgranularityaffectsthestatisticalbehaviorofsemanticsimilaritymetrics.Weusearandomizeddatasetofphenotypeprdrawnfrom57,051taxon-phenotypeannotationsinthePhenoscapeKnowledgebase.WecomparedqueryprhavingvariableproportionsofmatchingphenotypestosubjectdatabaseprusingbothpairwiseandgroupwiseJaccard(edge-based)andResnik(node-based)semanticsimilaritymetrics,andcomparedstatisticalperformanceforthreedifferentlevelsofannotationgranularity:entitiesalone,entitiesplusattributes,andentitiesplusqualities(withimplicitattributes).Allfourmetricsexaminedshowedmoreextremevaluesthanexpectedbychancewhenapproximatelyhalftheannotationsmatchedbetweenthequeryandsubjectprwithamoresuddendeclineforpairwisestatisticsandamoregradualoneforthegroupwisestatistics.Annotationgranularityhadanegligibleeffectonthepositionofthethresholdatwhichmatchescouldbediscriminatedfromnoise.Theseresultssuggestthatcoarseannotationsofphenotypes,atthelevelofentitieswithorwithoutattributes,maybetoidentifyphenotypeprwithstatisticallysemanticsimilarity.</p>

Year of Publication
2016
Conference Name
International Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016)
Date Published
11/30/16
Publisher
CEUR-ws.org Volume 1747
Other Numbers
Vol-1747|urn:nbn:de:0074-1747-1
URL
http://ceur-ws.org/Vol-1747/IT606_ICBO2016.pdf
Download citation