| Abstract |
<p>Inphenotypeannotationscuratedfromthebiolog-icalandmedicalliterature,considerablehumaneffortmustbeinvestedtoselectontologicalclassesthatcapturetheexpressivityoftheoriginalnaturallanguagedescriptions,andannotationgranularitycanalsoentailhighercomputationalcostsforpartic-ularreasoningtasks.Docoarseannotationsforcertainapplications?Here,wemeasurehowannotationgranularityaffectsthestatisticalbehaviorofsemanticsimilaritymetrics.Weusearandomizeddatasetofphenotypeprdrawnfrom57,051taxon-phenotypeannotationsinthePhenoscapeKnowledgebase.WecomparedqueryprhavingvariableproportionsofmatchingphenotypestosubjectdatabaseprusingbothpairwiseandgroupwiseJaccard(edge-based)andResnik(node-based)semanticsimilaritymetrics,andcomparedstatisticalperformanceforthreedifferentlevelsofannotationgranularity:entitiesalone,entitiesplusattributes,andentitiesplusqualities(withimplicitattributes).Allfourmetricsexaminedshowedmoreextremevaluesthanexpectedbychancewhenapproximatelyhalftheannotationsmatchedbetweenthequeryandsubjectprwithamoresuddendeclineforpairwisestatisticsandamoregradualoneforthegroupwisestatistics.Annotationgranularityhadanegligibleeffectonthepositionofthethresholdatwhichmatchescouldbediscriminatedfromnoise.Theseresultssuggestthatcoarseannotationsofphenotypes,atthelevelofentitieswithorwithoutattributes,maybetoidentifyphenotypeprwithstatisticallysemanticsimilarity.</p> |
| Year of Publication |
2016
|
| Conference Name |
International Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016)
|
| Date Published |
11/30/16
|
| Publisher |
CEUR-ws.org Volume 1747
|
| Other Numbers |
Vol-1747|urn:nbn:de:0074-1747-1
|
| URL |
http://ceur-ws.org/Vol-1747/IT606_ICBO2016.pdf
|
| Download citation |