Abstract

<p>Named Entity Recognition (NER) in biomedical literature is a very active research area. NER is a crucial component of biomedical text mining because it allows for information retrieval, reasoning and knowledge discovery. Much research has been carried out in this area using semantic type categories, such as fiDNAfl, fiRNAfl, fiproteinsfl and figenesfl. However, disease NER has not received its needed attention yet, specifically human disease NER. Traditional machine learning approaches lack the precision for disease NER, due to their dependence on token level features, sentence level features and the integration of features, such as orthographic, contextual and linguistic features. In this paper a method for disease NER is proposed which utilizes sentence and token level features based on Conditional Random Fields using the NCBI disease corpus. Our system utilizes rich features including orthographic, contextual, affixes, bigrams, part of speech and stem based features. Using these feature sets our approach has achieved a maximum F-score of 94% for the training set by applying 10 fold cross validation for semantic labeling of the NCBI disease corpus. For testing and development corpus the model has achieved an F-score of 88% and 85% respectively.</p>

Year of Publication
2016
Conference Name
International Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016)
Date Published
11/30/16
Publisher
CEUR-ws.org Volume 1747
Other Numbers
Vol-1747|urn:nbn:de:0074-1747-1
URL
http://ceur-ws.org/Vol-1747/BP02_ICBO2016.pdf
Download citation