BIT103: Scalable Text Mining Assisted Curation of PTM Proteoforms in the Protein Ontology | International Conference on Biological Ontology & BioCreative

Abstract	<p>The Protein Ontology (PRO) defines protein classes and their interrelationships from the family to the protein form (proteoform) level within and across species. One of the unique contributions of PRO is its representation of post-translationally modified (PTM) proteoforms. However, progress in adding PTM proteoform classes to PRO has been relatively slow due to the extensive manual curation effort required. Here we report an automated pipeline for creation of PTM proteoform classes that leverages two phosphorylation-focused text mining tools (RLIMS-P, which detects mentions of kinases, substrates, and phosphorylation sites, and eFIP, which detects phosphorylation-dependent protein-protein interactions (PPIs)) and our integrated PTM database, iPTMnet. By applying this pipeline, we obtained a set of  820 substrate-site pairs that are suitable for automated PRO term generation with literature-based evidence attribution. Inclusion of these terms in PRO will increase PRO coverage of species-specific PTM proteoforms by 50%. Many of these new proteoforms also have associated kinase and/or PPI information. Finally, we show a phosphorylation network for the human and mouse peptidyl-prolyl cis-trans isomerase (PIN1/Pin1) derived from our dataset that demonstrates the biological complexity of the information we have extracted. Our approach addresses scalability in PRO curation and will be further expanded to advance PRO representation of phosphorylated proteoforms.</p>
Year of Publication	2016
Conference Name	International Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016)
Date Published	11/30/16
Publisher	CEUR-ws.org Volume 1747
Other Numbers	Vol-1747\|urn:nbn:de:0074-1747-1
URL	http://ceur-ws.org/Vol-1747/BIT103_ICBO2016.pdf
Download citation	Google Scholar BibTeX EndNote X3 XML EndNote 7 XML Endnote tagged Marc RIS