A Silver Standard Corpus of Human Phenotype-Gene Relations

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Please contact us in case of a broken link from here

Authors Andre Lamurias, Diana Sousa, Francisco M. Couto
Journal/Conference Name NAACL 2019 6
Paper Category
Paper Abstract Human phenotype-gene relations are fundamental to fully understand the origin of some phenotypic abnormalities and their associated diseases. Biomedical literature is the most comprehensive source of these relations, however, we need Relation Extraction tools to automatically recognize them. Most of these tools require an annotated corpus and to the best of our knowledge, there is no corpus available annotated with human phenotype-gene relations. This paper presents the Phenotype-Gene Relations (PGR) corpus, a silver standard corpus of human phenotype and gene annotations and their relations. The corpus consists of 1712 abstracts, 5676 human phenotype annotations, 13835 gene annotations, and 4283 relations. We generated this corpus using Named-Entity Recognition tools, whose results were partially evaluated by eight curators, obtaining a precision of 87.01%. By using the corpus we were able to obtain promising results with two state-of-the-art deep learning tools, namely 78.05% of precision. The PGR corpus was made publicly available to the research community.
Date of publication 2019
Code Programming Language Python

Copyright Researcher 2022