TY - GEN
T1 - End-to-End Relation Extraction of Pharmacokinetic Estimates from the Scientific Literature
AU - Hernandez, Ferran Gonzalez
AU - Smith, Victoria C.
AU - Nguyen, Quang
AU - Cordero, José Antonio
AU - Ballester, Maria Rosa
AU - Duran, Màrius
AU - Solé, Albert
AU - Chotsiri, Palang
AU - Wattanakul, Thanaporn
AU - Mundin, Gill
AU - Lilaonitkul, Watjana
AU - Standing, Joseph F.
AU - Kloprogge, Frank
N1 - Publisher Copyright:
©2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - The lack of comprehensive and standardised databases containing Pharmacokinetic (PK) parameters presents a challenge in the drug development pipeline. Efficiently managing the increasing volume of published PK Parameters requires automated approaches that centralise information from diverse studies. In this work, we present the Pharmacokinetic Relation Extraction Dataset (PRED), a novel, manually curated corpus developed by pharmacometricians and NLP specialists, covering multiple types of PK parameters and numerical expressions reported in open-access scientific articles. PRED covers annotations for various entities and relations involved in PK parameter measurements from 3,600 sentences. We also introduce an end-to-end relation extraction model based on BioBERT, which is trained with joint named entity recognition (NER) and relation extraction objectives. The optimal pipeline achieved a micro-average F1-score of 94% for NER and over 85% F1-score across all relation types. This work represents the first resource for training and evaluating models for PK end-to-end extraction across multiple parameters and study types. We make our corpus and model openly available to accelerate the construction of large PK databases and to support similar endeavours in other scientific disciplines..
AB - The lack of comprehensive and standardised databases containing Pharmacokinetic (PK) parameters presents a challenge in the drug development pipeline. Efficiently managing the increasing volume of published PK Parameters requires automated approaches that centralise information from diverse studies. In this work, we present the Pharmacokinetic Relation Extraction Dataset (PRED), a novel, manually curated corpus developed by pharmacometricians and NLP specialists, covering multiple types of PK parameters and numerical expressions reported in open-access scientific articles. PRED covers annotations for various entities and relations involved in PK parameter measurements from 3,600 sentences. We also introduce an end-to-end relation extraction model based on BioBERT, which is trained with joint named entity recognition (NER) and relation extraction objectives. The optimal pipeline achieved a micro-average F1-score of 94% for NER and over 85% F1-score across all relation types. This work represents the first resource for training and evaluating models for PK end-to-end extraction across multiple parameters and study types. We make our corpus and model openly available to accelerate the construction of large PK databases and to support similar endeavours in other scientific disciplines..
UR - http://www.scopus.com/inward/record.url?scp=85204447279&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85204447279
T3 - BioNLP 2024 - 23rd Meeting of the ACL Special Interest Group on Biomedical Natural Language Processing, Proceedings of the Workshop and Shared Tasks
SP - 144
EP - 154
BT - BioNLP 2024 - 23rd Meeting of the ACL Special Interest Group on Biomedical Natural Language Processing, Proceedings of the Workshop and Shared Tasks
A2 - Demner-Fushman, Dina
A2 - Ananiadou, Sophia
A2 - Miwa, Makoto
A2 - Roberts, Kirk
A2 - Tsujii, Junichi
PB - Association for Computational Linguistics (ACL)
T2 - 23rd Meeting of the ACL Special Interest Group on Biomedical Natural Language Processing, BioNLP 2024
Y2 - 16 August 2024
ER -