Scientific claim verification with fine-tuned NLI models
This paper introduces the foundation for the third component of a pioneering open-source scientific questionanswering system. The system is designed to provide referenced, automatically vetted, and verifiable answers in the scientific domain where hallucinations and misinformation are intolerable. This Verification Engine is based on models fine-tuned for the Natural Language Inference task using an additionally processed SciFact dataset. Our experiments, involving eight fine-tuned models based on RoBERTa Large, XLM RoBERTa Large, DeBERTa, and DeBERTa SQuAD, show promising results. Notably, the DeBERTa model fine-tuned on our dataset achieved the highest F1 score of 88%. Furthermore, evaluating our best model on the HealthVer dataset resulted in an F1 score of 48%, outperforming other models by more than 12%. Additionally, our model demonstrated superior performance with a 7% absolute increase in F1 score compared to the bestperforming GPT-4 model on the same test set in a zero-shot regime. These findings suggest that our system can significantly enhance scientists’ productivity while fostering trust in the use of generative language models in scientific environments.
engleski
2024
Ovo delo je licencirano pod uslovima licence
Creative Commons CC BY-NC-ND 4.0 - Creative Commons Autorstvo - Nekomercijalno - Bez prerada 4.0 International License.
http://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
claim verification, deep learning models, natural language inference, PubMed, SciFact dataset