Prototipo de sistema de reconocimiento de entidades para la extracción de información en fuentes no estructuradas

Prado-Valiño, Francisco

Use this link to cite:

http://hdl.handle.net/2183/34023

Prototipo de sistema de reconocimiento de entidades para la extracción de información en fuentes no estructuradas

Files

PradoValino_Francisco_TFG_2023.pdf (1.31 MB)

Identifiers

URI: http://hdl.handle.net/2183/34023

Publication date

2023

Authors

Prado-Valiño, Francisco

Advisors

Vilares, Jesús

Vilares, David

Other responsabilities

Universidade da Coruña. Facultade de Informática

Type of academic work

TFG

Academic degree

Grao en Enxeñaría Informática

Abstract

[Resumen]: La investigación en el ámbito biomédico requiere del estudio de enormes cantidades de información textual no estructurada, lo cual supone un gran gasto de tiempo y recursos por parte de los expertos médicos. Debido a esto, existe un gran interés por desarrollar sistemas capaces de automatizar estas tareas mediante la Minería de Texto. Una de las tareas clave de la Minería de Texto es el Reconocimiento de Entidades, que se encargan de extraer las entidades de interés de los textos y clasificarlas en categorías preestablecidas. Nuestro trabajo consiste en la aplicación de dichas técnicas para automatizar la detección de entidades en textos clínicos no estructurados. En nuestro caso, nos vamos a centrar en el ámbito de la resistencia a antimicrobianos, concretamente en la resistencia de bacterias a antibióticos. Esta tarea forma parte del proyecto GRALENIA, que busca mejorar la gestión digital de la resistencia a antimicrobianos en el ámbito hospitalario. El objetivo principal de este trabajo es la implementación de un prototipo del sistema de Reconocimiento de Entidades encargado de preidentificar y preetiquetar las expresiones sintomáticas de interés (síntomas, enfermedades, etc) en los informes clínicos. Los resultados obtenidos serán usados por etiquetadores humanos en etapas futuras para entrenar modelos de aprendizaje automático que identifiquen de forma más robusta las expresiones de interés. Puesto que existe una escasez de datos y de un corpus de evaluación específicos al ámbito del proyecto matriz, añadiremos como objetivo el estudio de la problemática de la falta de recursos y posibles soluciones alternativas que posteriormente deberán adaptarse a los datos reales.
[Abstract]: Research in the biomedical field requires the study of huge amounts of unstructured textual information, which is very time consuming and resource intensive for medical experts. Because of this, there is great interest in developing systems capable of automating these tasks through Text Mining. One of the key tasks of Text Mining is Entity Recognition, which extracts entities of interest from texts and classifies them into pre-established categories. Our work consists in the application of these techniques to automate the detection of entities in unstructured clinical texts. In our case, we are going to focus on the field of antimicrobial resistance, specifically on bacterial resistance to antibiotics. This work is part of the GRALENIA project, which aims to improve the digital management of antimicrobial resistance in hospitals. The main objective of this work is the implementation of a prototype of the Entity Recognition system in charge of pre-identifying and pre-tagging symptomatic expressions (symptoms, diseases, etc) of interest in clinical reports. The results obtained will be used by human labellers in future stages to train machine learning models that more robustly identify expressions of interest. Since there is a shortage of data and of an evaluation corpus specific to the scope of the parent project, we will add as an objective the study of the problem of the lack of resources and possible alternative solutions that will subsequently have to be adapted to the real data.

Keywords

Reconocimiento de entidades Procesado del lenguaje natural Extracción de información Modelos de lenguaje Extracción de términos biomédicos Entity recognition Natural language processing Information extraction Language models Biomedical term extraction

Rights

Atribución 3.0 España

Collections

Traballos académicos (FIC)

Full item page

Except where otherwise noted, this item's license is described as Atribución 3.0 España

Prototipo de sistema de reconocimiento de entidades para la extracción de información en fuentes no estructuradas

Files

Identifiers

Publication date

Authors

Advisors

Other responsabilities

Journal Title

Bibliographic citation

Type of academic work

Academic degree

Abstract

Description

Keywords

Editor version

Rights

Collections