GALIASdoc: Automatic Intermediate Language Generator for fast Syntactic Analysis over massive document sets

Bibliographic citation

Type of academic work

Academic degree

Abstract

[Abstract]: The GALIASdoc software is a system for extracting relevant information from large volumes of documents with common formats and heterogeneous origins. The data obtained are ready to be exploited by other applications such as content management systems (CMS), enterprise resource planning (ERP) systems, databases, and similar platforms. The system is responsible for identifying the document model in order to locate the semantic information it contains. During the ingestion process, an initial version in text format is obtained, applying optical character recognition (OCR) techniques when necessary. The model includes geometric data defining the areas of interest presented in the document. This record has been in operational use since 2020 through the signing of two exploitation contracts with companies in the ICT sector.

Description

Registration of the intellectual property (of a software)

Editor version

Rights

Right holders: Universidade da Coruña (100%)