Dafonte, CarlosCarneiro, VíctorGómez García, ÁngelTrabazo-Sardón, DiegoSantoveña, RaúlSilvelo, ArturoNóvoa, FranciscoFernández, DiegoManteiga, Minia2026-01-232026-01-232020https://hdl.handle.net/2183/47072Registration of the intellectual property (of a software)[Abstract]: The GALIASdoc software is a system for extracting relevant information from large volumes of documents with common formats and heterogeneous origins. The data obtained are ready to be exploited by other applications such as content management systems (CMS), enterprise resource planning (ERP) systems, databases, and similar platforms. The system is responsible for identifying the document model in order to locate the semantic information it contains. During the ingestion process, an initial version in text format is obtained, applying optical character recognition (OCR) techniques when necessary. The model includes geometric data defining the areas of interest presented in the document. This record has been in operational use since 2020 through the signing of two exploitation contracts with companies in the ICT sector.engRight holders: Universidade da Coruña (100%)Information extractionDocument processingSoftwareGALIASdoc: Automatic Intermediate Language Generator for fast Syntactic Analysis over massive document setsotheropen access