Space/time-efficient RDF stores based on circular suffix sorting

UDC.coleccionInvestigaciónes_ES
UDC.departamentoCiencias da Computación e Tecnoloxías da Informaciónes_ES
UDC.endPage5683es_ES
UDC.grupoInvLaboratorio de Bases de Datos (LBD)es_ES
UDC.journalTitleThe Journal of Supercomputinges_ES
UDC.startPage5643es_ES
UDC.volume79es_ES
dc.contributor.authorBrisaboa, Nieves R.
dc.contributor.authorCerdeira-Pena, Ana
dc.contributor.authorBernardo, Guillermo de
dc.contributor.authorFariña, Antonio
dc.contributor.authorNavarro, Gonzalo
dc.date.accessioned2023-12-18T15:18:13Z
dc.date.embargoEndDate2024-04-01es_ES
dc.date.embargoLift2024-04-01
dc.date.issued2023-03
dc.descriptionThis version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s11227-022-04890-wes_ES
dc.description.abstract[Abstract]: The resource description framework (RDF) has gained popularity as a format for the standardized publication and exchange of information in the Web of Data. In this paper, we introduce RDFCSA, a compressed representation of RDF datasets that in addition supports efficient querying. RDFCSA regards the triples of the RDF store as short circular strings and applies suffix sorting on those strings, so that triple-pattern queries reduce to prefix searching on the string set. The RDF store is then represented compactly using a compressed suffix array (CSA), a proved technology in text indexing that efficiently supports prefix searches. Our experiments show that RDFCSA is competitive with state-of-the-art alternatives. It compresses the raw data to 60% of its size, close to the most compact alternatives. While most alternatives perform better in some kinds of triple-patterns than in others, RDFCSA features fast and consistent query times, a few microseconds per result in all cases. This enables efficiently supporting join queries by using either merge- or chaining-join strategies over the triple patterns coupled with some specific optimizations such as variable filling. Our experiments on binary joins show that RDFCSA is faster than the alternatives in most cases.es_ES
dc.description.sponsorshipFunding for the Spanish group: projects funded by MCIN/ AEI/10.13039/501100011033: PDC2021-121239-C31 (FLATCITY-POC)-“NextGenerationEU”/PRTR; PDC2021-120917-C21 (SIGTRANS)-“NextGenerationEU”/PRTR; PID2020-114635RB-I00 (EXTRACompact); PID2019-105221RB-C41 (MAGIST); PID2021-122554OB-C33 (OASSIS-UDC); and TED2021-129245B-C21 (PLAGEMIS-UDC); grant ED431C 2021/53 (GRC) funded by GAIN/Xunta de Galicia; and grant ED431G 2019/01 (CSI) funded by Xunta de Galicia, FEDER Galicia 2014-2020 80%, SXU 20%; Gonzalo Navarro is partially funded by Fondecyt 1-200038, and by ANID - Millennium Science Initiative Program - Code ICN17 002.es_ES
dc.description.sponsorshipXunta de Galicia; ED431C 2021/53es_ES
dc.description.sponsorshipXunta de Galicia; ED431G 2019/01es_ES
dc.description.sponsorshipChile. Fondo Nacional de Desarrollo Científico y Tecnológico (Fondecyt); 1-200038.
dc.description.sponsorshipChile. Agencia National de Investigación y Desarrollo; ICN17_002
dc.identifier.citationBrisaboa, N.R., Cerdeira-Pena, A., de Bernardo, G. et al. Space/time-efficient RDF stores based on circular suffix sorting. J Supercomput 79, 5643–5683 (2023). https://doi.org/10.1007/s11227-022-04890-wes_ES
dc.identifier.doi10.1007/s11227-022-04890-w
dc.identifier.issn1573-0484
dc.identifier.urihttp://hdl.handle.net/2183/34535
dc.language.isoenges_ES
dc.publisherSpringer Naturees_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PDC2021-121239-C31/ES/FLATCITY-POCes_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PDC2021-120917-C21/ES/SIGTRANSes_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-114635RB-I00/ES/EXPLOTACION ENRIQUECIDA DE TRAYECTORIAS CON ESTRUCTURAS DE DATOS COMPACTAS Y GIS/es_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-105221RB-C41/ES/VISUALIZACION Y EXPLORACION BASADA EN FLUJOS Y ANALITICA DE BIG DATA ESPACIALes_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2021-122554OB-C33/ES/OASSIS-UDC: HACIA ORGANIZACIONES SOFTWARE MÁS SOSTENIBLES: UN ENFOQUE HOLÍSTICO PARA PROMOVER LA SOSTENIBILIDAD ECONÓMICA, HUMANA Y MEDIOAMBIENTALes_ES
dc.relation.urihttps://doi.org/10.1007/s11227-022-04890-wes_ES
dc.rights© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022es_ES
dc.rights.accessRightsopen accesses_ES
dc.subjectCompact data structureses_ES
dc.subjectRDFes_ES
dc.subjectCSAes_ES
dc.subjectWeb of dataes_ES
dc.titleSpace/time-efficient RDF stores based on circular suffix sortinges_ES
dc.typejournal articlees_ES
dspace.entity.typePublication
relation.isAuthorOfPublication42f2c226-9868-4516-8efd-2cd3c6692034
relation.isAuthorOfPublicatione09ccaa0-3a7f-4463-b6e7-db404361f097
relation.isAuthorOfPublication23354397-ec74-4cbb-93ac-f85352e9fbd8
relation.isAuthorOfPublication2fe2b113-791f-4229-a83a-311d0c8b5ce6
relation.isAuthorOfPublication.latestForDiscovery42f2c226-9868-4516-8efd-2cd3c6692034

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Brisaboa_Nieves_2023_Space_time_efficient_RDF_stores.pdf
Size:
1.93 MB
Format:
Adobe Portable Document Format
Description:
Accepted version