Worst-Case-Optimal Similarity Joins on Graph Databases
Title
Worst-Case-Optimal Similarity Joins on Graph DatabasesAuthor(s)
Date
2024-03-06Citation
Diego Arroyuelo, Benjamin Bustos, Adrián Gómez-Brandón, Aidan Hogan, Gonzalo Navarro, and Juan Reutter. 2024. Worst-Case-Optimal Similarity Joins on Graph Databases. Proc. ACM Manag. Data 2, 1, Article 39 (February 2024), 26 pages. https://doi.org/10.1145/3639294
Abstract
[Absctract]: We extend the concept of worst-case optimal equijoins in graph databases to the case where some nodes are required to be within the k-nearest neighbors (kNN) of others under some similarity function. We model the problem by superimposing the database graph with the kNN graph and show that a variant of Leapfrog TrieJoin (LTJ) implemented over a compact data structure called the Ring can be seamlessly extended to integrate similarity clauses with the equijoins in the LTJ query process, retaining worst-case optimality in many relevant cases. Our experiments on a benchmark that combines Wikidata and IMGpedia show that our enhanced LTJ algorithm outperforms by a considerable margin a baseline that first applies classic LTJ and then completes the query by applying the similarity predicates. The difference is more pronounced on queries where the similarity clauses are more densely connected to the query, becoming of an order of magnitude in some cases.
Keywords
Worst-case optimal joins
Leapfrog Triejoin
Graph patterns
Graph databases
Graph indexing
Similarity joins
Nearest-neighbor graphs
Leapfrog Triejoin
Graph patterns
Graph databases
Graph indexing
Similarity joins
Nearest-neighbor graphs
Description
© ACM 2024. This is the author's version of the work (accepted
manuscript or postprint). It is posted here for your personal use. Not for
redistribution. The definitive Version of Record was published in
Proceedings of the ACM on Management of Data, https://
doi.org/10.1145/3639294
Editor version
ISSN
2836-6573
Related items
Showing items related by title, author, creator and subject.
-
The Ring: Worst-case Optimal Joins in Graph Databases using (Almost) No Extra Space
Arroyuelo, Diego; Gómez-Brandón, Adrián; Hogan, Aidan; Navarro, Gonzalo; Reutter, Juan; Rojas-Ledesma, Javiel; Soto, Adrián (Association for Computing Machinery (ACM), 2024-03-23)[Absctract]: We present an indexing scheme for triple-based graphs that supports join queries in worst-case optimal (wco) time within compact space. This scheme, called a ring, regards each triple as a cyclic string of ... -
Graphs with isolation number equal to one third of the order
Lemańska, Magdalena; Mora, Mercè; Souto Salorio, María José (Elsevier B.V., 2024-05)[Absctract]: A set D of vertices of a graph G is isolating if the set of vertices not in D and with no neighbor in D is independent. The isolation number of G, denoted by ι(G), is the minimum cardinality of an isolating ... -
Prediction of Nucleoitide Binding Peptides Using Star Graph Topological Índices
Liu, Yong; Munteanu, Cristian-Robert; Fernández-Blanco, Enrique; Tan, Zhiliang; Santos-del-Riego, Antonino; Pazos, A. (Elsevier, 2015-08-05)[Abstract] The nucleotide binding proteins are involved in many important cellular processes, such as transmission of genetic information or energy transfer and storage. Therefore, the screening of new peptides for this ...