Fast Distributed kNN Graph Construction Using Auto-tuned Locality-sensitive Hashing
Use este enlace para citar
http://hdl.handle.net/2183/36047Coleccións
- GI-LIDIA - Artigos [60]
Metadatos
Mostrar o rexistro completo do ítemTítulo
Fast Distributed kNN Graph Construction Using Auto-tuned Locality-sensitive HashingAutor(es)
Data
2020Cita bibliográfica
Carlos Eiras-Franco, David Martínez-Rego, Leslie Kanthan, César Piñeiro, Antonio Bahamonde, Bertha Guijarro-Berdiñas, and Amparo Alonso-Betanzos. 2020. Fast Distributed kNN Graph Construction Using Auto-tuned Locality-sensitive Hashing. ACM Trans. Intell. Syst. Technol. 11, 6, Article 71 (October 2020), https://doi.org/10.1145/3408889.
Resumo
[Abstract]: The k-nearest-neighbors (kNN) graph is a popular and powerful data structure that is used in various areas of Data Science, but the high computational cost of obtaining it hinders its use on large datasets. Approximate solutions have been described in the literature using diverse techniques, among which Locality-sensitive Hashing (LSH) is a promising alternative that still has unsolved problems. We present Variable Resolution Locality-sensitive Hashing, an algorithm that addresses these problems to obtain an approximate kNN graph at a significantly reduced computational cost. Its usability is greatly enhanced by its capacity to automatically find adequate hyperparameter values, a common hindrance to LSH-based methods. Moreover, we provide an implementation in the distributed computing framework Apache Spark that takes advantage of the structure of the algorithm to efficiently distribute the computational load across multiple machines, enabling practitioners to apply this solution to very large datasets. Experimental results show that our method offers significant improvements over the state-of-the-art in the field and shows very good scalability as more machines are added to the computation.
Palabras chave
Computing methodologies
Machine learning algorithms
MapReduce algorithms
Big data
Scalability
k nearest neighbors
Locality-sensitive hashing
Machine learning algorithms
MapReduce algorithms
Big data
Scalability
k nearest neighbors
Locality-sensitive hashing
Descrición
© 2020 Authors|ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Trans. Intell. Syst. Technol. 11, 6, Article 71 (October 2020). https://doi.org/10.1145/3408889.
Versión do editor
Dereitos
© 2020 Authors|ACM Todos os dereitos reservados. All rights reserved.
ISSN
2157-6904
2157-6912
2157-6912