Enhancing in-memory Efficiency for MapReduce-based Data Processing
Ver/Abrir
Use este enlace para citar
http://hdl.handle.net/2183/21765
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución-NoComercial-SinDerivadas 3.0 España
Colecciones
- GI-GAC - Artigos [193]
Metadatos
Mostrar el registro completo del ítemTítulo
Enhancing in-memory Efficiency for MapReduce-based Data ProcessingFecha
2018-10Cita bibliográfica
Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada, Juan Touriño, Enhancing in-memory efficiency for MapReduce-based data processing, Journal of Parallel and Distributed Computing, Volume 120, 2018, Pages 323-338, ISSN 0743-7315, https://doi.org/10.1016/j.jpdc.2018.04.001.
Resumen
[Abstract] As the memory capacity of computational systems increases, the in-memory data management of Big Data processing frameworks becomes more crucial for performance. This paper analyzes and improves the memory efficiency of Flame-MR, a framework that accelerates Hadoop applications, providing valuable insight into the impact of memory management on performance. By optimizing memory allocation, the garbage collection overheads and execution times have been reduced by up to 85% and 44%, respectively, on a multi-core cluster. Moreover, different data buffer implementations are evaluated, showing that off-heap buffers achieve better results overall. Memory resources are also leveraged by caching intermediate results, improving iterative applications by up to 26%. The memory-enhanced version of Flame-MR has been compared with Hadoop and Spark on the Amazon EC2 cloud platform. The experimental results have shown significant performance benefits reducing Hadoop execution times by up to 65%, while providing very competitive results compared to Spark.
Palabras clave
Big Data
MapReduce
In-memory computing
Garbage collector (GC)
Performance evaluation
MapReduce
In-memory computing
Garbage collector (GC)
Performance evaluation
Descripción
This is a post-peer-review, pre-copyedit version of an article published in Journal of Parallel and Distributed Computing. The final authenticated version is available online at: https://doi.org/10.1016/j.jpdc.2018.04.001
Versión del editor
Derechos
Atribución-NoComercial-SinDerivadas 3.0 España
ISSN
0743-7315
1096-0848
1096-0848