RGen: Data Generator for Benchmarking Big Data Workloads

Use este enlace para citar
http://hdl.handle.net/2183/29447Colecciones
- Investigación (FIC) [1705]
Metadatos
Mostrar el registro completo del ítemTítulo
RGen: Data Generator for Benchmarking Big Data WorkloadsFecha
2021Cita bibliográfica
Pérez-Jove, R.; Expósito, R.R.; Touriño, J. RGen: Data Generator for Benchmarking Big Data Workloads. Eng. Proc. 2021, 7, 13. https://doi.org/10.3390/engproc2021007013
Resumen
[Abstract] This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021.
Palabras clave
Data generator
MapReduce
HDFS
Apache Hadoop
Java
Big Data
Benchmarking
MapReduce
HDFS
Apache Hadoop
Java
Big Data
Benchmarking
Descripción
Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.
Versión del editor
Derechos
Atribución 3.0 España