RGen: Data Generator for Benchmarking Big Data Workloads

Pérez-Jove, R.; Expósito, R.R.; Touriño, J. RGen: Data Generator for Benchmarking Big Data Workloads. Eng. Proc. 2021, 7, 13. https://doi.org/10.3390/engproc2021007013

Resumen

[Abstract] This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021.

Palabras clave

Data generator
MapReduce
HDFS
Apache Hadoop
Java
Big Data
Benchmarking

Descripción

Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.

Versión del editor

https://doi.org/10.3390/engproc2021007013

Derechos

Atribución 3.0 España