Pérez-Jove, RubénExpósito, Roberto R.Touriño, Juan2022-01-192022-01-192021Pérez-Jove, R.; Expósito, R.R.; Touriño, J. RGen: Data Generator for Benchmarking Big Data Workloads. Eng. Proc. 2021, 7, 13. https://doi.org/10.3390/engproc2021007013http://hdl.handle.net/2183/29447Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.[Abstract] This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021.engAtribución 3.0 Españahttp://creativecommons.org/licenses/by/3.0/es/Data generatorMapReduceHDFSApache HadoopJavaBig DataBenchmarkingRGen: Data Generator for Benchmarking Big Data Workloadsconference outputopen access10.3390/engproc2021007013