Use this link to cite:
http://hdl.handle.net/2183/29447 RGen: Data Generator for Benchmarking Big Data Workloads
Loading...
Identifiers
Publication date
Advisors
Other responsabilities
Journal Title
Bibliographic citation
Pérez-Jove, R.; Expósito, R.R.; Touriño, J. RGen: Data Generator for Benchmarking Big Data Workloads. Eng. Proc. 2021, 7, 13. https://doi.org/10.3390/engproc2021007013
Type of academic work
Academic degree
Abstract
[Abstract] This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021.
Description
Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.
Editor version
Rights
Atribución 3.0 España








