RGen: Data Generator for Benchmarking Big Data Workloads

Use this link to cite
http://hdl.handle.net/2183/29447Collections
- Investigación (FIC) [1705]
Metadata
Show full item recordTitle
RGen: Data Generator for Benchmarking Big Data WorkloadsDate
2021Citation
Pérez-Jove, R.; Expósito, R.R.; Touriño, J. RGen: Data Generator for Benchmarking Big Data Workloads. Eng. Proc. 2021, 7, 13. https://doi.org/10.3390/engproc2021007013
Abstract
[Abstract] This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021.
Keywords
Data generator
MapReduce
HDFS
Apache Hadoop
Java
Big Data
Benchmarking
MapReduce
HDFS
Apache Hadoop
Java
Big Data
Benchmarking
Description
Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.
Editor version
Rights
Atribución 3.0 España