Mostrar o rexistro simple do ítem
RGen: Data Generator for Benchmarking Big Data Workloads
dc.contributor.author | Pérez-Jove, Rubén | |
dc.contributor.author | Expósito, Roberto R. | |
dc.contributor.author | Touriño, Juan | |
dc.date.accessioned | 2022-01-19T19:36:15Z | |
dc.date.available | 2022-01-19T19:36:15Z | |
dc.date.issued | 2021 | |
dc.identifier.citation | Pérez-Jove, R.; Expósito, R.R.; Touriño, J. RGen: Data Generator for Benchmarking Big Data Workloads. Eng. Proc. 2021, 7, 13. https://doi.org/10.3390/engproc2021007013 | es_ES |
dc.identifier.uri | http://hdl.handle.net/2183/29447 | |
dc.description | Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021. | es_ES |
dc.description.abstract | [Abstract] This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021. | es_ES |
dc.description.sponsorship | CITIC, as Research Center accredited by Galician University System, is funded by “Consellería de Cultura, Educación e Universidade from Xunta de Galicia”, supported in an 80% through ERDF, ERDF Operational Programme Galicia 2014–2020, and the remaining 20% by “Secretaría Xeral de Universidades (Grant ED431G 2019/01). This project was also supported by the “Consellería de Cultura, Educación e Ordenación Universitaria” via the Consolidation and Structuring of Competitive Research Units—Competitive Reference Groups (ED431C 2018/49 and 2021/30). | es_ES |
dc.description.sponsorship | Xunta de Galicia; ED431G 2019/01 | es_ES |
dc.description.sponsorship | Xunta de Galicia; ED431C 2018/49 | es_ES |
dc.description.sponsorship | Xunta de Galicia; ED431C 2021/30 | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | MDPI | es_ES |
dc.relation.uri | https://doi.org/10.3390/engproc2021007013 | es_ES |
dc.rights | Atribución 3.0 España | es_ES |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/es/ | * |
dc.subject | Data generator | es_ES |
dc.subject | MapReduce | es_ES |
dc.subject | HDFS | es_ES |
dc.subject | Apache Hadoop | es_ES |
dc.subject | Java | es_ES |
dc.subject | Big Data | es_ES |
dc.subject | Benchmarking | es_ES |
dc.title | RGen: Data Generator for Benchmarking Big Data Workloads | es_ES |
dc.type | info:eu-repo/semantics/conferenceObject | es_ES |
dc.rights.access | info:eu-repo/semantics/openAccess | es_ES |
UDC.journalTitle | Engineering Proceedings | es_ES |
UDC.volume | 7 | es_ES |
UDC.issue | 1 | es_ES |
UDC.startPage | 13 | es_ES |
dc.identifier.doi | 10.3390/engproc2021007013 |