Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics
Use this link to citehttp://hdl.handle.net/2183/23359
MetadataShow full item record
TitlePerformance Evaluation of Big Data Frameworks for Large-Scale Data Analytics
J. Veiga, R. R. Expósito, X. C. Pardo, G. L. Taboada and J. Tourifio, "Performance evaluation of big data frameworks for large-scale data analytics," 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, 2016, pp. 424-431.
[Abstract] The increasing adoption of Big Data analytics has led to a high demand for efficient technologies in order to manage and process large datasets. Popular MapReduce frameworks such as Hadoop are being replaced by emerging ones like Spark or Flink, which improve both the programming APIs and performance. However, few works have focused on comparing these frameworks. This paper addresses this issue by performing a comparative evaluation of Hadoop, Spark and Flink using representative Big Data workloads and considering factors like performance and scalability. Moreover, the behavior of these frameworks has been characterized by modifying some of the main parameters of the workloads such as HDFS block size, input data size, interconnect network or thread configuration. The analysis of the results has shown that replacing Hadoop with Spark or Flink can lead to a reduction in execution times by 77% and 70% on average, respectively, for non-sort benchmarks.
This is a post-peer-review, pre-copyedit version of an article published. The final authenticated version is available online at: http://dx.doi.org/10.1109/BigData.2016.7840633