Performance Optimization of a Parallel Error Correction Tool
Title
Performance Optimization of a Parallel Error Correction ToolDate
2021Citation
Martínez-Sánchez, M.; Expósito, R.R.; Touriño, J. Performance Optimization of a Parallel Error Correction Tool. Eng. Proc. 2021, 7, 34. https://doi.org/10.3390/engproc2021007034
Abstract
[Abstract] Due to the continuous development in the field of Next Generation Sequencing (NGS) technologies that have allowed researchers to take advantage of greater genetic samples in less time, it is a matter of relevance to improve the existing algorithms aimed at the enhancement of the quality of those generated reads. In this work, we present a Big Data tool implemented upon the open-source Apache Spark framework that is able to execute validated error-correction algorithms at an improved performance. The experimental evaluation conducted on a multi-core cluster has shown significant improvements in execution times, providing a maximum speedup of 9.5 over existing error correction tools when processing an NGS dataset with 25 million reads.
Keywords
High performance computing
Big Data
Bioinformatics
Next Generation Sequencing
Big Data
Bioinformatics
Next Generation Sequencing
Description
Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.
Editor version
Rights
Atribución 4.0 Internacional