Cross Sequencing Integration of Compositional Microbiome Data in Cancer

Bibliographic citation

Fernández-Edreira, D., Liñares-Blanco, J., Fernandez-Lozano, C. (2025). Cross Sequencing Integration of Compositional Microbiome Data in Cancer. In: Cerulo, L., Napolitano, F., Bardozzo, F., Cheng, L., Occhipinti, A., Pagnotta, S.M. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2024. Lecture Notes in Computer Science(), vol 15276. Springer, Cham. https://doi.org/10.1007/978-3-031-89704-7_6

Type of academic work

Academic degree

Abstract

[Abstract]: High-throughput sequencing has revolutionized our understanding of the human microbiome, providing detailed insights into microbial communities under various health and disease conditions. Among the most common strategies for studying the microbiome are 16S rRNA amplicon sequencing and whole genome shotgun sequencing (WGS), each with its own advantages and limitations. However, integrating and comparing results from data obtained through these two sequencing techniques presents a challenge due to the inherent differences in methods and discrepancies among datasets and their sources. This work evaluates batch effect removal (BER) methods for integrating microbiome composition data from different sequencing platforms. Using data from ten different cohorts, we applied BER methods such as Combat, Limma, FAbatch, MMUPHin, and Percentile-normalization. Our results demonstrate the effectiveness of these methods in reducing batch effects. However, it remains unclear whether the remaining biological signal is reliable, which is critical. Additionally, we compared GG2 with standard databases (SILVA for 16S and WoL for WGS), showing that GG2 enables more unified analysis (increasing the number of taxa shared among cohorts from 94 genera and 58 species to 215 and 210, respectively). In conclusion, our findings suggest that appropriate BER methods can harmonize microbiome data from diverse sequencing platforms, but further experiments are needed to reliably understand how the biological signal is modulated in the process.

Description

This version of the conference paper has been accepted for publication, after peer review but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-031-89704-7_6.
Conference presented at: 19th International Meeting, CIBB 2024 - Computational Intelligence Methods for Bioinformatics and Biostatistics, Benevento, Italy, September 4–6, 2024.

Rights

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG. This version of the conference paper is subject to Springer Nature’s AM terms of use - https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms.