LabChain: Enabling reproducible and modular scientific experiments in Python
| UDC.coleccion | Investigación | |
| UDC.departamento | Ciencias da Computación e Tecnoloxías da Información | |
| UDC.endPage | 10 | |
| UDC.grupoInv | Information Retrieval Lab (IRlab) | |
| UDC.institutoCentro | CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación | |
| UDC.issue | 102543 | |
| UDC.journalTitle | SoftwareX | |
| UDC.startPage | 1 | |
| UDC.volume | 33 | |
| dc.contributor.author | Couto Pintos, Manuel | |
| dc.contributor.author | Parapar, Javier | |
| dc.contributor.author | Losada, David E. | |
| dc.date.accessioned | 2026-02-16T10:55:32Z | |
| dc.date.available | 2026-02-16T10:55:32Z | |
| dc.date.issued | 2026-02 | |
| dc.description | Data availability: The LabChain framework is publicly available at https://github.com/manucouto1/LabChain. The reference implementation for this article is v1.2.1. The mental health detection case study uses publicly available datasets from the eRisk shared tasks: Depression (2017, 2018, 2022), Anorexia (2018, 2019), Self-harm (2020, 2021), and Gambling (2022, 2023). These datasets can be requested from the eRisk organizers at https://erisk.irlab.org/. The complete implementation of the case study, including all pipeline configurations and preprocessing code, is available at https://github.com/manucouto1/Temporal-Word-Embeddings-for-Early-Detection.... No new data were generated or analyzed in support of this research. | |
| dc.description.abstract | [Abstract]: Python’s flexibility accelerates research prototyping but frequently results in unmaintainable code and duplicated computational effort. The absence of software engineering practices in academic development leads to fragile experiments where even minor modifications require rerunning expensive computations from scratch. LabChain addresses this through a pipeline-and-filter architecture with hash-based caching that automatically identifies and reuses intermediate results. When evaluating multiple classifiers on the same embeddings, the framework computes embeddings once—regardless of how many classifiers are tested. This automatic reuse extends across research teams: if another researcher applies different models to the same preprocessed data, LabChain detects existing results and eliminates redundant computation. Beyond efficiency, the framework’s modular structure reduces technical debt that obscures experimental logic. Pipelines serialize to JSON for reproducibility and distributed execution across computational clusters. A mental health detection case study demonstrates dual impact: computational savings exceeding 12 hours per task with reduced CO2 emissions, alongside substantial scientific improvements—performance gains up to 192.3% in some tasks. These improvements emerged from clearer experimental organization that exposed a critical preprocessing bug hidden in the original monolithic implementation. LabChain proves that software engineering discipline amplifies scientific discovery. | |
| dc.description.sponsorship | MC and DEL thank the financial support provided by MICIU/AEI/10.13039/501100011033 (PID2022-137061OB-C22, supported by ERDF) and Xunta de Galicia-Consellería de Cultura, Educación, Formación Profesional e Universidades (ED431G 2023/04, ED431C 2022/19, supported by ERDF). JP has received support from project PID2022-137061OB-C21 (MCIU/AEI/10.13039/5011000 11033, Ministerio de Ciencia e Innovación). He also thanks the financial support provided by the Consellería de Educación, Universidade e Formación Profesional, Spain (grant number ED481A-2024–079 and GRC ED431C 2025/49); and the European Regional Development Fund, which supports the CITIC Research Center. | |
| dc.description.sponsorship | Xunta de Galicia; ED481A-2024–079 | |
| dc.description.sponsorship | Xunta de Galicia; ED431C 2025/49 | |
| dc.description.sponsorship | Xunta de Galicia; ED431G 2023/04 | |
| dc.description.sponsorship | Xunta de Galicia; ED431C 2022/19 | |
| dc.description.uri | https://github.com/manucouto1/LabChain | |
| dc.description.uri | https://erisk.irlab.org/ | |
| dc.description.uri | https://github.com/manucouto1/Temporal-Word-Embeddings-for-Early-Detection-of-Psychological-Disorders-on-Social-Media | |
| dc.identifier.citation | Couto, M., Parapar, J., & Losada, D. E. (2026). LabChain: Enabling reproducible and modular scientific experiments in Python. SoftwareX, 33(102543). https://doi.org/10.1016/j.softx.2026.102543 | |
| dc.identifier.doi | 10.1016/j.softx.2026.102543 | |
| dc.identifier.issn | 2352-7110 | |
| dc.identifier.uri | https://hdl.handle.net/2183/47431 | |
| dc.language.iso | eng | |
| dc.publisher | Elsevier | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2022-137061OB-C22/ES/BUSQUEDA, SELECCION Y ORGANIZACION DE CONTENIDOS PARA NECESIDADES DE INFORMACION RELACIONADAS CON LA SALUD: BUSQUEDA Y DETECCION DE DESINFORMACION | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2022-137061OB-C21/ES/BUSQUEDA, SELECCION Y ORGANIZACION DE CONTENIDOS PARA NECESIDADES DE INFORMACION RELACIONADAS CON LA SALUD - CONSTRUCCION DE RECURSOS Y PERSONALIZACION | |
| dc.relation.uri | https://doi.org/10.1016/j.softx.2026.102543 | |
| dc.rights | Attribution-NonCommercial 4.0 International | en |
| dc.rights.accessRights | open access | |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | |
| dc.subject | Scientific workflows | |
| dc.subject | Pipeline architecture | |
| dc.subject | Hash-based caching | |
| dc.subject | Reproducible research | |
| dc.subject | Software engineering practices | |
| dc.title | LabChain: Enabling reproducible and modular scientific experiments in Python | |
| dc.type | journal article | |
| dc.type.hasVersion | VoR | |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | fef1a9cb-e346-4e53-9811-192e144f09d0 | |
| relation.isAuthorOfPublication.latestForDiscovery | fef1a9cb-e346-4e53-9811-192e144f09d0 |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Parapar_Javier_2026_LabChain.pdf
- Size:
- 2.52 MB
- Format:
- Adobe Portable Document Format

