Analysis of federated learning on non-independent and identically distributed sleep data
| UDC.coleccion | Investigación | |
| UDC.departamento | Ciencias da Computación e Tecnoloxías da Información | |
| UDC.institutoCentro | CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación | |
| UDC.issue | 3 | |
| UDC.journalTitle | Physiological Measurement | |
| UDC.volume | 47 | |
| dc.contributor.author | Anido Alonso, Adriana | |
| dc.contributor.author | Álvarez-Estévez, Diego | |
| dc.date.accessioned | 2026-04-07T16:35:28Z | |
| dc.date.available | 2026-04-07T16:35:28Z | |
| dc.date.issued | 2026-03-09 | |
| dc.description.abstract | [Abstract]: Objective. We investigate the application of federated learning (FL) across heterogeneous, non-independent and identically distributed (non-IID) sleep data. We evaluate three algorithms-federated stochastic gradient descent, federated averaging, and federated proximal (FedProx)-in a realistic setting where non-IID characteristics arise from distinct sensor configurations, varying acquisition protocols, and diverse patient populations across independent sleep cohort datasets. Approach. We employ a dual-layered evaluation framework. First, we systematically analyze the impact of local training epochs and aggregation schemes (weighted and unweighted) on model convergence. Second, we introduce and adapt a generalized sub-sampling strategy designed to mitigate model drift caused by heterogeneous data distribution and volume imbalances across participating clients. To ensure robust external generalization, our evaluation utilizes six independent databases in a leave-one-database-out cross-validation scheme. Main results. Our analysis has evidenced that increasing the number of local training epochs adversely affects performance across all evaluated federated schemes. This confirms that extended local training exacerbates client drift, hindering global convergence. Furthermore, weighted aggregation consistently under-performs unweighted approaches, suggesting that disproportionate client contributions bias the global data representation. While the inclusion of a proximal term partially mitigates this instability by constraining local updates, the proposed sub-sampling strategy proves most effective. This approach yields consistent generalization results across all algorithms and minimizes performance downgrading, while significantly reducing computational overhead. Significance. This work addresses critical privacy concerns in centralized automated sleep staging by validating FL in realistic, multi-center scenarios. We provide evidence that decentralized strategies can achieve performance comparable to centralized methods, effectively overcoming data silos. Ultimately, this approach enables robust collaborative training while strictly maintaining data privacy-a fundamental requirement for widespread clinical implementation. | |
| dc.description.sponsorship | This study has been supported by project RYC2022-038121-I, funded by MCIN/AEI/10.13 039/50110 0011033 and European Social Fund Plus (ESF+), project PID2023-147422OB-I00 funded by MCIU/AEI/10.13039/501100011033 and by the European FEDER program, and project ED431F 2025/35 funded by Xunta de Galicia. Authors wish to acknowledge the support received from Universidade da Coruña and Centro de Investigación de Galicia ‘CITIC’, center accredited for excellence within the Galician University System and a member of the CIGUS Network. CITIC receives subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia and it is co-financed by the EU through the FEDER Galicia 2021-27 operational program ED431G 2023/01. Furthermore, this research project was made possible through the access granted by the Galician Supercomputing Center (CESGA) to its supercomputing infrastructure. The supercomputer FinisTerrae III and its permanent data storage system have been funded by the NextGeneration EU 2021 Recovery, Transformation and Resilience Plan, ICT2021-006904, and also from the Pluriregional Operational Programme of Spain 2014-2020 of the European Regional Development Fund (ERDF), ICTS-2019-02-CESGA-3, and from the State Programme for the Promotion of Scientific and Technical Research of Excellence of the State Plan for Scientific and Technical Research and Innovation 2013-2016 State subprogramme for scientific and technical infrastructures and equipment of ERDF, CESG15-DE-3114 | |
| dc.description.sponsorship | Xunta de Galicia; ED431F 2025/35 | |
| dc.description.sponsorship | Xunta de Galicia; ED431G 2023/01 | |
| dc.description.sponsorship | Xunta de Galicia; ICTS-2019-02-CESGA-3 | |
| dc.description.sponsorship | Xunta de Galicia; CESG15-DE-3114 | |
| dc.identifier.citation | A. Anido-Alonso y D. Alvarez-Estevez, «Analysis of federated learning on non-independent and identically distributed sleep data», Physiol. Meas., vol. 47, n.o 3, p. 035006, mar. 2026, doi: 10.1088/1361-6579/ae4a82 | |
| dc.identifier.doi | 10.1088/1361-6579/ae4a82 | |
| dc.identifier.issn | 1361-6579 | |
| dc.identifier.uri | https://hdl.handle.net/2183/47887 | |
| dc.language.iso | eng | |
| dc.publisher | IOP Science | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/RYC2022-038121-I/ES/BIOMEDICAL SIGNAL PROCESSING AND ARTIFICIAL INTELLIGENCE FOR AIDING CLINICAL DIAGNOSIS IN SLEEP MEDICINE | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2023-147422OB-I00/ES/ALGORITMOS DE APRENDIZAJE AUTOMATICO DE NUEVA GENERACION PARA EL ANALISIS DE REGISTROS MEDICOS DEL SUEÑO | |
| dc.relation.projectID | info:eu-repo/grantAgreement/MICINN/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/ICT2021-006904/ES/ | |
| dc.relation.uri | https://doi.org/10.1088/1361-6579/ae4a82 | |
| dc.rights | © 2026 The Author(s). Published on behalf of Institute of Physics and Engineering in Medicine by IOP Publishing Ltd | |
| dc.rights | Attribution 4.0 International | en |
| dc.rights.accessRights | open access | |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | Deep-learning | |
| dc.subject | Data-privacy | |
| dc.subject | Federated learning | |
| dc.subject | Non-independent and identically distributed data | |
| dc.subject | Sleep staging | |
| dc.title | Analysis of federated learning on non-independent and identically distributed sleep data | |
| dc.type | journal article | |
| dc.type.hasVersion | VoR | |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | 2f33139f-83f9-4a21-9fb4-43f4322a8a87 | |
| relation.isAuthorOfPublication.latestForDiscovery | 2f33139f-83f9-4a21-9fb4-43f4322a8a87 |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- AnidoAlonso_Adriana_2026_analysis_FL_non_ind.pdf
- Size:
- 3.01 MB
- Format:
- Adobe Portable Document Format

