Use this link to cite:
https://hdl.handle.net/2183/47887 Analysis of federated learning on non-independent and identically distributed sleep data
Loading...
Identifiers
Publication date
Authors
Advisors
Other responsabilities
Journal Title
Bibliographic citation
A. Anido-Alonso y D. Alvarez-Estevez, «Analysis of federated learning on non-independent and identically distributed sleep data», Physiol. Meas., vol. 47, n.o 3, p. 035006, mar. 2026, doi: 10.1088/1361-6579/ae4a82
Type of academic work
Academic degree
Abstract
[Abstract]: Objective. We investigate the application of federated learning (FL) across heterogeneous, non-independent and identically distributed (non-IID) sleep data. We evaluate three algorithms-federated stochastic gradient descent, federated averaging, and federated proximal (FedProx)-in a realistic setting where non-IID characteristics arise from distinct sensor configurations, varying acquisition protocols, and diverse patient populations across independent sleep cohort datasets. Approach. We employ a dual-layered evaluation framework. First, we systematically analyze the impact of local training epochs and aggregation schemes (weighted and unweighted) on model convergence. Second, we introduce and adapt a generalized sub-sampling strategy designed to mitigate model drift caused by heterogeneous data distribution and volume imbalances across participating clients. To ensure robust external generalization, our evaluation utilizes six independent databases in a leave-one-database-out cross-validation scheme. Main results. Our analysis has evidenced that increasing the number of local training epochs adversely affects performance across all evaluated federated schemes. This confirms that extended local training exacerbates client drift, hindering global convergence. Furthermore, weighted aggregation consistently under-performs unweighted approaches, suggesting that disproportionate client contributions bias the global data representation. While the inclusion of a proximal term partially mitigates this instability by constraining local updates, the proposed sub-sampling strategy proves most effective. This approach yields consistent generalization results across all algorithms and minimizes performance downgrading, while significantly reducing computational overhead. Significance. This work addresses critical privacy concerns in centralized automated sleep staging by validating FL in realistic, multi-center scenarios. We provide evidence that decentralized strategies can achieve performance comparable to centralized methods, effectively overcoming data silos. Ultimately, this approach enables robust collaborative training while strictly maintaining data privacy-a fundamental requirement for widespread clinical implementation.
Description
Editor version
Rights
© 2026 The Author(s). Published on behalf of Institute of Physics and Engineering in Medicine by IOP Publishing Ltd
Attribution 4.0 International
Attribution 4.0 International







