Analysis of federated learning on non-independent and identically distributed sleep data

Anido Alonso, AdrianaÁlvarez-Estévez, Diego2026-04-072026-04-072026-03-09A. Anido-Alonso y D. Alvarez-Estevez, «Analysis of federated learning on non-independent and identically distributed sleep data», Physiol. Meas., vol. 47, n.o 3, p. 035006, mar. 2026, doi: 10.1088/1361-6579/ae4a821361-6579https://hdl.handle.net/2183/47887[Abstract]: Objective. We investigate the application of federated learning (FL) across heterogeneous, non-independent and identically distributed (non-IID) sleep data. We evaluate three algorithms-federated stochastic gradient descent, federated averaging, and federated proximal (FedProx)-in a realistic setting where non-IID characteristics arise from distinct sensor configurations, varying acquisition protocols, and diverse patient populations across independent sleep cohort datasets. Approach. We employ a dual-layered evaluation framework. First, we systematically analyze the impact of local training epochs and aggregation schemes (weighted and unweighted) on model convergence. Second, we introduce and adapt a generalized sub-sampling strategy designed to mitigate model drift caused by heterogeneous data distribution and volume imbalances across participating clients. To ensure robust external generalization, our evaluation utilizes six independent databases in a leave-one-database-out cross-validation scheme. Main results. Our analysis has evidenced that increasing the number of local training epochs adversely affects performance across all evaluated federated schemes. This confirms that extended local training exacerbates client drift, hindering global convergence. Furthermore, weighted aggregation consistently under-performs unweighted approaches, suggesting that disproportionate client contributions bias the global data representation. While the inclusion of a proximal term partially mitigates this instability by constraining local updates, the proposed sub-sampling strategy proves most effective. This approach yields consistent generalization results across all algorithms and minimizes performance downgrading, while significantly reducing computational overhead. Significance. This work addresses critical privacy concerns in centralized automated sleep staging by validating FL in realistic, multi-center scenarios. We provide evidence that decentralized strategies can achieve performance comparable to centralized methods, effectively overcoming data silos. Ultimately, this approach enables robust collaborative training while strictly maintaining data privacy-a fundamental requirement for widespread clinical implementation.eng© 2026 The Author(s). Published on behalf of Institute of Physics and Engineering in Medicine by IOP Publishing LtdAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/Deep-learningData-privacyFederated learningNon-independent and identically distributed dataSleep stagingAnalysis of federated learning on non-independent and identically distributed sleep datajournal articleopen access10.1088/1361-6579/ae4a82