Subagging for bandwidth selection: a computationally efficient approach to kernel density estimation

UDC.coleccionInvestigación
UDC.departamentoMatemáticas
UDC.grupoInvModelización, Optimización e Inferencia Estatística (MODES)
UDC.institutoCentroCITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación
UDC.journalTitleComputational Statistics
UDC.startPage29
UDC.volume41
dc.contributor.authorFrancisco-Fernández, Mario
dc.contributor.authorBarreiro-Ures, Daniel
dc.contributor.authorCao, Ricardo
dc.date.accessioned2026-01-14T10:28:34Z
dc.date.available2026-01-14T10:28:34Z
dc.date.issued2026-01-13
dc.descriptionFinanciado para publicación en acceso aberto: Universidade da Coruña/CISUG Link to the electronic supplementary material: Supplementary file 1 (pdf 277 KB) https://static-content.springer.com/esm/art%3A10.1007%2Fs00180-025-01712-4/MediaObjects/180_2025_1712_MOESM1_ESM.pdf
dc.description.abstract[Abstract]: Bandwidth selection is a central issue in kernel density estimation. For large datasets, classical selectors such as cross-validation and bootstrap become computationally intensive and may yield bandwidths with high variability. This paper proposes subagging-based versions of several popular selectors, including cross-validation, direct plug-in, and bootstrap methods. These selectors are constructed by computing bandwidths over multiple subsamples (without replacement), rescaling them, and averaging the results. We also introduce a novel regression-based approach, Regression Subbagging (RSB), which extrapolates the optimal bandwidth via a log-log regression, avoiding the need to assume a known convergence rate. We assess statistical accuracy in terms of the mean squared error of the selectors and the corresponding MISE of the resulting kernel estimators, using the optimal bandwidth as a benchmark. Computational efficiency is evaluated via parallel implementations using the parallel and foreach packages in R, reporting speedups as a function of the number of CPU cores. The results confirm that subagging improves or preserves statistical performance while yielding substantial runtime reductions, especially for demanding selectors like cross-validation and bootstrap. The RSB variant, in particular, stands out as a scalable, flexible, and robust solution. The core methods are implemented in the R package baggingbwsel, available on CRAN.
dc.description.sponsorshipOpen Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. Funding for open acess charge: Universidade da Coruña/CISUG. This research was supported by the grants PID2020-113578RB-I00 and PID2023-147127OB-I00 “ERDF/EU", funded by MCIN/AEI/10.13039/501100011033. It was also supported by the Xunta de Galicia through the program “Grupos de Referencia Competitiva" (grant ED431C-2024/14) and by CITIC, a center accredited for excellence within the Galician University System and a member of the CIGUS Network. CITIC receives subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia and is co-financed by the European Union through the FEDER Galicia 2021–27 operational program (Ref. ED431G 2023/01).
dc.description.sponsorshipXunta de Galicia; ED431C-2024/14
dc.description.sponsorshipXunta de Galicia; ED431G 2023/01
dc.identifier.citationFrancisco-Fernández, M., Barreiro-Ures, D. & Cao, R. Subagging for bandwidth selection: a computationally efficient approach to kernel density estimation. Comput Stat 41, 29 (2026). https://doi.org/10.1007/s00180-025-01712-4
dc.identifier.doi10.1007/s00180-025-01712-4
dc.identifier.issn1613-9658
dc.identifier.urihttps://hdl.handle.net/2183/46845
dc.language.isoeng
dc.publisherSpringer
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113578RB-I00/ES/METODOS ESTADISTICOS FLEXIBLES EN CIENCIA DE DATOS PARA DATOS COMPLEJOS Y DE GRAN VOLUMEN: TEORIA Y APLICACIONES/
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2023-147127OB-I00/ES/INFERENCIA ESTADISTICA UTILIZANDO METODOS FLEXIBLES PARA DATOS COMPLEJOS: TEORIA Y APPLICACIONES
dc.relation.urihttps://doi.org/10.1007/s00180-025-01712-4
dc.rightsAttribution 4.0 Internationalen
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectCross-validation
dc.subjectBootstrap methods
dc.subjectComputational complexity
dc.subjectSubsampling
dc.subjectRegression extrapolation
dc.titleSubagging for bandwidth selection: a computationally efficient approach to kernel density estimation
dc.typejournal article
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isAuthorOfPublication9724fb7a-c0db-4b2f-aa1a-7f79bf9c2064
relation.isAuthorOfPublication5e21e4cc-372f-4718-8f5d-0024ba87a995
relation.isAuthorOfPublication3360aaca-39be-43b4-a458-974e79cdbc6b
relation.isAuthorOfPublication.latestForDiscovery9724fb7a-c0db-4b2f-aa1a-7f79bf9c2064

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
FranciscoFernandez_Mario_2026_Subagging_for_bandwidth_selection.pdf
Size:
2.85 MB
Format:
Adobe Portable Document Format