Subagging for bandwidth selection: a computationally efficient approach to kernel density estimation
| UDC.coleccion | Investigación | |
| UDC.departamento | Matemáticas | |
| UDC.grupoInv | Modelización, Optimización e Inferencia Estatística (MODES) | |
| UDC.institutoCentro | CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación | |
| UDC.journalTitle | Computational Statistics | |
| UDC.startPage | 29 | |
| UDC.volume | 41 | |
| dc.contributor.author | Francisco-Fernández, Mario | |
| dc.contributor.author | Barreiro-Ures, Daniel | |
| dc.contributor.author | Cao, Ricardo | |
| dc.date.accessioned | 2026-01-14T10:28:34Z | |
| dc.date.available | 2026-01-14T10:28:34Z | |
| dc.date.issued | 2026-01-13 | |
| dc.description | Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG Link to the electronic supplementary material: Supplementary file 1 (pdf 277 KB) https://static-content.springer.com/esm/art%3A10.1007%2Fs00180-025-01712-4/MediaObjects/180_2025_1712_MOESM1_ESM.pdf | |
| dc.description.abstract | [Abstract]: Bandwidth selection is a central issue in kernel density estimation. For large datasets, classical selectors such as cross-validation and bootstrap become computationally intensive and may yield bandwidths with high variability. This paper proposes subagging-based versions of several popular selectors, including cross-validation, direct plug-in, and bootstrap methods. These selectors are constructed by computing bandwidths over multiple subsamples (without replacement), rescaling them, and averaging the results. We also introduce a novel regression-based approach, Regression Subbagging (RSB), which extrapolates the optimal bandwidth via a log-log regression, avoiding the need to assume a known convergence rate. We assess statistical accuracy in terms of the mean squared error of the selectors and the corresponding MISE of the resulting kernel estimators, using the optimal bandwidth as a benchmark. Computational efficiency is evaluated via parallel implementations using the parallel and foreach packages in R, reporting speedups as a function of the number of CPU cores. The results confirm that subagging improves or preserves statistical performance while yielding substantial runtime reductions, especially for demanding selectors like cross-validation and bootstrap. The RSB variant, in particular, stands out as a scalable, flexible, and robust solution. The core methods are implemented in the R package baggingbwsel, available on CRAN. | |
| dc.description.sponsorship | Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. Funding for open acess charge: Universidade da Coruña/CISUG. This research was supported by the grants PID2020-113578RB-I00 and PID2023-147127OB-I00 “ERDF/EU", funded by MCIN/AEI/10.13039/501100011033. It was also supported by the Xunta de Galicia through the program “Grupos de Referencia Competitiva" (grant ED431C-2024/14) and by CITIC, a center accredited for excellence within the Galician University System and a member of the CIGUS Network. CITIC receives subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia and is co-financed by the European Union through the FEDER Galicia 2021–27 operational program (Ref. ED431G 2023/01). | |
| dc.description.sponsorship | Xunta de Galicia; ED431C-2024/14 | |
| dc.description.sponsorship | Xunta de Galicia; ED431G 2023/01 | |
| dc.identifier.citation | Francisco-Fernández, M., Barreiro-Ures, D. & Cao, R. Subagging for bandwidth selection: a computationally efficient approach to kernel density estimation. Comput Stat 41, 29 (2026). https://doi.org/10.1007/s00180-025-01712-4 | |
| dc.identifier.doi | 10.1007/s00180-025-01712-4 | |
| dc.identifier.issn | 1613-9658 | |
| dc.identifier.uri | https://hdl.handle.net/2183/46845 | |
| dc.language.iso | eng | |
| dc.publisher | Springer | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113578RB-I00/ES/METODOS ESTADISTICOS FLEXIBLES EN CIENCIA DE DATOS PARA DATOS COMPLEJOS Y DE GRAN VOLUMEN: TEORIA Y APLICACIONES/ | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2023-147127OB-I00/ES/INFERENCIA ESTADISTICA UTILIZANDO METODOS FLEXIBLES PARA DATOS COMPLEJOS: TEORIA Y APPLICACIONES | |
| dc.relation.uri | https://doi.org/10.1007/s00180-025-01712-4 | |
| dc.rights | Attribution 4.0 International | en |
| dc.rights.accessRights | open access | |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | Cross-validation | |
| dc.subject | Bootstrap methods | |
| dc.subject | Computational complexity | |
| dc.subject | Subsampling | |
| dc.subject | Regression extrapolation | |
| dc.title | Subagging for bandwidth selection: a computationally efficient approach to kernel density estimation | |
| dc.type | journal article | |
| dc.type.hasVersion | VoR | |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | 9724fb7a-c0db-4b2f-aa1a-7f79bf9c2064 | |
| relation.isAuthorOfPublication | 5e21e4cc-372f-4718-8f5d-0024ba87a995 | |
| relation.isAuthorOfPublication | 3360aaca-39be-43b4-a458-974e79cdbc6b | |
| relation.isAuthorOfPublication.latestForDiscovery | 9724fb7a-c0db-4b2f-aa1a-7f79bf9c2064 |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- FranciscoFernandez_Mario_2026_Subagging_for_bandwidth_selection.pdf
- Size:
- 2.85 MB
- Format:
- Adobe Portable Document Format

