Bagging cross-validated bandwidth selection in nonparametric regression estimation with applications to large-sized samples

UDC.coleccionInvestigación
UDC.departamentoMatemáticas
UDC.endPage18
UDC.grupoInvModelización, Optimización e Inferencia Estatística (MODES)
UDC.institutoCentroCITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación
UDC.issue108257
UDC.journalTitleComputational Statistics & Data Analysis
UDC.startPage1
UDC.volume213
dc.contributor.authorBarreiro-Ures, Daniel
dc.contributor.authorCao, Ricardo
dc.contributor.authorFrancisco-Fernández, Mario
dc.contributor.authorFernández-Casal, Rubén
dc.date.accessioned2025-08-13T11:43:53Z
dc.date.available2025-08-13T11:43:53Z
dc.date.issued2026-01
dc.description.abstract[Abstract]: Cross-validation is a well-known and widely used bandwidth selection method in nonparametric regression estimation. However, this technique has two remarkable drawbacks: the large variability of the selected bandwidths, and the inability to provide results in a reasonable time for very large sample sizes. To address these issues, bagged cross-validation bandwidth selectors are investigated. This approach consists in computing the cross-validation bandwidths for a finite number of subsamples and then rescaling the averaged smoothing parameters to the original sample size. Under a random-design regression model, asymptotic expressions up to a second-order for the bias and variance of the leave-one-out cross-validation bandwidth for the Nadaraya–Watson estimator are obtained. Subsequently, the asymptotic bias and variance and the limiting distribution for the bagged cross-validation selector are derived. Suitable choices of the number of subsamples and the subsample size lead to a convergence rate proportional to the inverse square root of the sample size for the bagging cross-validation selector, outperforming the slower rate typically associated with leave-one-out cross-validation. Several simulations and an illustration on a real dataset related to the COVID-19 pandemic show the behavior of our proposal and its better performance, in terms of statistical efficiency and computing time, when compared to leave-one-out cross-validation.
dc.description.sponsorshipThis work is part of the grants PID2020-113578RB-I00 and PID2023-147127OB-I00 “ERDF/EU”, funded by MCIN/AEI/10.13039/501100011033/. It has also been supported by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2024/14) and by CITIC as a center accredited for excellence within the Galician University System and a member of the CIGUS Network, receives subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia. Additionally, it is co-financed by the EU through the FEDER Galicia 2021-27 operational program (Ref. ED431G 2023/01). Funding for open access charge: Universidade da Coruña/CISUG.
dc.description.sponsorshipFinanciado para publicación en acceso aberto: Universidade da Coruña/CISUG
dc.description.sponsorshipXunta de Galicia; ED431C-2024/14
dc.description.sponsorshipXunta de Galicia; ED431G 2023/01
dc.identifier.citationBarreiro-Ures, D., Cao, R., Francisco-Fernández, M., & Casal, R. F. (2025). Bagging cross-validated bandwidth selection in nonparametric regression estimation with applications to large-sized samples. Computational Statistics & Data Analysis, 108257.
dc.identifier.doi10.1016/j.csda.2025.108257
dc.identifier.issn0167-9473
dc.identifier.issn1872-7352
dc.identifier.urihttps://hdl.handle.net/2183/45604
dc.language.isoeng
dc.publisherElsevier
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113578RB-I00/ES/METODOS ESTADISTICOS FLEXIBLES EN CIENCIA DE DATOS PARA DATOS COMPLEJOS Y DE GRAN VOLUMEN: TEORIA Y APLICACIONES/
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2023-147127OB-I00/ES/INFERENCIA ESTADISTICA UTILIZANDO METODOS FLEXIBLES PARA DATOS COMPLEJOS: TEORIA Y APPLICACIONES
dc.relation.urihttps://doi.org/10.1016/j.csda.2025.108257
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectBagging
dc.subjectBandwidth selection
dc.subjectCross-validation
dc.subjectKernel smoothing
dc.subjectNadaraya–Watson
dc.subjectSubsampling
dc.titleBagging cross-validated bandwidth selection in nonparametric regression estimation with applications to large-sized samples
dc.typejournal article
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isAuthorOfPublication5e21e4cc-372f-4718-8f5d-0024ba87a995
relation.isAuthorOfPublication3360aaca-39be-43b4-a458-974e79cdbc6b
relation.isAuthorOfPublication9724fb7a-c0db-4b2f-aa1a-7f79bf9c2064
relation.isAuthorOfPublication96b3567f-5599-4789-bdfe-e621516d18ef
relation.isAuthorOfPublication.latestForDiscovery5e21e4cc-372f-4718-8f5d-0024ba87a995

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Barreiro_Ures_Daniel_2026_Bagging_cross_validated_bandwidth_selection.pdf
Size:
4.25 MB
Format:
Adobe Portable Document Format