Improving the prediction accuracy of statistical models: A new hierarchical clustering approach

UDC.coleccionInvestigación
UDC.departamentoMatemáticas
UDC.grupoInvModelización, Optimización e Inferencia Estatística (MODES)
UDC.journalTitleStatistics and Computing
UDC.startPage168
UDC.volume35
dc.contributor.authorLópez-Oriona, Ángel
dc.contributor.authorSun, Ying
dc.contributor.authorVilar, José
dc.date.accessioned2025-09-09T09:02:31Z
dc.date.available2025-09-09T09:02:31Z
dc.date.issued2025-08-11
dc.descriptionOpen access publishing provided by King Abdullah University of Science and Technology (KAUST).
dc.description.abstract[Abstract]: Statisticians and machine learning practitioners frequently encounter datasets originated from multiple populations but containing the same type of measurements. In such cases, predictive analytics is typically carried out by either fitting a separate model to each dataset independently or by merging the datasets and fitting a single model to the combined data. These approaches overlook the potential existence of multiple groups of datasets associated with different underlying models, and, therefore, fail to exploit the inherent similarity between datasets to improve predictions. A third alternative is to perform pairwise comparisons between the populations before fitting the models. However, this is not always feasible, can become a very challenging task with complex models, and often does not rely on predictive accuracy. To address these issues, we propose a clustering approach designed to improve predictions in general databases. The method is based on a novel type of objective function that represents the total by-group prediction error. The clustering problem is solved using a hierarchical-type algorithm of agglomerative nature that automatically obtains the resulting clustering partition in a fully data-driven manner. An additional advantage of this procedure is that the number of clusters is treated as a variable in the minimization problem, allowing it to be determined naturally in a way that optimizes the predictive accuracy of the underlying models. Furthermore, the technique is versatile and can be used with any type of model for both regression, and classification tasks. Several simulation experiments and two real-world applications involving housing prices demonstrate that the procedure outperforms benchmark approaches in terms of predictive accuracy
dc.description.sponsorshipÁngel López-Oriona and Ying Sun thank King Abdullah University of Science and Technology (KAUST) for its support. The research by José A. Vilar is supported by the grants PID2020-113578RB-I00 and PID2023-147127OB-I00 "ERDF/EU", funded by MCIN/AEI/10.13039/501100011033/. It has also been supported by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2024/14) and by CITIC as a center accredited for excellence within the Galician University System and a member of the CIGUS Network, receiving subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia. Additionally, it is co-financed by the EU through the FEDER Galicia 2021-27 operational program (Ref. ED431G 2023/01).
dc.description.sponsorshipXunta de Galicia; ED431C-2024/14
dc.description.sponsorshipXunta de Galicia; ED431G 2023/01
dc.identifier.citationLópez-Oriona, Á., Sun, Y. & Vilar, J.A. Improving the prediction accuracy of statistical models: A new hierarchical clustering approach. Stat Comput 35, 168 (2025). https://doi.org/10.1007/s11222-025-10683-x
dc.identifier.issn1573-1375
dc.identifier.issn0960-3174
dc.identifier.urihttps://hdl.handle.net/2183/45733
dc.language.isoeng
dc.publisherSpringer Nature
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113578RB-I00/ES/METODOS ESTADISTICOS FLEXIBLES EN CIENCIA DE DATOS PARA DATOS COMPLEJOS Y DE GRAN VOLUMEN: TEORIA Y APLICACIONES/
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2023-147127OB-I00/ES/INFERENCIA ESTADISTICA UTILIZANDO METODOS FLEXIBLES PARA DATOS COMPLEJOS: TEORIA Y APPLICACIONES
dc.relation.urihttps://doi.org/10.1007/s11222-025-10683-x
dc.rights© The Author(s) 2025
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectCategorization
dc.subjectData Mining
dc.subjectFunctional clustering
dc.subjectMachine Learning
dc.subjectPredictive medicine
dc.subjectStatistical Learning
dc.titleImproving the prediction accuracy of statistical models: A new hierarchical clustering approach
dc.typejournal article
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isAuthorOfPublicationc9381eef-6e06-41b8-a15c-a194bdff8d03
relation.isAuthorOfPublication.latestForDiscoveryc9381eef-6e06-41b8-a15c-a194bdff8d03

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Vilar_Jose_2025_Improv_prediction_accuracy_stat_model.pdf
Size:
608.99 KB
Format:
Adobe Portable Document Format