Improving the prediction accuracy of statistical models: A new hierarchical clustering approach

López-Oriona, Ángel; Sun, Ying; Vilar, José

Improving the prediction accuracy of statistical models: A new hierarchical clustering approach

UDC.coleccion	Investigación
UDC.departamento	Matemáticas
UDC.grupoInv	Modelización, Optimización e Inferencia Estatística (MODES)
UDC.journalTitle	Statistics and Computing
UDC.startPage	168
UDC.volume	35
dc.contributor.author	López-Oriona, Ángel
dc.contributor.author	Sun, Ying
dc.contributor.author	Vilar, José
dc.date.accessioned	2025-09-09T09:02:31Z
dc.date.available	2025-09-09T09:02:31Z
dc.date.issued	2025-08-11
dc.description	Open access publishing provided by King Abdullah University of Science and Technology (KAUST).
dc.description.abstract	[Abstract]: Statisticians and machine learning practitioners frequently encounter datasets originated from multiple populations but containing the same type of measurements. In such cases, predictive analytics is typically carried out by either fitting a separate model to each dataset independently or by merging the datasets and fitting a single model to the combined data. These approaches overlook the potential existence of multiple groups of datasets associated with different underlying models, and, therefore, fail to exploit the inherent similarity between datasets to improve predictions. A third alternative is to perform pairwise comparisons between the populations before fitting the models. However, this is not always feasible, can become a very challenging task with complex models, and often does not rely on predictive accuracy. To address these issues, we propose a clustering approach designed to improve predictions in general databases. The method is based on a novel type of objective function that represents the total by-group prediction error. The clustering problem is solved using a hierarchical-type algorithm of agglomerative nature that automatically obtains the resulting clustering partition in a fully data-driven manner. An additional advantage of this procedure is that the number of clusters is treated as a variable in the minimization problem, allowing it to be determined naturally in a way that optimizes the predictive accuracy of the underlying models. Furthermore, the technique is versatile and can be used with any type of model for both regression, and classification tasks. Several simulation experiments and two real-world applications involving housing prices demonstrate that the procedure outperforms benchmark approaches in terms of predictive accuracy
dc.description.sponsorship	Ángel López-Oriona and Ying Sun thank King Abdullah University of Science and Technology (KAUST) for its support. The research by José A. Vilar is supported by the grants PID2020-113578RB-I00 and PID2023-147127OB-I00 "ERDF/EU", funded by MCIN/AEI/10.13039/501100011033/. It has also been supported by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2024/14) and by CITIC as a center accredited for excellence within the Galician University System and a member of the CIGUS Network, receiving subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia. Additionally, it is co-financed by the EU through the FEDER Galicia 2021-27 operational program (Ref. ED431G 2023/01).
dc.description.sponsorship	Xunta de Galicia; ED431C-2024/14
dc.description.sponsorship	Xunta de Galicia; ED431G 2023/01
dc.identifier.citation	López-Oriona, Á., Sun, Y. & Vilar, J.A. Improving the prediction accuracy of statistical models: A new hierarchical clustering approach. Stat Comput 35, 168 (2025). https://doi.org/10.1007/s11222-025-10683-x
dc.identifier.issn	1573-1375
dc.identifier.issn	0960-3174
dc.identifier.uri	https://hdl.handle.net/2183/45733
dc.language.iso	eng
dc.publisher	Springer Nature
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113578RB-I00/ES/METODOS ESTADISTICOS FLEXIBLES EN CIENCIA DE DATOS PARA DATOS COMPLEJOS Y DE GRAN VOLUMEN: TEORIA Y APLICACIONES/
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2023-147127OB-I00/ES/INFERENCIA ESTADISTICA UTILIZANDO METODOS FLEXIBLES PARA DATOS COMPLEJOS: TEORIA Y APPLICACIONES
dc.relation.uri	https://doi.org/10.1007/s11222-025-10683-x
dc.rights	© The Author(s) 2025
dc.rights.accessRights	open access
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Categorization
dc.subject	Data Mining
dc.subject	Functional clustering
dc.subject	Machine Learning
dc.subject	Predictive medicine
dc.subject	Statistical Learning
dc.title	Improving the prediction accuracy of statistical models: A new hierarchical clustering approach
dc.type	journal article
dc.type.hasVersion	VoR
dspace.entity.type	Publication
relation.isAuthorOfPublication	c9381eef-6e06-41b8-a15c-a194bdff8d03
relation.isAuthorOfPublication.latestForDiscovery	c9381eef-6e06-41b8-a15c-a194bdff8d03

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Vilar_Jose_2025_Improv_prediction_accuracy_stat_model.pdf
Size:: 608.99 KB
Format:: Adobe Portable Document Format

Download

Collections

Investigación (FIC)