Population Subset Selection for the Use of a Validation Dataset for Overfitting Control in Genetic Programming

UDC.coleccionInvestigaciónes_ES
UDC.departamentoCiencias da Computación e Tecnoloxías da Informaciónes_ES
UDC.endPage271es_ES
UDC.grupoInvRedes de Neuronas Artificiais e Sistemas Adaptativos -Informática Médica e Diagnóstico Radiolóxico (RNASA - IMEDIR)es_ES
UDC.grupoInvRNASA - IMEDIR (INIBIC)es_ES
UDC.institutoCentroINIBIC - Instituto de Investigacións Biomédicas de A Coruñaes_ES
UDC.issue2es_ES
UDC.journalTitleJournal of Experimental & Theoretical Artificial Intelligencees_ES
UDC.startPage243es_ES
UDC.volume32es_ES
dc.contributor.authorRivero, Daniel
dc.contributor.authorFernández-Blanco, Enrique
dc.contributor.authorFernández-Lozano, Carlos
dc.contributor.authorPazos, A.
dc.date.accessioned2020-09-16T09:39:48Z
dc.date.available2020-09-16T09:39:48Z
dc.date.issued2019-07-31
dc.description.abstract[Abstract] Genetic Programming (GP) is a technique which is able to solve different problems through the evolution of mathematical expressions. However, in order to be applied, its tendency to overfit the data is one of its main issues. The use of a validation dataset is a common alternative to prevent overfitting in many Machine Learning (ML) techniques, including GP. But, there is one key point which differentiates GP and other ML techniques: instead of training a single model, GP evolves a population of models. Therefore, the use of the validation dataset has several possibilities because any of those evolved models could be evaluated. This work explores the possibility of using the validation dataset not only on the training-best individual but also in a subset with the training-best individuals of the population. The study has been conducted with 5 well-known databases performing regression or classification tasks. In most of the cases, the results of the study point out to an improvement when the validation dataset is used on a subset of the population instead of only on the training-best individual, which also induces a reduction on the number of nodes and, consequently, a lower complexity on the expressions.
dc.description.sponsorshipXunta de Galicia; ED431G/01es_ES
dc.description.sponsorshipXunta de Galicia; ED431D 2017/16es_ES
dc.description.sponsorshipXunta de Galicia; ED431C 2018/49es_ES
dc.description.sponsorshipXunta de Galicia; ED431D 2017/23es_ES
dc.description.sponsorshipInstituto de Salud Carlos III; PI17/01826es_ES
dc.identifier.citationRivero D, Fernandez-Blanco E, Fernandez-Lozano C, Pazos A. Population subset selection for the use of a validation dataset for overfitting control in genetic programming. J Exp Theor Artif Intell. 2020; 32(2):243-271es_ES
dc.identifier.issn0952-813X
dc.identifier.urihttp://hdl.handle.net/2183/26190
dc.language.isoenges_ES
dc.publisherTaylor & Francis Groupes_ES
dc.relation.urihttps://doi.org/10.1080/0952813X.2019.1647562es_ES
dc.rightsThis is an accepted manuscript of an articled published by Taylor & Francis in "Journal of Experimental & Theoretical Artificial Intelligence", avaliable at Taylor & Francis Onlinees_ES
dc.rights.accessRightsopen accesses_ES
dc.subjectGenetic programminges_ES
dc.subjectOverfittinges_ES
dc.subjectValidationes_ES
dc.subjectEvolutionary computationes_ES
dc.titlePopulation Subset Selection for the Use of a Validation Dataset for Overfitting Control in Genetic Programminges_ES
dc.typejournal articlees_ES
dspace.entity.typePublication
relation.isAuthorOfPublicationd8e10433-ea19-4a35-8cc6-0c7b9f143a6d
relation.isAuthorOfPublication244a6828-de1c-45f3-86b6-69bb81250814
relation.isAuthorOfPublicatione5ddd06a-3e7f-4bf4-9f37-5f1cf3d3430a
relation.isAuthorOfPublicationfa192a4c-bffd-4b23-87ae-e68c29350cdc
relation.isAuthorOfPublication.latestForDiscoveryd8e10433-ea19-4a35-8cc6-0c7b9f143a6d

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Pazos_2019_Population_subset_selection.pdf
Size:
843.23 KB
Format:
Adobe Portable Document Format
Description: