Use this link to cite:
https://hdl.handle.net/2183/46474 Random Forest: descrición, funcionamento e aplicación práctica. Estimación do coñecemento financeiro obxectivo mediante datos do Flash Eurobarometer 525 (2023).
Loading...
Identifiers
Publication date
Authors
Rodríguez Sanjuán, Martín
Other responsabilities
Journal Title
Bibliographic citation
Type of academic work
Academic degree
Abstract
[Resumo]: O presente traballo ten como obxectivo describir e aplicar o algoritmo Random Forest no contexto da aprendizaxe automática (Machine Learning). Ilústrase o seu funcionamento e utilidade práctica a través dun caso de clasificación real, centrado no nivel de alfabetización financeira da cidadanía europea.
Nunha primeira parte introdúcese o marco teórico do Machine Learning, abordando os seus principais tipos, aplicacións e fundamentos metodolóxicos. A continuación, nunha segunda parte, explórase o algoritmo Random Forest, salientando as súas vantaxes fronte a outros métodos de aprendizaxe supervisada, como a súa robustez fronte ao sobreaxuste ou a capacidade para manexar variables categóricas sen necesidade de transformación previa.
Finalmente, nunha terceira sección, recóllese a súa aplicación. Nela emprégase un conxunto de datos do Flash Eurobarometer 525 (2023), que recompila información sobre coñecementos, actitudes e comportamentos financeiros dunha mostra representativa da poboación adulta da Unión Europea. A variable resposta é o nivel obxectivo de alfabetización financeira (coñecemento obxectivo), clasificado en tres categorías (baixo, medio e alto), e as variables explicativas inclúen factores sociodemográficos, percepcións, actitudes e comportamentos.
O modelo foi axustado mediante validación cruzada e optimización de hiperparámetros, acadando unha taxa de exactitude (accuracy) do 53,8 %. Os resultados mostran que o país de residencia, o nivel de ingresos e a ocupación son os predictores máis relevantes. Ademais, empregáronse gráficos de probabilidade por nivel para visualizar a influencia marxinal de determinadas variables sobre as predicións do modelo.
Este traballo demostra o potencial dos modelos Random Forest como ferramenta de análise e clasificación no ámbito económico, e exemplifica a súa aplicación práctica a problemas reais de interese público como a alfabetización financeira.
[Abstract]: The main objective of this paper is to describe and apply the Random Forest algorithm in the context of machine learning. Its functioning and practical utility are illustrated through a real-world classification case focused on the level of financial literacy among European citizens. The first part introduces the theoretical framework of machine learning, covering its main types, applications, and methodological foundations. The second part explores the Random Forest algorithm, highlighting its advantages over other supervised learning methods, such as its robustness against overfitting and its ability to handle categorical variables without prior transformation. Finally, the third, applied section presents the implementation of the model using data from the Flash Eurobarometer 525 (2023), which gathers information on the financial knowledge, attitudes, and behaviours of a representative sample of the adult population in the European Union. The response variable is the objective level of financial literacy, classified into three categories (low, medium, and high), and the explanatory variables include sociodemographic factors, personal perceptions, attitudes, and behaviours. The model was fitted using cross-validation and hyperparameter tuning, achieving an accuracy rate of 53.8%. The results show that country of residence, income level, and current occupation are the most relevant predictors. Additionally, class probability plots were used to visualize the marginal influence of selected variables on the model’s predictions. This work demonstrates the potential of Random Forest models as a tool for analysis and classification in the economic field and exemplifies their practical application to real-world public interest issues such as financial literacy.
[Abstract]: The main objective of this paper is to describe and apply the Random Forest algorithm in the context of machine learning. Its functioning and practical utility are illustrated through a real-world classification case focused on the level of financial literacy among European citizens. The first part introduces the theoretical framework of machine learning, covering its main types, applications, and methodological foundations. The second part explores the Random Forest algorithm, highlighting its advantages over other supervised learning methods, such as its robustness against overfitting and its ability to handle categorical variables without prior transformation. Finally, the third, applied section presents the implementation of the model using data from the Flash Eurobarometer 525 (2023), which gathers information on the financial knowledge, attitudes, and behaviours of a representative sample of the adult population in the European Union. The response variable is the objective level of financial literacy, classified into three categories (low, medium, and high), and the explanatory variables include sociodemographic factors, personal perceptions, attitudes, and behaviours. The model was fitted using cross-validation and hyperparameter tuning, achieving an accuracy rate of 53.8%. The results show that country of residence, income level, and current occupation are the most relevant predictors. Additionally, class probability plots were used to visualize the marginal influence of selected variables on the model’s predictions. This work demonstrates the potential of Random Forest models as a tool for analysis and classification in the economic field and exemplifies their practical application to real-world public interest issues such as financial literacy.





