Mostrar o rexistro simple do ítem

dc.contributor.authorEnes, Jonatan
dc.contributor.authorExpósito, Roberto R.
dc.contributor.authorFuentes Rodríguez, Jose
dc.contributor.authorLópez Cacheiro, Javier
dc.contributor.authorTouriño, Juan
dc.date.accessioned2023-03-27T15:25:31Z
dc.date.available2023-03-27T15:25:31Z
dc.date.issued2023-05
dc.identifier.citationEnes, J., Expósito, R. R., Fuentes, J., Cacheiro, J. L., & Touriño, J. (2023). A pipeline architecture for feature-based unsupervised clustering using multivariate time series from HPC jobs. Information Fusion, 93, 1-20. 10.1016/j.inffus.2022.12.017es_ES
dc.identifier.issn1566-2535
dc.identifier.urihttp://hdl.handle.net/2183/32787
dc.description.abstract[Abstract]: Time series are key across industrial and research areas for their ability to model behaviour across time, making them ideal for a wide range of use cases such as event monitoring, trend prediction or anomaly detection. This is even more so due to the increasing monitoring capabilities in many areas, with the subsequent massive data generation. But it is also interesting to consider the potential of time series for Machine Learning processing, often fused with Big Data, to search for useful information and solve real-world problems. However, time series can be studied individually, representing a single entity or variable to be analysed, or in a grouped fashion, to study and represent a more complex entity or scenario. In this latter case we are dealing with multivariate time series, which usually imply different approaches when dealt with. In this paper, we present a pipeline architecture to process and cluster multiple groups of multivariate time series. To implement this, we apply a multi-process solution composed by a feature-based extraction stage, followed by a dimension reduction, and finally, several clustering algorithms. The pipeline is also highly configurable in terms of the stage techniques to be used, allowing to perform a search with several combinations for the most promising results. The pipeline has been experimentally applied to batches of HPC jobs from different users of a supercomputer, with the multivariate time series coming from the monitoring of several node resource metrics. The results show how it is possible to apply this multi-process information fusion to create different meaningful clusters from the batches, using only the time series, without any labelling information, thus being an unsupervised scenario. Optionally, the pipeline also supports an outlier detection stage to find and separate jobs that are radically different when compared to others on a dataset. These outliers can be removed for a better clustering, and later reviewed looking for anomalies, or if numerous, fed back to the pipeline to identify possible groupings. The results also include some outliers found in the experiments, as well as scenarios where they are clustered, or ignored and not removed at all. In addition, by leveraging Big Data technologies like Spark, the pipeline is proven to be scalable by working with up to hundreds of jobs and thousands of time series.es_ES
dc.description.sponsorshipXunta de Galicia; ED431G 2019/01es_ES
dc.description.sponsorshipXunta de Galicia; ED431C 2021/30es_ES
dc.description.sponsorshipThis research was funded by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00/AEI/10.13039/501100011033), and by Xunta de Galicia, Spain and FEDER funds of the European Union (Centro de Investigación de Galicia accreditation 2019–2022, ref. ED431G 2019/01; Consolidation Program of Competitive Reference Groups, ref. ED431C 2021/30). Funding for open access charge: Universidade da Coruña/CISUG.es_ES
dc.language.isoenges_ES
dc.publisherElsevier B.V.es_ES
dc.relationinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-104184RB-I00/ES/DESAFIOS ACTUALES EN HPC: ARQUITECTURAS, SOFTWARE Y APLICACIONESes_ES
dc.relation.urihttps://doi.org/10.1016/j.inffus.2022.12.017es_ES
dc.rightsAtribución-NoComercial-SinDerivadas 3.0 Españaes_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/*
dc.subjectUnsupervised clusteringes_ES
dc.subjectFeature extractiones_ES
dc.subjectMultivariate time serieses_ES
dc.subjectAnomaly detectiones_ES
dc.subjectHPC jobses_ES
dc.titleA pipeline architecture for feature-based unsupervised clustering using multivariate time series from HPC jobses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.rights.accessinfo:eu-repo/semantics/openAccesses_ES
UDC.journalTitleInformation Fusiones_ES
UDC.volume93es_ES
UDC.issueMayes_ES
UDC.startPage1es_ES
UDC.endPage20es_ES


Ficheiros no ítem

Thumbnail
Thumbnail

Este ítem aparece na(s) seguinte(s) colección(s)

Mostrar o rexistro simple do ítem