Skip navigation
  •  Home
  • UDC 
    • Getting started
    • RUC Policies
    • FAQ
    • FAQ on Copyright
    • More information at INFOguias UDC
  • Browse 
    • Communities
    • Browse by:
    • Issue Date
    • Author
    • Title
    • Subject
  • Help
    • español
    • Gallegan
    • English
  • Login
  •  English 
    • Español
    • Galego
    • English
  
View Item 
  •   DSpace Home
  • Facultade de Informática
  • Investigación (FIC)
  • View Item
  •   DSpace Home
  • Facultade de Informática
  • Investigación (FIC)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A pipeline architecture for feature-based unsupervised clustering using multivariate time series from HPC jobs

Thumbnail
View/Open
Enes_Exposito_fuentes_Cacheiro_Touriño_2023_Pipeline_architecture_feature_based_unsupervised_clustering.pdf (5.654Mb)
Use this link to cite
http://hdl.handle.net/2183/32787
Atribución-NoComercial-SinDerivadas 3.0 España
Except where otherwise noted, this item's license is described as Atribución-NoComercial-SinDerivadas 3.0 España
Collections
  • Investigación (FIC) [1678]
Metadata
Show full item record
Title
A pipeline architecture for feature-based unsupervised clustering using multivariate time series from HPC jobs
Author(s)
Enes, Jonatan
Expósito, Roberto R.
Fuentes Rodríguez, Jose
López Cacheiro, Javier
Touriño, Juan
Date
2023-05
Citation
Enes, J., Expósito, R. R., Fuentes, J., Cacheiro, J. L., & Touriño, J. (2023). A pipeline architecture for feature-based unsupervised clustering using multivariate time series from HPC jobs. Information Fusion, 93, 1-20. 10.1016/j.inffus.2022.12.017
Abstract
[Abstract]: Time series are key across industrial and research areas for their ability to model behaviour across time, making them ideal for a wide range of use cases such as event monitoring, trend prediction or anomaly detection. This is even more so due to the increasing monitoring capabilities in many areas, with the subsequent massive data generation. But it is also interesting to consider the potential of time series for Machine Learning processing, often fused with Big Data, to search for useful information and solve real-world problems. However, time series can be studied individually, representing a single entity or variable to be analysed, or in a grouped fashion, to study and represent a more complex entity or scenario. In this latter case we are dealing with multivariate time series, which usually imply different approaches when dealt with. In this paper, we present a pipeline architecture to process and cluster multiple groups of multivariate time series. To implement this, we apply a multi-process solution composed by a feature-based extraction stage, followed by a dimension reduction, and finally, several clustering algorithms. The pipeline is also highly configurable in terms of the stage techniques to be used, allowing to perform a search with several combinations for the most promising results. The pipeline has been experimentally applied to batches of HPC jobs from different users of a supercomputer, with the multivariate time series coming from the monitoring of several node resource metrics. The results show how it is possible to apply this multi-process information fusion to create different meaningful clusters from the batches, using only the time series, without any labelling information, thus being an unsupervised scenario. Optionally, the pipeline also supports an outlier detection stage to find and separate jobs that are radically different when compared to others on a dataset. These outliers can be removed for a better clustering, and later reviewed looking for anomalies, or if numerous, fed back to the pipeline to identify possible groupings. The results also include some outliers found in the experiments, as well as scenarios where they are clustered, or ignored and not removed at all. In addition, by leveraging Big Data technologies like Spark, the pipeline is proven to be scalable by working with up to hundreds of jobs and thousands of time series.
Keywords
Unsupervised clustering
Feature extraction
Multivariate time series
Anomaly detection
HPC jobs
 
Editor version
https://doi.org/10.1016/j.inffus.2022.12.017
Rights
Atribución-NoComercial-SinDerivadas 3.0 España
ISSN
1566-2535

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsResearch GroupAcademic DegreeThis CollectionBy Issue DateAuthorsTitlesSubjectsResearch GroupAcademic Degree

My Account

LoginRegister

Statistics

View Usage Statistics
Sherpa
OpenArchives
OAIster
Scholar Google
UNIVERSIDADE DA CORUÑA. Servizo de Biblioteca.    DSpace Software Copyright © 2002-2013 Duraspace - Send Feedback