Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences
Use este enlace para citar
http://hdl.handle.net/2183/32840
A non ser que se indique outra cousa, a licenza do ítem descríbese como Atribución 4.0 Internacional (CC BY 4.0)
Coleccións
- GI-MODES - Artigos [143]
Metadatos
Mostrar o rexistro completo do ítemTítulo
Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequencesData
2023Cita bibliográfica
Á. López-Oriona, J. A. Vilar, & P. D'Urso, "Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences", Information Sciences, vol. 624, pp. 467-492, 2023. doi:10.1016/j.ins.2022.12.065
Resumo
[Abstract]: Two novel distances between categorical time series are introduced. Both of them measure discrepancies between extracted features describing the underlying serial dependence patterns. One distance is based on well-known association measures, namely Cramer's v and Cohen's κ. The other one relies on the so-called binarization of a categorical process, which indicates the presence of each category by means of a canonical vector. Binarization is used to construct a set of innovative association measures which allow to identify different types of serial dependence. The metrics are used to perform crisp and fuzzy clustering of nominal series. The proposed approaches are able to group together series generated from similar stochastic processes, achieve accurate results with series coming from a broad range of models and are computationally efficient. Extensive simulation studies show that both hard and soft clustering algorithms outperform several alternative procedures proposed in the literature. Two applications involving biological sequences from different species highlight the usefulness of the introduced techniques.
Palabras chave
Association measures
Biological sequences
Categorical time series
Fuzzy clustering
Hard clustering
Biological sequences
Categorical time series
Fuzzy clustering
Hard clustering
Descrición
Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG.
Versión do editor
Dereitos
Atribución 4.0 Internacional (CC BY 4.0)