Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences

Lopez-Oriona, Ángel; Vilar, José; D'Urso, Pierpaolo

dc.contributor.author	Lopez-Oriona, Ángel
dc.contributor.author	Vilar, José
dc.contributor.author	D'Urso, Pierpaolo
dc.date.accessioned	2023-04-10T12:55:51Z
dc.date.available	2023-04-10T12:55:51Z
dc.date.issued	2023
dc.identifier.citation	Á. López-Oriona, J. A. Vilar, & P. D'Urso, "Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences", Information Sciences, vol. 624, pp. 467-492, 2023. doi:10.1016/j.ins.2022.12.065	es_ES
dc.identifier.uri	http://hdl.handle.net/2183/32840
dc.description	Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG.	es_ES
dc.description.abstract	[Abstract]: Two novel distances between categorical time series are introduced. Both of them measure discrepancies between extracted features describing the underlying serial dependence patterns. One distance is based on well-known association measures, namely Cramer's v and Cohen's κ. The other one relies on the so-called binarization of a categorical process, which indicates the presence of each category by means of a canonical vector. Binarization is used to construct a set of innovative association measures which allow to identify different types of serial dependence. The metrics are used to perform crisp and fuzzy clustering of nominal series. The proposed approaches are able to group together series generated from similar stochastic processes, achieve accurate results with series coming from a broad range of models and are computationally efficient. Extensive simulation studies show that both hard and soft clustering algorithms outperform several alternative procedures proposed in the literature. Two applications involving biological sequences from different species highlight the usefulness of the introduced techniques.	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431G 2019/01	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431C-2020-14	es_ES
dc.description.sponsorship	The research of Ángel López-Oriona and José A. Vilar has been supported by the Ministerio de Economía y Competitividad (MINECO) grants MTM2017-82724-R and PID2020-113578RB-100, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14), and the Centro de Investigación del Sistema Universitario de Galicia “CITIC” grant ED431G 2019/01; all of them through the European Regional Development Fund (ERDF). This work has received funding for open access charge by Universidade da Coruña/CISUG. The author Ángel López-Oriona is very grateful to researcher Maite Freire for her lessons about DNA theory.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.relation	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/MTM2017-82724-R/ES/INFERENCIA ESTADISTICA FLEXIBLE PARA DATOS COMPLEJOS DE GRAN VOLUMEN Y DE ALTA DIMENSION	es_ES
dc.relation	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113578RB-100/ES/METODOS ESTADISTICOS FLEXIBLES EN CIENCIA DE DATOS PARA DATOS COMPLEJOS Y DE GRAN VOLUMEN: TEORIA Y APLICACIONES	es_ES
dc.relation.uri	https://doi.org/10.1016/j.ins.2022.12.065	es_ES
dc.rights	Atribución 4.0 Internacional (CC BY 4.0)	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/es/	*
dc.subject	Association measures	es_ES
dc.subject	Biological sequences	es_ES
dc.subject	Categorical time series	es_ES
dc.subject	Fuzzy clustering	es_ES
dc.subject	Hard clustering	es_ES
dc.title	Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.access	info:eu-repo/semantics/openAccess	es_ES
UDC.journalTitle	Information Sciences	es_ES
UDC.volume	624	es_ES
UDC.startPage	467	es_ES
UDC.endPage	492	es_ES
dc.identifier.doi	10.1016/j.ins.2022.12.065

Ficheiros no ítem

Nome:: license_rdf
Tamaño:: 1.337Kb
Formato:: application/rdf+xml

Ver/abrir

Nome:: LopezOriona_Angel_2023_Hard_an ...
Tamaño:: 1.648Mb
Formato:: PDF

Ver/abrir

Este ítem aparece na(s) seguinte(s) colección(s)

GI-MODES - Artigos [143]

Mostrar o rexistro simple do ítem