Mostrar o rexistro simple do ítem

dc.contributor.authorLopez-Oriona, Ángel
dc.contributor.authorVilar, José
dc.contributor.authorD'Urso, Pierpaolo
dc.date.accessioned2023-04-10T12:55:51Z
dc.date.available2023-04-10T12:55:51Z
dc.date.issued2023
dc.identifier.citationÁ. López-Oriona, J. A. Vilar, & P. D'Urso, "Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences", Information Sciences, vol. 624, pp. 467-492, 2023. doi:10.1016/j.ins.2022.12.065es_ES
dc.identifier.urihttp://hdl.handle.net/2183/32840
dc.descriptionFinanciado para publicación en acceso aberto: Universidade da Coruña/CISUG.es_ES
dc.description.abstract[Abstract]: Two novel distances between categorical time series are introduced. Both of them measure discrepancies between extracted features describing the underlying serial dependence patterns. One distance is based on well-known association measures, namely Cramer's v and Cohen's κ. The other one relies on the so-called binarization of a categorical process, which indicates the presence of each category by means of a canonical vector. Binarization is used to construct a set of innovative association measures which allow to identify different types of serial dependence. The metrics are used to perform crisp and fuzzy clustering of nominal series. The proposed approaches are able to group together series generated from similar stochastic processes, achieve accurate results with series coming from a broad range of models and are computationally efficient. Extensive simulation studies show that both hard and soft clustering algorithms outperform several alternative procedures proposed in the literature. Two applications involving biological sequences from different species highlight the usefulness of the introduced techniques.es_ES
dc.description.sponsorshipXunta de Galicia; ED431G 2019/01es_ES
dc.description.sponsorshipXunta de Galicia; ED431C-2020-14es_ES
dc.description.sponsorshipThe research of Ángel López-Oriona and José A. Vilar has been supported by the Ministerio de Economía y Competitividad (MINECO) grants MTM2017-82724-R and PID2020-113578RB-100, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14), and the Centro de Investigación del Sistema Universitario de Galicia “CITIC” grant ED431G 2019/01; all of them through the European Regional Development Fund (ERDF). This work has received funding for open access charge by Universidade da Coruña/CISUG. The author Ángel López-Oriona is very grateful to researcher Maite Freire for her lessons about DNA theory.es_ES
dc.language.isoenges_ES
dc.publisherElsevieres_ES
dc.relationinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/MTM2017-82724-R/ES/INFERENCIA ESTADISTICA FLEXIBLE PARA DATOS COMPLEJOS DE GRAN VOLUMEN Y DE ALTA DIMENSIONes_ES
dc.relationinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113578RB-100/ES/METODOS ESTADISTICOS FLEXIBLES EN CIENCIA DE DATOS PARA DATOS COMPLEJOS Y DE GRAN VOLUMEN: TEORIA Y APLICACIONESes_ES
dc.relation.urihttps://doi.org/10.1016/j.ins.2022.12.065es_ES
dc.rightsAtribución 4.0 Internacional (CC BY 4.0)es_ES
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.subjectAssociation measureses_ES
dc.subjectBiological sequenceses_ES
dc.subjectCategorical time serieses_ES
dc.subjectFuzzy clusteringes_ES
dc.subjectHard clusteringes_ES
dc.titleHard and soft clustering of categorical time series based on two novel distances with an application to biological sequenceses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.rights.accessinfo:eu-repo/semantics/openAccesses_ES
UDC.journalTitleInformation Scienceses_ES
UDC.volume624es_ES
UDC.startPage467es_ES
UDC.endPage492es_ES
dc.identifier.doi10.1016/j.ins.2022.12.065


Ficheiros no ítem

Thumbnail
Thumbnail

Este ítem aparece na(s) seguinte(s) colección(s)

Mostrar o rexistro simple do ítem