On the performance of phonetic algorithms in microtext normalization

Doval, Yerai; Vilares Ferro, Manuel; Vilares, Jesús

dc.contributor.author	Doval, Yerai
dc.contributor.author	Vilares Ferro, Manuel
dc.contributor.author	Vilares, Jesús
dc.date.accessioned	2024-01-17T15:21:37Z
dc.date.available	2024-01-17T15:21:37Z
dc.date.issued	2018-12-15
dc.identifier.citation	Doval, Y., Vilares, M. and Vilares, J. (2018) ‘On the performance of phonetic algorithms in microtext normalization’, Expert Systems with Applications, 113, pp. 213–222. doi:10.1016/j.eswa.2018.07.016.	es_ES
dc.identifier.issn	0957-4174
dc.identifier.uri	http://hdl.handle.net/2183/34951
dc.description	© 2018. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/. This version of the article: Doval, Y., Vilares, M. and Vilares, J. (2018) ‘On the performance of phonetic algorithms in microtext normalization’ has been accepted for publication in: Expert Systems with Applications, 113, pp. 213–222. The Version of Record is available online at: https://doi.org/10.1016/j.eswa.2018.07.016	es_ES
dc.description.abstract	[Abstract]: User–generated content published on microblogging social networks constitutes a priceless source of information. However, microtexts usually deviate from the standard lexical and grammatical rules of the language, thus making its processing by traditional intelligent systems very difficult. As an answer, microtext normalization consists in transforming those non–standard microtexts into standard well–written texts as a preprocessing step, allowing traditional approaches to continue with their usual processing. Given the importance of phonetic phenomena in non–standard text formation, an essential element of the knowledge base of a normalizer would be the phonetic rules that encode these phenomena, which can be found in the so–called phonetic algorithms. In this work we experiment with a wide range of phonetic algorithms for the English language. The aim of this study is to determine the best phonetic algorithms within the context of candidate generation for microtext normalization. In other words, we intend to find those algorithms that taking as input non–standard terms to be normalized allow us to obtain as output the smallest possible sets of normalization candidates which still contain the corresponding target standard words. As it will be stated, the choice of the phonetic algorithm will depend heavily on the capabilities of the candidate selection mechanism which we usually find at the end of a microtext normalization pipeline. The faster it can make the right choices among big enough sets of candidates, the more we can sacrifice on the precision of the phonetic algorithms in favour of coverage in order to increase the overall performance of the normalization system.	es_ES
dc.description.sponsorship	This research has been partially funded by the Spanish Ministry of Economy, Industry and Competitiveness (MINECO) through projects TIN2017-85160-C2-1-R, TIN2017-85160-C2-2-R, FFI2014-51978-C2-1-R and FFI2014-51978-C2-2-R, and by the Autonomous Government of Galicia through projects ED431D-2017/12, ED431B-2017/01 and ED431D R2016/046. Moreover, Yerai Doval is funded by the Spanish State Secretariat for Research, Development and Innovation (which belongs to MINECO) and by the European Social Fund (ESF) under a FPI fellowship (BES-2015-073768) associated to project FFI2014-51978-C2-1-R.	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431D-2017/12	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431B-2017/01	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431D R2016/046	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.relation	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-85160-C2-1-R/ES/AVANCES EN NUEVOS SISTEMAS DE EXTRACCION DE RESPUESTAS CON ANALISIS SEMANTICO Y APRENDIZAJE PROFUNDO/	es_ES
dc.relation	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-85160-C2-2-R/ES/AVANCES EN NUEVOS SISTEMAS DE EXTRACCION DE RESPUESTAS CON ANALISIS SEMANTICO Y APRENDIZAJE PROFUNDO/	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/FFI2014-51978-C2-1-R/ES/TECNOLOGIAS DE LA LENGUA PARA ANALISIS DE OPINIONES EN REDES SOCIALES	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/FFI2014-51978-C2-2-R/ES/TECNOLOGIAS DE LA LENGUA PARA ANALISIS DE OPINIONES EN REDES SOCIALES: DEL TEXTO AL MICROTEXTO	es_ES
dc.relation	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/BES-2015-073768/ES/	es_ES
dc.relation.isversionof	10.1016/j.eswa.2018.07.016
dc.relation.uri	https://doi.org/10.1016/j.eswa.2018.07.016	es_ES
dc.rights	Atribución-NoComercial-SinDerivadas 4.0 Internacional	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.subject	Microtext normalization	es_ES
dc.subject	Phonetic algorithm	es_ES
dc.subject	Fuzzy matching	es_ES
dc.subject	Twitter	es_ES
dc.subject	Texting	es_ES
dc.title	On the performance of phonetic algorithms in microtext normalization	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.access	info:eu-repo/semantics/openAccess	es_ES
UDC.journalTitle	Expert Systems with Applications	es_ES
UDC.volume	113	es_ES
UDC.startPage	213	es_ES
UDC.endPage	222	es_ES
dc.identifier.doi	10.1016/j.eswa.2018.07.016

Ficheiros no ítem

Nome:: license_rdf
Tamaño:: 1.203Kb
Formato:: application/rdf+xml

Ver/abrir

Nome:: Doval_Yerai_2018_On_the_perfor ...
Tamaño:: 453.4Kb
Formato:: PDF

Ver/abrir

Este ítem aparece na(s) seguinte(s) colección(s)

GI-LYS - Artigos [51]

Mostrar o rexistro simple do ítem