A linguistic approach for determining the topics of Spanish Twitter messages

Vilares, David; Alonso, Miguel A.; Gómez-Rodríguez, Carlos

dc.contributor.author	Vilares, David
dc.contributor.author	Alonso, Miguel A.
dc.contributor.author	Gómez-Rodríguez, Carlos
dc.date.accessioned	2024-01-18T15:53:17Z
dc.date.available	2024-01-18T15:53:17Z
dc.date.issued	2015
dc.identifier.citation	Vilares, D., Alonso, M. A., & Gómez-Rodríguez, C. (2015). A linguistic approach for determining the topics of Spanish Twitter messages. Journal of Information Science, 41(2), 127-145. https://doi.org/10.1177/0165551514561652	es_ES
dc.identifier.issn	0165-5515
dc.identifier.issn	1741-6485
dc.identifier.uri	http://hdl.handle.net/2183/34987
dc.description	This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/. This version of the article: Vilares, D., Alonso, M. A., & Gómez-Rodríguez, C. (2015). ‘A linguistic approach for determining the topics of Spanish Twitter messages’ has been accepted for publication in Journal of Information Science, 41(2), 127-145. Copyright © 2014 The Authors. DOI: https://doi.org/10.1177/0165551514561652.	es_ES
dc.description.abstract	[Abstract]: The vast number of opinions and reviews provided in Twitter is helpful in order to make interesting findings about a given industry, but given the huge number of messages published every day, it is important to detect the relevant ones. In this respect, the Twitter search functionality is not a practical tool when we want to poll messages dealing with a given set of general topics. This article presents an approach to classify Twitter messages into various topics. We tackle the problem from a linguistic angle, taking into account part-of-speech, syntactic and semantic information, showing how language processing techniques should be adapted to deal with the informal language present in Twitter messages. The TASS 2013 General corpus, a collection of tweets that has been specifically annotated to perform text analytics tasks, is used as the dataset in our evaluation framework. We carry out a wide range of experiments to determine which kinds of linguistic information have the greatest impact on this task and how they should be combined in order to obtain the best-performing system. The results lead us to conclude that relating features by means of contextual information adds complementary knowledge over pure lexical models, making it possible to outperform them on standard metrics for multilabel classification tasks.	es_ES
dc.description.sponsorship	The research reported in this article was partially funded by Ministerio de Economía y Competitividad and FEDER (grant TIN2010-18552-C03-02), Ministerio de Educación, Cultura y Deporte (FPU13/01180) and by Xunta de Galicia (Grants CN2012/008, CN2012/319).	es_ES
dc.description.sponsorship	Xunta de Galicia; CN2012/008	es_ES
dc.description.sponsorship	Xunta de Galicia; CN2012/319	es_ES
dc.language.iso	eng	es_ES
dc.publisher	SAGE Publications & CILIP	es_ES
dc.relation	info:eu-repo/grantAgreement/MICINN/Plan Nacional de I+D+i 2008-2011/TIN2010-18552-C03-02/ES/ANALISIS DE TEXTOS Y RECUPERACION DE INFORMACION PARA LA MINERIA DE OPINIONES: ANALISIS DE ENUNCIADOS Y EXTRACCION DE RELACIONES	es_ES
dc.relation	info:eu-repo/grantAgreement/MECD/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/FPU13%2F01180/ES/	es_ES
dc.relation.isversionof	https://doi.org/10.1177/0165551514561652
dc.relation.uri	https://doi.org/10.1177/0165551514561652	es_ES
dc.rights	Atribución-NoComercial-SinDerivadas 4.0 Internacional	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.subject	Twitter	es_ES
dc.subject	Natural language processing	es_ES
dc.subject	Multi-label topic classification	es_ES
dc.title	A linguistic approach for determining the topics of Spanish Twitter messages	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.access	info:eu-repo/semantics/openAccess	es_ES
UDC.journalTitle	Journal of Information Science	es_ES
UDC.volume	41	es_ES
UDC.issue	2	es_ES
UDC.startPage	127	es_ES
UDC.endPage	145	es_ES
dc.identifier.doi	10.1177/0165551514561652

Ficheiros no ítem

Nome:: license_rdf
Tamaño:: 1.203Kb
Formato:: application/rdf+xml

Ver/abrir

Nome:: Vilares_David_2015_A_linguisti ...
Tamaño:: 660.3Kb
Formato:: PDF

Ver/abrir

Este ítem aparece na(s) seguinte(s) colección(s)

GI-LYS - Artigos [51]

Mostrar o rexistro simple do ítem