EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis

Bibliographic citation

David Vilares, Miguel A. Alonso, and Carlos Gómez-Rodríguez. 2016. EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4149–4153, Portorož, Slovenia. European Language Resources Association (ELRA).

Type of academic work

Academic degree

Abstract

[Abstract]: Code-switching texts are those that contain terms in two or more different languages, and they appear increasingly often in social media. The aim of this paper is to provide a resource to the research community to evaluate the performance of sentiment classification techniques on this complex multilingual environment, proposing an English-Spanish corpus of tweets with code-switching (EN-ES-CS CORPUS). The tweets are labeled according to two well-known criteria used for this purpose: SentiStrength and a trinary scale (positive, neutral and negative categories). Preliminary work on the resource is already done, providing a set of baselines for the research community.

Description

Rights

Atribución-NoComercial 4.0 Internacional
Atribución-NoComercial 4.0 Internacional

Except where otherwise noted, this item's license is described as Atribución-NoComercial 4.0 Internacional