Use this link to cite:
https://hdl.handle.net/2183/47585 Design Principles and Data Collection for CELEN: A Corpus of Learner Spanish in Japan
Loading...
Identifiers
Publication date
Authors
Valverde, Pilar
Advisors
Other responsabilities
Journal Title
Bibliographic citation
Valverde, Pilar (2018). Design Principles and Data Collection for CELEN: A Corpus of Learner Spanish in Japan. En Tono, Y & Isahara, H. (Eds.), Proceedings of the 4th Asia Pacific Corpus Linguistics Conference (APCLC2018), pp. 485-491.
Type of academic work
Academic degree
Abstract
[Abstract] This paper describes the first steps in the creation of a new resource for Spanish, CELEN, a written corpus of Spanish as a Foreign Language in Japan. First, we introduce the situation of Spanish in higher education in Japan and the design principles of the corpus. Second, we describe the workflow, consisting of collection of background and permit forms, collection of texts, transcription, and task metadata registration. Third, we present some results about the data collected at Kansai Gaidai University during the first semester of the current academic year (from April to August 2018): the resulting corpus contains 963 texts from 449 learners totalling 74,631 words, which represent three of the six CEFR levels: A1, A2 and B1. The work on CELEN is still ongoing with texts of the second semester as well as texts from other institutions waiting to be added to the corpus. In the future, we plan to annotate the corpus with morpho-syntactic information and make it accessible to the research community under a CC-BY-NC license, so that one can not only see the data but also manipulate it and further annotate it.

