Design Principles and Data Collection for CELEN: A Corpus of Learner Spanish in Japan

Valverde, Pilar

Use this link to cite:

https://hdl.handle.net/2183/47585

Design Principles and Data Collection for CELEN: A Corpus of Learner Spanish in Japan

Files

Valverde_Pilar_2018_Design_principles_and_data_collection_for_CELEN.pdf (231.47 KB)

Identifiers

URI: https://hdl.handle.net/2183/47585

Publication date

2018

Authors

Valverde, Pilar

Bibliographic citation

Valverde, Pilar (2018). Design Principles and Data Collection for CELEN: A Corpus of Learner Spanish in Japan. En Tono, Y & Isahara, H. (Eds.), Proceedings of the 4th Asia Pacific Corpus Linguistics Conference (APCLC2018), pp. 485-491.

Abstract

[Abstract] This paper describes the first steps in the creation of a new resource for Spanish, CELEN, a written corpus of Spanish as a Foreign Language in Japan. First, we introduce the situation of Spanish in higher education in Japan and the design principles of the corpus. Second, we describe the workflow, consisting of collection of background and permit forms, collection of texts, transcription, and task metadata registration. Third, we present some results about the data collected at Kansai Gaidai University during the first semester of the current academic year (from April to August 2018): the resulting corpus contains 963 texts from 449 learners totalling 74,631 words, which represent three of the six CEFR levels: A1, A2 and B1. The work on CELEN is still ongoing with texts of the second semester as well as texts from other institutions waiting to be added to the corpus. In the future, we plan to annotate the corpus with morpho-syntactic information and make it accessible to the research community under a CC-BY-NC license, so that one can not only see the data but also manipulate it and further annotate it.

Keywords

Learner corpora Learner Spanish Spanish as a Foreign Language Corpus de aprendices Español como lengua extranjera Aprendices de español

Collections

Investigación (FFIL)

Full item page

Design Principles and Data Collection for CELEN: A Corpus of Learner Spanish in Japan

Files

Identifiers

Publication date

Authors

Advisors

Other responsabilities

Journal Title

Bibliographic citation

Type of academic work

Academic degree

Abstract

Description

Keywords

Editor version

Rights

Collections