Design Principles and Data Collection for CELEN: A Corpus of Learner Spanish in Japan

UDC.coleccionInvestigación
UDC.conferenceTitleAsia Pacific Corpus Linguistics Conference (4º, 2018. Takamatsu)
UDC.departamentoLetras
UDC.endPage491
UDC.grupoInvHISPANIA (Grupo de Investigación en Lingua, Literatura e Cultura Hispánica)
UDC.startPage485
dc.contributor.authorValverde, Pilar
dc.date.accessioned2026-03-05T07:38:36Z
dc.date.available2026-03-05T07:38:36Z
dc.date.issued2018
dc.description.abstract[Abstract] This paper describes the first steps in the creation of a new resource for Spanish, CELEN, a written corpus of Spanish as a Foreign Language in Japan. First, we introduce the situation of Spanish in higher education in Japan and the design principles of the corpus. Second, we describe the workflow, consisting of collection of background and permit forms, collection of texts, transcription, and task metadata registration. Third, we present some results about the data collected at Kansai Gaidai University during the first semester of the current academic year (from April to August 2018): the resulting corpus contains 963 texts from 449 learners totalling 74,631 words, which represent three of the six CEFR levels: A1, A2 and B1. The work on CELEN is still ongoing with texts of the second semester as well as texts from other institutions waiting to be added to the corpus. In the future, we plan to annotate the corpus with morpho-syntactic information and make it accessible to the research community under a CC-BY-NC license, so that one can not only see the data but also manipulate it and further annotate it.
dc.identifier.citationValverde, Pilar (2018). Design Principles and Data Collection for CELEN: A Corpus of Learner Spanish in Japan. En Tono, Y & Isahara, H. (Eds.), Proceedings of the 4th Asia Pacific Corpus Linguistics Conference (APCLC2018), pp. 485-491.
dc.identifier.urihttps://hdl.handle.net/2183/47585
dc.language.isoeng
dc.publisherThe Asia Pacific Corpus Linguistics Association
dc.relation.projectIDJapan. Society for the Promotion of Science/kakenhi (17H07270)/Grantin-Aid for Scientific Research (Start-up)
dc.rights.accessRightsopen access
dc.subjectLearner corpora
dc.subjectLearner Spanish
dc.subjectSpanish as a Foreign Language
dc.subjectCorpus de aprendices
dc.subjectEspañol como lengua extranjera
dc.subjectAprendices de español
dc.titleDesign Principles and Data Collection for CELEN: A Corpus of Learner Spanish in Japan
dc.typeconference output
dspace.entity.typePublication

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Valverde_Pilar_2018_Design_principles_and_data_collection_for_CELEN.pdf
Size:
231.47 KB
Format:
Adobe Portable Document Format