Design Principles and Data Collection for CELEN: A Corpus of Learner Spanish in Japan
| UDC.coleccion | Investigación | |
| UDC.conferenceTitle | Asia Pacific Corpus Linguistics Conference (4º, 2018. Takamatsu) | |
| UDC.departamento | Letras | |
| UDC.endPage | 491 | |
| UDC.grupoInv | HISPANIA (Grupo de Investigación en Lingua, Literatura e Cultura Hispánica) | |
| UDC.startPage | 485 | |
| dc.contributor.author | Valverde, Pilar | |
| dc.date.accessioned | 2026-03-05T07:38:36Z | |
| dc.date.available | 2026-03-05T07:38:36Z | |
| dc.date.issued | 2018 | |
| dc.description.abstract | [Abstract] This paper describes the first steps in the creation of a new resource for Spanish, CELEN, a written corpus of Spanish as a Foreign Language in Japan. First, we introduce the situation of Spanish in higher education in Japan and the design principles of the corpus. Second, we describe the workflow, consisting of collection of background and permit forms, collection of texts, transcription, and task metadata registration. Third, we present some results about the data collected at Kansai Gaidai University during the first semester of the current academic year (from April to August 2018): the resulting corpus contains 963 texts from 449 learners totalling 74,631 words, which represent three of the six CEFR levels: A1, A2 and B1. The work on CELEN is still ongoing with texts of the second semester as well as texts from other institutions waiting to be added to the corpus. In the future, we plan to annotate the corpus with morpho-syntactic information and make it accessible to the research community under a CC-BY-NC license, so that one can not only see the data but also manipulate it and further annotate it. | |
| dc.identifier.citation | Valverde, Pilar (2018). Design Principles and Data Collection for CELEN: A Corpus of Learner Spanish in Japan. En Tono, Y & Isahara, H. (Eds.), Proceedings of the 4th Asia Pacific Corpus Linguistics Conference (APCLC2018), pp. 485-491. | |
| dc.identifier.uri | https://hdl.handle.net/2183/47585 | |
| dc.language.iso | eng | |
| dc.publisher | The Asia Pacific Corpus Linguistics Association | |
| dc.relation.projectID | Japan. Society for the Promotion of Science/kakenhi (17H07270)/Grantin-Aid for Scientific Research (Start-up) | |
| dc.rights.accessRights | open access | |
| dc.subject | Learner corpora | |
| dc.subject | Learner Spanish | |
| dc.subject | Spanish as a Foreign Language | |
| dc.subject | Corpus de aprendices | |
| dc.subject | Español como lengua extranjera | |
| dc.subject | Aprendices de español | |
| dc.title | Design Principles and Data Collection for CELEN: A Corpus of Learner Spanish in Japan | |
| dc.type | conference output | |
| dspace.entity.type | Publication |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Valverde_Pilar_2018_Design_principles_and_data_collection_for_CELEN.pdf
- Size:
- 231.47 KB
- Format:
- Adobe Portable Document Format

