The making of the Corpus of English Life Sciences Texts (CELiST), a bunch of disciplines

Bibliographic citation

Moskowich, Isabel. 2021. “The making of the Corpus of English Life Sciences Texts (CELiST), a bunch of disciplines”. In Moskowich, Isabel; Lareo, Inés and Camiña Rioboó, Gonzalo (eds.), "All families and genera": Exploring the Corpus of English Life Sciences Texts. Amsterdam: John Benjamins. 2–19

Type of academic work

Academic degree

Abstract

[Abstract] Contrary to what happens with huge corpora automatically taken from the Internet by crawlers, the compilation of a smaller specialised corpus is a time-consuming, carefully planned task that must follow a protocol. The different subcorpora in the Coruña Corpus family have been built in a similar way and attending to what Kennedy (1998: 70-85) mentions as the five steps in corpus creation: design, planning a storage system and keeping records, obtaining permissions, text capture, and encoding. In the case of CELiST, the Corpus of English Life Sciences Texts, design has been certainly difficult. This chapter will explore and explain the reasons behind text selection for this particular corpus and will also address the decisions that had to be made regarding disciplines. As we intended to compile a corpus of texts dealing with biology, we found that the field, as such, did not exist in the eighteenth and nineteenth centuries, thus leading compilers to look for extracts and works in different sources and to extend our original selection to many more disciplines in the UNESCO classification of the fields of Science and Technology (1988). Therefore, the sampling frame was determined, first and foremost, by the field in question. Consequently, we had to move to something different and more inclusive as we learned more about the taxonomies of scientific fields across history. The chapter will provide the final portrait of CELiST in iys making

Description

Rights