The Kolipsi-2 Corpus is a written learner corpus of German and Italian L2 speakers originating from South Tyrol (Italy). It has been developed as a by-product of the KOLIPSI II project “South-Tyrolean pupils and the second language: a linguistic and socio-psychological investigation” (KOLIPSI II, a follow up study of the KOLIPSI project).
The aims of the project KOLIPSI II were twofold: Firstly, it aimed to analyse the second language competences of South-Tyrolean pupils and to contextualize the results of such investigation by commenting on crucial sociolinguistic and psychosocial aspects that influence it. Secondly, the results should be be compared with those collected in the previous Kolipsi project (school year 2007/2008) in order to measure eventual changes and developments.
The learner corpus consists of German and Italian L2 productions that have been assigned to the CEFR levels in a reliable way and contain also a variety of metadata such as school type, gender, origin, socioeconomic status and L1 of the learners.
The data collection took place during the school year 2014/2015 and is based on two standardized tests for written productions. The two tasks the students were given consisted in writing an e-mail to a friend on a certain event at the supermarket (narrative text genre) and in writing an e-mail on problems with chatting as a reaction on a letter to the editor of a youth magazine (argumentative/narrative text genre). The task on the supermarket was used for the previous project KOLIPSI in an identical manner.
The corpus contains manually performed transcription annotations. Transcription annotations reflect surface features of the text, such as the graphical arrangement, and include error annotation on the orthographic level. In addition to that, the corpus is automatically annotated, including tokenisation, sentence splitting, POS-tagging and lemmatization.
Corpus Information
size: | ca. 600,000 tokens |
texts: | 1,619 |
writers: | [number] students from upper secondary schools, between 16-18 years old |
text type: | picture description (narrative), opinion text (argumentative) |
languages: | Italian, German |
year of data collection: | 2014/15 |
reference paper: | in preperation |
Corpus Access
The Corpus will be made accessible in 2021.