The KoKo Corpus is an error-annotated learner corpus in German that consists of texts collected in public schools and produced mainly by L1 German speakers. It has been created with the aim to investigate and describe the writing skills of German-speaking secondary-school pupils at the end of their school career by analysing authentic texts produced in classrooms during the KoKo project. The essays were collected in South Tyrol (Italy), Tyrol (Austria) and Thuringia (Germany).
The KoKo Corpus consists of 1503 argumentative essays which contain manually performed transcription annotations and linguistic error annotations.
Corpus Information
size: | 950,000 tokens |
texts: | 1,503 (1,319 of L1 learner) |
writers: | 1,503 students from upper secondary schools, between 17-19 years old |
text type: | argumentative essay |
language: | German |
year of data collection: | 2011 |
reference paper: | Abel, A., Glaznieks, A., Nicolas, L. & Stemle, E.W. (2014): KoKo: an L1 Learner Corpus for German. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik 26-31 May 2014, pp. 2414-2421. (pdf) |
Documents
- Description of the annotation tags used for orthographic errors
- Description of the annotation tags used for punctuaton errors
- Description of the annotation tags used for grammatical errors
- Description of the person-related metadata
- Transcription guidelines
- Writing Task
Corpus Access
The Corpus can be queried via the ANNIS interface or downloaded on the Eurac Research Clarin Repository.