The KoKo Corpus is an error-annotated learner corpus in German that consists of texts collected in public schools and produced mainly by L1 German speakers. It has been created with the aim to investigate and describe the writing skills of German-speaking secondary-school pupils at the end of their school career by analysing authentic texts produced in classrooms during the KoKo project. The essays were collected in South Tyrol (Italy), Tyrol (Austria) and Thuringia (Germany).
The KoKo Corpus consists of 1503 argumentative essays which contain manually performed transcription annotations and linguistic error annotations.
New corpus version available!
Version 4 of the KoKo Corpus has been released in 2024! The new version contains two subcorpora with grammatical and lexical annotations.
Corpus Information
size: | 950,000 tokens |
texts: | 1,503 (1,319 of L1 learner) |
writers: | 1,503 students from upper secondary schools, between 17-19 years old |
text type: | argumentative essay |
language: | German |
year of data collection: | 2011 |
reference paper: | Abel, A., Glaznieks, A., Nicolas, L. & Stemle, E.W. (2014): KoKo: an L1 Learner Corpus for German. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik 26-31 May 2014, pp. 2414-2421. (pdf) |
Documents
- Writing Task
- Transcription guidelines
- Description of the person-related metadata
- Description of the annotation tags used for grammatical errors
KoKo Version 3
- Description of the annotation tags used for orthographic errors
- Description of the annotation tags used for punctuation errors
KoKo Version 4
- Description of the annotation tags used for orthographic errors
- Description of the annotation tags used for punctuation errors
- Description of the tags used for lexical annotations
Corpus Access
The Corpus can be queried via the ANNIS interface or downloaded on the Eurac Research Clarin Repository.