The KoKo Corpus

The KoKo Corpus is an error-annotated learner corpus in German that consists of texts collected in public schools and produced mainly by L1 German speakers. It has been created with the aim to investigate and describe the writing skills of German-speaking secondary-school pupils at the end of their school career by analysing authentic texts produced in classrooms during the KoKo project. The essays were collected in South Tyrol (Italy), Tyrol (Austria) and Thuringia (Germany).

The KoKo Corpus consists of 1503 argumentative essays which contain manually performed transcription annotations and linguistic error annotations.

New corpus version available!
Version 4 of the KoKo Corpus has been released in 2024! The new version contains two subcorpora with grammatical and lexical annotations.

Corpus Information

size:950,000 tokens
texts:1,503 (1,319 of L1 learner)
writers:1,503 students from upper secondary schools, between 17-19 years old
text type:argumentative essay
language:German
year of data collection:2011
reference paper:Abel, A., Glaznieks, A., Nicolas, L. & Stemle, E.W. (2014): KoKo: an L1 Learner Corpus for German. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik 26-31 May 2014, pp. 2414-2421. (pdf)

Documents

KoKo Version 3
KoKo Version 4

Corpus Access

The Corpus can be queried via the ANNIS interface or downloaded on the Eurac Research Clarin Repository.

Corpus Download (Version 3)
Corpus Download (Version 4 – under construction)