Welcome to the Learner Corpus Portal of Eurac Research!

This is the entrance page to the collection of Learner Corpora at the Institute for Applied Linguistics at Eurac Research. The Learner Corpus portal provides access to the transcribed and annotated learner texts of various learner corpora available at the institute for download and online search.

Learner Corpora

All learner corpora available at our institute originated from handwritten learner texts and vary with respect to age, L1 and L2 of the writers, text type, time of data collection, size of the corpus and annotations. Click on a name of a corpus in the list below to see more details about each learner corpus.

  • KoKo Corpus: An L1 learner corpus with essays of German speakers from upper secondary schools
  • Kolipsi Corpus Family: A collection of Italian and German L2 learner texts from upper secondary schools:
    • Kolipsi-1 Corpus: collected in 2007 with additional L1 texts
    • Kolipsi-2 Corpus: collected in 2017
    • Kolipsi-Matura Corpus: collected in 2009
  • LEKO Corpus: A subset of the Italian texts from the Kolipsi 1 and Merlin corpora, containing lexical annotations for phraseological elements in the student texts.
  • LEONIDE: A longitudinal trilingual learner corpus (Italian, German, English) of young learners (lower secondary school)
  • Merlin Corpus: A trilingual learner corpus (Italian, German, Czech) with texts produced by adults

Corpus Query

The learner corpora are available for corpus queries via the ANNIS interface. The interface enables simple queries on single and multi-word expressions as well as more sophisticated queries considering annotations and metadata.

You can directly access all learner corpora on ANNIS. Each corpora provides a set of searchable annotations and metadata information that might be useful for your research.

Corpus Download

All learner corpora are documented and stored on a repository and made available for download via the Eurac Research Clarin Centre.

On the repository, you will find all learner corpora available the Institute of Applied Linguistics listed above for download. In addition, the repository provides documentation files, metadata files, and annotation files.