This is the entrance page to the collection of Learner Corpora at the Institute for Applied Linguistics at Eurac Research. The Learner Corpus portal provides access to the transcribed and annotated learner texts of various learner corpora available at the institute for download and online search.
All learner corpora available at our institute originated from handwritten learner texts and vary with respect to age, L1 and L2 of the writers, text type, time of data collection, size of the corpus and annotations. Click on a name of a corpus in the list below to see more details about each learner corpus.
- KoKo Corpus: An L1 corpus with German argumentative essays written by students from German upper secondary schools in Germany, Austria and South Tyrol (12th grade).
- ITACA Corpus: An L1 corpus with Italian argumentative essays collected from Italian upper secondary schools in South Tyrol (12th grade).
- Kolipsi Corpus Family: A collection of Italian and German L2 learner texts from upper secondary schools (12th grade) in South Tyrol :
- Kolipsi-1 Corpus: German and Italian L2 corpus collected in 2007
- Kolipsi-1 (L1): reference corpus with texts from L1 writers
- Kolipsi-2 Corpus: German and Italian L2 corpus collected in 2017
- Kolipsi-Matura Corpus: official Matura exam texts in L1 from a subset of the participants in Kolipsi-1 collected in 2009
- LEONIDE: A longitudinal trilingual learner corpus (Italian, German, English) of young learners (lower secondary school)
- Merlin Corpus: A trilingual learner corpus (Italian, German, Czech) with texts produced by adults
- LEKO Corpus: A subset of the Italian texts from the Kolipsi 1 and Merlin corpora, containing lexical annotations for phraseological elements in the student texts.
The learner corpora are available for corpus queries via the ANNIS interface. The interface enables simple queries on single and multi-word expressions as well as more sophisticated queries considering annotations and metadata.
You can directly access all learner corpora on ANNIS. Each corpora provides a set of searchable annotations and metadata information that might be useful for your research.
All learner corpora are documented and stored on a repository and made available for download via the Eurac Research Clarin Centre.
On the repository, you will find all learner corpora available the Institute of Applied Linguistics listed above for download. In addition, the repository provides documentation files, metadata files, and annotation files.