The LEKO Corpora – PORTA Eurac Research Learner Corpus Portal

The LEKO corpora LEKO_Kolipsi and LEKO_Merlin provide lexical annotations for phraseological elements in Italian L2 writing on the basis of a subset of the texts of the Kolipsi-1 corpus and the Merlin corpus respectively. The annotations were jointly created by the University of Innsbruck (Austria) and Eurac Research Bolzano (Italy) within the project LEKO, whose aim was to describe the use of phrasemes in these texts. There are manual annotations for phraseme category, lexical errors, morpho-syntactic features and error explanations.

LEKO_Kolipsi contains about 55 000 tokens in 282 texts from 141 pupils of the final year of upper secondary school, representing two different text types (email and letter, narrative and argumentative genre) as described in the Kolipsi-1 documentation.

LEKO_Merlin contains about 9 000 tokens in 50 texts from 50 examinees, who took part in an official language test (TELC) for Italian.

The documents have been transcribed according to the Kolipsi-1 and Merlin Transcription guidelines. Annotation guidelines for the lexical annotations can be found here.

Note: The LEKO corpora do not contain manual annotations for non-lexical errors, foreign word insertions, target language transcriptions, ambiguous writings or other annotations available in the base corpora Kolipsi-1 and Merlin. In order to retrieve any of those annotations and/or full target versions of the student writings please consult the base corpora directly.

Corpus Information

	LEKO_Kolipsi	LEKO_Merlin
size:	~55 000 tokens	~9 000 tokens
texts:	282	50
writers:	141 pupils	50 adults
text type:	opinion text (e-mail), picture story (letter)	various (informal and formal email/letter for different purposes, opinion text on different topics), based on standardised language tests
languages:	IT	IT
year of data collection:	2007	2012

Corpus Access

The Corpus can be queried via the ANNIS interface or downloaded from the Eurac Research Clarin Repository.

corpus query

Corpus Download

Reference Paper

Konecny, C., Abel, A., Autelli, E., & Zanasi, L. (2016). Identification and Classification of Phrasemes in an L2 Learner Corpus of Italian. In Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives = Fraseología computacional y basada en corpus: perspectivas monolingües y multilingües (pp. 533–542). Tradulex. Available online: http://hdl.handle.net/10863/7683

Documentation

Annotation guidelines for lexical annotations in the LEKO corpora

Related Publications

Schmiderer K., Zanasi L., Konecny C., Autelli E. (2020). Sviluppare la competenza lessicale e fraseologica tramite i task. Un contributo allo sviluppo di materiale didattico per l’italiano l2. In: Italiano LinguaDue, 12(2). 238-256. Online at: https://riviste.unimi.it/index.php/promoitals/article/view/15069/13950

Schmiderer K., Zanasi L., Konecny C., Autelli E. (2021). Facciamo bella figura! 8 task fraseodidattici per studenti di italiano L2/LS: con una prefazione e con la consulenza scientifica di Barbara Hinger. Innsbruck: innsbruck university press (iup). Online at: https://www.uibk.ac.at/iup/buecher/9783991060451.html

If you have used the LeKo Corpus in your work and want to list your publications here, please email porta@eurac.edu!

Corpus Query

Leko Corpus Download