The LEKO corpora LEKO_Kolipsi and LEKO_Merlin provide lexical annotations for phraseological elements in Italian L2 writing on the basis of a subcorpus of the Kolipsi-1 and the Merlin corpus respectively. The annotations were jointly created by the University of Innsbruck (Austria) and Eurac Research Bolzano (Italy) within the project LEKO, whose aim was to describe the use of phrasemes in these texts.
In addition to the annotations available in the base corpora KOLIPSI-1 and MERLIN, the LEKO corpora contain manual annotations for phraseme category, lexical errors, morpho-syntactic features and error explanations.
- LEKO_Kolipsi contains about 55 000 tokens in 282 texts from 141 pupils of the final year of upper secondary school, representing two different text types (email and letter, narrative and argumentative genre) as described in the Kolipsi-1 documentation.
- LEKO_Merlin contains about 9 000 tokens in 50 texts from 50 examinees, who took part in an official language test (TELC) for Italian.
Annotation guidelines for the lexical annotations can be found here. For information on other annotations and available metadata please refer to the documentation of the base corpora Kolipsi-1 and Merlin.
|size:||~55 000 tokens||~9 000 tokens|
|writers:||141 pupils||50 adults|
|text type:||opinion text (e-mail), picture story (letter)||various (informal and formal email/letter for different purposes, opinion text on different topics), based on standardised language tests|
|year of data collection:||2007||2012|
Annotation guidelines for lexical annotations in the LEKO corpora
Konecny, C., Abel, A., Autelli, E., & Zanasi, L. (2016). Identification and Classification of Phrasemes in an L2 Learner Corpus of Italian. In Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives = Fraseología computacional y basada en corpus: perspectivas monolingües y multilingües (pp. 533–542). Tradulex. Available online: http://hdl.handle.net/10863/7683
The corpora will be made accessible in 2021.