This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

Multilingual corpus of literal occurrences of multiword expressions

Please use the following text to cite this item or export to a predefined format:
Savary, Agata; et al., 2019, Multilingual corpus of literal occurrences of multiword expressions, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11372/LRT-2966.
Date issued
2019-04-01
Size
26754 sentences
Description
The corpus contains sentences with idiomatic, literal and coincidental occurrences of verbal multiword expressions (VMWEs) in Basque, German, Greek, Polish and Portuguese. The source corpus is the PARSEME multilingual corpus of VMWEs v 1.1 (cf. http://hdl.handle.net/11372/LRT-2842). The sentences with VMWEs were extracted from the source corpus and potential co-occurrences of the same lexemes were automatically extracted from the same corpus. These candidates were then manually annotated by native experts into 6 classes, including literal and coincidental occurrences, as well as various annotation errors. The construction of the corpus is described by the following publication: Agata Savary, Silvio Ricardo Cordeiro, Timm Lichte, Carlos Ramisch, Uxoa Iñurrieta, Voula Giouli (forthcoming) "Literal occurrences of multiword expressions: Rare birds that cause a stir", to appear in Prague Bulletin of Mathematical Linguistics.
Publisher
Acknowledgement
 Files in this item
Name
PT.tgz
Size
533.87 KB
Format
application/x-gzip
Description
gzip Archive
MD5
7513dad43afc0343365e15471ef1a085
Preview
  File Preview
Name
DE.tgz
Size
368.23 KB
Format
application/x-gzip
Description
gzip Archive
MD5
6512c4688561caa99bee516332873ad8
Preview
  File Preview
Name
EL.tgz
Size
350.63 KB
Format
application/x-gzip
Description
gzip Archive
MD5
6bdb435486338ab3898daf0154e9c0bd
Preview
  File Preview
Name
EU.tgz
Size
383.8 KB
Format
application/x-gzip
Description
gzip Archive
MD5
525686bdbc7b3da0f9e1331a8b3046eb
Preview
  File Preview
Name
PL.tgz
Size
403.88 KB
Format
application/x-gzip
Description
gzip Archive
MD5
1451c2c4d5be016ba3fdcad4851e7f67
Preview
  File Preview