This is not the latest version of this item. The latest version can be found here.
PARSEME corpora annotated for verbal multiword expressions (version 1.3)
Please use the following text to cite this item or export to a predefined format:
Savary, Agata; et al., 2023,
PARSEME corpora annotated for verbal multiword expressions (version 1.3), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11372/LRT-5124.
Authors
Savary, Agata ; et al.
Item identifier
Project URL
Demo URL
Referenced by
Date issued
2023-05-10
Size
455629 sentences,
9264811 tokens,
127498 multiWordUnits
Description
This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). This is the first release of the corpora without an associated shared task. Previous version (1.2) was associated with the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). The data covers 26 languages corresponding to the combination of the corpora for all previous three editions (1.0, 1.1 and 1.2) of the corpora. VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information, including parts of speech, lemmas, morphological features and/or syntactic dependencies, are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). All corpora are split into training, development and test data, following the splitting strategy adopted for the PARSEME Shared Task 1.2. The annotation guidelines are available online: https://parsemefr.lis-lab.fr/parseme-st-guidelines/1.3 The .cupt format is detailed here: https://multiword.sourceforge.net/cupt-format/
Publisher
Collections
Version History
Files in this item
- Name
- README.md
- Size
- 7.08 KB
- Format
- application/octet-stream
- Description
- General README file
- MD5
- 5902de46b35f82c79183b20d67ab13de

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz


