This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

PARSEME corpora annotated for verbal multiword expressions (version 1.3)

Please use the following text to cite this item or export to a predefined format:
Savary, Agata; et al., 2023, PARSEME corpora annotated for verbal multiword expressions (version 1.3), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11372/LRT-5124.
Authors
show everyone
Date issued
2023-05-10
Size
455629 sentences,
9264811 tokens,
127498 multiWordUnits
Description
This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). This is the first release of the corpora without an associated shared task. Previous version (1.2) was associated with the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). The data covers 26 languages corresponding to the combination of the corpora for all previous three editions (1.0, 1.1 and 1.2) of the corpora. VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information, ­­­­including parts of speech, lemmas, morphological features and/or syntactic dependencies, are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). All corpora are split into training, development and test data, following the splitting strategy adopted for the PARSEME Shared Task 1.2. The annotation guidelines are available online: https://parsemefr.lis-lab.fr/parseme-st-guidelines/1.3 The .cupt format is detailed here: https://multiword.sourceforge.net/cupt-format/
Publisher

Version History

Showing 1 - 4 out of 4 results
VersionDateSummary
4*
2023-05-10 00:00:00
2020-07-09 00:00:00
2018-04-30 00:00:00
2017-01-20 00:00:00
* Selected version
This item isPublicly Available
and licensed under:
 Files in this item
Name
HI.tgz
Size
469.3 KB
Format
application/x-gzip
Description
gzip Archive
MD5
1d8dbf79b80326f797d517f3f993d04d
Preview
  File Preview
Name
ES.tgz
Size
2.09 MB
Format
application/x-gzip
Description
gzip Archive
MD5
588b050f3cd655d1dd6df000b0d702da
Preview
  File Preview
Name
EL.tgz
Size
10.6 MB
Format
application/x-gzip
Description
gzip Archive
MD5
125789048de3a0ee764cc3d9f34bc854
Preview
  File Preview
Name
AR.tgz
Size
10.78 MB
Format
application/x-gzip
Description
gzip Archive
MD5
73fe213c348928f5eb49a635a6f02a01
Preview
  File Preview
Name
GA.tgz
Size
494.12 KB
Format
application/x-gzip
Description
gzip Archive
MD5
cb2b193f7ce5bd60a77ba55efbd8232f
Preview
  File Preview
Name
EU.tgz
Size
2.02 MB
Format
application/x-gzip
Description
gzip Archive
MD5
5b9d3da6fcdce7e800b1c1ea07eb6ef1
Preview
  File Preview
Name
FA.tgz
Size
703.09 KB
Format
application/x-gzip
Description
gzip Archive
MD5
d0459becd9d685241b241384ec79ad57
Preview
  File Preview
Name
DE.tgz
Size
2.25 MB
Format
application/x-gzip
Description
gzip Archive
MD5
eaee4a615ce4abd74aab58ea72d5c12e
Preview
  File Preview
Name
BG.tgz
Size
6.48 MB
Format
application/x-gzip
Description
gzip Archive
MD5
7ccee1056d5621a9b509cf727a678525
Preview
  File Preview
Name
EN.tgz
Size
1.59 MB
Format
application/x-gzip
Description
gzip Archive
MD5
b8c356eefeb174e0984f6c7b1188dba9
Preview
  File Preview
Name
CS.tgz
Size
12.86 MB
Format
application/x-gzip
Description
gzip Archive
MD5
9fe9764dc970e2c646049533a81ccda6
Preview
  File Preview
Name
HU.tgz
Size
1.88 MB
Format
application/x-gzip
Description
gzip Archive
MD5
a1153a044795ee7a9151e0ad2f9e25c1
Preview
  File Preview
Name
FR.tgz
Size
6.12 MB
Format
application/x-gzip
Description
gzip Archive
MD5
755009c7e5ba96e74cedc14ec802eb2b
Preview
  File Preview
Name
HE.tgz
Size
5.26 MB
Format
application/x-gzip
Description
gzip Archive
MD5
f2e883e1a108a3888fb2628d769b9c3c
Preview
  File Preview
Name
HR.tgz
Size
1.98 MB
Format
application/x-gzip
Description
gzip Archive
MD5
951cd6b5948ee8e1aa6a9a4a8bf41336
Preview
  File Preview
Name
IT.tgz
Size
4.67 MB
Format
application/x-gzip
Description
gzip Archive
MD5
565fb5c73667b4ac55e8aacf20680501
Preview
  File Preview
Name
LT.tgz
Size
2.98 MB
Format
application/x-gzip
Description
gzip Archive
MD5
8f94517eebae1216e80ea6effc97a91a
Preview
  File Preview
Name
MT.tgz
Size
2.78 MB
Format
application/x-gzip
Description
gzip Archive
MD5
12ee7b2105eeac324386c859a7ef7816
Preview
  File Preview
Name
PL.tgz
Size
6.99 MB
Format
application/x-gzip
Description
gzip Archive
MD5
bdae0922e513f36c000b47360980ffc9
Preview
  File Preview
Name
PT.tgz
Size
7.59 MB
Format
application/x-gzip
Description
gzip Archive
MD5
2c96f436546787f976e20a2022abf516
Preview
  File Preview
Name
RO.tgz
Size
12.33 MB
Format
application/x-gzip
Description
gzip Archive
MD5
7efcbd0b9902d925c11f014b6ccd3c18
Preview
  File Preview
Name
TR.tgz
Size
4.55 MB
Format
application/x-gzip
Description
gzip Archive
MD5
1c36bfd64fba1d93f9deca35e3272ed1
Preview
  File Preview
Name
SV.tgz
Size
1.44 MB
Format
application/x-gzip
Description
gzip Archive
MD5
5c71eb09a2bb773b21141a13e8e40a88
Preview
  File Preview
Name
ZH.tgz
Size
9.61 MB
Format
application/x-gzip
Description
gzip Archive
MD5
362b4150e0fda49a0915130bc85a6712
Preview
  File Preview
Name
SR.tgz
Size
1.11 MB
Format
application/x-gzip
Description
gzip Archive
MD5
0ad8cad8ca462ea837445d2166bc722a
Preview
  File Preview
Name
SL.tgz
Size
8.35 MB
Format
application/x-gzip
Description
gzip Archive
MD5
6933ab467e6bef5e52d0656075e42618
Preview
  File Preview
Name
README.md
Size
7.08 KB
Format
application/octet-stream
Description
Unknown
MD5
5902de46b35f82c79183b20d67ab13de
Preview
  File Preview