This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)

Please use the following text to cite this item or export to a predefined format:
Guillaume, Bruno; et al., 2020, Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-3416.
Date issued
2020-07-09
Size
450 gb
Description
This multilingual resource contains corpora for 14 languages, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). These corpora were meant to serve as additional "raw" corpora, to help discovering unseen verbal MWEs. The corpora are provided in CONLL-U (https://universaldependencies.org/format.html) format. They contain morphosyntactic annotations (parts of speech, lemmas, morphological features, and syntactic dependencies). Depending on the language, the information comes from treebanks (mostly Universal Dependencies v2.x) or from automatic parsers trained on UD v2.x treebanks (e.g., UDPipe). VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information ­­­­– not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.2
Publisher
This item isPublicly Available
and licensed under:
 Files in this item
Name
DE.tgz
Size
1.34 GB
Format
application/x-gzip
Description
gzip Archive
MD5
3aed5aee260875f8903cc0a1543d890a
Preview
  File Preview
Name
EL.tgz
Size
187.26 MB
Format
application/x-gzip
Description
gzip Archive
MD5
c81f1052d4ff0f48a36deca267ccaf1f
Preview
  File Preview
Name
GA.tgz
Size
199.6 MB
Format
application/x-gzip
Description
gzip Archive
MD5
a06d893fdb1c69d7286a95cb601b3a2e
Preview
  File Preview
Name
FR-0.tgz
Size
1.43 GB
Format
application/x-gzip
Description
gzip Archive
MD5
c40111581b2f4d45c8d284087609254a
Preview
  File Preview
Name
HI.tgz
Size
542.97 MB
Format
application/x-gzip
Description
gzip Archive
MD5
547b0eaa6eb0fce60af47473c72276a1
Preview
  File Preview
Name
EU.tgz
Size
141.77 MB
Format
application/x-gzip
Description
gzip Archive
MD5
a7617a30028a4189558c018612368c77
Preview
  File Preview
Name
HE.tgz
Size
88.2 MB
Format
application/x-gzip
Description
gzip Archive
MD5
0a11b23381d024a3ebdca84e18245a68
Preview
  File Preview
Name
SV-01.tgz
Size
1.01 GB
Format
application/x-gzip
Description
gzip Archive
MD5
9f71294e5bbe3503964b9dd77e5851f3
Preview
  File Preview
Name
FR-2.tgz
Size
1.39 GB
Format
application/x-gzip
Description
gzip Archive
MD5
ed9ae99da315a5c4e2adcf226d3d0f3e
Preview
  File Preview
Name
FR-1.tgz
Size
1.43 GB
Format
application/x-gzip
Description
gzip Archive
MD5
984b76faccffef4cdd50d357ee033206
Preview
  File Preview
Name
PL-12.tgz
Size
1.09 GB
Format
application/x-gzip
Description
gzip Archive
MD5
e6713b3961cf9bfb1322461d7b5ad53d
Preview
  File Preview
Name
PL-10.tgz
Size
1.12 GB
Format
application/x-gzip
Description
gzip Archive
MD5
bbf111bf808e84e46cb04b28b2b537f9
Preview
  File Preview
Name
SV-11.tgz
Size
1.05 GB
Format
application/x-gzip
Description
gzip Archive
MD5
247d36e4389774ae59af8aeac055091f
Preview
  File Preview
Name
SV-18.tgz
Size
800.99 MB
Format
application/x-gzip
Description
gzip Archive
MD5
89243ae8435558ccac8b5ddfb74677f2
Preview
  File Preview
Name
SV-17.tgz
Size
1 GB
Format
application/x-gzip
Description
gzip Archive
MD5
a0c58d61c9670c8c5f954d469aa74a62
Preview
  File Preview
Name
SV-16.tgz
Size
1.04 GB
Format
application/x-gzip
Description
gzip Archive
MD5
a3ce9f178594be70a30835d4984f3f63
Preview
  File Preview
Name
IT.tgz
Size
1.59 GB
Format
application/x-gzip
Description
gzip Archive
MD5
ce3e3baf237c011731bcbf42c16f161b
Preview
  File Preview
Name
TR.tgz
Size
162.02 MB
Format
application/x-gzip
Description
gzip Archive
MD5
6323664589945450aaf1853b73ca99cf
Preview
  File Preview
Name
ZH.tgz
Size
399.11 MB
Format
application/x-gzip
Description
gzip Archive
MD5
d7f39b560a8f037d76a8e421657e235a
Preview
  File Preview
Name
PL-07.tgz
Size
1.13 GB
Format
application/x-gzip
Description
gzip Archive
MD5
eb952fd1b29b1097ae0166ac43176df0
Preview
  File Preview
Name
PL-00.tgz
Size
1.03 GB
Format
application/x-gzip
Description
gzip Archive
MD5
7751e321a902c6435b32689538ea4df3
Preview
  File Preview
Name
SV-03.tgz
Size
970.8 MB
Format
application/x-gzip
Description
gzip Archive
MD5
6da6df394d30d975e44daae2a8bfdc2f
Preview
  File Preview
Name
PL-01.tgz
Size
1.15 GB
Format
application/x-gzip
Description
gzip Archive
MD5
30bfd4572fa0558a40de07d54a48bf00
Preview
  File Preview
Name
SV-13.tgz
Size
1017.32 MB
Format
application/x-gzip
Description
gzip Archive
MD5
279fe30a8afda651c73d5744ceea6ee2
Preview
  File Preview
Name
PL-02.tgz
Size
1.15 GB
Format
application/x-gzip
Description
gzip Archive
MD5
f789d03f71464da5f3b653d40fbef351
Preview
  File Preview
Name
SV-06.tgz
Size
999.95 MB
Format
application/x-gzip
Description
gzip Archive
MD5
8d3994e43c6442e96d5a19951017cef0
Preview
  File Preview
Name
PL-05.tgz
Size
1.14 GB
Format
application/x-gzip
Description
gzip Archive
MD5
7650964d51f43de10ebcb11ee636db06
Preview
  File Preview
Name
PT.tgz
Size
1.73 GB
Format
application/x-gzip
Description
gzip Archive
MD5
dcc41bbd107be5902b8795c267dace5d
Preview
  File Preview
Name
SV-14.tgz
Size
946.75 MB
Format
application/x-gzip
Description
gzip Archive
MD5
d42b12cc53288f077c69ae821fffa16d
Preview
  File Preview
Name
SV-15.tgz
Size
972 MB
Format
application/x-gzip
Description
gzip Archive
MD5
7696cc8931abb44cadbefe6ad402a0c8
Preview
  File Preview
Name
PL-14.tgz
Size
107.29 MB
Format
application/x-gzip
Description
gzip Archive
MD5
676dc9cb0109ab5160d5d1d7ab15c19f
Preview
  File Preview
Name
PL-04.tgz
Size
1.14 GB
Format
application/x-gzip
Description
gzip Archive
MD5
bfbde7fb2b55bbc117f0298a0a9a77fb
Preview
  File Preview
Name
SV-08.tgz
Size
931.66 MB
Format
application/x-gzip
Description
gzip Archive
MD5
5694bd3086b7bf67ca0ea69ecb32f491
Preview
  File Preview
Name
PL-03.tgz
Size
1.15 GB
Format
application/x-gzip
Description
gzip Archive
MD5
fe721405c64c7437a9ad45175984b985
Preview
  File Preview
Name
SV-02.tgz
Size
1008.39 MB
Format
application/x-gzip
Description
gzip Archive
MD5
06584c4521f93c4924747fa28d24bac5
Preview
  File Preview
Name
PL-09.tgz
Size
1016.43 MB
Format
application/x-gzip
Description
gzip Archive
MD5
e7d5e40e9daf8301a83f0ba0f57f0e46
Preview
  File Preview
Name
SV-04.tgz
Size
849.36 MB
Format
application/x-gzip
Description
gzip Archive
MD5
f76653d2632ef2a4d458c6fbd8e16d10
Preview
  File Preview
Name
PL-06.tgz
Size
1.14 GB
Format
application/x-gzip
Description
gzip Archive
MD5
81de90fef4982b69adb5dac7a392f23c
Preview
  File Preview
Name
SV-10.tgz
Size
987.17 MB
Format
application/x-gzip
Description
gzip Archive
MD5
1c974dd27afd4026ce26ba95ff520171
Preview
  File Preview
Name
SV-09.tgz
Size
886.31 MB
Format
application/x-gzip
Description
gzip Archive
MD5
bc0450f9b8ab9fe0db06def2750e55c5
Preview
  File Preview
Name
SV-12.tgz
Size
1.02 GB
Format
application/x-gzip
Description
gzip Archive
MD5
11de25c2d239438832522e9c2c3d86e3
Preview
  File Preview
Name
PL-08.tgz
Size
1.12 GB
Format
application/x-gzip
Description
gzip Archive
MD5
59ec3bbd04f68321fefc1ee23b566b8b
Preview
  File Preview
Name
PL-11.tgz
Size
1.13 GB
Format
application/x-gzip
Description
gzip Archive
MD5
63c285f88dfc9e8c180033929067cbbe
Preview
  File Preview
Name
PL-13.tgz
Size
1.1 GB
Format
application/x-gzip
Description
gzip Archive
MD5
f0ebe5779d28738918e1fcb9a29fc96e
Preview
  File Preview
Name
RO.tgz
Size
88.05 MB
Format
application/x-gzip
Description
gzip Archive
MD5
55a0cddc185b3c2f7faa9b7c12d0bf85
Preview
  File Preview
Name
SV-00.tgz
Size
953.36 MB
Format
application/x-gzip
Description
gzip Archive
MD5
ad5542e3129988bfb6e4af0db1068b35
Preview
  File Preview
Name
SV-07.tgz
Size
966.7 MB
Format
application/x-gzip
Description
gzip Archive
MD5
11f477f6b558f39c7c20031d2783892c
Preview
  File Preview
Name
SV-05.tgz
Size
1019.38 MB
Format
application/x-gzip
Description
gzip Archive
MD5
04e7917c7cf5dc620467171b10639a73
Preview
  File Preview
Name
README_raw.md
Size
10.97 KB
Format
application/octet-stream
Description
Unknown
MD5
f93e74d775c864bd27446d9244cf19ec
Preview
  File Preview