This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

Multilingual static embeddings for Verbal Multiword Expressions trained on PARSEME raw corpora

Please use the following text to cite this item or export to a predefined format:
Estève, Louis Clément; Savary, Agata and Lavergne, Thomas, 2024, Multilingual static embeddings for Verbal Multiword Expressions trained on PARSEME raw corpora, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-5528.
Date issued
2024-06-07
Size
44412316 entries,
17267 multiWordUnits
Description
This resource is a set of 14 vector spaces for single words and Verbal Multiword Expressions (VMWEs) in different languages (German, Greek, Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Brazilian Portuguese, Romanian, Swedish, Turkish, Chinese). They were trained with the Word2Vec algorithm, in its skip-gram version, on PARSEME raw corpora automatically annotated for morpho-syntax (http://hdl.handle.net/11234/1-3367). These corpora were annotated by Seen2Seen, a rule-based VMWE identifier, one of the leading tools of the PARSEME shared task version 1.2. VMWE tokens were merged into single tokens. The format of the vector space files is that of the original Word2Vec implementation by Mikolov et al. (2013), i.e. a binary format. For compression, bzip2 was used.
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
INSTALL.md
Size
1.21 KB
Format
application/octet-stream
Description
Unknown
MD5
23fbf46cd30ccdae44893d3906946e9f
Preview
  File Preview
Name
MWE_S2S_DE_typed_100d_skip-gram.bin.xz
Size
822.69 MB
Format
application/x-xz
Description
xz Archive
MD5
c5968c97c89a332d08d4ec100cc794b1
Preview
  File Preview
Name
MWE_S2S_PL_typed_100d_skip-gram.bin.xz
Size
3.89 GB
Format
application/x-xz
Description
xz Archive
MD5
e2c4349c12f2da5fe4c7a918f67c0ec9
Preview
  File Preview
Name
MWE_S2S_PT_typed_100d_skip-gram.bin.xz
Size
1.51 GB
Format
application/x-xz
Description
xz Archive
MD5
a40e2effa510c076c46e0eebe8c31bef
Preview
  File Preview
Name
MWE_S2S_RO_typed_100d_skip-gram.bin.xz
Size
100.33 MB
Format
application/x-xz
Description
xz Archive
MD5
335a84cd858ad1854696cbb7cc41dd90
Preview
  File Preview
Name
MWE_S2S_SV_typed_100d_skip-gram.bin.xz
Size
4.62 GB
Format
application/x-xz
Description
xz Archive
MD5
1c80c77c6eb64d9842fb3cae87d8dce8
Preview
  File Preview
Name
MWE_S2S_TR_typed_100d_skip-gram.bin.xz
Size
237.35 MB
Format
application/x-xz
Description
xz Archive
MD5
4f193bcbc0c4d8dee5cbac6b753afe93
Preview
  File Preview
Name
MWE_S2S_ZH_typed_100d_skip-gram.bin.xz
Size
679.84 MB
Format
application/x-xz
Description
xz Archive
MD5
c60eee0a4faf6ed54705e6d080335881
Preview
  File Preview
Name
load_vectors.py
Size
1.29 KB
Format
application/octet-stream
Description
Unknown
MD5
83c3839d234ba066126f9fb77bd22c71
Preview
  File Preview
Name
load_vectors.sh
Size
153 B
Format
application/octet-stream
Description
Unknown
MD5
8219c98d3d1999d86660a9d8459395ca
Preview
  File Preview
Name
md5_checksums.txt
Size
1022 B
Format
text/plain
Description
Text
MD5
0a2210bab3bd4160578317b8f9bd443a
Preview
  File Preview
    c5968c97c89a332d08d4ec100cc794b1 *MWE_S2S_DE_typed_100d_skip-gram.bin.xz
    1408874115eb749721791284e4c0ee1e *MWE_S2S_EL_typed_100d_skip-gram.bin.xz
    0d2536382ddda7a92c68d7e6ffde23d4 *MWE_S2S_EU_typed_100d_skip-gram.bin.xz
    e4ffbd8874d2ca4593036884c4f7fb0b *MWE_S2S_FR_typed_100d_skip-gram.bin.xz
    12804a1e814d9fd4f03608f821c73f08 *MWE_S2S_GA_typed_100d_skip-gram.bin.xz
    c382b1f98e4d18c1652ef65d14f0a06b *MWE_S2S_HE_typed_100d_skip-gram.bin.xz
    b4fb6af09acf2fee1fa4c426f9b60b53 *MWE_S2S_HI_typed_100d_skip-gram.bin.xz
    07a883f11b4442f2b0611e1fa29101f4 *MWE_S2S_IT_typed_100d_skip-gram.bin.xz
    e2c4349c12f2da5fe4c7a918f67c0ec9 *MWE_S2S_PL_typed_100d_skip-gram.bin.xz
    a40e2effa510c076c46e0eebe8c31bef *MWE_S2S_PT_typed_100d_skip-gram.bin.xz
    335a84cd858ad1854696cbb7cc41dd90 *MWE_S2S_RO_typed_100d_skip-gram.bin.xz
    1c80c77c6eb64d9842fb3cae87d8dce8 *MWE_S2S_SV_typed_100d_skip-gram.bin.xz
    4f193bcbc0c4d8dee5cbac6b753afe93 *MWE_S2S_TR_typed_100d_skip-gram.bin.xz
    c60eee0a4faf6ed54705e6d080335881 *MWE_S2S_ZH_typed_100d_skip-gram.bin.xz
    
Name
sha3_checksums.txt
Size
1.33 KB
Format
text/plain
Description
Text
MD5
62c052404a4361b91df2464f21971885
Preview
  File Preview
    e12d5f4d7539b161098d922ffb8935e9e9d350aec9a0f8aea110aac5 *MWE_S2S_DE_typed_100d_skip-gram.bin.xz
    59a224848baee4c956565374935ccb8c53faa1151650ceb9af14f999 *MWE_S2S_EL_typed_100d_skip-gram.bin.xz
    6a4f6b423d597db6b6b06942d452eac8aeffa6ef0d6e97d7e88d6c65 *MWE_S2S_EU_typed_100d_skip-gram.bin.xz
    297d6cc656d909d62b063af92aada8b222b694e50e54a3e9ac984736 *MWE_S2S_FR_typed_100d_skip-gram.bin.xz
    107bff85292d176ef0ac975e9f0625fd8376cba223e7b2f33bb03e7b *MWE_S2S_GA_typed_100d_skip-gram.bin.xz
    d4951bd3322a635ab0971be28b91b630109ad9fd4878d8d8700b3984 *MWE_S2S_HE_typed_100d_skip-gram.bin.xz
    6be3a825011a0213f33d5bce32e58e6aae26232d4a0dcfd4c266d477 *MWE_S2S_HI_typed_100d_skip-gram.bin.xz
    dd436cd35d95395417eff750690b957aee6f906590b3ea238110cf96 *MWE_S2S_IT_typed_100d_skip-gram.bin.xz
    c979d05a0d5c2658fc112d7555191b8f0601903a4eed47e83c561cd6 *MWE_S2S_PL_typed_100d_skip-gram.bin.xz
    a64110568c0987dd152c80d2833e592523580da4991d763129dfb4f8 *MWE_S2S_PT_typed_100d_skip-gram.bin.xz
    903c1f3e3d792625a064310623d3177187b0d41ae3bef930c9f5e07e *MWE_S2S_RO_typed_100d_skip-gram.bin.xz
    08d7ad6447b11e528be5ae59183dbb49980d67da0788018fd3e07900 *MWE_S2S_SV_typed_100d_skip-gram.bin.xz
    a937371df4890c8711e3e90f4ba85fb4c80f5525ad1baa55554320ee *MWE_S2S_TR_typed_100d_skip-gram.bin.xz
    cafa538eaa5ae12c9b3e11dbe43c7676730e92b9c2652053ba93575a *MWE_S2S_ZH_typed_100d_skip-gram.bin.xz
    
Name
MWE_S2S_EL_typed_100d_skip-gram.bin.xz
Size
472.67 MB
Format
application/x-xz
Description
xz Archive
MD5
1408874115eb749721791284e4c0ee1e
Preview
  File Preview
Name
MWE_S2S_EU_typed_100d_skip-gram.bin.xz
Size
145.61 MB
Format
application/x-xz
Description
xz Archive
MD5
0d2536382ddda7a92c68d7e6ffde23d4
Preview
  File Preview
Name
MWE_S2S_FR_typed_100d_skip-gram.bin.xz
Size
1.96 GB
Format
application/x-xz
Description
xz Archive
MD5
e4ffbd8874d2ca4593036884c4f7fb0b
Preview
  File Preview
Name
MWE_S2S_GA_typed_100d_skip-gram.bin.xz
Size
197.87 MB
Format
application/x-xz
Description
xz Archive
MD5
12804a1e814d9fd4f03608f821c73f08
Preview
  File Preview
Name
MWE_S2S_HE_typed_100d_skip-gram.bin.xz
Size
117.52 MB
Format
application/x-xz
Description
xz Archive
MD5
c382b1f98e4d18c1652ef65d14f0a06b
Preview
  File Preview
Name
MWE_S2S_HI_typed_100d_skip-gram.bin.xz
Size
319.86 MB
Format
application/x-xz
Description
xz Archive
MD5
b4fb6af09acf2fee1fa4c426f9b60b53
Preview
  File Preview
Name
unzip.sh
Size
149 B
Format
application/octet-stream
Description
Unknown
MD5
85764c8a353ad3462b948e407cbdff5a
Preview
  File Preview
Name
verify_checksums.sh
Size
351 B
Format
application/octet-stream
Description
Unknown
MD5
570a61090da4e672e2d7309e0ab7086a
Preview
  File Preview
Name
MWE_S2S_IT_typed_100d_skip-gram.bin.xz
Size
611.89 MB
Format
application/x-xz
Description
xz Archive
MD5
07a883f11b4442f2b0611e1fa29101f4
Preview
  File Preview
Name
zip.sh
Size
158 B
Format
application/octet-stream
Description
Unknown
MD5
44d9b3b265827ace9f9b49c724b64083
Preview
  File Preview