Multilingual static embeddings for Verbal Multiword Expressions trained on PARSEME raw corpora
Please use the following text to cite this item or export to a predefined format:
Estève, Louis Clément; Savary, Agata and Lavergne, Thomas, 2024,
Multilingual static embeddings for Verbal Multiword Expressions trained on PARSEME raw corpora, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-5528.
Authors
Item identifier
Project URL
Date issued
2024-06-07
Size
44412316 entries,
17267 multiWordUnits
Description
This resource is a set of 14 vector spaces for single words and Verbal Multiword Expressions (VMWEs) in different languages (German, Greek, Basque, French, Irish, Hebrew, Hindi, Italian, Polish, Brazilian Portuguese, Romanian, Swedish, Turkish, Chinese).
They were trained with the Word2Vec algorithm, in its skip-gram version, on PARSEME raw corpora automatically annotated for morpho-syntax (http://hdl.handle.net/11234/1-3367).
These corpora were annotated by Seen2Seen, a rule-based VMWE identifier, one of the leading tools of the PARSEME shared task version 1.2.
VMWE tokens were merged into single tokens.
The format of the vector space files is that of the original Word2Vec implementation by Mikolov et al. (2013), i.e. a binary format.
For compression, bzip2 was used.
Acknowledgement
Université Paris Saclay
Project code:Plan blanc
Project name:PhD grant
Subject(s)
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- INSTALL.md
- Size
- 1.21 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 23fbf46cd30ccdae44893d3906946e9f

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- MWE_S2S_DE_typed_100d_skip-gram.bin.xz
- Size
- 822.69 MB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- c5968c97c89a332d08d4ec100cc794b1

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- MWE_S2S_PL_typed_100d_skip-gram.bin.xz
- Size
- 3.89 GB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- e2c4349c12f2da5fe4c7a918f67c0ec9

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- MWE_S2S_PT_typed_100d_skip-gram.bin.xz
- Size
- 1.51 GB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- a40e2effa510c076c46e0eebe8c31bef

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- MWE_S2S_RO_typed_100d_skip-gram.bin.xz
- Size
- 100.33 MB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- 335a84cd858ad1854696cbb7cc41dd90

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- MWE_S2S_SV_typed_100d_skip-gram.bin.xz
- Size
- 4.62 GB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- 1c80c77c6eb64d9842fb3cae87d8dce8

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- MWE_S2S_TR_typed_100d_skip-gram.bin.xz
- Size
- 237.35 MB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- 4f193bcbc0c4d8dee5cbac6b753afe93

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- MWE_S2S_ZH_typed_100d_skip-gram.bin.xz
- Size
- 679.84 MB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- c60eee0a4faf6ed54705e6d080335881

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- load_vectors.py
- Size
- 1.29 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 83c3839d234ba066126f9fb77bd22c71

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- load_vectors.sh
- Size
- 153 B
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 8219c98d3d1999d86660a9d8459395ca

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- md5_checksums.txt
- Size
- 1022 B
- Format
- text/plain
- Description
- Text
- MD5
- 0a2210bab3bd4160578317b8f9bd443a

c5968c97c89a332d08d4ec100cc794b1 *MWE_S2S_DE_typed_100d_skip-gram.bin.xz 1408874115eb749721791284e4c0ee1e *MWE_S2S_EL_typed_100d_skip-gram.bin.xz 0d2536382ddda7a92c68d7e6ffde23d4 *MWE_S2S_EU_typed_100d_skip-gram.bin.xz e4ffbd8874d2ca4593036884c4f7fb0b *MWE_S2S_FR_typed_100d_skip-gram.bin.xz 12804a1e814d9fd4f03608f821c73f08 *MWE_S2S_GA_typed_100d_skip-gram.bin.xz c382b1f98e4d18c1652ef65d14f0a06b *MWE_S2S_HE_typed_100d_skip-gram.bin.xz b4fb6af09acf2fee1fa4c426f9b60b53 *MWE_S2S_HI_typed_100d_skip-gram.bin.xz 07a883f11b4442f2b0611e1fa29101f4 *MWE_S2S_IT_typed_100d_skip-gram.bin.xz e2c4349c12f2da5fe4c7a918f67c0ec9 *MWE_S2S_PL_typed_100d_skip-gram.bin.xz a40e2effa510c076c46e0eebe8c31bef *MWE_S2S_PT_typed_100d_skip-gram.bin.xz 335a84cd858ad1854696cbb7cc41dd90 *MWE_S2S_RO_typed_100d_skip-gram.bin.xz 1c80c77c6eb64d9842fb3cae87d8dce8 *MWE_S2S_SV_typed_100d_skip-gram.bin.xz 4f193bcbc0c4d8dee5cbac6b753afe93 *MWE_S2S_TR_typed_100d_skip-gram.bin.xz c60eee0a4faf6ed54705e6d080335881 *MWE_S2S_ZH_typed_100d_skip-gram.bin.xz
- Name
- sha3_checksums.txt
- Size
- 1.33 KB
- Format
- text/plain
- Description
- Text
- MD5
- 62c052404a4361b91df2464f21971885

e12d5f4d7539b161098d922ffb8935e9e9d350aec9a0f8aea110aac5 *MWE_S2S_DE_typed_100d_skip-gram.bin.xz 59a224848baee4c956565374935ccb8c53faa1151650ceb9af14f999 *MWE_S2S_EL_typed_100d_skip-gram.bin.xz 6a4f6b423d597db6b6b06942d452eac8aeffa6ef0d6e97d7e88d6c65 *MWE_S2S_EU_typed_100d_skip-gram.bin.xz 297d6cc656d909d62b063af92aada8b222b694e50e54a3e9ac984736 *MWE_S2S_FR_typed_100d_skip-gram.bin.xz 107bff85292d176ef0ac975e9f0625fd8376cba223e7b2f33bb03e7b *MWE_S2S_GA_typed_100d_skip-gram.bin.xz d4951bd3322a635ab0971be28b91b630109ad9fd4878d8d8700b3984 *MWE_S2S_HE_typed_100d_skip-gram.bin.xz 6be3a825011a0213f33d5bce32e58e6aae26232d4a0dcfd4c266d477 *MWE_S2S_HI_typed_100d_skip-gram.bin.xz dd436cd35d95395417eff750690b957aee6f906590b3ea238110cf96 *MWE_S2S_IT_typed_100d_skip-gram.bin.xz c979d05a0d5c2658fc112d7555191b8f0601903a4eed47e83c561cd6 *MWE_S2S_PL_typed_100d_skip-gram.bin.xz a64110568c0987dd152c80d2833e592523580da4991d763129dfb4f8 *MWE_S2S_PT_typed_100d_skip-gram.bin.xz 903c1f3e3d792625a064310623d3177187b0d41ae3bef930c9f5e07e *MWE_S2S_RO_typed_100d_skip-gram.bin.xz 08d7ad6447b11e528be5ae59183dbb49980d67da0788018fd3e07900 *MWE_S2S_SV_typed_100d_skip-gram.bin.xz a937371df4890c8711e3e90f4ba85fb4c80f5525ad1baa55554320ee *MWE_S2S_TR_typed_100d_skip-gram.bin.xz cafa538eaa5ae12c9b3e11dbe43c7676730e92b9c2652053ba93575a *MWE_S2S_ZH_typed_100d_skip-gram.bin.xz
- Name
- MWE_S2S_EL_typed_100d_skip-gram.bin.xz
- Size
- 472.67 MB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- 1408874115eb749721791284e4c0ee1e

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- MWE_S2S_EU_typed_100d_skip-gram.bin.xz
- Size
- 145.61 MB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- 0d2536382ddda7a92c68d7e6ffde23d4

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- MWE_S2S_FR_typed_100d_skip-gram.bin.xz
- Size
- 1.96 GB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- e4ffbd8874d2ca4593036884c4f7fb0b

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- MWE_S2S_GA_typed_100d_skip-gram.bin.xz
- Size
- 197.87 MB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- 12804a1e814d9fd4f03608f821c73f08

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- MWE_S2S_HE_typed_100d_skip-gram.bin.xz
- Size
- 117.52 MB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- c382b1f98e4d18c1652ef65d14f0a06b

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- MWE_S2S_HI_typed_100d_skip-gram.bin.xz
- Size
- 319.86 MB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- b4fb6af09acf2fee1fa4c426f9b60b53

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- unzip.sh
- Size
- 149 B
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 85764c8a353ad3462b948e407cbdff5a

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- verify_checksums.sh
- Size
- 351 B
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 570a61090da4e672e2d7309e0ab7086a

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- MWE_S2S_IT_typed_100d_skip-gram.bin.xz
- Size
- 611.89 MB
- Format
- application/x-xz
- Description
- xz Archive
- MD5
- 07a883f11b4442f2b0611e1fa29101f4

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- zip.sh
- Size
- 158 B
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 44d9b3b265827ace9f9b49c724b64083

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

