This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Word embeddings based on a large corpus of written Czech

Please use the following text to cite this item or export to a predefined format:
Jelínek, Tomáš, 2025, Word embeddings based on a large corpus of written Czech, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-6017.
Date issued
2025-10-24
Size
6 files
Language(s)
Description
This package comprises six models of Czech word embeddings: two sets with dimensions 100, 200 and 300, one for lemmas and one for word forms. They were trained by fastText (P. Bojanowski, E. Grave, A. Joulin, T. Mikolov (2016): Enriching Word Vectors with Subword Information, https://fasttext.cc/) on the SYN v13 corpus of contemporary written Czech (Křen et al. 2024, https://wiki.korpus.cz/doku.php/en:cnk:syn:verze13) based on its lemmatisation and tagging. The skipgram algorithm was used for the training, with -minn 2 and -maxn 5 for subwords.
Acknowledgement
 Files in this item
Name
SYNv13_lemmas_fastText_skipgram_minn2_maxn5_dim100.vec.gz
Size
455.26 MB
Format
application/x-gzip
Description
MD5
77210b54852e14fbee2909c0496d71be
Preview
  File Preview
    • SYNv13_lemmas_fastText_skipgram_minn2_maxn5_dim100.vec1 GB
    • SYNv13_lemmas_fastText_skipgram_minn2_maxn5_dim100.vec1 GB
Name
SYNv13_lemmas_fastText_skipgram_minn2_maxn5_dim200.vec.gz
Size
888.9 MB
Format
application/x-gzip
Description
MD5
c481bbabd34df68f620044154f6448ec
Preview
  File Preview
    • SYNv13_lemmas_fastText_skipgram_minn2_maxn5_dim200.vec2 GB
    • SYNv13_lemmas_fastText_skipgram_minn2_maxn5_dim200.vec2 GB
Name
SYNv13_lemmas_fastText_skipgram_minn2_maxn5_dim300.vec.gz
Size
1.29 GB
Format
application/x-gzip
Description
MD5
11dfabdcc90cdf5a214db81d99d38105
Preview
  File Preview
    • SYNv13_lemmas_fastText_skipgram_minn2_maxn5_dim300.vec3 GB
    • SYNv13_lemmas_fastText_skipgram_minn2_maxn5_dim300.vec3 GB
Name
SYNv13_words_fastText_skipgram_minn2_maxn5_dim100.vec.gz
Size
1.02 GB
Format
application/x-gzip
Description
MD5
78cba99bf772b24134df4952f5c44ad6
Preview
  File Preview
    • SYNv13_words_fastText_skipgram_minn2_maxn5_dim100.vec2 GB
    • SYNv13_words_fastText_skipgram_minn2_maxn5_dim100.vec2 GB
Name
SYNv13_words_fastText_skipgram_minn2_maxn5_dim200.vec.gz
Size
1.98 GB
Format
application/x-gzip
Description
MD5
f584b415447a4e5c71e1ad7f19dce288
Preview
  File Preview
    • SYNv13_words_fastText_skipgram_minn2_maxn5_dim200.vec5 GB
    • SYNv13_words_fastText_skipgram_minn2_maxn5_dim200.vec5 GB
Name
SYNv13_words_fastText_skipgram_minn2_maxn5_dim300.vec.gz
Size
2.94 GB
Format
application/x-gzip
Description
MD5
9486a36ed9844306536c736e50a2183a
Preview
  File Preview
    • SYNv13_words_fastText_skipgram_minn2_maxn5_dim300.vec7 GB
    • SYNv13_words_fastText_skipgram_minn2_maxn5_dim300.vec7 GB