This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

POS Tagging and Lemmatization (Czech model)

Please use the following text to cite this item or export to a predefined format:
Vysušilová, Petra and Straka, Milan, 2021, POS Tagging and Lemmatization (Czech model), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-4613.
Date issued
2021
Language(s)
Description
Model trained for Czech POS Tagging and Lemmatization using Czech version of BERT model, RobeCzech. Model is trained on data from Prague Dependency Treebank 3.5. Model is a part of Czech NLP with Contextualized Embeddings master thesis and presented a state-of-the-art performance on the date of submission of the work. Demo jupyter notebook is available on the project GitHub.
 Files in this item
Name
forms.vectors-w5-d300-ns5.16b.npz
Size
850.3 MB
Format
application/octet-stream
Description
Pretrained embeddings needed for the model construction
MD5
1691478ca44620a734dff58c8bd6b7fd
Preview
  File Preview
Name
ch18.index
Size
16.79 KB
Format
application/octet-stream
Description
TensorFlow checkpoint data index
MD5
99a09ee9ba3531fdba323db57a4554c8
Preview
  File Preview
Name
mappings.pickle
Size
40.81 MB
Format
application/octet-stream
Description
Mappings
MD5
363f9a3b8d82610fcb99773c2eb5e856
Preview
  File Preview
Name
ch18.data-00000-of-00001
Size
847.49 MB
Format
application/octet-stream
Description
TensorFlow checkpoint data
MD5
273672b0bb2f180a6ad6e223f696d58d
Preview
  File Preview