dc.contributor.author | Straka, Milan |
dc.contributor.author | Náplava, Jakub |
dc.contributor.author | Straková, Jana |
dc.contributor.author | Samuel, David |
dc.date.accessioned | 2021-05-25T08:15:54Z |
dc.date.available | 2021-05-25T08:15:54Z |
dc.date.issued | 2021-05-25 |
dc.identifier.uri | http://hdl.handle.net/11234/1-3691 |
dc.description | RobeCzech is a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-theart results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base, both for PyTorch and TensorFlow. |
dc.language.iso | ces |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc.subject | Czech |
dc.subject | BERT |
dc.subject | RoBERTa |
dc.title | RobeCzech Base |
dc.type | languageDescription |
metashare.ResourceInfo#ContentInfo.mediaType | text |
metashare.ResourceInfo#ContentInfo.detailedType | mlmodel |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Milan Straka straka@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
sponsor | Grantová agentura České republiky GX20-16819X LUSyD – Language Understanding: from Syntax to Discourse nationalFunds |
sponsor | Grantová agentura Univerzity Karlovy v Praze GAUK 578218 Automatická korekce jazyka pomocí neuronových sítí nationalFunds |
sponsor | Mellon Foundation, Brandeis University G-1901-06505 Transatlantic Collaboration between LAPPS Grid and CLARIN: Implementation of NLP-enabled Tools using Text Archives as a Use Case Other |
size.info | 1035 mb |
files.size | 1085900989 |
files.count | 2 |
Files in this item
This item is
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
- Name
- robeczech-base-tf.zip
- Size
- 589.1 MB
- Format
- application/zip
- Description
- Tensorflow Huggingface checkpoint of RobeCzech base model.
- MD5
- 621eafb6c5d8784089b2e5d091874d30
- Name
- robeczech-base-pytorch.zip
- Size
- 446.5 MB
- Format
- application/zip
- Description
- PyTorch Huggingface checkpoint of RobeCzech base model.
- MD5
- 2b3cfe1dd91f31f0963220e784b4a304