Show simple item record

 
dc.contributor.author Straka, Milan
dc.contributor.author Náplava, Jakub
dc.contributor.author Straková, Jana
dc.contributor.author Samuel, David
dc.date.accessioned 2021-05-25T08:15:54Z
dc.date.available 2021-05-25T08:15:54Z
dc.date.issued 2021-05-25
dc.identifier.uri http://hdl.handle.net/11234/1-3691
dc.description RobeCzech is a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-theart results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base, both for PyTorch and TensorFlow.
dc.language.iso ces
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject Czech
dc.subject BERT
dc.subject RoBERTa
dc.title RobeCzech Base
dc.type languageDescription
metashare.ResourceInfo#ContentInfo.mediaType text
metashare.ResourceInfo#ContentInfo.detailedType mlmodel
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Milan Straka straka@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
sponsor Grantová agentura České republiky GX20-16819X LUSyD – Language Understanding: from Syntax to Discourse nationalFunds
sponsor Grantová agentura Univerzity Karlovy v Praze GAUK 578218 Automatická korekce jazyka pomocí neuronových sítí nationalFunds
sponsor Mellon Foundation, Brandeis University G-1901-06505 Transatlantic Collaboration between LAPPS Grid and CLARIN: Implementation of NLP-enabled Tools using Text Archives as a Use Case Other
size.info 1035 mb
files.size 1085900989
files.count 2


 Files in this item

Icon
Name
robeczech-base-tf.zip
Size
589.1 MB
Format
application/zip
Description
Tensorflow Huggingface checkpoint of RobeCzech base model.
MD5
621eafb6c5d8784089b2e5d091874d30
 Download file  Preview
 File Preview  
    • config.json516 B
    • vocab.json1 MB
    • LICENSE20 kB
    • README.md544 B
    • tokenizer_config.json184 B
    • tf_model.h5636 MB
    • merges.txt586 kB
Icon
Name
robeczech-base-pytorch.zip
Size
446.5 MB
Format
application/zip
Description
PyTorch Huggingface checkpoint of RobeCzech base model.
MD5
2b3cfe1dd91f31f0963220e784b4a304
 Download file  Preview
 File Preview  
    • config.json516 B
    • vocab.json1 MB
    • LICENSE20 kB
    • README.md544 B
    • tokenizer_config.json184 B
    • pytorch_model.bin483 MB
    • merges.txt586 kB

Show simple item record