This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

Synthetic part of CzEng 2.0

Please use the following text to cite this item or export to a predefined format:
Popel, Martin, 2020, Synthetic part of CzEng 2.0, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-4774.
Date issued
2020-07-06
Size
131537252 sentences
Language(s)
Description
CzEng is a sentence-parallel Czech-English corpus compiled at the Institute of Formal and Applied Linguistics (ÚFAL). While the full CzEng 2.0 is freely available for non-commercial research purposes from the project website (https://ufal.mff.cuni.cz/czeng), this release contains only the original monolingual parts of news text (csmono 53M and enmono 79M sentences) with automatic (synthetic) translations by CUBBITT. See the attached README for additional details such as the file format.
Acknowledgement
 Files in this item
Name
README
Size
2.99 KB
Format
application/octet-stream
Description
Unknown
MD5
ab2d71950b2e51acdc461bec8674b164
Preview
  File Preview
Name
czeng20-csmono.gz
Size
4.31 GB
Format
application/x-gzip
Description
gzip Archive
MD5
b80333bef7cc9db8610daaae0e2186ea
Preview
  File Preview
Name
czeng20-enmono.gz
Size
7.61 GB
Format
application/x-gzip
Description
gzip Archive
MD5
bf5941d6de35af9cbd7f0f0efd190e1f
Preview
  File Preview