dc.contributor.author | Popel, Martin |
dc.date.accessioned | 2022-06-14T10:10:38Z |
dc.date.available | 2022-06-14T10:10:38Z |
dc.date.issued | 2020-07-06 |
dc.identifier.uri | http://hdl.handle.net/11234/1-4774 |
dc.description | CzEng is a sentence-parallel Czech-English corpus compiled at the Institute of Formal and Applied Linguistics (ÚFAL). While the full CzEng 2.0 is freely available for non-commercial research purposes from the project website (https://ufal.mff.cuni.cz/czeng), this release contains only the original monolingual parts of news text (csmono 53M and enmono 79M sentences) with automatic (synthetic) translations by CUBBITT. See the attached README for additional details such as the file format. |
dc.language.iso | ces |
dc.language.iso | eng |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.relation.isreferencedby | https://arxiv.org/abs/2007.03006 |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-sa/4.0/ |
dc.source.uri | https://ufal.mff.cuni.cz/czeng |
dc.subject | parallel corpus |
dc.title | Synthetic part of CzEng 2.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Martin Popel popel@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
sponsor | Ministerstvo školství, mládeže a tělovýchovy České republiky LM2018101 LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy nationalFunds |
sponsor | Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds |
sponsor | Grantová agentura České republiky GX20-16819X LUSyD – Language Understanding: from Syntax to Discourse nationalFunds |
size.info | 131537252 sentences |
files.size | 12798377982 |
files.count | 3 |
Soubory tohoto záznamu
Licenční kategorie:
Licence: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
Licence: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Název
- README
- Velikost
- 2.99 KB
- Formát
- Neznámý
- Popis
- readme.txt
- MD5
- ab2d71950b2e51acdc461bec8674b164
- Název
- czeng20-csmono.gz
- Velikost
- 4.31 GB
- Formát
- application/x-gzip
- Popis
- filtered Czech news crawl from 2013-2018, translated to English by CUBBITT
- MD5
- b80333bef7cc9db8610daaae0e2186ea
- Název
- czeng20-enmono.gz
- Velikost
- 7.61 GB
- Formát
- application/x-gzip
- Popis
- filtered English news crawl from 2016-2018, translated to Czech by CUBBITT
- MD5
- bf5941d6de35af9cbd7f0f0efd190e1f