Czech-Slovak Parallel Corpus
Please use the following text to cite this item or export to a predefined format:
Galuščáková, Petra; Garabík, Radovan and Bojar, Ondřej, 2012,
Czech-Slovak Parallel Corpus, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11858/00-097C-0000-0006-AADF-0.
Authors
Item identifier
Date issued
2012-05-15
Size
5700000 sentences
Description
Czech-Slovak parallel corpus consisting of several freely available corpora (Acquis [1], Europarl [2], Official Journal of the European Union [3] and part of OPUS corpus [4] – EMEA, EUConst, KDE4 and PHP) and downloaded website of European Commission [5]. Corpus is published in both in plaintext format and with an automatic morphological annotation.
References:
[1] http://langtech.jrc.it/JRC-Acquis.html/
[2] http://www.statmt.org/europarl/
[3] http://apertium.eu/data
[4] http://opus.lingfil.uu.se/
[5] http://ec.europa.eu/
Acknowledgement
European Union
Project code:FP7-ICT-2007-3-231720
Project name:EuroMatrix Plus
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:7E09003
Project name:EuroMatrixPlus – Bringing Machine Translation for European Languages to the User
Subject(s)
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- corpora-cs-sk-export-format.tar.gz
- Size
- 794.99 MB
- Format
- application/x-gzip
- Description
- Corpus with morphological information
- MD5
- 0efa8e221b25cf89e27660a421f4dbbf

- corpora-cs-sk-export-format
- OPUS
- PHP.cs-sk-tagged.cs.gz130 kB
- EMEA.cs-sk-tagged.sk.gz49 MB
- KDE4.cs-sk-tagged.sk.gz1 MB
- KDE4.cs-sk-tagged.cs.gz1 MB
- EMEA.cs-sk-tagged.cs.gz57 MB
- EUconst.cs-sk-tagged.sk.gz494 kB
- EUconst.cs-sk-tagged.cs.gz536 kB
- PHP.cs-sk-tagged.sk.gz84 kB
- acquis
- acquis-train-tagged.sk.gz79 MB
- acquis-train-tagged.cs.gz89 MB
- eu-journal
- eu-journal-tagged.sk.gz190 MB
- eu-journal-tagged.cs.gz223 MB
- ec-europa
- corpora-ec-europa-tagged.sk.gz1 MB
- corpora-ec-europa-tagged.cs.gz2 MB
- europarl
- europarl-v6.sk-cs-tagged.sk.gz45 MB
- europarl-v6.sk-cs-tagged.cs.gz52 MB
- OPUS
- Name
- corpora-cs-sk-plaintext.tar.gz
- Size
- 342 MB
- Format
- application/x-gzip
- Description
- Corpus in plaintext format
- MD5
- f68136ee855aec4f18f29af557591819

- corpora-cs-sk-plaintext
- OPUS
- EMEA.cs-sk.cs.gz18 MB
- KDE4.cs-sk.sk.gz889 kB
- PHP.cs-sk.cs.gz56 kB
- README52 B
- KDE4.cs-sk.cs.gz923 kB
- EUconst.cs-sk.sk.gz247 kB
- EMEA.cs-sk.sk.gz18 MB
- EUconst.cs-sk.cs.gz231 kB
- PHP.cs-sk.sk.gz44 kB
- acquis
- acquis-train_cs.txt.gz37 MB
- README271 B
- acquis-train_sk.txt.gz38 MB
- ec-europa
- corpora-ec-europa.sk.gz936 kB
- corpora-ec-europa.cs.gz939 kB
- README37 B
- journal
- eu-journal.sk.gz91 MB
- eu-journal.cs.gz90 MB
- README215 B
- europarl
- europarl-v6.sk-cs.sk.gz21 MB
- europarl-v6.sk-cs.cs.gz21 MB
- README45 B
- OPUS

