This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Czech-Slovak Parallel Corpus

Please use the following text to cite this item or export to a predefined format:
Galuščáková, Petra; Garabík, Radovan and Bojar, Ondřej, 2012, Czech-Slovak Parallel Corpus, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11858/00-097C-0000-0006-AADF-0.
Date issued
2012-05-15
Size
5700000 sentences
Language(s)
Description
Czech-Slovak parallel corpus consisting of several freely available corpora (Acquis [1], Europarl [2], Official Journal of the European Union [3] and part of OPUS corpus [4] – EMEA, EUConst, KDE4 and PHP) and downloaded website of European Commission [5]. Corpus is published in both in plaintext format and with an automatic morphological annotation. References: [1] http://langtech.jrc.it/JRC-Acquis.html/ [2] http://www.statmt.org/europarl/ [3] http://apertium.eu/data [4] http://opus.lingfil.uu.se/ [5] http://ec.europa.eu/
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
corpora-cs-sk-export-format.tar.gz
Size
794.99 MB
Format
application/x-gzip
Description
Corpus with morphological information
MD5
0efa8e221b25cf89e27660a421f4dbbf
Preview
  File Preview
  • corpora-cs-sk-export-format
    • OPUS
      • PHP.cs-sk-tagged.cs.gz130 kB
      • EMEA.cs-sk-tagged.sk.gz49 MB
      • KDE4.cs-sk-tagged.sk.gz1 MB
      • KDE4.cs-sk-tagged.cs.gz1 MB
      • EMEA.cs-sk-tagged.cs.gz57 MB
      • EUconst.cs-sk-tagged.sk.gz494 kB
      • EUconst.cs-sk-tagged.cs.gz536 kB
      • PHP.cs-sk-tagged.sk.gz84 kB
    • acquis
      • acquis-train-tagged.sk.gz79 MB
      • acquis-train-tagged.cs.gz89 MB
    • eu-journal
      • eu-journal-tagged.sk.gz190 MB
      • eu-journal-tagged.cs.gz223 MB
    • ec-europa
      • corpora-ec-europa-tagged.sk.gz1 MB
      • corpora-ec-europa-tagged.cs.gz2 MB
    • europarl
      • europarl-v6.sk-cs-tagged.sk.gz45 MB
      • europarl-v6.sk-cs-tagged.cs.gz52 MB
Name
corpora-cs-sk-plaintext.tar.gz
Size
342 MB
Format
application/x-gzip
Description
Corpus in plaintext format
MD5
f68136ee855aec4f18f29af557591819
Preview
  File Preview
  • corpora-cs-sk-plaintext
    • OPUS
      • EMEA.cs-sk.cs.gz18 MB
      • KDE4.cs-sk.sk.gz889 kB
      • PHP.cs-sk.cs.gz56 kB
      • README52 B
      • KDE4.cs-sk.cs.gz923 kB
      • EUconst.cs-sk.sk.gz247 kB
      • EMEA.cs-sk.sk.gz18 MB
      • EUconst.cs-sk.cs.gz231 kB
      • PHP.cs-sk.sk.gz44 kB
    • acquis
      • acquis-train_cs.txt.gz37 MB
      • README271 B
      • acquis-train_sk.txt.gz38 MB
    • ec-europa
      • corpora-ec-europa.sk.gz936 kB
      • corpora-ec-europa.cs.gz939 kB
      • README37 B
    • journal
      • eu-journal.sk.gz91 MB
      • eu-journal.cs.gz90 MB
      • README215 B
    • europarl
      • europarl-v6.sk-cs.sk.gz21 MB
      • europarl-v6.sk-cs.cs.gz21 MB
      • README45 B