This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Diakorp v6: diachronic corpus of Czech

Please use the following text to cite this item or export to a predefined format:
Kučera, Karel; Řehořková, Anna and Stluka, Martin, 2015, Diakorp v6: diachronic corpus of Czech, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-5413.
Date issued
2015-12-18
Size
3450000 words
Language(s)
Description
Diachronic corpus of Czech sized 3.45 million words (i.e. 4.1 million tokens). It contains 116 texts from the 14th-20th century period. The texts are transcribed, not transliterated. Diakorp v6 is provided in a CoNLL-U-like vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query interface to the registered users of CNC at http://www.korpus.cz
Acknowledgement
 Files in this item
Name
diakorp_v6.gz
Size
9.21 MB
Format
application/x-gzip
Description
Neznámý
MD5
cd67d27a33a55d56548505cf73857b00
Preview
  File Preview
    • diakorp_v627 MB