This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Kacenka : parallel corpus of English and Czech texts

Please use the following text to cite this item or export to a predefined format:
Masaryk University, Brno, 1997, Kacenka : parallel corpus of English and Czech texts, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11372/LRT-891.
Date issued
1997
Type
Language(s)
Description
Parallel corpus, 3,297,283 words. The idea was to create a small parallel corpus which would enable to work with entire texts in translation analysis rather then short extracts. At the same time it aimed at acquiring experience that could be used in creating a larger parallel corpus of English and Czech in the future. Although the main part of work has been completed -- and the aims of the KACENKA grant met -- we keep improving and enlarging KACENKA gradually. Currently, it has the size of 3,297,283 words (out of which, 1,689,513 have been acquired by means of scanning). Most of the English texts for KACENKA have been retrieved from the Internet resources. The rest -- and nearly all the Czech texts -- had to be scanned with the use of an OCR programme. KACENKA is stored on a single CD-ROM; its use is limited by copyright restrictions.