This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Croatian National Corpus

Please use the following text to cite this item or export to a predefined format:
University of Zagreb, Faculty of Humanities and Social Sciences, 2014, Croatian National Corpus, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11372/LRT-233.
Date issued
2014-07-30
Type
Language(s)
Description
This is the reference corpus of standard Croatian. In its 3.0 version, which is accessible via noSketch Engine, it has 216.8 million tokens. In terms of annotation, the corpus is tokenised, lemmatised and tagged for MSDs (morphosyntactic descriptions).