|

Corpora

Below is the list of corpora in the TEITOK/Kontext hybrid set-up, hosted at ÚFAL. To get a larger list of TEITOK projects, see the TEITOK project page. A larger list of Kontext corpora at the UFAL institute can be found in the KonText corpus list, or in the repository. For corpora that have multiple versions in TEITOK, only the most recent version is displayed, but you can click on the version number to see all versions of the corpus. The corpora are listed by corpus type, a description of which can be found here


AcronymLatestToken sizeCorpus TypeCorpus StatusCorpus ContentCorpus Language(s)
infoCzechVerse13MSpecialized CorpuslivePoetryCzech
infoHaCzech18kFacsimile CorpusstableHandwritten textsCzech
infoMakoň2020-11-164.2MSpoken CorpusstableTranscribed talksCzech
infoOCRCZ27MFacsimile CorpusstablePrinted materialCzech
infoParCzech3.027MSpoken CorpusstableParliamentary sessionsCzech
infoPDT-C1.03.9MTreebankstableCzech
infoSIR1.0250kSpecialized CorpusstableNewspaper articlesCzech
infoSkript 2015400kLearner CorpusliveCzech
infoMazon7.9kFacsimile CorpusliveLettersCzech, German, English, French, Russian
infoMigrant Stories400kSpecialized corpusliveMigrant storiesEnglish
infoEHRI40kSpecialized corpusliveLettersGerman, Czech, English
infoMaPCorpSpecialized CorpuslivePoetryMacedonian
infoDeltaCorpus1.194MLRL CorpusstableMany
infoMuNeCo840MLRL CorpusliveNewspaper articlesMany
infoParlaMint4.01.4GSpecialized corpusstableParliamentary sessionsMany
infoUniversal Dependencies2.1229MTreebankstableMany

19 results - showing 1-19 - - click on a value to reduce selection - click on a column to sort - Search