Available corpora

Keywordscurrent versionolder versionmonolingualparallelspeechwords onlymorphologysyntaxdeep syntaxmanualWeb corporaPML-TQDictionary linksUniversal DependenciesUD v2.3UD v2.2UD v1.3UD v1.2HamleDTJRC EU DG Translation MemoryParseme Shared TaskPDT stylePCEDT 2.0CzEng 1.0Czech-SlovakUMC_0.1Czech-RussianEnglish-RussianHindEnCorp 0.5Summa TheologiaeAfrikaansAkkadianAmharicArabicArmenianBambaraBasqueBelarusianBengaliBretonBulgarianBuryatCantoneseCatalanChineseCopticCroatianCzechDanishDutchEnglishErzyaEstonianFaroeseFinnishFrenchOld FrenchGalicianGermanGothicGreekAncient GreekHebrewHindiHindi and EnglishHungarianIndonesianIrishItalianJapaneseKazakhKomi ZyrianKoreanKurmanjiLatinLatvianLithuanianMalteseMarathiNaijaNorthern SamiNorwegianOld Church SlavonicPersianPolishPortugueseRomanianRussianSanskritSerbianSlovakSlovenianSpanishSwedishSwedish Sign LanguageTagalogTamilTeluguThaiTurkishUighurUkrainianUpper SorbianUrduVietnameseWarlpiriYorubaTEITOKReset
(Hold CTRL/Command to select multiple labels)
Advanced filter
NameSize (positions)Labels
ACL RD-TEC 2.053kmanualmonolingualWeb corporaEnglishmorphologycurrent versionDictionary linksDetails
Air Traffic Control Communication193kspeechcurrent versionEnglishwords onlyDetails
Companions - Czech37kspeechcurrent versionCzechwords onlyDetails
Czech Legal Text Treebank 2.034kmanualmonolingualcurrent versionPML-TQCzechmorphologysyntaxPDT styleDetails
Czech Parliamentary Meetings (2012-03-28)642kspeechcurrent versionCzechwords onlyDetails
Czech-English Parallel Corpus 1.0206MparallelCzEng 1.0Czechmorphologysyntaxcurrent versionPDT styleDetails
Czech-English Parallel Corpus 1.0 - English - with POS tags and AFUN233MparallelCzEng 1.0Englishmorphologysyntaxcurrent versionPDT styleDetails
Czech-Slovak - Slovak - POS tags only110MparallelCzech-SlovakSlovakmorphologycurrent versionDetails
Czech-Slovak parallel corpus110MparallelCzech-SlovakCzechmorphologycurrent versionDetails
English TenTen Corpus (2011-12-16)4GmonolingualWeb corporaEnglishwords onlycurrent versionDetails
English TTS speech corpus of air traffic messages - Serbian accent18kspeechcurrent versionEnglishwords onlyDetails
EU DGT-UD: Bulgarian73MparallelJRC EU DG Translation MemoryBulgarianmorphologysyntaxcurrent versionDetails
EU DGT-UD: Croatian28MparallelJRC EU DG Translation MemoryCroatianmorphologysyntaxcurrent versionDetails
EU DGT-UD: Czech100MparallelJRC EU DG Translation MemoryCzechmorphologysyntaxcurrent versionDetails
EU DGT-UD: Danish88MparallelJRC EU DG Translation MemoryDanishmorphologysyntaxcurrent versionDetails
EU DGT-UD: Dutch97MparallelJRC EU DG Translation MemoryDutchmorphologysyntaxcurrent versionDetails
EU DGT-UD: English151MparallelJRC EU DG Translation MemoryEnglishmorphologysyntaxcurrent versionDetails
EU DGT-UD: Estonian78MparallelJRC EU DG Translation MemoryEstonianmorphologysyntaxcurrent versionDetails
EU DGT-UD: Finnish72MparallelJRC EU DG Translation MemoryFinnishmorphologysyntaxcurrent versionDetails
EU DGT-UD: French132MparallelJRC EU DG Translation MemoryFrenchmorphologysyntaxcurrent versionDetails
EU DGT-UD: Gaelic3MparallelJRC EU DG Translation MemoryIrishmorphologysyntaxcurrent versionDetails
EU DGT-UD: German93MparallelJRC EU DG Translation MemoryGermanmorphologysyntaxcurrent versionDetails
EU DGT-UD: Greek97MparallelJRC EU DG Translation MemoryGreekmorphologysyntaxcurrent versionDetails
EU DGT-UD: Hungarian93MparallelJRC EU DG Translation MemoryHungarianmorphologysyntaxcurrent versionDetails
EU DGT-UD: Italian116MparallelJRC EU DG Translation MemoryItalianmorphologysyntaxcurrent versionDetails
EU DGT-UD: Latvian93MparallelJRC EU DG Translation MemoryLatvianmorphologysyntaxcurrent versionDetails
EU DGT-UD: Lithuanian96MparallelJRC EU DG Translation MemoryLithuanianmorphologysyntaxcurrent versionDetails
EU DGT-UD: Polish99MparallelJRC EU DG Translation MemoryPolishmorphologysyntaxcurrent versionDetails
EU DGT-UD: Portuguese124MparallelJRC EU DG Translation MemoryPortuguesemorphologysyntaxcurrent versionDetails
EU DGT-UD: Romanian73MparallelJRC EU DG Translation MemoryRomanianmorphologysyntaxcurrent versionDetails
Load more