This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Deltacorpus

Please use the following text to cite this item or export to a predefined format:
Mareček, David; Yu, Zhiwei; Zeman, Daniel and Žabokrtský, Zdeněk, 2016, Deltacorpus, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-1662.
Date issued
2016-03-17
Size
94686526 tokens
Description
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
Acknowledgement
 Files in this item
Name
Deltacorpus.tar
Size
431.84 MB
Format
application/x-tar
Description
Deltacorpus
MD5
08c6025cc7aa60e0383792e7e3f441a4
Preview
  File Preview
  • data
    • tgk.txt.gz4 MB
    • mal.txt.gz5 MB
    • pam.txt.gz4 MB
    • bos.txt.gz4 MB
    • jav.txt.gz4 MB
    • bel.txt.gz4 MB
    • hrv.txt.gz4 MB
    • ben.txt.gz5 MB
    • slv.txt.gz4 MB
    • aze.txt.gz4 MB
    • spa.txt.gz4 MB
    • fra.txt.gz3 MB
    • ron.txt.gz4 MB
    • hin.txt.gz4 MB
    • hat.txt.gz3 MB
    • war.txt.gz2 MB
    • dan.txt.gz4 MB
    • hbs.txt.gz4 MB
    • kur.txt.gz4 MB
    • pol.txt.gz4 MB
    • hsb.txt.gz216 kB
    • epo.txt.gz4 MB
    • lat.txt.gz4 MB
    • lav.txt.gz4 MB
    • arz.txt.gz4 MB
    • tam.txt.gz5 MB
    • nds.txt.gz3 MB
    • vie.txt.gz3 MB
    • rus.txt.gz4 MB
    • sqi.txt.gz4 MB
    • ind.txt.gz4 MB
    • swe.txt.gz4 MB
    • nep.txt.gz5 MB
    • vol.txt.gz931 kB
    • arg.txt.gz4 MB
    • bpy.txt.gz5 MB
    • guj.txt.gz4 MB
    • deu.txt.gz4 MB
    • hye.txt.gz4 MB
    • hif.txt.gz4 MB
    • msa.txt.gz4 MB
    • uzb.txt.gz4 MB
    • wln.txt.gz671 kB
    • fry.txt.gz4 MB
    • yid.txt.gz4 MB
    • sah.txt.gz5 MB
    • kor.txt.gz5 MB
    • diq.txt.gz1 MB
    • isl.txt.gz4 MB
    • swa.txt.gz4 MB
    • eus.txt.gz4 MB
    • cym.txt.gz3 MB
    • vec.txt.gz4 MB
    • cat.txt.gz3 MB
    • amh.txt.gz37 kB
    • urd.txt.gz4 MB
    • nap.txt.gz1 MB
    • tat.txt.gz5 MB
    • kaz.txt.gz5 MB
    • lmo.txt.gz3 MB
    • gsw.txt.gz4 MB
    • glk.txt.gz2 MB
    • ara.txt.gz4 MB
    • new.txt.gz296 kB
    • mon.txt.gz4 MB
    • eng.txt.gz4 MB
    • sun.txt.gz2 MB
    • pms.txt.gz1 MB
    • sco.txt.gz4 MB
    • tgl.txt.gz4 MB
    • heb.txt.gz4 MB
    • bul.txt.gz4 MB
    • tel.txt.gz5 MB
    • ita.txt.gz4 MB
    • mri.txt.gz4 MB
    • fas.txt.gz4 MB
    • kat.txt.gz5 MB
    • gle.txt.gz4 MB
    • glg.txt.gz4 MB
    • chv.txt.gz67 kB
    • ukr.txt.gz4 MB
    • hun.txt.gz4 MB
    • fao.txt.gz4 MB
    • lim.txt.gz4 MB
    • ido.txt.gz1 MB
    • ast.txt.gz4 MB
    • afr.txt.gz3 MB
    • gla.txt.gz3 MB
    • mlg.txt.gz3 MB
    • ina.txt.gz3 MB
    • mar.txt.gz5 MB
    • slk.txt.gz4 MB
    • tur.txt.gz4 MB
    • ltz.txt.gz4 MB
    • kan.txt.gz5 MB
    • ell.txt.gz4 MB
    • ces.txt.gz4 MB
    • bre.txt.gz3 MB
    • nor.txt.gz4 MB
    • fin.txt.gz4 MB
    • por.txt.gz3 MB
    • srp.txt.gz4 MB
    • lit.txt.gz4 MB
    • est.txt.gz4 MB
    • nno.txt.gz4 MB
    • mkd.txt.gz4 MB
    • nld.txt.gz4 MB
    • LANGUAGES.txt5 kB
    • README.txt467 B
    • POS_TAGSET.txt567 B