This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

Deltacorpus

Please use the following text to cite this item or export to a predefined format:
Mareček, David; Yu, Zhiwei; Zeman, Daniel and Žabokrtský, Zdeněk, 2016, Deltacorpus, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-1662.
Date issued
2016-03-17
Size
94686526 tokens
Description
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
Acknowledgement

Version History

Showing 1 - 2 out of 2 results
VersionDateSummary
2016-06-20 00:00:00
1*
2016-03-17 00:00:00
* Selected version
 Files in this item
Name
Deltacorpus.tar
Size
431.84 MB
Format
application/x-tar
Description
tar Archive
MD5
08c6025cc7aa60e0383792e7e3f441a4
Preview
  File Preview
  • data
    • tgk.txt.gz4 MB
    • mal.txt.gz5 MB
    • pam.txt.gz4 MB
    • bos.txt.gz4 MB
    • jav.txt.gz4 MB
    • bel.txt.gz4 MB
    • hrv.txt.gz4 MB
    • ben.txt.gz5 MB
    • slv.txt.gz4 MB
    • aze.txt.gz4 MB
    • spa.txt.gz4 MB
    • fra.txt.gz3 MB
    • ron.txt.gz4 MB
    • hin.txt.gz4 MB
    • hat.txt.gz3 MB
    • war.txt.gz2 MB
    • dan.txt.gz4 MB
    • hbs.txt.gz4 MB
    • kur.txt.gz4 MB
    • pol.txt.gz4 MB
    • hsb.txt.gz216 kB
    • epo.txt.gz4 MB
    • lat.txt.gz4 MB
    • lav.txt.gz4 MB
    • arz.txt.gz4 MB
    • tam.txt.gz5 MB
    • nds.txt.gz3 MB
    • vie.txt.gz3 MB
    • rus.txt.gz4 MB
    • sqi.txt.gz4 MB
    • ind.txt.gz4 MB
    • swe.txt.gz4 MB
    • nep.txt.gz5 MB
    • vol.txt.gz931 kB
    • arg.txt.gz4 MB
    • bpy.txt.gz5 MB
    • guj.txt.gz4 MB
    • deu.txt.gz4 MB
    • hye.txt.gz4 MB
    • hif.txt.gz4 MB
    • msa.txt.gz4 MB
    • uzb.txt.gz4 MB
    • wln.txt.gz671 kB
    • fry.txt.gz4 MB
    • yid.txt.gz4 MB
    • sah.txt.gz5 MB
    • kor.txt.gz5 MB
    • diq.txt.gz1 MB
    • isl.txt.gz4 MB
    • swa.txt.gz4 MB
    • eus.txt.gz4 MB
    • cym.txt.gz3 MB
    • vec.txt.gz4 MB
    • cat.txt.gz3 MB
    • amh.txt.gz37 kB
    • urd.txt.gz4 MB
    • nap.txt.gz1 MB
    • tat.txt.gz5 MB
    • kaz.txt.gz5 MB
    • lmo.txt.gz3 MB
    • gsw.txt.gz4 MB
    • glk.txt.gz2 MB
    • ara.txt.gz4 MB
    • new.txt.gz296 kB
    • mon.txt.gz4 MB
    • eng.txt.gz4 MB
    • sun.txt.gz2 MB
    • pms.txt.gz1 MB
    • sco.txt.gz4 MB
    • tgl.txt.gz4 MB
    • heb.txt.gz4 MB
    • bul.txt.gz4 MB
    • tel.txt.gz5 MB
    • ita.txt.gz4 MB
    • mri.txt.gz4 MB
    • fas.txt.gz4 MB
    • kat.txt.gz5 MB
    • gle.txt.gz4 MB
    • glg.txt.gz4 MB
    • chv.txt.gz67 kB
    • ukr.txt.gz4 MB
    • hun.txt.gz4 MB
    • fao.txt.gz4 MB
    • lim.txt.gz4 MB
    • ido.txt.gz1 MB
    • ast.txt.gz4 MB
    • afr.txt.gz3 MB
    • gla.txt.gz3 MB
    • mlg.txt.gz3 MB
    • ina.txt.gz3 MB
    • mar.txt.gz5 MB
    • slk.txt.gz4 MB
    • tur.txt.gz4 MB
    • ltz.txt.gz4 MB
    • kan.txt.gz5 MB
    • ell.txt.gz4 MB
    • ces.txt.gz4 MB
    • bre.txt.gz3 MB
    • nor.txt.gz4 MB
    • fin.txt.gz4 MB
    • por.txt.gz3 MB
    • srp.txt.gz4 MB
    • lit.txt.gz4 MB
    • est.txt.gz4 MB
    • nno.txt.gz4 MB
    • mkd.txt.gz4 MB
    • nld.txt.gz4 MB
    • LANGUAGES.txt5 kB
    • README.txt467 B
    • POS_TAGSET.txt567 B