Show simple item record Mareček, David Yu, Zhiwei Zeman, Daniel Žabokrtský, Zdeněk 2016-03-22T16:44:19Z 2016-03-22T16:44:19Z 2016-03-17
dc.description Texts in 107 languages from the W2C corpus (, first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
dc.language.iso bel
dc.language.iso bos
dc.language.iso bul
dc.language.iso ces
dc.language.iso hbs
dc.language.iso hrv
dc.language.iso hsb
dc.language.iso mkd
dc.language.iso pol
dc.language.iso rus
dc.language.iso slk
dc.language.iso slv
dc.language.iso srp
dc.language.iso ukr
dc.language.iso lav
dc.language.iso lit
dc.language.iso afr
dc.language.iso dan
dc.language.iso deu
dc.language.iso eng
dc.language.iso fao
dc.language.iso fry
dc.language.iso gsw
dc.language.iso isl
dc.language.iso lim
dc.language.iso ltz
dc.language.iso nds
dc.language.iso nld
dc.language.iso nno
dc.language.iso nor
dc.language.iso sco
dc.language.iso swe
dc.language.iso yid
dc.language.iso arg
dc.language.iso ast
dc.language.iso cat
dc.language.iso fra
dc.language.iso glg
dc.language.iso hat
dc.language.iso ita
dc.language.iso lat
dc.language.iso lmo
dc.language.iso nap
dc.language.iso pms
dc.language.iso por
dc.language.iso ron
dc.language.iso spa
dc.language.iso vec
dc.language.iso wln
dc.language.iso bre
dc.language.iso cym
dc.language.iso gla
dc.language.iso gle
dc.language.iso ell
dc.language.iso hye
dc.language.iso sqi
dc.language.iso diq
dc.language.iso fas
dc.language.iso glk
dc.language.iso kur
dc.language.iso tgk
dc.language.iso ben
dc.language.iso bpy
dc.language.iso guj
dc.language.iso hif
dc.language.iso hin
dc.language.iso mar
dc.language.iso nep
dc.language.iso urd
dc.language.iso amh
dc.language.iso ara
dc.language.iso arz
dc.language.iso heb
dc.language.iso est
dc.language.iso fin
dc.language.iso hun
dc.language.iso eus
dc.language.iso kat
dc.language.iso chv
dc.language.iso aze
dc.language.iso tur
dc.language.iso uzb
dc.language.iso kaz
dc.language.iso tat
dc.language.iso sah
dc.language.iso kor
dc.language.iso mon
dc.language.iso tel
dc.language.iso kan
dc.language.iso mal
dc.language.iso tam
dc.language.iso new
dc.language.iso vie
dc.language.iso ind
dc.language.iso jav
dc.language.iso mlg
dc.language.iso mri
dc.language.iso msa
dc.language.iso pam
dc.language.iso sun
dc.language.iso tgl
dc.language.iso war
dc.language.iso swa
dc.language.iso epo
dc.language.iso ido
dc.language.iso ina
dc.language.iso vol
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.subject part of speech
dc.subject tagging
dc.subject semi-supervised
dc.subject cross-language
dc.title Deltacorpus
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
contact.person Daniel Zeman Charles University in Prague, ÚFAL
sponsor Grantová agentura České republiky GA15-10472S Morphologically and Syntactically Annotated Corpora of Many Languages nationalFunds 94686526 tokens
files.size 452815360
files.count 1

 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
431.84 MB
 Download file  Preview
 File Preview  
  • data
    • tgk.txt.gz4 MB
    • mal.txt.gz5 MB
    • pam.txt.gz4 MB
    • bos.txt.gz4 MB
    • jav.txt.gz4 MB
    • bel.txt.gz4 MB
    • hrv.txt.gz4 MB
    • ben.txt.gz5 MB
    • aze.txt.gz4 MB
    • slv.txt.gz4 MB
    • spa.txt.gz4 MB
    • fra.txt.gz3 MB
    • ron.txt.gz4 MB
    • hin.txt.gz4 MB
    • hat.txt.gz3 MB
    • war.txt.gz2 MB
    • dan.txt.gz4 MB
    • hbs.txt.gz4 MB
    • pol.txt.gz4 MB
    • kur.txt.gz4 MB
    • epo.txt.gz4 MB
    • hsb.txt.gz216 kB
    • lat.txt.gz4 MB
    • lav.txt.gz4 MB
    • arz.txt.gz4 MB
    • tam.txt.gz5 MB
    • nds.txt.gz3 MB
    • rus.txt.gz4 MB
    • vie.txt.gz3 MB
    • sqi.txt.gz4 MB
    • ind.txt.gz4 MB
    • nep.txt.gz5 MB
    • swe.txt.gz4 MB
    • vol.txt.gz931 kB
    • arg.txt.gz4 MB
    • bpy.txt.gz5 MB
    • guj.txt.gz4 MB
    • hye.txt.gz4 MB
    • deu.txt.gz4 MB
    • hif.txt.gz4 MB
    • msa.txt.gz4 MB
    • uzb.txt.gz4 MB
    • wln.txt.gz671 kB
    • fry.txt.gz4 MB
    • yid.txt.gz4 MB
    • sah.txt.gz5 MB
    • kor.txt.gz5 MB
    • diq.txt.gz1 MB
    • isl.txt.gz4 MB
    • swa.txt.gz4 MB
    • eus.txt.gz4 MB
    • cym.txt.gz3 MB
    • vec.txt.gz4 MB
    • cat.txt.gz3 MB
    • amh.txt.gz37 kB
    • urd.txt.gz4 MB
    • nap.txt.gz1 MB
    • tat.txt.gz5 MB
    • kaz.txt.gz5 MB
    • lmo.txt.gz3 MB
    • gsw.txt.gz4 MB
    • glk.txt.gz2 MB
    • ara.txt.gz4 MB
    • new.txt.gz296 kB
    • mon.txt.gz4 MB
    • eng.txt.gz4 MB
    • sun.txt.gz2 MB
    • pms.txt.gz1 MB
    • sco.txt.gz4 MB
    • tgl.txt.gz4 MB
    • heb.txt.gz4 MB
    • bul.txt.gz4 MB
    • tel.txt.gz5 MB
    • ita.txt.gz4 MB
    • mri.txt.gz4 MB
    • fas.txt.gz4 MB
    • kat.txt.gz5 MB
    • gle.txt.gz4 MB
    • glg.txt.gz4 MB
    • chv.txt.gz67 kB
    • ukr.txt.gz4 MB
    • hun.txt.gz4 MB
    • fao.txt.gz4 MB
    • lim.txt.gz4 MB
    • ido.txt.gz1 MB
    • ast.txt.gz4 MB
    • afr.txt.gz3 MB
    • gla.txt.gz3 MB
    • mlg.txt.gz3 MB
    • ina.txt.gz3 MB
    • mar.txt.gz5 MB
    • slk.txt.gz4 MB
    • tur.txt.gz4 MB
    • ltz.txt.gz4 MB
    • kan.txt.gz5 MB
    • ell.txt.gz4 MB
    • ces.txt.gz4 MB
    • bre.txt.gz3 MB
    • nor.txt.gz4 MB
    • por.txt.gz3 MB
    • fin.txt.gz4 MB
    • lit.txt.gz4 MB
    • srp.txt.gz4 MB
    • est.txt.gz4 MB
    • nno.txt.gz4 MB
    • mkd.txt.gz4 MB
    • nld.txt.gz4 MB
    • LANGUAGES.txt5 kB
    • README.txt467 B
    • POS_TAGSET.txt567 B

Show simple item record