Show simple item record

 
dc.contributor.author Medveď, Marek
dc.contributor.author Suchomel, Vít
dc.date.accessioned 2018-01-09T15:57:37Z
dc.date.available 2018-01-09T15:57:37Z
dc.date.issued 2017-12-18
dc.identifier.uri http://hdl.handle.net/11234/1-2586
dc.description Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd (http://septinalarasati.com/morphind/).
dc.language.iso ind
dc.publisher Natural Language Processing Centre, Faculty of Informatics, Masaryk University
dc.rights NLP Centre Web Corpus License
dc.rights.uri https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC
dc.subject corpus
dc.subject lemmatization
dc.subject PoS tagging
dc.title Indonesian web corpus (idWac)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label ACA
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Marek Medveď xmedved1@fi.muni.cz Natural Language Processing Centre, Faculty of Informatics, Masaryk University
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds
size.info 89622005 words
size.info 112017651 tokens
size.info 27050 pages
size.info 6947784 sentences
files.size 271074052
files.count 1


 Files in this item

This item is
Academic Use
and licensed under:
NLP Centre Web Corpus License
Icon
Name
idWac.vert.xz
Size
258.52 MB
Format
application/x-xz
Description
text file
MD5
b65f7f0b82adeb81a739b577164c029f
 Download file

Show simple item record