This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Indonesian web corpus (idWac)

Please use the following text to cite this item or export to a predefined format:
Medveď, Marek and Suchomel, Vít, 2017, Indonesian web corpus (idWac), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-2586.
Date issued
2017-12-18
Size
89622005 words,
112017651 tokens,
27050 pages,
6947784 sentences
Language(s)
Description
Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd (http://septinalarasati.com/morphind/).
Acknowledgement
This item isAcademic Use
and licensed under:
 Files in this item
Name
idWac.vert.xz
Size
258.52 MB
Format
application/x-xz
Description
text file
MD5
b65f7f0b82adeb81a739b577164c029f
Preview
  File Preview