Indonesian web corpus (idWac)
Please use the following text to cite this item or export to a predefined format:
Medveď, Marek and Suchomel, Vít, 2017,
Indonesian web corpus (idWac), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-2586.
Authors
Item identifier
Date issued
2017-12-18
Size
89622005 words,
112017651 tokens,
27050 pages,
6947784 sentences
Language(s)
Description
Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd (http://septinalarasati.com/morphind/).
Acknowledgement
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2015071
Project name:LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat
Subject(s)
Collections
Files in this item
- Name
- idWac.vert.xz
- Size
- 258.52 MB
- Format
- application/x-xz
- Description
- text file
- MD5
- b65f7f0b82adeb81a739b577164c029f

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

