Catalog
Repository
Education
Projects
Tools
Services
About
Partners
Mission Statement
CLARIN
DARIAH
Service integrations
Project partnerships
Login
LINDAT/CLARIAH-CZ Repository Home
View Item
Show/Hide Menu
Browse
All of the Repository
Issue Date
Authors
Titles
Subjects
Publisher
Language
Type
Rights Label
My Account
Login
Statistics
Statistics
BETA
General Information
Deposit
Cite
Submission Lifecycle
FAQ
About
Help Desk
Indonesian web corpus
LINDAT / CLARIAH-CZ
Authors
MEDVEĎ, MAREK
and
Suchomel, Vít
Item identifier
http://hdl.handle.net/11234/1-2970
Date issued
2019-04-02
Type
corpus
,
text
Size
109232712 tokens
Language(s)
Indonesian
Description
Indonesian web corpus crawled in 2010. Encoded in UTF-8, cleaned, deduplicated, tagged by Morphind.
Publisher
Masaryk University, NLP Centre
Subject(s)
Web corpus
Collection(s)
LINDAT / CLARIAH-CZ Data & Tools
Show full item record
Files in this item
This item is
Academic Use
and licensed under:
NLP Centre Web Corpus License
Name
indonesianwac3_morphind_lempos.vert.7z
Size
207.88 MB
Format
Unknown
Description
vertical text
MD5
f6553682cf576b5868fa8a118d6cbd68
Download file