W2C – Web to Corpus – tool
Please use the following text to cite this item or export to a predefined format:
Majliš, Martin, 2011,
W2C – Web to Corpus – tool, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1.
Authors
Item identifier
Date issued
2011-12-20
Type
Description
A tool used to build multilingual corpora from wikipedia. Download the web pages, convert them to plain text, identify language, etc.
A set of 120 corpora collected using this tool is available at https://ufal-point.mff.cuni.cz/xmlui/handle/11858/00-097C-0000-0022-6133-9
Subject(s)
Collections
Files in this item
- Name
- tr46.pdf
- Size
- 567.11 KB
- Format
- application/pdf
- Description
- Adobe PDF
- MD5
- 824ef862d75b40fc324d54b13a592ee1

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- w2c.tar.gz
- Size
- 165.85 KB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 747d9fabca38d085e976950193029ca3

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

