Amharic Web Corpus
Please use the following text to cite this item or export to a predefined format:
Suchomel, Vít and Rychlý, Pavel, 2016,
Amharic Web Corpus, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-2587.
Authors
Item identifier
Project URL
Date issued
2016
Size
20287250 tokens,
17320000 words,
1208926 sentences
Language(s)
Description
Amharic web corpus. Crawled by SpiderLing in August 2013 and October 2015 and January 2016. Encoded in UTF-8, cleaned, deduplicated. Tagged by TreeTagger trained on Amharic WIC corpus.
Publisher
Acknowledgement
Norway Grants
Project code:7F14047
Project name:Harvesting big text data for under-resourced languages (HaBiT)
Collections
Files in this item
- Name
- am131516.vert.gz
- Size
- 128.12 MB
- Format
- application/x-gzip
- Description
- Amharic web corpus
- MD5
- 16c9490a9eab931e4b6b5eb6b11eb71e

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

