This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Amharic Web Corpus

Please use the following text to cite this item or export to a predefined format:
Suchomel, Vít and Rychlý, Pavel, 2016, Amharic Web Corpus, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-2587.
Date issued
2016
Size
20287250 tokens,
17320000 words,
1208926 sentences
Language(s)
Description
Amharic web corpus. Crawled by SpiderLing in August 2013 and October 2015 and January 2016. Encoded in UTF-8, cleaned, deduplicated. Tagged by TreeTagger trained on Amharic WIC corpus.
Acknowledgement
This item isAcademic Use
and licensed under:
 Files in this item
Name
am131516.vert.gz
Size
128.12 MB
Format
application/x-gzip
Description
Amharic web corpus
MD5
16c9490a9eab931e4b6b5eb6b11eb71e
Preview
  File Preview