dc.contributor.author |
Suchomel, Vít |
dc.contributor.author |
Rychlý, Pavel |
dc.date.accessioned |
2018-01-11T15:29:19Z |
dc.date.available |
2018-01-11T15:29:19Z |
dc.date.issued |
2016 |
dc.identifier.uri |
http://hdl.handle.net/11234/1-2587 |
dc.description |
Amharic web corpus. Crawled by SpiderLing in August 2013 and October 2015 and January 2016. Encoded in UTF-8, cleaned, deduplicated. Tagged by TreeTagger trained on Amharic WIC corpus. |
dc.language.iso |
amh |
dc.publisher |
Masaryk University, NLP Centre |
dc.relation.isreferencedby |
https://link.springer.com/chapter/10.1007/978-3-319-45510-5_34 |
dc.relation.isreferencedby |
https://www.sketchengine.co.uk/wp-content/uploads/2015/05/Corpus_Factory_2010.pdf |
dc.relation.isreferencedby |
http://habit-project.eu/wiki/AmharicCorpus |
dc.rights |
NLP Centre Web Corpus License |
dc.rights.uri |
https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC |
dc.source.uri |
http://habit-project.eu/wiki/HabitSystemFinal |
dc.subject |
Amharic |
dc.subject |
text corpus |
dc.subject |
Web corpus |
dc.subject |
under-resourced language |
dc.subject |
corpus annotation |
dc.subject |
morphological tagger |
dc.title |
Amharic Web Corpus |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
dc.rights.label |
ACA |
has.files |
yes |
branding |
LINDAT / CLARIAH-CZ |
demo.uri |
https://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16;align= |
contact.person |
Marie Stará nlpassist@aurora.fi.muni.cz Masaryk University, NLP Centre |
sponsor |
Norway Grants 7F14047 Harvesting big text data for under-resourced languages (HaBiT) Other |
size.info |
20287250 tokens |
size.info |
17320000 words |
size.info |
1208926 sentences |
files.size |
134348635 |
files.count |
1 |