Show simple item record

 
dc.contributor.author Suchomel, Vít
dc.contributor.author Rychlý, Pavel
dc.date.accessioned 2018-01-11T15:29:19Z
dc.date.available 2018-01-11T15:29:19Z
dc.date.issued 2016
dc.identifier.uri http://hdl.handle.net/11234/1-2587
dc.description Amharic web corpus. Crawled by SpiderLing in August 2013 and October 2015 and January 2016. Encoded in UTF-8, cleaned, deduplicated. Tagged by TreeTagger trained on Amharic WIC corpus.
dc.language.iso amh
dc.publisher Masaryk University, NLP Centre
dc.relation.isreferencedby https://link.springer.com/chapter/10.1007/978-3-319-45510-5_34
dc.relation.isreferencedby https://www.sketchengine.co.uk/wp-content/uploads/2015/05/Corpus_Factory_2010.pdf
dc.relation.isreferencedby http://habit-project.eu/wiki/AmharicCorpus
dc.rights NLP Centre Web Corpus License
dc.rights.uri https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC
dc.source.uri http://habit-project.eu/wiki/HabitSystemFinal
dc.subject Amharic
dc.subject text corpus
dc.subject Web corpus
dc.subject under-resourced language
dc.subject corpus annotation
dc.subject morphological tagger
dc.title Amharic Web Corpus
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label ACA
has.files yes
branding LINDAT / CLARIAH-CZ
demo.uri https://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16;align=
contact.person Marie Stará nlpassist@aurora.fi.muni.cz Masaryk University, NLP Centre
sponsor Norway Grants 7F14047 Harvesting big text data for under-resourced languages (HaBiT) Other
size.info 20287250 tokens
size.info 17320000 words
size.info 1208926 sentences
files.size 134348635
files.count 1


 Files in this item

This item is
Academic Use
and licensed under:
NLP Centre Web Corpus License
Icon
Name
am131516.vert.gz
Size
128.12 MB
Format
application/x-gzip
Description
Amharic web corpus
MD5
16c9490a9eab931e4b6b5eb6b11eb71e
 Download file

Show simple item record