Amharic Web Corpus

Name: Amharic Web Corpus
License: https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC

Suchomel, Vít; Rychlý, Pavel

dc.contributor.author	Suchomel, Vít
dc.contributor.author	Rychlý, Pavel
dc.date.accessioned	2018-01-11T15:29:19Z
dc.date.available	2018-01-11T15:29:19Z
dc.date.issued	2016
dc.identifier.uri	http://hdl.handle.net/11234/1-2587
dc.description	Amharic web corpus. Crawled by SpiderLing in August 2013 and October 2015 and January 2016. Encoded in UTF-8, cleaned, deduplicated. Tagged by TreeTagger trained on Amharic WIC corpus.
dc.language.iso	amh
dc.publisher	Masaryk University, NLP Centre
dc.relation.isreferencedby	https://link.springer.com/chapter/10.1007/978-3-319-45510-5_34
dc.relation.isreferencedby	https://www.sketchengine.co.uk/wp-content/uploads/2015/05/Corpus_Factory_2010.pdf
dc.relation.isreferencedby	http://habit-project.eu/wiki/AmharicCorpus
dc.rights	NLP Centre Web Corpus License
dc.rights.uri	https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC
dc.source.uri	http://habit-project.eu/wiki/HabitSystemFinal
dc.subject	Amharic
dc.subject	text corpus
dc.subject	Web corpus
dc.subject	under-resourced language
dc.subject	corpus annotation
dc.subject	morphological tagger
dc.title	Amharic Web Corpus
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
dc.rights.label	ACA
has.files	yes
branding	LINDAT / CLARIAH-CZ
demo.uri	https://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16;align=
contact.person	Marie Stará nlpassist@aurora.fi.muni.cz Masaryk University, NLP Centre
sponsor	Norway Grants 7F14047 Harvesting big text data for under-resourced languages (HaBiT) Other
size.info	20287250 tokens
size.info	17320000 words
size.info	1208926 sentences
files.size	134348635
files.count	1

Files in this item

This item is

Academic Use

and licensed under:
NLP Centre Web Corpus License

Name: am131516.vert.gz
Size: 128.12 MB
Format: application/x-gzip
Description: Amharic web corpus
MD5: 16c9490a9eab931e4b6b5eb6b11eb71e

Download file

Show simple item record