Oromo web corpus

Name: Oromo web corpus
License: https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC

dc.contributor.author	Suchomel, Vít
dc.contributor.author	Rychlý, Pavel
dc.date.accessioned	2018-01-11T15:31:22Z
dc.date.available	2018-01-11T15:31:22Z
dc.date.issued	2016
dc.identifier.uri	http://hdl.handle.net/11234/1-2588
dc.description	Oromo web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.
dc.language.iso	orm
dc.publisher	Masaryk University, NLP Centre
dc.relation.isreferencedby	https://www.sketchengine.co.uk/wp-content/uploads/2015/05/Corpus_Factory_2010.pdf
dc.relation.isreferencedby	http://habit-project.eu/wiki/OromoCorpus
dc.rights	NLP Centre Web Corpus License
dc.rights.uri	https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC
dc.source.uri	http://habit-project.eu/wiki/HabitSystemFinal
dc.subject	text corpora
dc.subject	Ethiopian languages
dc.subject	Oromo
dc.subject	Web corpus
dc.subject	under-resourced language
dc.title	Oromo web corpus
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
dc.rights.label	ACA
has.files	yes
branding	LINDAT / CLARIAH-CZ
demo.uri	https://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16;align=
contact.person	Marie Stará nlpassist@aurora.fi.muni.cz Masaryk University, NLP Centre
sponsor	Norway Grants 7F14047 Harvesting big text data for under-resourced languages (HaBiT) Other
size.info	5091696 tokens
size.info	4249953 words
size.info	250432 sentences
files.size	14649688
files.count	1

This item is

Academic Use