dc.contributor.author |
Suchomel, Vít |
dc.contributor.author |
Rychlý, Pavel |
dc.date.accessioned |
2018-01-11T15:31:22Z |
dc.date.available |
2018-01-11T15:31:22Z |
dc.date.issued |
2016 |
dc.identifier.uri |
http://hdl.handle.net/11234/1-2588 |
dc.description |
Oromo web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated. |
dc.language.iso |
orm |
dc.publisher |
Masaryk University, NLP Centre |
dc.relation.isreferencedby |
https://www.sketchengine.co.uk/wp-content/uploads/2015/05/Corpus_Factory_2010.pdf |
dc.relation.isreferencedby |
http://habit-project.eu/wiki/OromoCorpus |
dc.rights |
NLP Centre Web Corpus License |
dc.rights.uri |
https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC |
dc.source.uri |
http://habit-project.eu/wiki/HabitSystemFinal |
dc.subject |
text corpora |
dc.subject |
Ethiopian languages |
dc.subject |
Oromo |
dc.subject |
Web corpus |
dc.subject |
under-resourced language |
dc.title |
Oromo web corpus |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
dc.rights.label |
ACA |
has.files |
yes |
branding |
LINDAT / CLARIAH-CZ |
demo.uri |
https://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16;align= |
contact.person |
Marie Stará nlpassist@aurora.fi.muni.cz Masaryk University, NLP Centre |
sponsor |
Norway Grants 7F14047 Harvesting big text data for under-resourced languages (HaBiT) Other |
size.info |
5091696 tokens |
size.info |
4249953 words |
size.info |
250432 sentences |
files.size |
14649688 |
files.count |
1 |