Show simple item record

 
dc.contributor.author Suchomel, Vít
dc.contributor.author Rychlý, Pavel
dc.date.accessioned 2018-01-11T15:31:22Z
dc.date.available 2018-01-11T15:31:22Z
dc.date.issued 2016
dc.identifier.uri http://hdl.handle.net/11234/1-2588
dc.description Oromo web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.
dc.language.iso orm
dc.publisher Masaryk University, NLP Centre
dc.relation.isreferencedby https://www.sketchengine.co.uk/wp-content/uploads/2015/05/Corpus_Factory_2010.pdf
dc.relation.isreferencedby http://habit-project.eu/wiki/OromoCorpus
dc.rights NLP Centre Web Corpus License
dc.rights.uri https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC
dc.source.uri http://habit-project.eu/wiki/HabitSystemFinal
dc.subject text corpora
dc.subject Ethiopian languages
dc.subject Oromo
dc.subject Web corpus
dc.subject under-resourced language
dc.title Oromo web corpus
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label ACA
has.files yes
branding LINDAT / CLARIAH-CZ
demo.uri https://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16;align=
contact.person Marie Stará nlpassist@aurora.fi.muni.cz Masaryk University, NLP Centre
sponsor Norway Grants 7F14047 Harvesting big text data for under-resourced languages (HaBiT) Other
size.info 5091696 tokens
size.info 4249953 words
size.info 250432 sentences
files.size 14649688
files.count 1


 Files in this item

This item is
Academic Use
and licensed under:
NLP Centre Web Corpus License
Icon
Name
or16.tag.vert.gz
Size
13.97 MB
Format
application/x-gzip
Description
Oromo web corpus
MD5
acb176438874cde24cd8d403141e6977
 Download file

Show simple item record