dc.contributor.author |
Suchomel, Vít |
dc.contributor.author |
Rychlý, Pavel |
dc.date.accessioned |
2018-01-11T15:32:25Z |
dc.date.available |
2018-01-11T15:32:25Z |
dc.date.issued |
2016 |
dc.identifier.uri |
http://hdl.handle.net/11234/1-2591 |
dc.description |
Somali web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated. |
dc.language.iso |
som |
dc.publisher |
Masaryk University, NLP Centre |
dc.relation.isreferencedby |
https://www.sketchengine.co.uk/wp-content/uploads/2015/05/Corpus_Factory_2010.pdf |
dc.relation.isreferencedby |
http://habit-project.eu/wiki/SomaliCorpus |
dc.rights |
NLP Centre Web Corpus License |
dc.rights.uri |
https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC |
dc.source.uri |
http://habit-project.eu/wiki/HabitSystemFinal |
dc.subject |
text corpora |
dc.subject |
Ethiopian languages |
dc.subject |
web corpora |
dc.subject |
under-resourced languages |
dc.subject |
Somali |
dc.title |
Somali Web Corpus |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
dc.rights.label |
ACA |
has.files |
yes |
branding |
LINDAT / CLARIAH-CZ |
demo.uri |
https://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16 |
contact.person |
Marie Stará nlpassist@aurora.fi.muni.cz Masaryk University, NLP Centre |
sponsor |
Norway Grants 7F14047 Harvesting big text data for under-resourced languages (HaBiT) Other |
sponsor |
Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds |
size.info |
79741231 tokens |
size.info |
71871585 words |
size.info |
2643336 sentences |
files.size |
251037311 |
files.count |
1 |