Show simple item record

 
dc.contributor.author Suchomel, Vít
dc.contributor.author Rychlý, Pavel
dc.date.accessioned 2018-01-11T15:32:25Z
dc.date.available 2018-01-11T15:32:25Z
dc.date.issued 2016
dc.identifier.uri http://hdl.handle.net/11234/1-2591
dc.description Somali web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.
dc.language.iso som
dc.publisher Masaryk University, NLP Centre
dc.relation.isreferencedby https://www.sketchengine.co.uk/wp-content/uploads/2015/05/Corpus_Factory_2010.pdf
dc.relation.isreferencedby http://habit-project.eu/wiki/SomaliCorpus
dc.rights NLP Centre Web Corpus License
dc.rights.uri https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC
dc.source.uri http://habit-project.eu/wiki/HabitSystemFinal
dc.subject text corpora
dc.subject Ethiopian languages
dc.subject web corpora
dc.subject under-resourced languages
dc.subject Somali
dc.title Somali Web Corpus
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label ACA
has.files yes
branding LINDAT / CLARIAH-CZ
demo.uri https://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16
contact.person Marie Stará nlpassist@aurora.fi.muni.cz Masaryk University, NLP Centre
sponsor Norway Grants 7F14047 Harvesting big text data for under-resourced languages (HaBiT) Other
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds
size.info 79741231 tokens
size.info 71871585 words
size.info 2643336 sentences
files.size 251037311
files.count 1


 Files in this item

This item is
Academic Use
and licensed under:
NLP Centre Web Corpus License
Icon
Name
so16.tag.vert.gz
Size
239.41 MB
Format
application/x-gzip
Description
Somali web corpus
MD5
fcb2c39080461a1ae2d5b57cf1905928
 Download file

Show simple item record