Zobrazit minimální záznam

 
dc.contributor.author Abdi Khojasteh, Hadi
dc.contributor.author Ansari, Ebrahim
dc.contributor.author Bohlouli, Mahdi
dc.date.accessioned 2020-03-18T10:44:43Z
dc.date.available 2020-03-18T10:44:43Z
dc.date.issued 2020-02-02
dc.identifier.uri http://hdl.handle.net/11234/1-3195
dc.description "Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in five different languages (EN, CS, DE, IT, HI).
dc.language.iso fas
dc.language.iso eng
dc.language.iso deu
dc.language.iso ces
dc.language.iso ita
dc.language.iso hin
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.publisher Institute for Advanced Studies in Basic Sciences (IASBS)
dc.relation.isreferencedby https://arxiv.org/abs/2003.06499
dc.rights Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.source.uri https://iasbs.ac.ir/~ansari/lscp/
dc.subject PoS tagging
dc.subject corpus
dc.subject annotated corpus
dc.subject multilingual
dc.subject derivation
dc.subject dependency parser
dc.subject machine translation
dc.subject informal language
dc.subject spoken language
dc.subject monolingual corpus
dc.subject bilingual corpus annotation
dc.title Large-Scale Colloquial Persian 0.5
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Ebrahim Ansari ansari@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
contact.person Ebrahim Ansari ansari@iasbs.ac.ir Institute for Advanced Studies in Basic Sciences (IASBS)
contact.person Hadi Abdi Khojasteh hadiabdikhojasteh@gmail.com Institute for Advanced Studies in Basic Sciences (IASBS)
sponsor Czech Science Foundation 19-26934X Neural Representations in Multi-modal and Multi-lingual Modelling nationalFunds
size.info 120000000 sentences
size.info 19.6 gb
files.size 2669457416
files.count 9


 Soubory tohoto záznamu

Icon
Název
README.md
Velikost
6.04 KB
Formát
Neznámý
Popis
readme
MD5
7cdb63dc4bf1038fbe132fd3234b0efd
 Stáhnout soubor
Icon
Název
lscp-0.5-fa.7z
Velikost
378.04 MB
Formát
Neznámý
Popis
Persian - Monolingual Corpus
MD5
5eba07bcf2b644a41f2c52e00d1fd61c
 Stáhnout soubor
Icon
Název
lscp-0.5-fa-normalized.7z
Velikost
328.79 MB
Formát
Neznámý
Popis
Persian - Normalized Monolingual Corpus
MD5
77efc113f976acd561fa363b2ec676c8
 Stáhnout soubor
Icon
Název
lscp-0.5-fa-derivation-tree.7z
Velikost
502.63 MB
Formát
Neznámý
Popis
Persian - Derivation Tree
MD5
641ce73575dd03c361e9be61ba909a29
 Stáhnout soubor
Icon
Název
lscp-0.5-fa-cs.7z
Velikost
256.74 MB
Formát
Neznámý
Popis
Persian-Czech - Bilingual Corpus
MD5
aa17ea609dd9f953632ab12baca868a7
 Stáhnout soubor
Icon
Název
lscp-0.5-fa-en.7z
Velikost
229.74 MB
Formát
Neznámý
Popis
Persian-English - Bilingual Corpus
MD5
ecdd2400df014b3f7cc6671567fdb93a
 Stáhnout soubor
Icon
Název
lscp-0.5-fa-de.7z
Velikost
277.34 MB
Formát
Neznámý
Popis
Persian-German - Bilingual Corpus
MD5
6d773a7b71420715d9d22d49d1e9b671
 Stáhnout soubor
Icon
Název
lscp-0.5-fa-it.7z
Velikost
269.64 MB
Formát
Neznámý
Popis
Persian-Italian - Bilingual Corpus
MD5
9d0b506edd6bc03d0b2f95c5142145e3
 Stáhnout soubor
Icon
Název
lscp-0.5-fa-hi.7z
Velikost
302.86 MB
Formát
Neznámý
Popis
Persian-Hindi - Bilingual Corpus
MD5
2c522644a47f2b1c6eaff1edc8730ec4
 Stáhnout soubor

Zobrazit minimální záznam