This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Large-Scale Colloquial Persian 0.5

Please use the following text to cite this item or export to a predefined format:
Abdi Khojasteh, Hadi; Ansari, Ebrahim and Bohlouli, Mahdi, 2020, Large-Scale Colloquial Persian 0.5, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-3195.
Date issued
2020-02-02
Size
120000000 sentences,
19.6 gb
Description
"Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in five different languages (EN, CS, DE, IT, HI).
Acknowledgement
 Files in this item
Name
lscp-0.5-fa-cs.7z
Size
256.74 MB
Format
application/octet-stream
Description
Persian-Czech - Bilingual Corpus
MD5
aa17ea609dd9f953632ab12baca868a7
Preview
  File Preview
Name
lscp-0.5-fa-de.7z
Size
277.34 MB
Format
application/octet-stream
Description
Persian-German - Bilingual Corpus
MD5
6d773a7b71420715d9d22d49d1e9b671
Preview
  File Preview
Name
lscp-0.5-fa-en.7z
Size
229.74 MB
Format
application/octet-stream
Description
Persian-English - Bilingual Corpus
MD5
ecdd2400df014b3f7cc6671567fdb93a
Preview
  File Preview
Name
lscp-0.5-fa-hi.7z
Size
302.86 MB
Format
application/octet-stream
Description
Persian-Hindi - Bilingual Corpus
MD5
2c522644a47f2b1c6eaff1edc8730ec4
Preview
  File Preview
Name
lscp-0.5-fa.7z
Size
378.04 MB
Format
application/octet-stream
Description
Persian - Monolingual Corpus
MD5
5eba07bcf2b644a41f2c52e00d1fd61c
Preview
  File Preview
Name
lscp-0.5-fa-it.7z
Size
269.64 MB
Format
application/octet-stream
Description
Persian-Italian - Bilingual Corpus
MD5
9d0b506edd6bc03d0b2f95c5142145e3
Preview
  File Preview
Name
lscp-0.5-fa-derivation-tree.7z
Size
502.63 MB
Format
application/octet-stream
Description
Persian - Derivation Tree
MD5
641ce73575dd03c361e9be61ba909a29
Preview
  File Preview
Name
lscp-0.5-fa-normalized.7z
Size
328.79 MB
Format
application/octet-stream
Description
Persian - Normalized Monolingual Corpus
MD5
77efc113f976acd561fa363b2ec676c8
Preview
  File Preview
Name
README.md
Size
6.04 KB
Format
application/octet-stream
Description
readme
MD5
7cdb63dc4bf1038fbe132fd3234b0efd
Preview
  File Preview