Large-Scale Colloquial Persian 0.5
Please use the following text to cite this item or export to a predefined format:
Abdi Khojasteh, Hadi; Ansari, Ebrahim and Bohlouli, Mahdi, 2020,
Large-Scale Colloquial Persian 0.5, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-3195.
Authors
Item identifier
Project URL
Referenced by
Date issued
2020-02-02
Size
120000000 sentences,
19.6 gb
Description
"Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in five different languages (EN, CS, DE, IT, HI).
Acknowledgement
Czech Science Foundation
Project code:19-26934X
Project name:Neural Representations in Multi-modal and Multi-lingual Modelling
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- lscp-0.5-fa-cs.7z
- Size
- 256.74 MB
- Format
- application/octet-stream
- Description
- Persian-Czech - Bilingual Corpus
- MD5
- aa17ea609dd9f953632ab12baca868a7

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- lscp-0.5-fa-de.7z
- Size
- 277.34 MB
- Format
- application/octet-stream
- Description
- Persian-German - Bilingual Corpus
- MD5
- 6d773a7b71420715d9d22d49d1e9b671

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- lscp-0.5-fa-en.7z
- Size
- 229.74 MB
- Format
- application/octet-stream
- Description
- Persian-English - Bilingual Corpus
- MD5
- ecdd2400df014b3f7cc6671567fdb93a

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- lscp-0.5-fa-hi.7z
- Size
- 302.86 MB
- Format
- application/octet-stream
- Description
- Persian-Hindi - Bilingual Corpus
- MD5
- 2c522644a47f2b1c6eaff1edc8730ec4

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- lscp-0.5-fa.7z
- Size
- 378.04 MB
- Format
- application/octet-stream
- Description
- Persian - Monolingual Corpus
- MD5
- 5eba07bcf2b644a41f2c52e00d1fd61c

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- lscp-0.5-fa-it.7z
- Size
- 269.64 MB
- Format
- application/octet-stream
- Description
- Persian-Italian - Bilingual Corpus
- MD5
- 9d0b506edd6bc03d0b2f95c5142145e3

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- lscp-0.5-fa-derivation-tree.7z
- Size
- 502.63 MB
- Format
- application/octet-stream
- Description
- Persian - Derivation Tree
- MD5
- 641ce73575dd03c361e9be61ba909a29

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- lscp-0.5-fa-normalized.7z
- Size
- 328.79 MB
- Format
- application/octet-stream
- Description
- Persian - Normalized Monolingual Corpus
- MD5
- 77efc113f976acd561fa363b2ec676c8

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- README.md
- Size
- 6.04 KB
- Format
- application/octet-stream
- Description
- readme
- MD5
- 7cdb63dc4bf1038fbe132fd3234b0efd

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

