Urdu Monolingual Corpus
Please use the following text to cite this item or export to a predefined format:
Jawaid, Bushra; Kamran, Amir and Bojar, Ondřej, 2014,
Urdu Monolingual Corpus, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11858/00-097C-0000-0023-65A9-5.
Authors
Item identifier
Date issued
2014-03-22
Size
5464575 sentences
Language(s)
Description
We release a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We extend the work of Jawaid and Bojar (2012) who use three different taggers and then apply a voting scheme to disambiguate among the different choices suggested by each tagger. We run this complex ensemble on a large monolingual corpus and release the both plain and tagged corpora.
Acknowledgement
European Union
Project code:FP7-ICT-2011-7-288487
Project name:MosesCore
Subject(s)
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- urdu-tagged-corpus.gz
- Size
- 253.82 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 63d61d9ebae592598c41a6746ec9938b

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- urdu-plain-text-corpus.gz
- Size
- 213.46 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 100b1db9efd403ee677683b3268084d9

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- urmono-lrec-2014.pdf
- Size
- 152.86 KB
- Format
- application/pdf
- Description
- Adobe PDF
- MD5
- 528b61b0dd860aff9e3fe8d9b3c31b80

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- cleaning-tools.tar.gz
- Size
- 748.74 KB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 469377de9bbb6f900a2322547d2566d8

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

