Files in this item
Download all files in item (468.16 MB)This item is
Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
Publicly Available
and licensed under:Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
- Name
- urdu-tagged-corpus.gz
- Size
- 253.82 MB
- Format
- application/x-gzip
- Description
- Urdu Monolingual Tagged Corpus
- MD5
- 63d61d9ebae592598c41a6746ec9938b
- Name
- urdu-plain-text-corpus.gz
- Size
- 213.46 MB
- Format
- application/x-gzip
- Description
- Urdu Monolingual Plain Text Corpus
- MD5
- 100b1db9efd403ee677683b3268084d9
- Name
- urmono-lrec-2014.pdf
- Size
- 152.86 KB
- Format
- Description
- Urdu data description
- MD5
- 528b61b0dd860aff9e3fe8d9b3c31b80
- Name
- cleaning-tools.tar.gz
- Size
- 748.74 KB
- Format
- application/x-gzip
- Description
- Cleaning tools
- MD5
- 469377de9bbb6f900a2322547d2566d8
- cleaning-tools
- del_sentences_with_missing_spaces.pl879 B
- detectLanguage.pl1 kB
- filter_arabic_sentences.pl619 B
- del_invalid_utf8.pl417 B
- README796 B
- remove_repeated_chars.pl1 kB
- tok-dan.pl1 kB
- remove_sindhi_sentences.pl857 B
- detect_en_sentence.pl440 B
- langfeatures.dat3 MB
- convert-urNum-to-enNum.pl754 B
- clean-corpus.sh4 kB