Files in this item

 Download all files in item (468.16 MB)
This item is
Publicly Available
and licensed under:
Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
Distributed under Creative Commons Attribution Required Noncommercial Share Alike
Icon
Name
urdu-tagged-corpus.gz
Size
253.82 MB
Format
application/x-gzip
Description
Urdu Monolingual Tagged Corpus
MD5
63d61d9ebae592598c41a6746ec9938b
 Download file
Icon
Name
urdu-plain-text-corpus.gz
Size
213.46 MB
Format
application/x-gzip
Description
Urdu Monolingual Plain Text Corpus
MD5
100b1db9efd403ee677683b3268084d9
 Download file
Icon
Name
urmono-lrec-2014.pdf
Size
152.86 KB
Format
PDF
Description
Urdu data description
MD5
528b61b0dd860aff9e3fe8d9b3c31b80
 Download file
Icon
Name
cleaning-tools.tar.gz
Size
748.74 KB
Format
application/x-gzip
Description
Cleaning tools
MD5
469377de9bbb6f900a2322547d2566d8
 Download file  Preview
 File Preview  
  • cleaning-tools
    • del_sentences_with_missing_spaces.pl879 B
    • detectLanguage.pl1 kB
    • filter_arabic_sentences.pl619 B
    • del_invalid_utf8.pl417 B
    • README796 B
    • remove_repeated_chars.pl1 kB
    • tok-dan.pl1 kB
    • remove_sindhi_sentences.pl857 B
    • detect_en_sentence.pl440 B
    • langfeatures.dat3 MB
    • convert-urNum-to-enNum.pl754 B
    • clean-corpus.sh4 kB