This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

EnTam: An English-Tamil Parallel Corpus (EnTam v2.0)

Please use the following text to cite this item or export to a predefined format:
Ramasamy, Loganathan; Bojar, Ondřej and Žabokrtský, Zdeněk, 2014, EnTam: An English-Tamil Parallel Corpus (EnTam v2.0), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-1454.
Date issued
2014-10-31
Size
169871 sentences
Language(s)
Description
EnTam is a sentence aligned English-Tamil bilingual corpus from some of the publicly available websites that we have collected for NLP research involving Tamil. The standard set of processing has been applied on the the raw web data before the data became available in sentence aligned English-Tamil parallel corpus suitable for various NLP tasks. The parallel corpus includes texts from bible, cinema and news domains.
This item isPublicly Available
and licensed under:
 Files in this item
Name
en-ta-parallel-v2.tar.gz
Size
23.71 MB
Format
application/x-gzip
Description
EnTam: An English-Tamil Parallel Corpus (EnTam v2.0)
MD5
48c5aaf2f603ddb05b77ddd4468eab8c
Preview
  File Preview
  • en-ta-parallel-v2
    • corpus.bcn.train.ta70 MB
    • corpus.bcn.train.en22 MB
    • corpus.bcn.dev.ta427 kB
    • corpus.bcn.dev.en137 kB
    • corpus.bcn.test.ta863 kB
    • corpus.bcn.test.en274 kB