dc.contributor.author | Ramasamy, Loganathan |
dc.contributor.author | Bojar, Ondřej |
dc.contributor.author | Žabokrtský, Zdeněk |
dc.date.accessioned | 2014-10-31T23:07:27Z |
dc.date.available | 2014-10-31T23:07:27Z |
dc.date.issued | 2014-10-31 |
dc.identifier.uri | http://hdl.handle.net/11234/1-1454 |
dc.description | EnTam is a sentence aligned English-Tamil bilingual corpus from some of the publicly available websites that we have collected for NLP research involving Tamil. The standard set of processing has been applied on the the raw web data before the data became available in sentence aligned English-Tamil parallel corpus suitable for various NLP tasks. The parallel corpus includes texts from bible, cinema and news domains. |
dc.language.iso | eng |
dc.language.iso | tam |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.rights | Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/ |
dc.source.uri | http://ufal.mff.cuni.cz/~ramasamy/parallel/html/ |
dc.subject | parallel corpus |
dc.title | EnTam: An English-Tamil Parallel Corpus (EnTam v2.0) |
dc.type | corpus |
metashare.ResourceInfo#ContactInfo#PersonInfo.surname | Ramasamy |
metashare.ResourceInfo#ContactInfo#PersonInfo.givenName | Loganathan |
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationName | Charles University in Prague, UFAL |
metashare.ResourceInfo#ContentInfo.mediaType | text |
metashare.ResourceInfo#TextInfo#SizeInfo.size | 169871 |
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnit | sentences |
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.email | ramasamy@ufal.mff.cuni.cz |
dc.rights.label | PUB |
hidden | false |
hasMetadata | false |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
size.info | 169871 sentences |
files.size | 24856696 |
files.count | 1 |
Files in this item
This item is
Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
Publicly Available
and licensed under:Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
- Name
- en-ta-parallel-v2.tar.gz
- Size
- 23.71 MB
- Format
- application/x-gzip
- Description
- EnTam: An English-Tamil Parallel Corpus (EnTam v2.0)
- MD5
- 48c5aaf2f603ddb05b77ddd4468eab8c
- en-ta-parallel-v2
- corpus.bcn.train.ta70 MB
- corpus.bcn.train.en22 MB
- corpus.bcn.dev.ta427 kB
- corpus.bcn.dev.en137 kB
- corpus.bcn.test.ta863 kB
- corpus.bcn.test.en274 kB