dc.contributor.author | Ramasamy, Loganathan |
dc.contributor.author | Bojar, Ondřej |
dc.contributor.author | Žabokrtský, Zdeněk |
dc.date.accessioned | 2014-10-31T23:07:27Z |
dc.date.available | 2014-10-31T23:07:27Z |
dc.date.issued | 2014-10-31 |
dc.identifier.uri | http://hdl.handle.net/11234/1-1454 |
dc.description | EnTam is a sentence aligned English-Tamil bilingual corpus from some of the publicly available websites that we have collected for NLP research involving Tamil. The standard set of processing has been applied on the the raw web data before the data became available in sentence aligned English-Tamil parallel corpus suitable for various NLP tasks. The parallel corpus includes texts from bible, cinema and news domains. |
dc.language.iso | eng |
dc.language.iso | tam |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.rights | Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/ |
dc.source.uri | http://ufal.mff.cuni.cz/~ramasamy/parallel/html/ |
dc.subject | parallel corpus |
dc.title | EnTam: An English-Tamil Parallel Corpus (EnTam v2.0) |
dc.type | corpus |
metashare.ResourceInfo#ContactInfo#PersonInfo.surname | Ramasamy |
metashare.ResourceInfo#ContactInfo#PersonInfo.givenName | Loganathan |
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationName | Charles University in Prague, UFAL |
metashare.ResourceInfo#ContentInfo.mediaType | text |
metashare.ResourceInfo#TextInfo#SizeInfo.size | 169871 |
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnit | sentences |
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.email | ramasamy@ufal.mff.cuni.cz |
dc.rights.label | PUB |
hidden | false |
hasMetadata | false |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
size.info | 169871 sentences |
files.size | 24856696 |
files.count | 1 |
Soubory tohoto záznamu
Licenční kategorie:
Licence: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
Publicly Available
Licence: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
- Název
- en-ta-parallel-v2.tar.gz
- Velikost
- 23.71 MB
- Formát
- application/x-gzip
- Popis
- EnTam: An English-Tamil Parallel Corpus (EnTam v2.0)
- MD5
- 48c5aaf2f603ddb05b77ddd4468eab8c
- en-ta-parallel-v2
- corpus.bcn.train.ta70 MB
- corpus.bcn.train.en22 MB
- corpus.bcn.dev.ta427 kB
- corpus.bcn.dev.en137 kB
- corpus.bcn.test.ta863 kB
- corpus.bcn.test.en274 kB