Show simple item record

 
dc.contributor.author Ramasamy, Loganathan
dc.contributor.author Bojar, Ondřej
dc.contributor.author Žabokrtský, Zdeněk
dc.date.accessioned 2014-10-31T23:07:27Z
dc.date.available 2014-10-31T23:07:27Z
dc.date.issued 2014-10-31
dc.identifier.uri http://hdl.handle.net/11234/1-1454
dc.description EnTam is a sentence aligned English-Tamil bilingual corpus from some of the publicly available websites that we have collected for NLP research involving Tamil. The standard set of processing has been applied on the the raw web data before the data became available in sentence aligned English-Tamil parallel corpus suitable for various NLP tasks. The parallel corpus includes texts from bible, cinema and news domains.
dc.language.iso eng
dc.language.iso tam
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/
dc.source.uri http://ufal.mff.cuni.cz/~ramasamy/parallel/html/
dc.subject parallel corpus
dc.title EnTam: An English-Tamil Parallel Corpus (EnTam v2.0)
dc.type corpus
metashare.ResourceInfo#ContactInfo#PersonInfo.surname Ramasamy
metashare.ResourceInfo#ContactInfo#PersonInfo.givenName Loganathan
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationName Charles University in Prague, UFAL
metashare.ResourceInfo#ContentInfo.mediaType text
metashare.ResourceInfo#TextInfo#SizeInfo.size 169871
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnit sentences
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.email ramasamy@ufal.mff.cuni.cz
dc.rights.label PUB
hidden false
hasMetadata false
has.files yes
branding LINDAT / CLARIAH-CZ
size.info 169871 sentences
files.size 24856696
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
Distributed under Creative Commons Attribution Required Noncommercial Share Alike
Icon
Name
en-ta-parallel-v2.tar.gz
Size
23.71 MB
Format
application/x-gzip
Description
EnTam: An English-Tamil Parallel Corpus (EnTam v2.0)
MD5
48c5aaf2f603ddb05b77ddd4468eab8c
 Download file  Preview
 File Preview  
  • en-ta-parallel-v2
    • corpus.bcn.train.ta70 MB
    • corpus.bcn.train.en22 MB
    • corpus.bcn.dev.ta427 kB
    • corpus.bcn.dev.en137 kB
    • corpus.bcn.test.ta863 kB
    • corpus.bcn.test.en274 kB

Show simple item record