dc.contributor.author | Bojar, Ondřej |
dc.contributor.author | Straňák, Pavel |
dc.contributor.author | Zeman, Daniel |
dc.contributor.author | Jain, Gaurav |
dc.contributor.author | Damani, Om Prakesh |
dc.date.accessioned | 2011-11-07T16:18:29Z |
dc.date.available | 2011-11-07T16:18:29Z |
dc.date.issued | 2010-05-11 |
dc.identifier | UMC002 |
dc.identifier.uri | http://hdl.handle.net/11858/00-097C-0000-0001-BD17-1 |
dc.description | English-Hindi parallel corpus collected from several sources. Tokenized and sentence-aligned. A part of the data is our patch for the Emille parallel corpus. |
dc.description.sponsorship | FP7-ICT-2007-3-231720 (EuroMatrix Plus) 7E09003 (Czech part of EM+) |
dc.language.iso | hin |
dc.language.iso | eng |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.relation | info:eu-repo/grantAgreement/EC/FP7/231720 |
dc.relation.isreplacedby | http://hdl.handle.net/11858/00-097C-0000-0023-625F-0 |
dc.rights | Creative Commons - Attribution 3.0 Unported (CC BY 3.0) |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/ |
dc.subject | English-Hindi parallel corpus |
dc.subject | parallel corpus |
dc.title | English-Hindi Parallel Corpus |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Pavel Straňák stranak@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
sponsor | European Union FP7-ICT-2007-3-231720 EuroMatrix Plus euFunds info:eu-repo/grantAgreement/EC/FP7/231720 |
sponsor | Ministerstvo školství, mládeže a tělovýchovy České republiky 7E09003 EuroMatrixPlus – Bringing Machine Translation for European Languages to the User nationalFunds |
files.size | 12749739 |
files.count | 1 |
Soubory tohoto záznamu
Licenční kategorie:
Licence: Creative Commons - Attribution 3.0 Unported (CC BY 3.0)
Publicly Available
Licence: Creative Commons - Attribution 3.0 Unported (CC BY 3.0)
- Název
- English-Hindi-without-Emille.tgz
- Velikost
- 12.16 MB
- Formát
- application/x-gzip
- Popis
- The complete parallel data, including the patch for the Emille corpus
- MD5
- fbe1e19c0e80fd7792e900656ce4c1a9
- UMC002-English-Hindi
- wikipedia-named-entities-2008
- en.tok.gz5 kB
- hi.tok.gz6 kB
- agrocorpus
- README693 B
- en.tok.gz17 kB
- hi.tok.gz15 kB
- shabdanjali-dictionary
- en.tok.gz76 kB
- README1 kB
- hi.tok.gz213 kB
- hi.filtered.tok.gz6 kB
- en.filtered.tok.gz4 kB
- tides-cleaned-by-ufal
- hi.test.tok.gz3 MB
- hi.train.tok.gz3 MB
- en.test.tok.gz2 MB
- hi.dev.tok.gz66 kB
- en.dev.tok.gz47 kB
- en.train.tok.gz2 MB
- README680 B
- wikipedia-named-entities-2009
- en.tok.gz4 kB
- hi.tok.gz5 kB
- danielpipes
- README51 B
- en.tok.gz354 kB
- hi.tok.gz319 kB
- acl-2005-shared-task
- en.tok.gz94 kB
- hi.tok.gz148 kB
- wikipedia-named-entities-2008