dc.description IDENTIC is an Indonesian-English parallel corpus for research purposes. The corpus is a bilingual corpus paired with English. The aim of this work is to build and provide researchers a proper Indonesian-English textual data set and also to promote research in this language pair. The corpus contains texts coming from different sources with different genres.
dc.description.sponsorship The research leading to these results has received funding from the European Commission’s 7th Framework Program under grant agreement no 238405 (CLARA) and by the grant LC536 Centrum Komputacni Lingvistiky of the Czech Ministry of Education.
dc.language.iso ind
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation info:eu-repo/grantAgreement/EC/FP7/238405
dc.rights Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
dc.subject Indonesian-English parallel corpus
dc.subject parallel corpus
dc.title IDENTICv1.0
dc.type corpus
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIN
sponsor European Union FP7-238405 CLARA (Common Language Resources and their Applications) euFunds info:eu-repo/grantAgreement/EC/FP7/238405
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LC536 Centrum komputační lingvistiky nationalFunds
files.size 16615187
files.count 1

  • IDENTICv1.0
    • en.npp.conll23 MB
    • identic.noclitic.npp.txt7 MB
    • id.npp.conll34 MB
    • identic.tokenized.npp.txt7 MB
    • identic.raw.npp.txt7 MB

