Show simple item record

 
dc.contributor.author Rosa, Rudolf
dc.date.accessioned 2016-06-13T11:53:27Z
dc.date.available 2016-06-13T11:53:27Z
dc.date.issued 2016-06-12
dc.identifier.uri http://hdl.handle.net/11234/1-1731
dc.description This is a document-aligned parallel corpus of English and Czech abstracts of scientific papers published by authors from the Institute of Formal and Applied Linguistics, Charles University in Prague, as reported in the institute's system Biblio. For each publication, the authors are obliged to provide both the original abstract in Czech or English, and its translation into English or Czech, respectively. No filtering was performed, except for removing entries missing the Czech or English abstract, and replacing newline and tabulator characters by spaces.
dc.language.iso ces
dc.language.iso eng
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.source.uri http://ufal.mff.cuni.cz/biblio/
dc.subject parallel corpus
dc.subject scientific texts
dc.subject abstracts
dc.title Czech and English abstracts of ÚFAL papers
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Rudolf Rosa rosa@ufal.mff.cuni.cz Charles University in Prague, UFAL
sponsor Grantová agentura Univerzity Karlovy v Praze GAUK 15723/2014 Modelování závislostní syntaxe napříč jazyky nationalFunds
size.info 1556 entries
size.info 12000 sentences
size.info 200000 words
files.size 1454989
files.count 2


 Files in this item

 Download all files in item (1.39 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
publications.tsv
Size
1.39 MB
Format
Unknown
Description
English and Czech abstracts -- one pair of abstracts per line, separated by tabulator
MD5
7b46974782ea692d80f2b7b4a78306fe
 Download file
Icon
Name
xml2tsv.pl
Size
1.08 KB
Format
Unknown
Description
Perl script for generating the TSV file from the Biblio XML dump
MD5
4dade72664c05943b583ba7df279b9f6
 Download file

Show simple item record