Zobrazit minimální záznam

 
dc.contributor.author Rosa, Rudolf
dc.date.accessioned 2016-06-13T11:53:27Z
dc.date.available 2016-06-13T11:53:27Z
dc.date.issued 2016-06-12
dc.identifier.uri http://hdl.handle.net/11234/1-1731
dc.description This is a document-aligned parallel corpus of English and Czech abstracts of scientific papers published by authors from the Institute of Formal and Applied Linguistics, Charles University in Prague, as reported in the institute's system Biblio. For each publication, the authors are obliged to provide both the original abstract in Czech or English, and its translation into English or Czech, respectively. No filtering was performed, except for removing entries missing the Czech or English abstract, and replacing newline and tabulator characters by spaces.
dc.language.iso ces
dc.language.iso eng
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation.isreplacedby http://hdl.handle.net/11234/1-4922
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.source.uri http://ufal.mff.cuni.cz/biblio/
dc.subject parallel corpus
dc.subject scientific texts
dc.subject abstracts
dc.title Czech and English abstracts of ÚFAL papers
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Rudolf Rosa rosa@ufal.mff.cuni.cz Charles University in Prague, UFAL
sponsor Grantová agentura Univerzity Karlovy v Praze GAUK 15723/2014 Modelování závislostní syntaxe napříč jazyky nationalFunds
size.info 1556 entries
size.info 12000 sentences
size.info 200000 words
files.size 1454989
files.count 2


 Soubory tohoto záznamu

 Stáhnout všechny soubory záznamu (1.39 MB)
Licenční kategorie:
Publicly Available

Licence: Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Název
publications.tsv
Velikost
1.39 MB
Formát
Neznámý
Popis
English and Czech abstracts -- one pair of abstracts per line, separated by tabulator
MD5
7b46974782ea692d80f2b7b4a78306fe
 Stáhnout soubor
Icon
Název
xml2tsv.pl
Velikost
1.08 KB
Formát
Neznámý
Popis
Perl script for generating the TSV file from the Biblio XML dump
MD5
4dade72664c05943b583ba7df279b9f6
 Stáhnout soubor

Zobrazit minimální záznam