dc.contributor.author | Rosa, Rudolf |
dc.date.accessioned | 2016-06-13T11:53:27Z |
dc.date.available | 2016-06-13T11:53:27Z |
dc.date.issued | 2016-06-12 |
dc.identifier.uri | http://hdl.handle.net/11234/1-1731 |
dc.description | This is a document-aligned parallel corpus of English and Czech abstracts of scientific papers published by authors from the Institute of Formal and Applied Linguistics, Charles University in Prague, as reported in the institute's system Biblio. For each publication, the authors are obliged to provide both the original abstract in Czech or English, and its translation into English or Czech, respectively. No filtering was performed, except for removing entries missing the Czech or English abstract, and replacing newline and tabulator characters by spaces. |
dc.language.iso | ces |
dc.language.iso | eng |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.relation.isreplacedby | http://hdl.handle.net/11234/1-4922 |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ |
dc.source.uri | http://ufal.mff.cuni.cz/biblio/ |
dc.subject | parallel corpus |
dc.subject | scientific texts |
dc.subject | abstracts |
dc.title | Czech and English abstracts of ÚFAL papers |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Rudolf Rosa rosa@ufal.mff.cuni.cz Charles University in Prague, UFAL |
sponsor | Grantová agentura Univerzity Karlovy v Praze GAUK 15723/2014 Modelování závislostní syntaxe napříč jazyky nationalFunds |
size.info | 1556 entries |
size.info | 12000 sentences |
size.info | 200000 words |
files.size | 1454989 |
files.count | 2 |
Files in this item
Download all files in item (1.39 MB)This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- publications.tsv
- Size
- 1.39 MB
- Format
- Unknown
- Description
- English and Czech abstracts -- one pair of abstracts per line, separated by tabulator
- MD5
- 7b46974782ea692d80f2b7b4a78306fe
- Name
- xml2tsv.pl
- Size
- 1.08 KB
- Format
- Unknown
- Description
- Perl script for generating the TSV file from the Biblio XML dump
- MD5
- 4dade72664c05943b583ba7df279b9f6