Czech and English abstracts of ÚFAL papers (2022-11-11)
Please use the following text to cite this item or export to a predefined format:
Rosa, Rudolf and Zouhar, Vilém, 2022,
Czech and English abstracts of ÚFAL papers (2022-11-11), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-4922.
Authors
Item identifier
Date issued
2022-11-11
Size
2659 entries,
11000 sentences,
255000 words
Description
This is a parallel corpus of Czech and mostly English abstracts of scientific papers and presentations published by authors from the Institute of Formal and Applied Linguistics, Charles University in Prague. For each publication record, the authors are obliged to provide both the original abstract (in Czech or English), and its translation (English or Czech) in the internal Biblio system. The data was filtered for duplicates and missing entries, ensuring that every record is bilingual. Additionally, records of published papers which are indexed by SemanticScholar contain the respective link. The dataset was created from September 2022 image of the Biblio database and is stored in JSONL format, with each line corresponding to one record.
Acknowledgement
Grantová agentura Univerzity Karlovy v Praze
Project code:GAUK 15723/2014
Project name:Modelování závislostní syntaxe napříč jazyky
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2018101
Project name:LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy
Subject(s)
Collections
Version History
This item isPublicly Available
and licensed under:
Files in this item
- Name
- corpus.jsonl
- Size
- 3.64 MB
- Format
- application/octet-stream
- Description
- The corpus
- MD5
- 666b8f01db3671c4db8a298ff3b8eee7

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

