AKCES-GEC Grammatical Error Correction Dataset for Czech
Please use the following text to cite this item or export to a predefined format:
Šebesta, Karel; et al., 2019,
AKCES-GEC Grammatical Error Correction Dataset for Czech, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-3057.
Authors
Šebesta, Karel ; et al.
Item identifier
Referenced by
Date issued
2019-09-27
Size
47371 sentences,
11 files,
505275 words
Language(s)
Description
AKCES-GEC is a grammar error correction corpus for Czech generated from a subset of AKCES. It contains train, dev and test files annotated in M2 format.
Note that in comparison to CZESL-GEC dataset, this dataset contains separated edits together with their type annotations in M2 format and also has two times more sentences.
If you use this dataset, please use following citation:
@article{naplava2019wnut,
title={Grammatical Error Correction in Low-Resource Scenarios},
author={N{\'a}plava, Jakub and Straka, Milan},
journal={arXiv preprint arXiv:1910.00353},
year={2019}
}
Acknowledgement
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2015071
Project name:LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat
Grantová agentura České republiky
Project code:GAČR 16-10185S
Project name:Čeština nerodilých mluvčích z pohledu teoretického a komputačního / Non-native Czech from the Theoretical and Computational Perspective
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2015071
Project name:LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- AKCES-GEC.zip
- Size
- 3.37 MB
- Format
- application/zip
- Description
- Zip
- MD5
- 84eb88aa9e0ec2de7626c3336d2fe005


