This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

AKCES-GEC Grammatical Error Correction Dataset for Czech

Please use the following text to cite this item or export to a predefined format:
Šebesta, Karel; et al., 2019, AKCES-GEC Grammatical Error Correction Dataset for Czech, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-3057.
Date issued
2019-09-27
Size
47371 sentences,
11 files,
505275 words
Language(s)
Description
AKCES-GEC is a grammar error correction corpus for Czech generated from a subset of AKCES. It contains train, dev and test files annotated in M2 format. Note that in comparison to CZESL-GEC dataset, this dataset contains separated edits together with their type annotations in M2 format and also has two times more sentences. If you use this dataset, please use following citation: @article{naplava2019wnut, title={Grammatical Error Correction in Low-Resource Scenarios}, author={N{\'a}plava, Jakub and Straka, Milan}, journal={arXiv preprint arXiv:1910.00353}, year={2019} }
Acknowledgement

Version History

Showing 1 - 2 out of 2 results
VersionDateSummary
2*
2019-09-27 00:00:00
2017-04-30 00:00:00
* Selected version
 Files in this item
Name
AKCES-GEC.zip
Size
3.37 MB
Format
application/zip
Description
Zip
MD5
84eb88aa9e0ec2de7626c3336d2fe005
Preview
  File Preview
  • dev
    • dev.cn.m2204 kB
    • dev.all.m2641 kB
    • dev.r.m2114 kB
    • dev.cs.m2321 kB
    • README.md3 kB
    • license.txt20 kB
  • train
    • train.c.m24 MB
    • train.all.m27 MB
    • train.r.m22 MB
  • test
    • test.r.m2114 kB
    • test.cs.m2305 kB
    • test.cn.m2215 kB
    • test.all.m2635 kB