This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)

Please use the following text to cite this item or export to a predefined format:
Šebesta, Karel; et al., 2017, CzeSL Grammatical Error Correction Dataset (CzeSL-GEC), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-2143.
Date issued
2017-04-30
Size
108067 sentences,
48 files
Language(s)
Description
CzeSL-GEC is a corpus containing sentence pairs of original and corrected versions of Czech sentences collected from essays written by both non-native learners of Czech and Czech pupils with Romani background. To create this corpus, unreleased CzeSL-man corpus (http://utkl.ff.cuni.cz/learncorp/) was utilized. All sentences in the corpus are word tokenized.
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
2017-czesl-gec.zip
Size
5.08 MB
Format
application/zip
Description
corpus data and metadata, zipped
MD5
49dba121e7bf8deb180e673693410cc9
Preview
  File Preview
  • word2simword
    • a1_targets_train.txt524 kB
    • a1_targets_test.txt35 kB
    • a2_targets_train.txt326 kB
    • a1_targets_dev.txt33 kB
    • a2_inputs_train.txt324 kB
    • a1_inputs_test.txt34 kB
    • a2_targets_dev.txt33 kB
    • a1_inputs_dev.txt32 kB
    • a2_inputs_test.txt34 kB
    • a2_inputs_dev.txt32 kB
    • a1_inputs_train.txt521 kB
    • a2_targets_test.txt35 kB
  • word2words
    • a1_targets_train.txt1 MB
    • a1_targets_test.txt73 kB
    • a2_targets_train.txt667 kB
    • a1_targets_dev.txt70 kB
    • a2_inputs_train.txt647 kB
    • a1_inputs_test.txt71 kB
    • a2_targets_dev.txt70 kB
    • a1_inputs_dev.txt69 kB
    • a2_inputs_test.txt71 kB
    • a2_inputs_dev.txt69 kB
    • a1_inputs_train.txt1 MB
    • a2_targets_test.txt73 kB
  • word2word
    • a1_targets_train.txt598 kB
    • a1_targets_test.txt37 kB
    • a2_targets_train.txt368 kB
    • a1_targets_dev.txt38 kB
    • a2_inputs_train.txt366 kB
    • a1_inputs_test.txt37 kB
    • a2_targets_dev.txt38 kB
    • a1_inputs_dev.txt38 kB
    • a2_inputs_test.txt37 kB
    • a2_inputs_dev.txt38 kB
    • a1_inputs_train.txt593 kB
    • a2_targets_test.txt37 kB
  • sent2sent
    • a1_targets_train.txt1 MB
    • a1_targets_test.txt79 kB
    • a2_targets_train.txt653 kB
    • a1_targets_dev.txt71 kB
    • a2_inputs_train.txt638 kB
    • a1_inputs_test.txt78 kB
    • a2_targets_dev.txt71 kB
    • a1_inputs_dev.txt70 kB
    • a2_inputs_test.txt78 kB
    • a2_inputs_dev.txt70 kB
    • a1_inputs_train.txt1 MB
    • a2_targets_test.txt79 kB
    • README.md2 kB
    • LICENSE.txt21 kB