Show simple item record Šebesta, Karel Bedřichová, Zuzanna Šormová, Kateřina Štindlová, Barbora Hrdlička, Milan Hrdličková, Tereza Hana, Jiří Petkevič, Vladimír Jelínek, Tomáš Škodová, Svatava Janeš, Petr Lundáková, Kateřina Skoumalová, Hana Sládek, Šimon Pierscieniak, Piotr Toufarová, Dagmar Straka, Milan Rosen, Alexandr Náplava, Jakub Poláčková, Marie 2017-05-03T08:08:33Z 2017-05-03T08:08:33Z 2017-04-30
dc.description CzeSL-GEC is a corpus containing sentence pairs of original and corrected versions of Czech sentences collected from essays written by both non-native learners of Czech and Czech pupils with Romani background. To create this corpus, unreleased CzeSL-man corpus ( was utilized. All sentences in the corpus are word tokenized.
dc.language.iso ces
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
dc.subject natural language correction
dc.subject grammatical error correction
dc.title CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
contact.person Milan Straka Charles University, UFAL
contact.person Jakub Náplava Charles University, UFAL
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds
sponsor Grantová agentura České republiky GAČR 16-10185S Čeština nerodilých mluvčích z pohledu teoretického a komputačního / Non-native Czech from the Theoretical and Computational Perspective nationalFunds 108067 sentences 48 files
files.size 5326473
files.count 1

 Files in this item

This item is
Publicly Available
and licensed under:
Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
Distributed under Creative Commons Attribution Required Share Alike
5.08 MB
corpus data and metadata, zipped
 Download file  Preview
 File Preview  
  • word2simword
    • a1_targets_train.txt524 kB
    • a1_targets_test.txt35 kB
    • a2_targets_train.txt326 kB
    • a1_targets_dev.txt33 kB
    • a2_inputs_train.txt324 kB
    • a1_inputs_test.txt34 kB
    • a2_targets_dev.txt33 kB
    • a1_inputs_dev.txt32 kB
    • a2_inputs_test.txt34 kB
    • a2_inputs_dev.txt32 kB
    • a1_inputs_train.txt521 kB
    • a2_targets_test.txt35 kB
  • word2words
    • a1_targets_train.txt1 MB
    • a1_targets_test.txt73 kB
    • a2_targets_train.txt667 kB
    • a1_targets_dev.txt70 kB
    • a2_inputs_train.txt647 kB
    • a1_inputs_test.txt71 kB
    • a2_targets_dev.txt70 kB
    • a1_inputs_dev.txt69 kB
    • a2_inputs_test.txt71 kB
    • a2_inputs_dev.txt69 kB
    • a1_inputs_train.txt1 MB
    • a2_targets_test.txt73 kB
  • word2word
    • a1_targets_train.txt598 kB
    • a1_targets_test.txt37 kB
    • a2_targets_train.txt368 kB
    • a1_targets_dev.txt38 kB
    • a2_inputs_train.txt366 kB
    • a1_inputs_test.txt37 kB
    • a2_targets_dev.txt38 kB
    • a1_inputs_dev.txt38 kB
    • a2_inputs_test.txt37 kB
    • a2_inputs_dev.txt38 kB
    • a1_inputs_train.txt593 kB
    • a2_targets_test.txt37 kB
  • sent2sent
    • a1_targets_train.txt1 MB
    • a1_targets_test.txt79 kB
    • a2_targets_train.txt653 kB
    • a1_targets_dev.txt71 kB
    • a2_inputs_train.txt638 kB
    • a1_inputs_test.txt78 kB
    • a2_targets_dev.txt71 kB
    • a1_inputs_dev.txt70 kB
    • a2_inputs_test.txt78 kB
    • a2_inputs_dev.txt70 kB
    • a1_inputs_train.txt1 MB
    • a2_targets_test.txt79 kB
    • README.md2 kB
    • LICENSE.txt21 kB

Show simple item record