Show simple item record

 
dc.contributor.author Šebesta, Karel
dc.contributor.author Bedřichová, Zuzanna
dc.contributor.author Šormová, Kateřina
dc.contributor.author Štindlová, Barbora
dc.contributor.author Hrdlička, Milan
dc.contributor.author Hrdličková, Tereza
dc.contributor.author Hana, Jiří
dc.contributor.author Petkevič, Vladimír
dc.contributor.author Jelínek, Tomáš
dc.contributor.author Škodová, Svatava
dc.contributor.author Janeš, Petr
dc.contributor.author Lundáková, Kateřina
dc.contributor.author Skoumalová, Hana
dc.contributor.author Sládek, Šimon
dc.contributor.author Pierscieniak, Piotr
dc.contributor.author Toufarová, Dagmar
dc.contributor.author Straka, Milan
dc.contributor.author Rosen, Alexandr
dc.contributor.author Náplava, Jakub
dc.contributor.author Poláčková, Marie
dc.date.accessioned 2017-05-03T08:08:33Z
dc.date.available 2017-05-03T08:08:33Z
dc.date.issued 2017-04-30
dc.identifier.uri http://hdl.handle.net/11234/1-2143
dc.description CzeSL-GEC is a corpus containing sentence pairs of original and corrected versions of Czech sentences collected from essays written by both non-native learners of Czech and Czech pupils with Romani background. To create this corpus, unreleased CzeSL-man corpus (http://utkl.ff.cuni.cz/learncorp/) was utilized. All sentences in the corpus are word tokenized.
dc.language.iso ces
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation.isreplacedby http://hdl.handle.net/11234/1-3057
dc.rights Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
dc.rights.uri http://creativecommons.org/licenses/by-sa/3.0/
dc.subject natural language correction
dc.subject grammatical error correction
dc.title CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Milan Straka straka@ufal.mff.cuni.cz Charles University, UFAL
contact.person Jakub Náplava naplava@ufal.mff.cuni.cz Charles University, UFAL
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds
sponsor Grantová agentura České republiky GAČR 16-10185S Čeština nerodilých mluvčích z pohledu teoretického a komputačního / Non-native Czech from the Theoretical and Computational Perspective nationalFunds
size.info 108067 sentences
size.info 48 files
files.size 5326473
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
2017-czesl-gec.zip
Size
5.08 MB
Format
application/zip
Description
corpus data and metadata, zipped
MD5
49dba121e7bf8deb180e673693410cc9
 Download file  Preview
 File Preview  
  • word2simword
    • a1_targets_train.txt524 kB
    • a1_targets_test.txt35 kB
    • a2_targets_train.txt326 kB
    • a1_targets_dev.txt33 kB
    • a2_inputs_train.txt324 kB
    • a1_inputs_test.txt34 kB
    • a2_targets_dev.txt33 kB
    • a1_inputs_dev.txt32 kB
    • a2_inputs_test.txt34 kB
    • a2_inputs_dev.txt32 kB
    • a1_inputs_train.txt521 kB
    • a2_targets_test.txt35 kB
  • word2words
    • a1_targets_train.txt1 MB
    • a1_targets_test.txt73 kB
    • a2_targets_train.txt667 kB
    • a1_targets_dev.txt70 kB
    • a2_inputs_train.txt647 kB
    • a1_inputs_test.txt71 kB
    • a2_targets_dev.txt70 kB
    • a1_inputs_dev.txt69 kB
    • a2_inputs_test.txt71 kB
    • a2_inputs_dev.txt69 kB
    • a1_inputs_train.txt1 MB
    • a2_targets_test.txt73 kB
  • word2word
    • a1_targets_train.txt598 kB
    • a1_targets_test.txt37 kB
    • a2_targets_train.txt368 kB
    • a1_targets_dev.txt38 kB
    • a2_inputs_train.txt366 kB
    • a1_inputs_test.txt37 kB
    • a2_targets_dev.txt38 kB
    • a1_inputs_dev.txt38 kB
    • a2_inputs_test.txt37 kB
    • a2_inputs_dev.txt38 kB
    • a1_inputs_train.txt593 kB
    • a2_targets_test.txt37 kB
  • sent2sent
    • a1_targets_train.txt1 MB
    • a1_targets_test.txt79 kB
    • a2_targets_train.txt653 kB
    • a1_targets_dev.txt71 kB
    • a2_inputs_train.txt638 kB
    • a1_inputs_test.txt78 kB
    • a2_targets_dev.txt71 kB
    • a1_inputs_dev.txt70 kB
    • a2_inputs_test.txt78 kB
    • a2_inputs_dev.txt70 kB
    • a1_inputs_train.txt1 MB
    • a2_targets_test.txt79 kB
    • README.md2 kB
    • LICENSE.txt21 kB

Show simple item record