This is not the latest version of this item. The latest version can be found here.
CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)
Please use the following text to cite this item or export to a predefined format:
Šebesta, Karel; et al., 2017,
CzeSL Grammatical Error Correction Dataset (CzeSL-GEC), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-2143.
Authors
Šebesta, Karel ; et al.
Item identifier
Date issued
2017-04-30
Size
108067 sentences,
48 files
Language(s)
Description
CzeSL-GEC is a corpus containing sentence pairs of original and corrected versions of Czech sentences collected from essays written by both non-native learners of Czech and Czech pupils with Romani background. To create this corpus, unreleased CzeSL-man corpus (http://utkl.ff.cuni.cz/learncorp/) was utilized. All sentences in the corpus are word tokenized.
Acknowledgement
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2015071
Project name:LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat
Grantová agentura České republiky
Project code:GAČR 16-10185S
Project name:Čeština nerodilých mluvčích z pohledu teoretického a komputačního / Non-native Czech from the Theoretical and Computational Perspective
Collections
Version History
Files in this item
- Name
- 2017-czesl-gec.zip
- Size
- 5.08 MB
- Format
- application/zip
- Description
- corpus data and metadata, zipped
- MD5
- 49dba121e7bf8deb180e673693410cc9

- word2simword
- a1_targets_train.txt524 kB
- a1_targets_test.txt35 kB
- a2_targets_train.txt326 kB
- a1_targets_dev.txt33 kB
- a2_inputs_train.txt324 kB
- a1_inputs_test.txt34 kB
- a2_targets_dev.txt33 kB
- a1_inputs_dev.txt32 kB
- a2_inputs_test.txt34 kB
- a2_inputs_dev.txt32 kB
- a1_inputs_train.txt521 kB
- a2_targets_test.txt35 kB
- word2words
- a1_targets_train.txt1 MB
- a1_targets_test.txt73 kB
- a2_targets_train.txt667 kB
- a1_targets_dev.txt70 kB
- a2_inputs_train.txt647 kB
- a1_inputs_test.txt71 kB
- a2_targets_dev.txt70 kB
- a1_inputs_dev.txt69 kB
- a2_inputs_test.txt71 kB
- a2_inputs_dev.txt69 kB
- a1_inputs_train.txt1 MB
- a2_targets_test.txt73 kB
- word2word
- a1_targets_train.txt598 kB
- a1_targets_test.txt37 kB
- a2_targets_train.txt368 kB
- a1_targets_dev.txt38 kB
- a2_inputs_train.txt366 kB
- a1_inputs_test.txt37 kB
- a2_targets_dev.txt38 kB
- a1_inputs_dev.txt38 kB
- a2_inputs_test.txt37 kB
- a2_inputs_dev.txt38 kB
- a1_inputs_train.txt593 kB
- a2_targets_test.txt37 kB
- sent2sent
- a1_targets_train.txt1 MB
- a1_targets_test.txt79 kB
- a2_targets_train.txt653 kB
- a1_targets_dev.txt71 kB
- a2_inputs_train.txt638 kB
- a1_inputs_test.txt78 kB
- a2_targets_dev.txt71 kB
- a1_inputs_dev.txt70 kB
- a2_inputs_test.txt78 kB
- a2_inputs_dev.txt70 kB
- a1_inputs_train.txt1 MB
- a2_targets_test.txt79 kB
-
- README.md2 kB
- LICENSE.txt21 kB

