CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)

CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)

LINDAT / CLARIAH-CZ

Autoři: Šebesta, Karel ; et al.zobraz všechny autory
Šebesta, Karel ; Bedřichová, Zuzanna ; Šormová, Kateřina ; Štindlová, Barbora ; Hrdlička, Milan ; Hrdličková, Tereza ; Hana, Jiří ; Petkevič, Vladimír ; Jelínek, Tomáš ; Škodová, Svatava ; Janeš, Petr ; Lundáková, Kateřina ; Skoumalová, Hana ; Sládek, Šimon ; Pierscieniak, Piotr ; Toufarová, Dagmar ; Straka, Milan ; Rosen, Alexandr ; Náplava, Jakub ; Poláčková, Marie

Identifikátor: http://hdl.handle.net/11234/1-2143

Datum vydání: 2017-04-30

Typ: corpus, text

Velikost: 108067 sentences, 48 files

Jazyky: Czech

Popis: CzeSL-GEC is a corpus containing sentence pairs of original and corrected versions of Czech sentences collected from essays written by both non-native learners of Czech and Czech pupils with Romani background. To create this corpus, unreleased CzeSL-man corpus (http://utkl.ff.cuni.cz/learncorp/) was utilized. All sentences in the corpus are word tokenized.

Nakladatel: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)

Klíčová slova: natural language correction grammatical error correction

Kolekce: LINDAT / CLARIAH-CZ Data & Tools

Tento záznam byl nahrazen novější verzí:

http://hdl.handle.net/11234/1-3057

Zobrazit celý záznam

Soubory tohoto záznamu

Licenční kategorie:

Publicly Available

Licence: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)

Název: 2017-czesl-gec.zip
Velikost: 5.08 MB
Formát: application/zip
Popis: corpus data and metadata, zipped
MD5: 49dba121e7bf8deb180e673693410cc9

Stáhnout soubor Náhled

Náhled souboru

word2simword
- a1_targets_train.txt524 kB
- a1_targets_test.txt35 kB
- a2_targets_train.txt326 kB
- a1_targets_dev.txt33 kB
- a2_inputs_train.txt324 kB
- a1_inputs_test.txt34 kB
- a2_targets_dev.txt33 kB
- a1_inputs_dev.txt32 kB
- a2_inputs_test.txt34 kB
- a2_inputs_dev.txt32 kB
- a1_inputs_train.txt521 kB
- a2_targets_test.txt35 kB
word2words
- a1_targets_train.txt1 MB
- a1_targets_test.txt73 kB
- a2_targets_train.txt667 kB
- a1_targets_dev.txt70 kB
- a2_inputs_train.txt647 kB
- a1_inputs_test.txt71 kB
- a2_targets_dev.txt70 kB
- a1_inputs_dev.txt69 kB
- a2_inputs_test.txt71 kB
- a2_inputs_dev.txt69 kB
- a1_inputs_train.txt1 MB
- a2_targets_test.txt73 kB
word2word
- a1_targets_train.txt598 kB
- a1_targets_test.txt37 kB
- a2_targets_train.txt368 kB
- a1_targets_dev.txt38 kB
- a2_inputs_train.txt366 kB
- a1_inputs_test.txt37 kB
- a2_targets_dev.txt38 kB
- a1_inputs_dev.txt38 kB
- a2_inputs_test.txt37 kB
- a2_inputs_dev.txt38 kB
- a1_inputs_train.txt593 kB
- a2_targets_test.txt37 kB
sent2sent
- a1_targets_train.txt1 MB
- a1_targets_test.txt79 kB
- a2_targets_train.txt653 kB
- a1_targets_dev.txt71 kB
- a2_inputs_train.txt638 kB
- a1_inputs_test.txt78 kB
- a2_targets_dev.txt71 kB
- a1_inputs_dev.txt70 kB
- a2_inputs_test.txt78 kB
- a2_inputs_dev.txt70 kB
- a1_inputs_train.txt1 MB
- a2_targets_test.txt79 kB
- README.md2 kB
- LICENSE.txt21 kB