CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)

Name: CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)
License: http://creativecommons.org/licenses/by-sa/3.0/

Šebesta, Karel; Bedřichová, Zuzanna; Šormová, Kateřina; Štindlová, Barbora; Hrdlička, Milan; Hrdličková, Tereza; Hana, Jiří; Petkevič, Vladimír; Jelínek, Tomáš; Škodová, Svatava; Janeš, Petr; Lundáková, Kateřina; Skoumalová, Hana; Sládek, Šimon; Pierscieniak, Piotr; Toufarová, Dagmar; Straka, Milan; Rosen, Alexandr; Náplava, Jakub; Poláčková, Marie

Show simple item record

dc.contributor.author	Šebesta, Karel
dc.contributor.author	Bedřichová, Zuzanna
dc.contributor.author	Šormová, Kateřina
dc.contributor.author	Štindlová, Barbora
dc.contributor.author	Hrdlička, Milan
dc.contributor.author	Hrdličková, Tereza
dc.contributor.author	Hana, Jiří
dc.contributor.author	Petkevič, Vladimír
dc.contributor.author	Jelínek, Tomáš
dc.contributor.author	Škodová, Svatava
dc.contributor.author	Janeš, Petr
dc.contributor.author	Lundáková, Kateřina
dc.contributor.author	Skoumalová, Hana
dc.contributor.author	Sládek, Šimon
dc.contributor.author	Pierscieniak, Piotr
dc.contributor.author	Toufarová, Dagmar
dc.contributor.author	Straka, Milan
dc.contributor.author	Rosen, Alexandr
dc.contributor.author	Náplava, Jakub
dc.contributor.author	Poláčková, Marie
dc.date.accessioned	2017-05-03T08:08:33Z
dc.date.available	2017-05-03T08:08:33Z
dc.date.issued	2017-04-30
dc.identifier.uri	http://hdl.handle.net/11234/1-2143
dc.description	CzeSL-GEC is a corpus containing sentence pairs of original and corrected versions of Czech sentences collected from essays written by both non-native learners of Czech and Czech pupils with Romani background. To create this corpus, unreleased CzeSL-man corpus (http://utkl.ff.cuni.cz/learncorp/) was utilized. All sentences in the corpus are word tokenized.
dc.language.iso	ces
dc.publisher	Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation.isreplacedby	http://hdl.handle.net/11234/1-3057
dc.rights	Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
dc.rights.uri	http://creativecommons.org/licenses/by-sa/3.0/
dc.subject	natural language correction
dc.subject	grammatical error correction
dc.title	CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
dc.rights.label	PUB
has.files	yes
branding	LINDAT / CLARIAH-CZ
contact.person	Milan Straka straka@ufal.mff.cuni.cz Charles University, UFAL
contact.person	Jakub Náplava naplava@ufal.mff.cuni.cz Charles University, UFAL
sponsor	Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds
sponsor	Grantová agentura České republiky GAČR 16-10185S Čeština nerodilých mluvčích z pohledu teoretického a komputačního / Non-native Czech from the Theoretical and Computational Perspective nationalFunds
size.info	108067 sentences
size.info	48 files
files.size	5326473
files.count	1

Files in this item

This item is

Publicly Available

and licensed under:
Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)

Name: 2017-czesl-gec.zip
Size: 5.08 MB
Format: application/zip
Description: corpus data and metadata, zipped
MD5: 49dba121e7bf8deb180e673693410cc9

Download file Preview

File Preview

word2simword
- a1_targets_train.txt524 kB
- a1_targets_test.txt35 kB
- a2_targets_train.txt326 kB
- a1_targets_dev.txt33 kB
- a2_inputs_train.txt324 kB
- a1_inputs_test.txt34 kB
- a2_targets_dev.txt33 kB
- a1_inputs_dev.txt32 kB
- a2_inputs_test.txt34 kB
- a2_inputs_dev.txt32 kB
- a1_inputs_train.txt521 kB
- a2_targets_test.txt35 kB
word2words
- a1_targets_train.txt1 MB
- a1_targets_test.txt73 kB
- a2_targets_train.txt667 kB
- a1_targets_dev.txt70 kB
- a2_inputs_train.txt647 kB
- a1_inputs_test.txt71 kB
- a2_targets_dev.txt70 kB
- a1_inputs_dev.txt69 kB
- a2_inputs_test.txt71 kB
- a2_inputs_dev.txt69 kB
- a1_inputs_train.txt1 MB
- a2_targets_test.txt73 kB
word2word
- a1_targets_train.txt598 kB
- a1_targets_test.txt37 kB
- a2_targets_train.txt368 kB
- a1_targets_dev.txt38 kB
- a2_inputs_train.txt366 kB
- a1_inputs_test.txt37 kB
- a2_targets_dev.txt38 kB
- a1_inputs_dev.txt38 kB
- a2_inputs_test.txt37 kB
- a2_inputs_dev.txt38 kB
- a1_inputs_train.txt593 kB
- a2_targets_test.txt37 kB
sent2sent
- a1_targets_train.txt1 MB
- a1_targets_test.txt79 kB
- a2_targets_train.txt653 kB
- a1_targets_dev.txt71 kB
- a2_inputs_train.txt638 kB
- a1_inputs_test.txt78 kB
- a2_targets_dev.txt71 kB
- a1_inputs_dev.txt70 kB
- a2_inputs_test.txt78 kB
- a2_inputs_dev.txt70 kB
- a1_inputs_train.txt1 MB
- a2_targets_test.txt79 kB
- README.md2 kB
- LICENSE.txt21 kB

Show simple item record