This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Czech Proofreading Rules

Please use the following text to cite this item or export to a predefined format:
Hlaváčková, Dana; Machura, Jakub; Žižková, Hana; Kovář, Vojtěch and Nevěřilová, Zuzana, 2025, Czech Proofreading Rules, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-6001.
Date issued
2025-10-19
Size
6649 entries,
175 classes
Language(s)
Description
The collection describes proofreading errors in Czech covered by Opravidlo 1.0. It consists of: - the grammar rules applicable via the SET Czech syntactic parser - description of grammar rules with relation to ERRANT codes - extended ERRANT ontology, created from the original ERRANT [Bryant et al., 2017] and its Czech extension [Náplava et al., 2022] - Python script that demonstrates how to apply the SET rules to proofreading The dataset contains 6649 SET rules in main categories: agreement, capitals, commas, dependent clauses, non-grammatical structures, pronouns, spelling complex, and others. The error categories form a taxonomy with Czech and English descriptions, examples, and links to ERRANT codes, 175 classes in total.
Acknowledgement
 Files in this item
Name
opravidlo_rules.zip.zip
Size
513.65 KB
Format
application/zip
Description
All data in one bundle.
MD5
1ecd9391616485d9795b2e9e59e088ec
Preview
  File Preview
    • other2.set.csv567 B
    • capitals.set.csv1 kB
    • README.md7 kB
    • errant_extended_vocabulary.rdf72 kB
    • pronouns.set.csv2 kB
    • other.set19 kB
    • commas.set.csv490 B
    • commas_morphodita.set412 kB
    • kdybysem.png21 kB
    • agreement.set168 kB
    • pronouns.set668 kB
    • spelling_complex.set44 kB
    • agreement.set.csv2 kB
    • nongramatical_structures.set.csv424 B
    • apply_rules.py2 kB
    • spelling_complex.set.csv609 B
    • capitals.set488 kB
    • other2.set3 kB
    • nongramatical_structures.set17 kB
    • commas.set205 kB
    • dependent_clauses.set23 kB
    • other.set.csv3 kB
    • dependent_clauses.set.csv1 kB
    • commas_morphodita.set.csv307 B
Name
errant_extended_vocabulary.rdf
Size
72.52 KB
Format
application/rdf+xml; charset=utf-8
Description
Ontology of errors extends ERRANT [Bryant et al., 2017] and Czech ERRANT [Náplava et al., 2022].
MD5
4d9c1208b59e9e65b1a9e2be2a04c94a
Preview
  File Preview
Name
apply_rules.py
Size
2.22 KB
Format
application/octet-stream
Description
Example script showing how to use SET syntactic parser for Czech, together with the rules.
MD5
4e2d1a1df603c0e660d19c0b57ba9463
Preview
  File Preview
Name
agreement.set
Size
168.32 KB
Format
application/octet-stream
Description
Grammar rules for SET - syntactic parser for Czech.
MD5
3f32d20766c5256364c730e4a73d3063
Preview
  File Preview
Name
capitals.set
Size
488.52 KB
Format
application/octet-stream
Description
Grammar rules for SET - syntactic parser for Czech.
MD5
6e68843e857b6c8bb5f8e3ee515cbf1b
Preview
  File Preview
Name
commas.set
Size
205.12 KB
Format
application/octet-stream
Description
Grammar rules for SET - syntactic parser for Czech.
MD5
852b3ea3e85817684e9f592d072d1811
Preview
  File Preview
Name
commas_morphodita.set
Size
412.48 KB
Format
application/octet-stream
Description
Grammar rules for SET - syntactic parser for Czech.
MD5
024b5e50d0079f955f9f3c6fb281be1f
Preview
  File Preview
Name
dependent_clauses.set
Size
23.17 KB
Format
application/octet-stream
Description
Grammar rules for SET - syntactic parser for Czech.
MD5
e8dd6770f7bf72032dd71713a4d77abb
Preview
  File Preview
Name
nongramatical_structures.set
Size
17.17 KB
Format
application/octet-stream
Description
Grammar rules for SET - syntactic parser for Czech.
MD5
34db75388caf5e040f197522542deb16
Preview
  File Preview
Name
other.set
Size
19.23 KB
Format
application/octet-stream
Description
Grammar rules for SET - syntactic parser for Czech.
MD5
d8b95e73218866fe28590b23ca231a48
Preview
  File Preview
Name
other2.set
Size
3.21 KB
Format
application/octet-stream
Description
Grammar rules for SET - syntactic parser for Czech.
MD5
cd444aaacd9cc1a324fcd9192d36fd47
Preview
  File Preview
Name
pronouns.set
Size
668.7 KB
Format
application/octet-stream
Description
Grammar rules for SET - syntactic parser for Czech.
MD5
28139db82376e1043154e7a038f9692b
Preview
  File Preview
Name
spelling_complex.set
Size
44.36 KB
Format
application/octet-stream
Description
Grammar rules for SET - syntactic parser for Czech.
MD5
31dcf401b221ab429c0ae4676be0a6e9
Preview
  File Preview
Name
agreement.set.csv
Size
2.81 KB
Format
text/csv
Description
Description of the rules categories, with links to the ERRANT ontology.
MD5
0a6c4c4f4cb8a682ba63b8c102eb1fbc
Preview
  File Preview
Name
capitals.set.csv
Size
1.67 KB
Format
text/csv
Description
Description of the rules categories, with links to the ERRANT ontology.
MD5
f96c06ed292be7d21943c955c3ceab1d
Preview
  File Preview
Name
commas.set.csv
Size
490 B
Format
text/csv
Description
Description of the rules categories, with links to the ERRANT ontology.
MD5
79d67d6f48e4376b8e947b879ff8418e
Preview
  File Preview
Name
commas_morphodita.set.csv
Size
307 B
Format
text/csv
Description
Description of the rules categories, with links to the ERRANT ontology.
MD5
255c971ce2453644566468a1d7c0e82a
Preview
  File Preview
Name
dependent_clauses.set.csv
Size
1.54 KB
Format
text/csv
Description
Description of the rules categories, with links to the ERRANT ontology.
MD5
f82163edc47a89a243da6b0b557d6cb9
Preview
  File Preview
Name
nongramatical_structures.set.csv
Size
424 B
Format
text/csv
Description
Description of the rules categories, with links to the ERRANT ontology.
MD5
6ee9ec01a4d5e235a1b8ab8f18b596e1
Preview
  File Preview
Name
other.set.csv
Size
3.13 KB
Format
text/csv
Description
Description of the rules categories, with links to the ERRANT ontology.
MD5
98504a7a53c665538adaec9392ffd7cd
Preview
  File Preview
Name
other2.set.csv
Size
567 B
Format
text/csv
Description
Description of the rules categories, with links to the ERRANT ontology.
MD5
7d8fb38a15466defb1c703fb7f52b4af
Preview
  File Preview
Name
pronouns.set.csv
Size
2.99 KB
Format
text/csv
Description
Description of the rules categories, with links to the ERRANT ontology.
MD5
15a13098d642b6aade728b9034cb6fc2
Preview
  File Preview
Name
spelling_complex.set.csv
Size
609 B
Format
text/csv
Description
Description of the rules categories, with links to the ERRANT ontology.
MD5
7bb26ab7c469e31f1c0d004ff5765ee6
Preview
  File Preview
Name
kdybysem.png
Size
21.54 KB
Format
image/png
Description
MD5
f9d6756f3759bcc60875223462d10df3
Preview
  File Preview
Name
README.md
Size
7.04 KB
Format
application/octet-stream
Description
How to use the rules.
MD5
35a896103233a62bf09e6e7b53ff908d
Preview
  File Preview