Evald 4.0 for Foreigners

 

EVALD 4.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech (click here for EVALD 4.0 that is designed for native speakers of Czech, and here for EVALD 4.0 for Beginners that is designed for beginning non-native speakers of Czech). For more information, visit the project web pages.

The software is created for assessing the surface coherence of authentic writing samples (essays) written by non-native speakers of Czech. In other words, it is trained to evaluate prosaic texts whose content and form correspond to the common essays created as a comprehensive piece of writing on a given topic, e.g. during the Czech language exam. When evaluating a different type of text (poems, journalistic texts etc.), the software may not work reliably. The minimum length of the inserted text is 300 words – the shorter texts do not provide enough linguistic material on which the real level of the text can be observed. The evaluation of shorter texts may be thus inaccurate.
The evaluated texts can be used for scientific purposes, freely distributed and published.



Explanatory notes

Evaluating scale for surface text coherence:

A1basic user of Czech – lower level
A2basic user of Czech – higher level
B1independent user of Czech – lower level
B2independent user of Czech – higher level
C1proficient user of Czech – lower level
C2proficient user of Czech – higher level

Readability measures:

Flesch-Kincaid Grade Level Formula

The higher the number obtained by this metric, the harder the text reads; e.g., value 8 means that the text should be comprehensible to students aged 13–14; it is also a degree of comprehensibility for the general public: a text with the value of 8 should be understood by approximately 80% of the people.

Flesch Reading Ease1
ScoreSchool levelNotes
90 or more5th gradeVery easy to read. Easily understood by an average 11-year-old student.
90–806th gradeEasy to read. Conversational language for consumers.
80–707th gradeFairly easy to read.
70–608th & 9th gradePlain language. Easily understood by 13- to 15-year-old students.
60–5010th to 12th gradeFairly difficult to read.
50–30CollegeDifficult to read.
30 or lessCollege graduateVery difficult to read. Best understood by university graduates.
SMOG index (Simple Measure of Gobbledygook)

The higher the number obtained by this metric, the harder the text reads.

Coleman-Liau index

The higher the number obtained by this metric, the harder the text reads.

Automated readability index2
ScoreAgeGrade Level
1 or less5–6Kindergarten
26–7First/Second Grade
37–9Third Grade
49–10Fourth Grade
510–11Fifth Grade
611–12Sixth Grade
712–13Seventh Grade
813–14Eighth Grade
914–15Ninth Grade
1015–16Tenth Grade
1116–17Eleventh Grade
1217–18Twelfth grade
1318–24College student
14 or more24+Professor

Variety of vocabulary:

Yule’s K characteristics

Richness of vocabulary: Yule’s K characteristics: The larger Yule’s K, the smaller the diversity of the vocabulary.

Simpson's Diversity Index (D)

Richness of vocabulary: Simpson's Diversity Index (D): The larger Simpson’s D, the smaller the diversity of the vocabulary.


Footnotes

1 Flesch, Rudolf. How to Write Plain English. University of Canterbury.

2 Senter, R.J.; Smith, E.A. (November 1967). Automated Readability Index. Wright-Patterson Air Force Base: iii. AMRL-TR-6620.

Acknowledgment

Developed (© 2016–2019) at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, with the financial support of the Ministry of Culture of the Czech Republic, project Automatic Evaluation of Text Coherence in Czech (DG16P02B016).