KUKY is a curated selection of 224 Czech administrative and legal documents for readability research, stored in two JSON files.
The documents come partly from public databases (Office of the Ombudsman, courts) and from private sources (letters, public local administration announcements). Some documents come in documented draft-revision pairs.
They are manually enriched with a two-level annotation: "Relevance Stoplight" and "Speech Acts". This annotation mimics the way a plain-language expert scrutinizes a document before redesigning it for better readability: first, they closely read the entire document and detect problematic passages ("Relevance Stoplight"), classifying them as either incomprehensible or superfluous, or approving them as relevant.
In a second step, the editor works with the relevant text according to a genre-specific template ("Speech Acts").
At the metadata level, the documents are graded with respect to their readability, as perceived by experienced plain legal writing teachers.