This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
This is an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification.
These are supplementary materials for an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification and is available at http://hdl.handle.net/11234/1-4615. These supplementary materials contain OCR texts from different OCR engines for book pages for which we have both high-resolution scanned images and annotations for OCR evaluation.
Obsah: Předmluva, Prof. PhDr. Jaroslav Ludvíkovský, Jaroslav Ludvíkovský a jeho „Řecký román dobrodružný“, Dobrovského klasická humanita, Soupis díla, Jubilejní sborníky a články, Nekrology, Soupis časopisů, sborníků a knih jiných autorů s příspěvky Jaroslava Ludvíkovského, Rejstřík starých autorů, Rejstřík moderních autorů a spolupracovníků,
Bibliografie, Výbor z prací Jaroslava Ludvíkovského