1 - 6 of 6
Number of results to display per page
Search Results
2. A Human-Annotated Dataset for Language Modeling and Named Entity Recognition in Medieval Documents (2023-01-05)
- Creator:
- Novotný, Vít, Luger, Kristýna, Štefánik, Michal, Vrabcová, Tereza, and Horák, Aleš
- Publisher:
- Masaryk University, Brno
- Type:
- text and corpus
- Subject:
- NER, named entity recognition, and Medieval
- Language:
- Czech, English, German, and Latin
- Description:
- This is an open dataset of sentences from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains a corpus for language modeling and human annotations for named entity recognition (NER).
- Rights:
- Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB
3. A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Documents
- Creator:
- Novotný, Vít, Seidlová, Kristýna, Vrabcová, Tereza, and Horák, Aleš
- Publisher:
- Masaryk University, Brno
- Type:
- image and corpus
- Subject:
- ocr, optical character recognition, language identification, image super-resolution, sr, and Medieval
- Language:
- German, Czech, Latin, and English
- Description:
- This is an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification.
- Rights:
- Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB
4. A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Documents: Supplementary Materials
- Creator:
- Novotný, Vít and Horák, Aleš
- Publisher:
- Masaryk University, Brno
- Type:
- text and corpus
- Subject:
- ocr, optical character recognition, language identification, image super-resolution, sr, and Medieval
- Language:
- Czech, English, German, and Latin
- Description:
- These are supplementary materials for an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The dataset contains human annotations for layout analysis, OCR evaluation, and language identification and is available at http://hdl.handle.net/11234/1-4615. These supplementary materials contain OCR texts from different OCR engines for book pages for which we have both high-resolution scanned images and annotations for OCR evaluation.
- Rights:
- Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB
5. Bosworth-Toller’s Anglo-Saxon Dictionary online
- Creator:
- Tichý, Ondřej, Roček, Martin, Bočková, Renata, Čermák, Matěj, Dragounová, Jolana, Filipová, Helena, Gilová, Lucie, Hejná, Michaela, Hladíková, Lenka, Hladká, Alena, Hubinová, Veronika, Krajcsovicsová, Vlaďena, Kupková, Tatiana, Lebedeva, Tatiana, Malečková, Nikola, Novotná, Alena, Pazderová, Tereza, Popelíková, Jiřina, Rumlová, Jana, Tyčová Ocelík, Dana, Volná, Veronika, and Zahradníková, Tereza
- Publisher:
- Charles University, Faculty of Arts, Department of English Language and ELT Methodology
- Type:
- text, lexicon, and lexicalConceptualResource
- Subject:
- English, Old English, Anglo-Saxon, dictionary, Bosworth, Toller, lexicography, digitalization, English history, Mediaeval, and Medieval
- Language:
- English, Old English (ca. 450-1100), Latin, Ancient Greek (to 1453), and Ancient Hebrew
- Description:
- Description : This is an online edition of An Anglo-Saxon Dictionary, or a dictionary of "Old English". The dictionary records the state of the English language as it was used between ca. 700-1100 AD by the Anglo-Saxon inhabitants of the British Isles. This project is based on a digital edition of An Anglo-Saxon dictionary, based on the manuscript collections of the late Joseph Bosworth (the so called Main Volume, first edition 1898) and its Supplement (first edition 1921), edited by Joseph Bosworth and T. Northcote Toller, today the largest complete dictionary of Old English (one day to be hopefully supplanted by the DOE). Alistair Campbell's "enlarged addenda and corrigenda" from 1972 are not public domain and are therefore not part of the online dictionary. Please see the front & back matter of the paper dictionary for further information, prefaces and lists of references & contractions. The digitization project was initiated by Sean Crist in 2001 as a part of his Germanic Lexicon Project and many individuals and institutions have contributed to this project. Check out the original GLP webpage and the old Bosworth-Toller offline application webpage (to be updated). Currently the project is hosted by the Faculty of Arts, Charles University. In 2010, the data from the GLP were converted to create the current site. Care was taken to preserve the typography of the original dictionary, but also provide a modern, user friendly interface for contemporary users. In 2013, the entries were structurally re-tagged and the original typography was abandoned, though the immediate access to the scans of the paper dictionary was preserved. Our aim is to reach beyond a simple digital edition and create an online environment dedicated to all interested in Old English and Anglo-Saxon culture. Feel free to join in the editing of the Dictionary, commenting on its numerous entries or participating in the discussions at our forums. We hope that by drawing the attention of the community of Anglo-Saxonists to our site and joining our resources, we may create a more useful tool for everybody. The most immediate project to draw on the corrected and tagged data of the Dictionary is a Morphological Analyzer of Old English (currently under development). We are grateful for the generous support of the Charles University Grant Agency and for the free hosting at the Faculty of Arts at Charles University. The site is currently maintained and developed by Ondrej Tichy et al. at the Department of English Language and ELT Methodology, Faculty of Arts, Charles University in Prague (Czech Republic).
- Rights:
- Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB
6. On-line Dictionary of medieval latin in the Czech lands
- Creator:
- Ctibor, Jan and Nývlt, Pavel
- Publisher:
- Institute of Philosophy of the Czech Academy of Sciences
- Type:
- text, lexicon, and lexicalConceptualResource
- Subject:
- dictionary, latin, Medieval, digital humanities, lexicography, and Medieval Latin
- Language:
- Latin and Czech
- Description:
- The Dictionary of Medieval Latin in the Czech Lands registers and explains the vocabulary of Medieval Latin as used in the Czech lands since the beginnings of Latin writing in this area (from about 1000 CE) to 1500 CE, so far covering the letters A-M. For more information about the Dictionary, see the webpage of the Department of Medieval Lexicography of the Institute of Philosophy of Czech Academy of Sciences. The data uploaded present the on-line version of the dictionary (API and XML data), making it possible to put the application into operation at a localhost.
- Rights:
- Dictionary of Medieval Latin in the Czech Lands - digital version 2.2 License Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/license-lb, and ACA