This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

Medieval Charter Sections Corpus

Please use the following text to cite this item or export to a predefined format:
Galuščáková, Petra and Neužilová, Lucie, 2018, Medieval Charter Sections Corpus, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-1952.
Date issued
2018-02-28
Size
171 kb,
57 files
Language(s)
Description
This package provides an evaluation framework, training and test data for semi-automatic recognition of sections of historical diplomatic manuscripts. The data collection consists of 57 Latin charters issued by the Royal Chancellery of 7 different types. Documents were created in the era of John the Blind, King of Bohemia (1310–1346) and Count of Luxembourg. Manuscripts were digitized, transcribed, and typical sections of medieval charters ('corroboratio', 'datatio', 'dispositio', 'inscriptio', 'intitulatio', 'narratio', and 'publicatio') were manually tagged. Manuscripts also contain additional metadata, such as manually marked named entities and short Czech abstracts. Recognition models are first trained using manually marked sections in training documents and the trained model can then be used for recognition of the sections in the test data. The parsing script supports methods based on Cosine Distance, TF-IDF weighting and adapted Viterbi algorithm.
Acknowledgement
 Files in this item
Name
Historical_Manuscript_Sections_Detection_v2.zip
Size
167.25 KB
Format
application/zip
Description
Zip
MD5
53d7e68d98402166539b0a9aecdc49c8
Preview
  File Preview