Zobrazit minimální záznam

 
dc.contributor.author Galuščáková, Petra
dc.contributor.author Neužilová, Lucie
dc.date.accessioned 2018-03-02T07:07:04Z
dc.date.available 2018-03-02T07:07:04Z
dc.date.issued 2018-02-28
dc.identifier.uri http://hdl.handle.net/11234/1-1952
dc.description This package provides an evaluation framework, training and test data for semi-automatic recognition of sections of historical diplomatic manuscripts. The data collection consists of 57 Latin charters issued by the Royal Chancellery of 7 different types. Documents were created in the era of John the Blind, King of Bohemia (1310–1346) and Count of Luxembourg. Manuscripts were digitized, transcribed, and typical sections of medieval charters ('corroboratio', 'datatio', 'dispositio', 'inscriptio', 'intitulatio', 'narratio', and 'publicatio') were manually tagged. Manuscripts also contain additional metadata, such as manually marked named entities and short Czech abstracts. Recognition models are first trained using manually marked sections in training documents and the trained model can then be used for recognition of the sections in the test data. The parsing script supports methods based on Cosine Distance, TF-IDF weighting and adapted Viterbi algorithm.
dc.language.iso lat
dc.language.iso ces
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.source.uri http://ufal.mff.cuni.cz/Medieval-Charter-Sections-Corpus
dc.subject section detection
dc.subject segmentation
dc.subject information retrieval
dc.title Medieval Charter Sections Corpus
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Petra Galuščáková galuscakova@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
sponsor Grantová agentura České republiky GAP103/12/G084 Centrum pro multi-modální interpretaci dat velkého rozsahu nationalFunds
sponsor NSF 1618695 Safely Searching Among Sensitive Content Other
size.info 171 kb
size.info 57 files
files.size 171266
files.count 1


 Soubory tohoto záznamu

Icon
Název
Historical_Manuscript_Sections_Detection_v2.zip
Velikost
167.25 KB
Formát
application/zip
Popis
Medieval Charter Sections Corpus
MD5
53d7e68d98402166539b0a9aecdc49c8
 Stáhnout soubor  Náhled
 Náhled souboru  

Zobrazit minimální záznam