Rights: http://creativecommons.org/licenses/by/4.0/ - LINDAT/CLARIAH-CZ Catalog Search Results

11. Bosworth-Toller’s Anglo-Saxon Dictionary online

Creator:: Tichý, Ondřej, Roček, Martin, Bočková, Renata, Čermák, Matěj, Dragounová, Jolana, Filipová, Helena, Gilová, Lucie, Hejná, Michaela, Hladíková, Lenka, Hladká, Alena, Hubinová, Veronika, Krajcsovicsová, Vlaďena, Kupková, Tatiana, Lebedeva, Tatiana, Malečková, Nikola, Novotná, Alena, Pazderová, Tereza, Popelíková, Jiřina, Rumlová, Jana, Tyčová Ocelík, Dana, Volná, Veronika, and Zahradníková, Tereza
Publisher:: Charles University, Faculty of Arts, Department of English Language and ELT Methodology
Type:: text, lexicon, and lexicalConceptualResource
Subject:: English, Old English, Anglo-Saxon, dictionary, Bosworth, Toller, lexicography, digitalization, English history, Mediaeval, and Medieval
Language:: English, Old English (ca. 450-1100), Latin, Ancient Greek (to 1453), and Ancient Hebrew
Description:: Description : This is an online edition of An Anglo-Saxon Dictionary, or a dictionary of "Old English". The dictionary records the state of the English language as it was used between ca. 700-1100 AD by the Anglo-Saxon inhabitants of the British Isles. This project is based on a digital edition of An Anglo-Saxon dictionary, based on the manuscript collections of the late Joseph Bosworth (the so called Main Volume, first edition 1898) and its Supplement (first edition 1921), edited by Joseph Bosworth and T. Northcote Toller, today the largest complete dictionary of Old English (one day to be hopefully supplanted by the DOE). Alistair Campbell's "enlarged addenda and corrigenda" from 1972 are not public domain and are therefore not part of the online dictionary. Please see the front & back matter of the paper dictionary for further information, prefaces and lists of references & contractions. The digitization project was initiated by Sean Crist in 2001 as a part of his Germanic Lexicon Project and many individuals and institutions have contributed to this project. Check out the original GLP webpage and the old Bosworth-Toller offline application webpage (to be updated). Currently the project is hosted by the Faculty of Arts, Charles University. In 2010, the data from the GLP were converted to create the current site. Care was taken to preserve the typography of the original dictionary, but also provide a modern, user friendly interface for contemporary users. In 2013, the entries were structurally re-tagged and the original typography was abandoned, though the immediate access to the scans of the paper dictionary was preserved. Our aim is to reach beyond a simple digital edition and create an online environment dedicated to all interested in Old English and Anglo-Saxon culture. Feel free to join in the editing of the Dictionary, commenting on its numerous entries or participating in the discussions at our forums. We hope that by drawing the attention of the community of Anglo-Saxonists to our site and joining our resources, we may create a more useful tool for everybody. The most immediate project to draw on the corrected and tagged data of the Dictionary is a Morphological Analyzer of Old English (currently under development). We are grateful for the generous support of the Charles University Grant Agency and for the free hosting at the Faculty of Arts at Charles University. The site is currently maintained and developed by Ondrej Tichy et al. at the Department of English Language and ELT Methodology, Faculty of Arts, Charles University in Prague (Czech Republic).
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

12. C4Corpus (CC-BY part)

Creator:: Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
Publisher:: Technische Universität Darmstadt
Type:: text and corpus
Subject:: CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
Language:: Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
Description:: A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

13. Continuous Rating; Supplementary materials

Creator:: Javorský, Dávid, Macháček, Dominik, and Bojar, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: manual evaluation, simultaneous speech subtitling, Continuous Rating, and questionnaire evaluation
Language:: German and Czech
Description:: Collected data from Continuous Rating evaluation study; collected Continuous Rating scores and Questionnaires.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

14. COSTRA 1.0: A Dataset of Complex Sentence Transformations

Creator:: Barančíková, Petra and Bojar, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: sentences, sentence embeddings, paraphrases, and semantic relations
Language:: Czech
Description:: COSTRA 1.0 is a dataset of Czech complex sentence transformations. The dataset is intended for the study of sentence-level embeddings beyond simple word alternations or standard paraphrasing. The dataset consist of 4,262 unique sentences with average length of 10 words, illustrating 15 types of modifications such as simplification, generalization, or formal and informal language variation. The hope is that with this dataset, we should be able to test semantic properties of sentence embeddings and perhaps even to find some topologically interesting “skeleton” in the sentence embedding space.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

15. COSTRA 1.1: A Dataset of Complex Sentence Transformations and Comparisons

Creator:: Barančíková, Petra and Bojar, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: paraphrases, sentence embeddings, evaluation, and sentence
Language:: Czech
Description:: Costra 1.1 is a new dataset for testing geometric properties of sentence embeddings spaces. In particular, it concentrates on examining how well sentence embeddings capture complex phenomena such paraphrases, tense or generalization. The dataset is a direct expansion of Costra 1.0, which was extended with more sentences and sentence comparisons.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

16. Covid-19 Thesaurus

Creator:: Fener, Patricia
Publisher:: Institute for scientific and technical information (Inist) - CNRS/UAR76
Type:: thesaurus, text, and lexicalConceptualResource
Subject:: COVID-19, SARS coronavirus, Middle-East coronavirus, SARS-CoV, and MERS-CoV
Language:: French and English
Description:: This bilingual thesaurus (French-English), developed at Inist-CNRS, covers the concepts from the emerging COVID-19 outbreak which reminds the past SARS coronavirus outbreak and Middle East coronavirus outbreak. This thesaurus is based on the vocabulary used in scientific publications for SARS-CoV-2 and other coronaviruses, like SARS-CoV and MERS-CoV. It provides a support to explore the coronavirus infectious diseases. The thesaurus can be browsed and queried by humans and machines on the Loterre portal (https://www.loterre.fr), via an API and an rdf triplestore. It is also downloadable in PDF, SKOS, csv and json-ld formats. The thesaurus is made available under a CC-by 4.0 license.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), PUB, and http://creativecommons.org/licenses/by/4.0/

17. Czech and English abstracts of ÚFAL papers

Creator:: Rosa, Rudolf
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: parallel corpus, scientific texts, and abstracts
Language:: Czech and English
Description:: This is a document-aligned parallel corpus of English and Czech abstracts of scientific papers published by authors from the Institute of Formal and Applied Linguistics, Charles University in Prague, as reported in the institute's system Biblio. For each publication, the authors are obliged to provide both the original abstract in Czech or English, and its translation into English or Czech, respectively. No filtering was performed, except for removing entries missing the Czech or English abstract, and replacing newline and tabulator characters by spaces.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

18. Czech and English abstracts of ÚFAL papers (2022-11-11)

Creator:: Rosa, Rudolf and Zouhar, Vilém
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: parallel corpus, scientific texts, and abstracts
Language:: Czech and English
Description:: This is a parallel corpus of Czech and mostly English abstracts of scientific papers and presentations published by authors from the Institute of Formal and Applied Linguistics, Charles University in Prague. For each publication record, the authors are obliged to provide both the original abstract (in Czech or English), and its translation (English or Czech) in the internal Biblio system. The data was filtered for duplicates and missing entries, ensuring that every record is bilingual. Additionally, records of published papers which are indexed by SemanticScholar contain the respective link. The dataset was created from September 2022 image of the Biblio database and is stored in JSONL format, with each line corresponding to one record.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

19. Czech and English Reflective Dataset (CEReD)

Creator:: Štefánik, Michal and Nehyba, Jan
Publisher:: Masaryk University, Brno
Type:: text and corpus
Subject:: reflective writing, reflective categories, pre-service teachers, and hand annotation
Language:: English and Czech
Description:: The database contains annotated reflective sentences, which fall into the categories of reflective writing according to Ullmann's (2019) model. The dataset is ready to replicate these categories' prediction using machine learning. Available from: https://anonymous.4open.science/repository/c856595c-dfc2-48d7-aa3d-0ccc2648c4dc/data
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

20. Digital humanities: Introduction. A 10-week course with practical sessions.

Creator:: Nevěřilová, Zuzana
Publisher:: Masaryk University, Brno
Type:: VIDEO and onlineCourse
Subject:: data-driven research, digital content processing, text processing, image processing, metadata, word embeddings, evaluation, and research infrastructures
Language:: English
Description:: The aim of the course is to introduce digital humanities and to describe various aspects of digital content processing. The course consists of 10 lessons with video material and a PowerPoint presentation with the same content. Every lesson contains a practical session – either a Jupyter Notebook to work in Python or a text file with a short description of the task. Most of the practical tasks consist of running the programme and analyse the results. Although the course does not focus on programming, the code can be reused easily in individual projects. Some experience in running Python code is desirable but not required.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

11. Bosworth-Toller’s Anglo-Saxon Dictionary online

12. C4Corpus (CC-BY part)

13. Continuous Rating; Supplementary materials

14. COSTRA 1.0: A Dataset of Complex Sentence Transformations

15. COSTRA 1.1: A Dataset of Complex Sentence Transformations and Comparisons

16. Covid-19 Thesaurus

17. Czech and English abstracts of ÚFAL papers

18. Czech and English abstracts of ÚFAL papers (2022-11-11)

19. Czech and English Reflective Dataset (CEReD)

20. Digital humanities: Introduction. A 10-week course with practical sessions.

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Show values starting with

Rights

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from