This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
What's New
Author(s):
Description:
This software package includes three tools: web frontend (charles-translator-web-frontend) for machine translation featuring phonetic transcription of Ukrainian suitable for Czech speakers, API server (lindat-translation) and a tool for translation of documents with markup including html, docx, odt, pptx and odp (document-translations). These tools are used in the Charles Translator service (https://translator.cuni.cz). This software was developed within the EdUKate project, which aims to help mitigate language barriers between non-Czech-speaking children in the Czech Republic and the education in the Czech school system. The project focuses on the development and dissemination of multilingual digital learning materials for students in primary and secondary schools.
This item contains 1 file (7.71 MB).
Publicly Available
Author(s):
Description:
This package includes three models adapted for sentence-level machine translation in educational domain: Czech-to-Ukrainian, Czech-to-English and Czech-to-German. The models are provided as LoRA adapters on top of EuroLLM-9B-Instruct LLM and can be used in the Charles Translator service (https://translator.cuni.cz) and in the web portal Škola s nadhledem (https://skolasnadhledem.cz/). The models were developed within the EdUKate project, which aims to help mitigate language barriers between non-Czech-speaking children in the Czech Republic and the education in the Czech school system. The project focuses on the development and dissemination of multilingual digital learning materials for students in primary and secondary schools.
This item contains 3 files (1.77 GB).
Publicly Available
Author(s):
Description:
This repository contains the official data release for the CRAC 2025 Shared Task on Multilingual Coreference Resolution associated with the CODI-CRAC 2025 Workshop held at EMNLP 2025.
This item contains 1 file (345.16 MB).
Publicly Available
Most Viewed Items - Last Month
Author(s):
Description:
Tokenizer, POS Tagger, Lemmatizer and Parser models for 94 treebanks of 61 languages of Universal Depenencies 2.5 Treebanks, created solely using UD 2.5 data (http://hdl.handle.net/11234/1-3105). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_25_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.
This item contains 96 files (2.61 GB).
Publicly Available
Author(s):
show everyone
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
This item contains 3 files (598.92 MB).
Publicly Available
Author(s):
show everyone
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
This item contains 3 files (740.61 MB).
Publicly Available