This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

CRAC 2026 Empty Nodes Baseline Model

Please use the following text to cite this item or export to a predefined format:
Straka, Milan, 2026, CRAC 2026 Empty Nodes Baseline Model, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-6081.
Date issued
2026-01-30
Description
The crac2026_empty_nodes_baseline is a XLM-RoBERTa-large–based multilingual model for CRAC 2026 Empty Nodes Baseline system https://github.com/ufal/crac2026_empty_nodes_baseline for predicting empty nodes in the input CoNLL-U files, trained on CorefUD 1.4 data. It was was used to generate baseline empty nodes prediction in the CRAC 2026 Shared Task on Multilingual Coreference Resolution https://ufal.mff.cuni.cz/corefud/crac26. The model is language agnostic, so in theory it can be used to predict coreference in any XLM-RoBERTa language. Compared to the last year CRAC 2025 Empty Nodes Baseline https://github.com/ufal/crac2025_empty_nodes_baseline, this year's baseline predicts all available information for the empty nodes, i.e., including forms, lemmas, UPOS, XPOS, and FEATS columns, in addition to previously predicted word order and dependency relations of the empty nodes. Instructions for running prediction, training, and intrinsic evaluation are all available in the repository CRAC 2026 Empty Nodes Baseline https://github.com/ufal/crac2026_empty_nodes_baseline.
Acknowledgement
 Files in this item
Name
crac2026_empty_nodes_baseline.zip
Size
2.17 GB
Format
application/zip
Description
CRAC 2026 Empty Nodes Baseline Model
MD5
bd978678d6864dccd146d8f7ff361446
Preview
  File Preview