Slavic UD Treebanks with Periphrastic Verb Forms
Please use the following text to cite this item or export to a predefined format:
Krippnerová, Lenka and Zeman, Daniel, 2025,
Slavic UD Treebanks with Periphrastic Verb Forms, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-5936.
Authors
Item identifier
Date issued
2025-06-16
Size
672813 sentences,
10341444 tokens,
10354008 words
Description
This dataset is based on Universal Dependencies v2.16 (http://hdl.handle.net/11234/1-5901). It contains treebanks for 15 Slavic languages, enriched with periphrastic verb form annotations. While UD encodes morphological features at the token level, our annotation extends this by marking periphrastic verb phrases that span multiple tokens — possibly discontinuously — to capture more complex verbal constructions. This kind of annotation is added to the last column of the CoNLL-U format (MISC). The added annotation is encoded in Phrase* attributes in MISC. In certain cases, the annotation of FEATS and DEPREL was modified, too, to provide more uniform annotation across the languages.
For more details, see the paper:
Lenka Krippnerová and Daniel Zeman. 2025. Periphrastic Verb Forms in Universal Dependencies.
In: Proceedings of SyntaxFest / Depling 2025, Ljubljana, Slovenia.
Acknowledgement
Grantová agentura České republiky
Project code:GX20-16819X
Project name:LUSyD – Language Understanding: from Syntax to Discourse
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2023062
Project name:LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy
COST (European Cooperation in Science and Technology)
Project code:COST CA21167
Project name:UniDive
Subject(s)
Collections
Files in this item
- Name
- periphrastic-ud-treebanks-v2.16-slavic.zip
- Size
- 217 MB
- Format
- application/zip
- Description
- Zip
- MD5
- ac9ece7bc6c19471ffa22b6037703f55

- UD_Russian-PUD
- stats.xml12 kB
- README.md5 kB
- LICENSE.txt19 kB
- ru_pud-ud-test.txt209 kB
- ru_pud-ud-test.conllu2 MB
- UD_Serbian-SET
- sr_set-ud-test.txt67 kB
- README.md1 kB
- sr_set-ud-train.conllu6 MB
- sr_set-ud-dev.txt68 kB
- stats.xml11 kB
- sr_set-ud-test.conllu997 kB
- sr_set-ud-train.txt432 kB
- LICENSE.txt230 B
- sr_set-ud-dev.conllu1 MB
- UD_Slovenian-SSJ
- sl_ssj-ud-dev.conllu2 MB
- README.md7 kB
- sl_ssj-ud-train.txt1 MB
- sl_ssj-ud-test.conllu2 MB
- sl_ssj-ud-train.conllu19 MB
- sl_ssj-ud-dev.txt147 kB
- stats.xml12 kB
- LICENSE.txt222 B
- sl_ssj-ud-test.txt141 kB
- UD_Bulgarian-BTB
- bg_btb-ud-dev.txt155 kB
- bg_btb-ud-train.txt1 MB
- bg_btb-ud-test.conllu1 MB
- bg_btb-ud-train.conllu13 MB
- stats.xml13 kB
- bg_btb-ud-test.txt152 kB
- LICENSE.txt319 B
- README.txt7 kB
- bg_btb-ud-dev.conllu1 MB
- UD_Old_East_Slavic-Ruthenian
- README.md1 kB
- orv_ruthenian-ud-test.txt91 kB
- orv_ruthenian-ud-test.conllu1018 kB
- orv_ruthenian-ud-train.txt834 kB
- orv_ruthenian-ud-train.conllu9 MB
- stats.xml18 kB
- orv_ruthenian-ud-dev.txt95 kB
- LICENSE.txt202 B
- orv_ruthenian-ud-dev.conllu1 MB
- UD_Belarusian-HSE
- be_hse-ud-test.txt180 kB
- README.md7 kB
- be_hse-ud-train.txt2 MB
- be_hse-ud-dev.conllu1 MB
- be_hse-ud-train.conllu30 MB
- be_hse-ud-test.conllu1 MB
- stats.xml17 kB
- LICENSE.txt1021 B
- be_hse-ud-dev.txt164 kB
- UD_Pomak-Philotis
- README.md2 kB
- qpm_philotis-ud-train.conllu1 MB
- qpm_philotis-ud-test.txt45 kB
- qpm_philotis-ud-train.txt86 kB
- qpm_philotis-ud-dev.txt45 kB
- qpm_philotis-ud-dev.conllu891 kB
- qpm_philotis-ud-test.conllu895 kB
- stats.xml12 kB
- LICENSE.txt408 B
- UD_Czech-PUD
- stats.xml16 kB
- README.md3 kB
- LICENSE.txt202 B
- cs_pud-ud-test.txt114 kB
- cs_pud-ud-test.conllu2 MB
- UD_Polish-PUD
- stats.xml15 kB
- pl_pud-ud-test.txt117 kB
- pl_pud-ud-test.conllu2 MB
- README.md2 kB
- LICENSE.txt217 B
- UD_Czech-CAC
- cs_cac-ud-test.conllu1 MB
- cs_cac-ud-test.txt71 kB
- README.md7 kB
- cs_cac-ud-train.conllu54 MB
- cs_cac-ud-dev.txt72 kB
- cs_cac-ud-train.txt2 MB
- stats.xml19 kB
- LICENSE.txt265 B
- cs_cac-ud-dev.conllu1 MB
- UD_Old_East_Slavic-TOROT
- README.md4 kB
- orv_torot-ud-test.txt275 kB
- orv_torot-ud-train.txt1 MB
- orv_torot-ud-train.conllu21 MB
- orv_torot-ud-dev.conllu3 MB
- orv_torot-ud-dev.txt270 kB
- orv_torot-ud-test.conllu3 MB
- stats.xml12 kB
- LICENSE.txt197 B
- UD_Old_East_Slavic-RNC
- README.md20 kB
- orv_rnc-ud-dev.txt350 kB
- orv_rnc-ud-dev.conllu3 MB
- orv_rnc-ud-test.conllu2 MB
- stats.xml19 kB
- orv_rnc-ud-train.txt930 kB
- orv_rnc-ud-test.txt251 kB
- LICENSE.txt202 B
- orv_rnc-ud-train.conllu10 MB
- UD_Russian-GSD
- ru_gsd-ud-dev.conllu1 MB
- README.md1 kB
- ru_gsd-ud-test.conllu1 MB
- ru_gsd-ud-dev.txt123 kB
- ru_gsd-ud-train.txt794 kB
- ru_gsd-ud-test.txt120 kB
- stats.xml13 kB
- LICENSE.txt202 B
- ru_gsd-ud-train.conllu7 MB
- UD_Slovak-SNK
- sk_snk-ud-dev.conllu1 MB
- README.md4 kB
- sk_snk-ud-train.conllu8 MB
- sk_snk-ud-dev.txt80 kB
- sk_snk-ud-train.txt447 kB
- sk_snk-ud-test.conllu1 MB
- sk_snk-ud-test.txt80 kB
- stats.xml13 kB
- LICENSE.txt202 B
- UD_Czech-Poetry
- stats.xml13 kB
- cs_poetry-ud-test.txt32 kB
- README.md2 kB
- LICENSE.txt202 B
- cs_poetry-ud-test.conllu662 kB
- UD_Polish-PDB
- README.md5 kB
- pl_pdb-ud-train.txt1 MB
- pl_pdb-ud-dev.conllu3 MB
- pl_pdb-ud-test.txt203 kB
- pl_pdb-ud-train.conllu32 MB
- pl_pdb-ud-dev.txt210 kB
- stats.xml16 kB
- LICENSE.txt384 B
- pl_pdb-ud-test.conllu3 MB
- UD_Upper_Sorbian-UFAL
- stats.xml11 kB
- README.md1 kB
- hsb_ufal-ud-test.conllu849 kB
- hsb_ufal-ud-train.conllu37 kB
- hsb_ufal-ud-test.txt64 kB
- LICENSE.txt202 B
- hsb_ufal-ud-train.txt2 kB
- UD_Czech-PDTC
- README.md16 kB
- cs_pdtc-ud-test.conllu41 MB
- cs_pdtc-ud-dev.txt2 MB
- cs_pdtc-ud-train.conllu378 MB
- cs_pdtc-ud-test.txt1 MB
- stats.xml19 kB
- cs_pdtc-ud-dev.conllu52 MB
- LICENSE.txt311 B
- cs_pdtc-ud-train.txt16 MB
- UD_Russian-Taiga
- ru_taiga-ud-train.conllu177 MB
- README.md5 kB
- ru_taiga-ud-test.conllu1 MB
- ru_taiga-ud-test.txt143 kB
- ru_taiga-ud-dev.conllu1 MB
- ru_taiga-ud-dev.txt157 kB
- ru_taiga-ud-train.txt16 MB
- stats.xml20 kB
- LICENSE.txt202 B
- UD_Croatian-SET
- hr_set-ud-dev.conllu1 MB
- README.md5 kB
- hr_set-ud-dev.txt130 kB
- hr_set-ud-train.conllu13 MB
- hr_set-ud-train.txt901 kB
- stats.xml11 kB
- hr_set-ud-test.txt143 kB
- LICENSE.txt233 B
- hr_set-ud-test.conllu2 MB
- UD_Russian-Poetry
- ru_poetry-ud-dev.txt85 kB
- ru_poetry-ud-dev.conllu1 MB
- README.md2 kB
- ru_poetry-ud-train.conllu4 MB
- ru_poetry-ud-train.txt386 kB
- ru_poetry-ud-test.txt86 kB
- stats.xml16 kB
- LICENSE.txt202 B
- ru_poetry-ud-test.conllu1 MB
- UD_Czech-CLTT
- cs_cltt-ud-dev.conllu1 MB
- README.md5 kB
- cs_cltt-ud-test.txt81 kB
- cs_cltt-ud-train.txt96 kB
- cs_cltt-ud-dev.txt76 kB
- cs_cltt-ud-train.conllu1 MB
- stats.xml13 kB
- LICENSE.txt265 B
- cs_cltt-ud-test.conllu1 MB
- UD_Ukrainian-ParlaMint
- uk_parlamint-ud-dev.conllu1 MB
- uk_parlamint-ud-test.txt113 kB
- README.md3 kB
- uk_parlamint-ud-train.txt667 kB
- uk_parlamint-ud-dev.txt112 kB
- stats.xml19 kB
- uk_parlamint-ud-train.conllu6 MB
- LICENSE.txt202 B
- uk_parlamint-ud-test.conllu1 MB
- UD_Old_Church_Slavonic-PROIEL
- README.md3 kB
- cu_proiel-ud-train.txt1 MB
- cu_proiel-ud-test.txt224 kB
- cu_proiel-ud-dev.conllu2 MB
- cu_proiel-ud-test.conllu2 MB
- cu_proiel-ud-train.conllu20 MB
- stats.xml13 kB
- LICENSE.txt197 B
- cu_proiel-ud-dev.txt205 kB
- UD_Ukrainian-IU
- README.md9 kB
- uk_iu-ud-train.txt900 kB
- uk_iu-ud-test.txt177 kB
- uk_iu-ud-test.conllu2 MB
- uk_iu-ud-dev.conllu1 MB
- uk_iu-ud-dev.txt128 kB
- stats.xml17 kB
- LICENSE.txt172 B
- uk_iu-ud-train.conllu13 MB
- UD_Macedonian-MTB
- stats.xml11 kB
- README.md2 kB
- LICENSE.txt217 B
- mk_mtb-ud-test.conllu193 kB
- mk_mtb-ud-test.txt11 kB
- UD_Slovenian-SST
- README.md7 kB
- sl_sst-ud-train.txt350 kB
- sl_sst-ud-test.txt55 kB
- sl_sst-ud-dev.conllu1 MB
- sl_sst-ud-dev.txt46 kB
- sl_sst-ud-train.conllu11 MB
- stats.xml11 kB
- LICENSE.txt417 B
- sl_sst-ud-test.conllu1 MB
- UD_Russian-SynTagRus
- README.md4 kB
- ru_syntagrus-ud-dev.txt1 MB
- ru_syntagrus-ud-test.txt1 MB
- ru_syntagrus-ud-dev.conllu16 MB
- ru_syntagrus-ud-train.txt12 MB
- ru_syntagrus-ud-train.conllu127 MB
- ru_syntagrus-ud-test.conllu16 MB
- stats.xml20 kB
- LICENSE.txt188 B
- UD_Polish-LFG
- pl_lfg-ud-dev.txt74 kB
- pl_lfg-ud-train.txt596 kB
- README.md6 kB
- pl_lfg-ud-test.txt74 kB
- pl_lfg-ud-dev.conllu1 MB
- pl_lfg-ud-test.conllu1 MB
- stats.xml14 kB
- LICENSE.txt34 kB
- pl_lfg-ud-train.conllu13 MB
- UD_Old_East_Slavic-Birchbark
- README.md4 kB
- orv_birchbark-ud-dev.txt90 kB
- orv_birchbark-ud-dev.conllu1 MB
- orv_birchbark-ud-train.txt66 kB
- orv_birchbark-ud-test.conllu1 MB
- orv_birchbark-ud-test.txt90 kB
- stats.xml15 kB
- LICENSE.txt202 B
- orv_birchbark-ud-train.conllu1 MB
- UD_Czech-FicTree
- README.md5 kB
- cs_fictree-ud-test.conllu1 MB
- cs_fictree-ud-test.txt86 kB
- cs_fictree-ud-train.conllu15 MB
- cs_fictree-ud-dev.txt86 kB
- cs_fictree-ud-dev.conllu1 MB
- stats.xml15 kB
- LICENSE.txt219 B
- cs_fictree-ud-train.txt696 kB
-
- README.md1 kB

