This is not the latest version of this item. The latest version can be found here.
Universal Dependencies 2.0
Please use the following text to cite this item or export to a predefined format:
Nivre, Joakim; et al., 2017,
Universal Dependencies 2.0, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-1983.
Authors
Nivre, Joakim ; et al.
Item identifier
Project URL
Date issued
2017-03-13
Size
11814230 tokens,
12102983 words,
630518 sentences
Language(s)
Description
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
This release is special in that the treebanks will be used as training/development data in the CoNLL 2017 shared task (http://universaldependencies.org/conll17/). Test data are not released, except for the few treebanks that do not take part in the shared task. 64 treebanks will be in the shared task, and they correspond to the following 45 languages: Ancient Greek, Arabic, Basque, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Gothic, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Kazakh, Korean, Latin, Latvian, Norwegian, Old Church Slavonic, Persian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian, Urdu, Uyghur and Vietnamese.
This release fixes a bug in http://hdl.handle.net/11234/1-1976. Changed files: ud-tools-v2.0.tgz (conllu_to_text.pl, conllu_to_conllx.pl; added text_without_spaces.pl), ud-treebanks-conll2017.tgz (fi_ftb-ud-train.txt, he-ud-train.txt, it-ud-train.txt, pt_br-ud-train.txt, es-ud-train.txt) and ud-treebanks-v2.0.tgz (fi_ftb-ud-train.txt, he-ud-train.txt, it-ud-train.txt, pt_br-ud-train.txt, es-ud-train.txt, ar_nyuad-ud-dev.txt, ar_nyuad-ud-test.txt, ar_nyuad-ud-train.txt, cop-ud-dev.txt, cop-ud-test.txt, cop-ud-train.txt, sa-ud-dev.txt, sa-ud-test.txt, sa-ud-train.txt).
Publisher
Acknowledgement
Grantová agentura České republiky
Project code:15-10472S
Project name:Morphologically and Syntactically Annotated Corpora of Many Languages
Collections
Version History
You are currently viewing version 6 of the item.
1 - 10 out of 24 results
| Version | Date | Summary |
|---|---|---|
| 2025-11-13 11:27:52 | New release of Universal Dependencies: 2.17 | |
| 2025-05-15 00:00:00 | ||
| 2024-11-15 00:00:00 | ||
| 2024-05-15 00:00:00 | ||
| 2023-11-15 00:00:00 | ||
| 2023-05-15 00:00:00 | ||
| 2022-11-15 00:00:00 | ||
| 2022-05-15 00:00:00 | ||
| 2021-11-15 00:00:00 | ||
| 2021-05-16 00:00:00 |
* Selected version
Files in this item
- Name
- ud-documentation-v2.0.tgz
- Size
- 43.5 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- fbe08dd83675da3ac1e54a5ee67d1a69

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- ud-treebanks-v2.0.tgz
- Size
- 180.67 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 0fc5576ebade87a0733cc323d529d784

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- ud-treebanks-conll2017.tgz
- Size
- 174.86 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- f4869f28c376c360c740ef8caafdfffd

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- ud-tools-v2.0.tgz
- Size
- 192.19 KB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 68c0c53a740a87eaeae18cd528a915c2

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

