Nivre, Joakim Agić, Željko Ahrenberg, Lars Aranzabe, Maria Jesus Asahara, Masayuki Atutxa, Aitziber Ballesteros, Miguel Bauer, John Bengoetxea, Kepa Bhat, Riyaz Ahmad Bick, Eckhard Bosco, Cristina Bouma, Gosse Bowman, Sam Candito, Marie Cebiroğlu Eryiğit, Gülşen Celano, Giuseppe G. A. Chalub, Fabricio Choi, Jinho Çöltekin, Çağrı Connor, Miriam Davidson, Elizabeth de Marneffe, Marie-Catherine de Paiva, Valeria Diaz de Ilarraza, Arantza Dobrovoljc, Kaja Dozat, Timothy Droganova, Kira Dwivedi, Puneet Eli, Marhaba Erjavec, Tomaž Farkas, Richárd Foster, Jennifer Freitas, Cláudia Gajdošová, Katarína Galbraith, Daniel Garcia, Marcos Ginter, Filip Goenaga, Iakes Gojenola, Koldo Gökırmak, Memduh Goldberg, Yoav Gómez Guinovart, Xavier Gonzáles Saavedra, Berta Grioni, Matias Grūzītis, Normunds Guillaume, Bruno Habash, Nizar Hajič, Jan Hà Mỹ, Linh Haug, Dag Hladká, Barbora Hohle, Petter Ion, Radu Irimia, Elena Johannsen, Anders Jørgensen, Fredrik Kaşıkara, Hüner Kanayama, Hiroshi Kanerva, Jenna Kotsyba, Natalia Krek, Simon Laippala, Veronika Lê Hồng, Phương Lenci, Alessandro Ljubešić, Nikola Lyashevskaya, Olga Lynn, Teresa Makazhanov, Aibek Manning, Christopher Mărănduc, Cătălina Mareček, David Martínez Alonso, Héctor Martins, André Mašek, Jan Matsumoto, Yuji McDonald, Ryan Missilä, Anna Mititelu, Verginica Miyao, Yusuke Montemagni, Simonetta More, Amir Mori, Shunsuke Moskalevskyi, Bohdan Muischnek, Kadri Mustafina, Nina Müürisep, Kaili Nguyễn Thị, Lương Nguyễn Thị Minh, Huyền Nikolaev, Vitaly Nurmi, Hanna Ojala, Stina Osenova, Petya Øvrelid, Lilja Pascual, Elena Passarotti, Marco Perez, Cenel-Augusto Perrier, Guy Petrov, Slav Piitulainen, Jussi Plank, Barbara Popel, Martin Pretkalniņa, Lauma Prokopidis, Prokopis Puolakainen, Tiina Pyysalo, Sampo Rademaker, Alexandre Ramasamy, Loganathan Real, Livy Rituma, Laura Rosa, Rudolf Saleh, Shadi Sanguinetti, Manuela Saulīte, Baiba Schuster, Sebastian Seddah, Djamé Seeker, Wolfgang Seraji, Mojgan Shakurova, Lena Shen, Mo Sichinava, Dmitry Silveira, Natalia Simi, Maria Simionescu, Radu Simkó, Katalin Šimková, Mária Simov, Kiril Smith, Aaron Suhr, Alane Sulubacak, Umut Szántó, Zsolt Taji, Dima Tanaka, Takaaki Tsarfaty, Reut Tyers, Francis Uematsu, Sumire Uria, Larraitz van Noord, Gertjan Varga, Viktor Vincze, Veronika Washington, Jonathan North Žabokrtský, Zdeněk Zeldes, Amir Zeman, Daniel Zhu, Hanzhi 2017-03-14
dc.description Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). This release is special in that the treebanks will be used as training/development data in the CoNLL 2017 shared task ( Test data are not released, except for the few treebanks that do not take part in the shared task. 64 treebanks will be in the shared task, and they correspond to the following 45 languages: Ancient Greek, Arabic, Basque, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Gothic, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Kazakh, Korean, Latin, Latvian, Norwegian, Old Church Slavonic, Persian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian, Urdu, Uyghur and Vietnamese. This release fixes a bug in Changed files: ud-tools-v2.0.tgz (,; added, ud-treebanks-conll2017.tgz (fi_ftb-ud-train.txt, he-ud-train.txt, it-ud-train.txt, pt_br-ud-train.txt, es-ud-train.txt) and ud-treebanks-v2.0.tgz (fi_ftb-ud-train.txt, he-ud-train.txt, it-ud-train.txt, pt_br-ud-train.txt, es-ud-train.txt, ar_nyuad-ud-dev.txt, ar_nyuad-ud-test.txt, ar_nyuad-ud-train.txt, cop-ud-dev.txt, cop-ud-test.txt, cop-ud-train.txt, sa-ud-dev.txt, sa-ud-test.txt, sa-ud-train.txt).
dc.language.iso grc
dc.language.iso ara
dc.language.iso eus
dc.language.iso bul
dc.language.iso hrv
dc.language.iso ces
dc.language.iso dan
dc.language.iso nld
dc.language.iso eng
dc.language.iso est
dc.language.iso fin
dc.language.iso fra
dc.language.iso deu
dc.language.iso got
dc.language.iso ell
dc.language.iso heb
dc.language.iso hin
dc.language.iso hun
dc.language.iso ind
dc.language.iso gle
dc.language.iso ita
dc.language.iso jpn
dc.language.iso lat
dc.language.iso nor
dc.language.iso chu
dc.language.iso fas
dc.language.iso pol
dc.language.iso por
dc.language.iso ron
dc.language.iso slv
dc.language.iso spa
dc.language.iso swe
dc.language.iso tam
dc.language.iso cat
dc.language.iso zho
dc.language.iso glg
dc.language.iso kaz
dc.language.iso lav
dc.language.iso rus
dc.language.iso tur
dc.language.iso cop
dc.language.iso san
dc.language.iso slk
dc.language.iso ukr
dc.language.iso uig
dc.language.iso vie
dc.language.iso bel
dc.language.iso kor
dc.language.iso lit
dc.language.iso urd
dc.publisher Universal Dependencies Consortium
dc.rights Licence Universal Dependencies v2.0
dc.subject treebank
dc.subject dependency
dc.subject syntax
dc.subject morphology
dc.subject harmonized annotation
dc.subject interset
dc.subject universal tagset
dc.subject stanford dependencies
dc.title Universal Dependencies 2.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIN
contact.person Joakim Nivre Uppsala University
contact.person Daniel Zeman Charles University in Prague, UFAL
sponsor Grantová agentura České republiky 15-10472S Morphologically and Syntactically Annotated Corpora of Many Languages nationalFunds 11814230 tokens 12102983 words 630518 sentences
files.size 418607328
files.count 4
Licence Universal Dependencies v2.0
GNU General Public License, version 3.0 Distributed under Creative Commons
Treebank data
Training and development data for the CoNLL 2017 shared task
