PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement


License Terms

PARSEME Shared Task Raw Corpus Data (edition 1.2) is a collection of linguistic data. Each of the corpora has its own license terms and you (the “User”) are responsible for complying with the license terms applicable to those parts which you use. If you do not agree with the license terms, you must stop using the corpora and destroy all copies of the data that you have obtained.

The license for every corpus included in the release is specified in the appropriate language directory.

Overview of the corpora and their license terms

Language Language code Licences
German DE CC BY-NC-SA 4.0
Greek EL CC-BY-NC-SA 4.0 (CoNLL 2017 part)
CC-BY (Leipzig corpus collection part)
Basque EU CC-BY-NC-SA 4.0
French FR CC-BY-NC-SA 4.0
Hebrew HE CC-BY-NC-SA 4.0
Hindi HI CC-BY-NC-SA 4.0
Irish GA CC-BY 4.0 (Citizen's Information part)
CC0 (ParaCrawl part)
CC-BY 2.0 (Tatoeba part)
CC-BY-SA 3.0 (Vicipéid)
Public Domain* (EU bookshop part)
Italian IT CC-BY-NC-SA 4.0
Polish PL CC-BY-NC-SA 4.0
Portuguese PT CC-BY-NC-SA 4.0
Romanian RO CC-BY-NC-SA 4.0
Swedish SV CC-BY-NC-SA 4.0
Turkish TR CC-BY-NC-SA 4.0
Chinese ZH CC-BY-NC-SA 4.0


License URL
CC BY 4.0
CC BY 2.0
CC BY-SA 3.0
Public Domain* The source is public administration data from EU; see publications office of the european union copyright.
The particular set of documents was gathered from EUbookshop
(see J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)).