PARSEME Shared Task Raw Corpus Data (v. 1.2) Agreement

(2020/11/03)

License Terms

PARSEME Shared Task Raw Corpus Data (edition 1.2) is a collection of linguistic data. Each of the corpora has its own license terms and you (the “User”) are responsible for complying with the license terms applicable to those parts which you use. If you do not agree with the license terms, you must stop using the corpora and destroy all copies of the data that you have obtained.

The license for every corpus included in the release is specified in the appropriate language directory.

Overview of the corpora and their license terms

Language	Language code	Licences
German	DE	CC BY-NC-SA 4.0
Greek	EL	CC-BY-NC-SA 4.0 (CoNLL 2017 part) CC-BY (Leipzig corpus collection part)
Basque	EU	CC-BY-NC-SA 4.0
French	FR	CC-BY-NC-SA 4.0
Hebrew	HE	CC-BY-NC-SA 4.0
Hindi	HI	CC-BY-NC-SA 4.0
Irish	GA	CC-BY 4.0 (Citizen's Information part) CC0 (ParaCrawl part) CC-BY 2.0 (Tatoeba part) CC-BY-SA 3.0 (Vicipéid) Public Domain* (EU bookshop part)
Italian	IT	CC-BY-NC-SA 4.0
Polish	PL	CC-BY-NC-SA 4.0
Portuguese	PT	CC-BY-NC-SA 4.0
Romanian	RO	CC-BY-NC-SA 4.0
Swedish	SV	CC-BY-NC-SA 4.0
Turkish	TR	CC-BY-NC-SA 4.0
Chinese	ZH	CC-BY-NC-SA 4.0

Licenses

License	URL
CC BY-NC-SA 4.0	http://creativecommons.org/licenses/by-nc-sa/4.0/
CC BY 4.0	https://creativecommons.org/licenses/by/4.0/
CC0	https://creativecommons.org/share-your-work/public-domain/cc0/
CC BY 2.0	https://creativecommons.org/licenses/by/2.0/
CC BY-SA 3.0	https://creativecommons.org/licenses/by-sa/3.0/
Public Domain*	The source is public administration data from EU; see publications office of the european union copyright. The particular set of documents was gathered from EUbookshop (see J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)).