Contributor: ANR (France)@@ANR-14-CERA-0001@@PARSEME-FR@@nationalFunds@@

Start Over Contributor ANR (France)@@ANR-14-CERA-0001@@PARSEME-FR@@nationalFunds@@

Creator:: Barque, Lucie, Candito, Marie, Constant, Matthieu, Cordeiro, Silvio Ricardo, Crabbé, Benoît, Fort, Karën, Guillaume, Bruno, Haas, Pauline, Huyghe, Richard, Perrier, Guy, Ramisch, Carlos, Ribeyre, Corentin, Savary, Agata, Seddah, Djamé, Segonne, Vincent, Tribout, Delphine, Villemonte de la Clergerie, Eric, Parmentier, Yannick, Pasquer, Caroline, and Antoine, Jean-Yves
Publisher:: ANR
Type:: text and corpus
Subject:: morpho-syntactic annotations, treebank, dependency syntax, semantic tagging, multiword expressions, and named entities
Language:: French
Description:: The Sequoia corpus is a set of 3,099 linguistically-annotated French sentences, originating from four sources (Europarl, European Agency Reports, French regional journal L'Est Républicain, and French wikipedia). Several types of annotations were added over the years. The current release comprises: - parts-of-speech (SEQUOIA ANR-08-EMER-013 project) - syntactic dependency trees - deep syntactic dependency graphs (Deep sequoia project) - multi-word expressions and named entities (PARSEME COST project and PARSEME-FR ANR-14-CERA-0001 project) - coarse semantic tags for nouns (FrSemCor project) See the deep sequoia page for a detailed description: https://deep-sequoia.inria.fr/
Rights:: Deep Sequoia Licence, https://lindat.mff.cuni.cz/repository/xmlui/page/deep-sequoia-licence, and PUB

Creator:: Savary, Agata, Cordeiro, Silvio Ricardo, Lichte, Timm, Ramisch, Carlos, Iñurrieta, Uxoa, and Giouli, Voula
Publisher:: PARSEME
Type:: text and corpus
Subject:: verbal multiword expressions, literal occurrence, and idiomaticity rate
Language:: Basque, German, Modern Greek (1453-), Polish, and Portuguese
Description:: The corpus contains sentences with idiomatic, literal and coincidental occurrences of verbal multiword expressions (VMWEs) in Basque, German, Greek, Polish and Portuguese. The source corpus is the PARSEME multilingual corpus of VMWEs v 1.1 (cf. http://hdl.handle.net/11372/LRT-2842). The sentences with VMWEs were extracted from the source corpus and potential co-occurrences of the same lexemes were automatically extracted from the same corpus. These candidates were then manually annotated by native experts into 6 classes, including literal and coincidental occurrences, as well as various annotation errors. The construction of the corpus is described by the following publication: Agata Savary, Silvio Ricardo Cordeiro, Timm Lichte, Carlos Ramisch, Uxoa Iñurrieta, Voula Giouli (forthcoming) "Literal occurrences of multiword expressions: Rare birds that cause a stir", to appear in Prague Bulletin of Mathematical Linguistics.
Rights:: License agreement for The Multilingual corpus of literal occurrences of multiword expressions, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-literal, and PUB

Limit your search