Czech Verbal MWEs

Name: Czech Verbal MWEs
License: http://creativecommons.org/licenses/by-nc-sa/4.0/

Bejček, Eduard

dc.contributor.author	Bejček, Eduard
dc.date.accessioned	2018-01-23T09:31:44Z
dc.date.available	2018-01-23T09:31:44Z
dc.date.issued	2017
dc.identifier.uri	http://hdl.handle.net/11234/1-2603
dc.description	Lexicon of Czech verbal multiword expressions (VMWEs) used in Parseme Shared Task 2017. https://typo.uni-konstanz.de/parseme/index.php/2-general/142-parseme-shared-task-on-automatic-detection-of-verbal-mwes Lexicon consists of 4785 VMWEs, categorized into four categories according to Parseme Shared Task (PST) typology: IReflV (inherently reflexive verbs), LVC (light verb constructions), ID (idiomatic expressions) and OTH (other VMWEs with other than verbal syntactic head). Verbal multiword expressions as well as deverbative variants of VMWEs were annotated during the preparation phase of PST. These data were published as http://hdl.handle.net/11372/LRT-2282. Czech part includes 14,536 VMWE occurences: 1611 ID 10000 IReflV 2923 LVC 2 OTH This lexicon was created out of Czech data. Each lexicon entry is represented by one line in the form: type lemmas frequency PoS [used form 1; used form 2; ... ] (columns are separated by tabs) where: type ... is the type of VMWE in PST typology lemmas ... are space separated lemmatized forms of all words that constitutes the VMWE frequency ... is the absolute frequency of this item in PST data PoS ... is a space separated list of parts of speech of individual words (in the same order as in "lemmas") final field contains a list of all (1 to 18) used forms found in the data (since Czech is a flective language).
dc.language.iso	ces
dc.publisher	Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation.isreferencedby	http://ceur-ws.org/Vol-1779/02bejeck.pdf
dc.rights	Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject	lexicon
dc.subject	verbs
dc.subject	multiword expressions
dc.subject	forms
dc.subject	lemmatization
dc.title	Czech Verbal MWEs
dc.type	lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.mediaType	text
metashare.ResourceInfo#ContentInfo.detailedType	lexicon
dc.rights.label	PUB
has.files	yes
branding	LINDAT / CLARIAH-CZ
contact.person	Eduard Bejček bejcek@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
sponsor	Institute of Formal and Applied Linguistics LD14117 Parseme CZ nationalFunds
size.info	4785 items
files.size	320985
files.count	1

Files in this item

This item is

Publicly Available

and licensed under:
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Name: lexicon_czech_VMWEs.tsv
Size: 313.46 KB
Format: Unknown
Description: lexicon
MD5: 65c5fa7b9391f8e33fbb81c8f42c1d15

Download file

Show simple item record