PDT-Vallex: Czech Valency lexicon linked to treebanks 4.5 (PDT-Vallex 4.5)
Please use the following text to cite this item or export to a predefined format:
Urešová, Zdeňka; et al., 2024,
PDT-Vallex: Czech Valency lexicon linked to treebanks 4.5 (PDT-Vallex 4.5), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-5814.
Authors
Urešová, Zdeňka ; et al.
Item identifier
Date issued
2024-12-30
Size
12998 words,
20320 entries
Language(s)
Description
The valency lexicon PDT-Vallex 4.5 is a part of the PDT-C 2.0 release https://hdl.handle.net/11234/1-5813. It is a slightly modified version of PDT-Vallex 4.0 from 2020 (as a part of PDT-C 1.0 corpus) for full compatibility with PDT-C 2.0 annotation, including a completely reworked reference IDs for the word and frame entries. PDT-Vallex has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT, the spoken language corpus (PDTSC) and corpus of user-generated texts in the project Faust). It contains over 14500 valency frames for almost 8500 verbs which occurred in the PDT, PCEDT, PDTSC and Faust corpora. In addition, there are nouns, adjectives and adverbs, linked from the PDT part only, increasing the total to over 20000 valency frames for almost 13000 words. All the corpora have been published in 2024 as the PDT-C 2.0 corpus with the PDT-Vallex 4.5 dictionary included; this is a copy of the dictionary published as a separate item for those not interested in the corpora themselves. It is available in electronically processable format (XML), and also in more human readable form including corpus examples (see the project and web browser links below, and the links to its main publications elsewhere in this metadata). The main feature of the lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame with additional (generalized) information about its usage and surface morphosyntactic form alternatives.
Acknowledgement
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2015071
Project name:LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2018101
Project name:LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:CZ.02.1.01/0.0/0.0/16_013/0001781
Project name:LINDAT/CLARIN - Výzkumná infrastruktura pro jazykové technologie - rozšíření repozitáře a výpočetní kapacity
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2023062
Project name:LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:CZ.02.01.01/00/23_015/0008176
Project name:LINDAT/CLARIAH-CZ Přístrojové vybavení
Grantová agentura České republiky
Project code:GX20-16819X
Project name:LUSyD – Language Understanding: from Syntax to Discourse
Collections
Version History
This item isPublicly Available
and licensed under:
Files in this item
- Name
- pdtvallex-4.5.xml
- Size
- 18.36 MB
- Format
- text/xml
- Description
- The valency lexicon in XML
- MD5
- 521cd74ba8ab0ba5cbb2b68437832e7d

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- vallex.changes
- Size
- 440.61 KB
- Format
- application/octet-stream
- Description
- Mapping PDT Vallex 4.0 frame ids to PDT Vallex 4.5 frame ids
- MD5
- d55ea206bd3de59e0b821dbe1ed20f2e

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

