Creator: Baisa, Vít - LINDAT/CLARIAH-CZ Catalog Search Results

1. Czech Grammar Agreement Dataset for Evaluation of Language Models

Creator:: Baisa, Vít
Publisher:: Masaryk University, NLP Centre
Type:: text and corpus
Subject:: agreement, past tense verb suffix, language model, and training data
Language:: Czech
Description:: AGREE is a dataset and task for evaluation of language models based on grammar agreement in Czech. The dataset consists of sentences with marked suffixes of past tense verbs. The task is to choose the right verb suffix which depends on gender, number and animacy of subject. It is challenging for language models because 1) Czech is morphologically rich, 2) it has relatively free word order, 3) high out-of-vocabulary (OOV) ratio, 4) predicate and subject can be far from each other, 5) subjects can be unexpressed and 6) various semantic rules may apply. The task provides a straightforward and easily reproducible way of evaluating language models on a morphologically rich language.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

2. English-Czech Corpus from Wikipedia

Creator:: Štromajerová, Adéla, Baisa, Vít, and Blahuš, Marek
Publisher:: Masaryk University, NLP Centre
Type:: text and corpus
Subject:: Wikipedia
Language:: English and Czech
Description:: Sentence-parallel corpus made from English and Czech Wikipedias based on translated articles from English into Czech. The work done is described in the paper: ŠTROMAJEROVÁ, Adéla, Vít BAISA a Marek BLAHUŠ. Between Comparable and Parallel: English-Czech Corpus from Wikipedia. In RASLAN 2016 Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2016. s. 3-8, 6 s. ISBN 978-80-263-1095-2.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

3. VPS-GradeUp (2016-10-10)

Creator:: Baisa, Vít, Cinková, Silvie, Krejčová, Ema, and Vernerová, Anna
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, other, and lexicalConceptualResource
Subject:: Pattern Dictionary of English Verbs, usage patterns, lexical semantics, dictionaries, clustering, Corpus Pattern Analysis, verbs, graded decisions, Likert scale, and Word Sense Disambiguation
Language:: English
Description:: VPS-GradeUp is a collection of triple manual annotations of 29 English verbs based on the Pattern Dictionary of English Verbs (PDEV) and comprising the following lemmas: abolish, act, adjust, advance, answer, approve, bid, cancel, conceive, cultivate, cure, distinguish, embrace, execute, hire, last, manage, murder, need, pack, plan, point, praise, prescribe, sail, seal, see, talk, urge . It contains results from two different tasks: 1. Graded decisions 2. Best-fit pattern (WSD) . In both tasks, the annotators were matching verb senses defined by the PDEV patterns with 50 actual uses of each verb (using concordances from the BNC [2]). The verbs were randomly selected from a list of completed PDEV lemmas with at least 3 patterns and at least 100 BNC concordances not previously annotated by PDEV’s own annotators. Also, the selection excluded verbs contained in VPS-30-En[3], a data set we developed earlier. This data set was built within the project Reviving Zellig S. Harris: more linguistic information for distributional lexical analysis of English and Czech and in connection with the SemEval-2015 CPA-related task.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

1. Czech Grammar Agreement Dataset for Evaluation of Language Models

2. English-Czech Corpus from Wikipedia

3. VPS-GradeUp (2016-10-10)

Limit your search

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Creator

Language

Publisher

Rights

Subject

Show values starting with

Type

Original context has metadata only

Harvested from