« Previous |
1 - 10 of 12
|
Next »
Number of results to display per page
Search Results
2. MEd
- Creator:
- Pajas, Petr and Mareček, David
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService
- Subject:
- annotation tool
- Description:
- MEd is an annotation tool in which linearly-structured annotations of text or audio data can be created and edited. The tool supports multiple stacked layers of annotations that can be interconnected by links. MEd can also be used for other purposes, such as word-to-word alignment of parallel corpora.
- Rights:
- GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB
3. PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0 (PDT-Vallex 4.0)
- Creator:
- Urešová, Zdeňka, Bémová, Alevtina, Fučíková, Eva, Hajič, Jan, Kolářová, Veronika, Mikulová, Marie, Pajas, Petr, Panevová, Jarmila, and Štěpánek, Jan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, computationalLexicon, and lexicalConceptualResource
- Subject:
- verbal valency, valency, annotation, linguistic data, lexicon, lexical semantics, and PDT
- Language:
- Czech
- Description:
- The valency lexicon PDT-Vallex 4.0 has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT, the spoken language corpus (PDTSC) and corpus of user-generated texts in the project Faust). It contains over 14500 valency frames for almost 8500 verbs which occurred in the PDT, PCEDT, PDTSC and Faust corpora. In addition, there are nouns, adjectives and adverbs, linked from the PDT part only, increasing the total to over 17000 valency frames for 13000 words. All the corpora have been published in 2020 as the PDT-C 1.0 corpus with the PDT-Vallex 4.0 dictionary included; this is a copy of the dictionary published as a separate item for those not interested in the corpora themselves. It is available in electronically processable format (XML), and also in more human readable form including corpus examples (see the WEBSITE link below, and the links to its main publications elsewhere in this metadata). The main feature of the lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame with additional (generalized) information about its usage and surface morphosyntactic form alternatives. It replaces the previously published unversioned edition of PDT-Vallex from 2014.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
4. PML Tree Query
- Creator:
- Pajas, Petr, Štěpánek, Jan, and Sedlák, Michal
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- toolService and tool
- Subject:
- treebank, query, and search
- Language:
- English
- Description:
- System for querying annotated treebanks in PML format. The querying uses it own query language with graphical representation. It has two different implementations (SQL and Perl) and several clients (TrEd, browser-based, command line interface).
- Rights:
- GNU General Public License, version 2, http://www.gnu.org/licenses/gpl-2.0.html, and PUB
5. Prague Arabic Dependency Treebank 1.0
- Creator:
- Hajič, Jan, Smrž, Otakar, Zemánek, Petr, Pajas, Petr, Šnaidauf, Jan, Beška, Emanuel, Kracmar, Jakub, and Hassanová, Kamila
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- corpus and Arabic
- Language:
- Arabic
- Description:
- The PADT project might be summarized as an open-ended activity of the Center for Computational Linguistics, the Institute of Formal and Applied Linguistics, and the Institute of Comparative Linguistics, Charles University in Prague, resting in multi-level annotation of Arabic language resources in the light of the theory of Functional Generative Description (Sgall et al., 1986; Hajičová and Sgall, 2003).
- Rights:
- Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB
6. Prague DaTabase of Spoken Czech 1.0
- Creator:
- Hajič, Jan, Pajas, Petr, Ircing, Pavel, Romportl, Jan, Peterek, Nino, Spousta, Miroslav, Mikulová, Marie, Grůber, Martin, and Legát, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) and University of West Bohemia
- Type:
- audio and corpus
- Subject:
- spoken corpus, speech recognition, and speech reconstruction
- Language:
- Czech
- Description:
- PDTSC 1.0 is a multi-purpose corpus of spoken language. 768,888 tokens, 73,374 sentences and 7,324 minutes of spontaneous dialog speech have been recorded, transcribed and edited in several interlinked layers: audio recordings, automatic and manual transcription and manually reconstructed text. PDTSC 1.0 is a delayed release of data annotated in 2012. It is an update of Prague Dependency Treebank of Spoken Language (PDTSL) 0.5 (published in 2009). In 2017, Prague Dependency Treebank of Spoken Czech (PDTSC) 2.0 was published as an update of PDTSC 1.0.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
7. Prague Dependency Treebank 2.0 (PDT 2.0)
- Creator:
- Hajič, Jan, Panevová, Jarmila, Hajičová, Eva, Sgall, Petr, Pajas, Petr, Štěpánek, Jan, Havelka, Jiří, Mikulová, Marie, Žabokrtský, Zdeněk, Ševčíková-Razímová, Magda, and Urešová, Zdeňka
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- corpus, Czech, treebank, and PDT
- Language:
- Czech
- Description:
- The Prague Dependency Treebank 2.0 (PDT 2.0) contains a large amount of Czech texts with complex and interlinked morphological (two million words), syntactic (1.5 MW) and complex semantic annotation (0.8 MW); in addition, certain properties of sentence information structure and coreference relations are annotated at the semantic level. PDT 2.0 is based on the long-standing Praguian linguistic tradition, adapted for the current Computational Linguistics research needs. The corpus itself uses the latest annotation technology. Software tools for corpus search, annotation and language analysis are included. Extensive documentation (in English) is provided as well. and 1ET101120413 (Data a nástroje pro informační systémy) MSM 0021620838 (Moderní metody, struktury a systémy informatiky) 1ET101120503 (Integrace jazykových zdrojů za účelem extrakce informací z přirozených textů) 1P05ME752 (Vícejazyčný valenční a predikátový slovník přirozeného jazyka) LC536 (Centrum komputační lingvistiky)
- Rights:
- PDT 2.0 License, https://lindat.mff.cuni.cz/repository/xmlui/page/license-pdt2, and ACA
8. Prague Dependency Treebank 2.0 - sample data
- Creator:
- Hajič, Jan, Panevová, Jarmila, Sgall, Petr, Pajas, Petr, Štěpánek, Jan, Havelka, Jiří, Mikulová, Marie, Žabokrtský, Zdeněk, and Ševčíková-Razímová, Magda
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- treebank, dependency, and PDT
- Language:
- Czech
- Description:
- A small subset of PDT 2.0 made available under a permissive license. Prague Dependency Treebank 2.0 (PDT 2.0) contains a large amount of Czech texts with complex and interlinked morphological (2 million words), syntactic (1.5 MW) and complex semantic annotation (0.8 MW); in addition, certain properties of sentence information structure and coreference relations are annotated at the semantic level. PDT 2.0 is based on the long-standing Praguian linguistic tradition, adapted for the current Computational Linguistics research needs. The corpus itself uses the latest annotation technology. Software tools for corpus search, annotation and language analysis are included. Extensive documentation (in English) is provided as well. and * Ministry of Education of the Czech Republic projects No. VS96151, LN00A063, 1P05ME752, MSM0021620838 and LC536, * Grant Agency of the Czech Republic grants Nos. 405/96/0198, 405/96/K214 and 405/03/0913, * research funds of the Faculty of Mathematics and Physics, * Charles University, Prague, Czech Republic, * Grant Agency of the Czech Academy of Science, Prague, Czech Republic projects No. 1ET101120503, 1ET101120413, and 1ET201120505 * Grant Agency of the Charles University No. 489/04, 350/05, 352/05 and 375/05 * the U.S. NSF Grant #IIS9732388.
- Rights:
- Creative Commons - Attribution 3.0 Unported (CC BY 3.0), http://creativecommons.org/licenses/by/3.0/, and PUB
9. Prague Dependency Treebank 3.5
- Creator:
- Hajič, Jan, Bejček, Eduard, Bémová, Alevtina, Buráňová, Eva, Hajičová, Eva, Havelka, Jiří, Homola, Petr, Kárník, Jiří, Kettnerová, Václava, Klyueva, Natalia, Kolářová, Veronika, Kučová, Lucie, Lopatková, Markéta, Mikulová, Marie, Mírovský, Jiří, Nedoluzhko, Anna, Pajas, Petr, Panevová, Jarmila, Poláková, Lucie, Rysová, Magdaléna, Sgall, Petr, Spoustová, Johanka, Straňák, Pavel, Synková, Pavlína, Ševčíková, Magda, Štěpánek, Jan, Urešová, Zdeňka, Vidová Hladká, Barbora, Zeman, Daniel, Zikánová, Šárka, and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- treebank, dependency, tectogrammatics, topic-focus articulation, multiword expressions, coreference, bridging relations, discourse, morphology, syntax, tokenization, lemmatization, clauses, semantics, semantic relations, lexical semantics, and lexicon
- Language:
- Czech
- Description:
- The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied Linguistics under various projects between 1996 and 2018 on the original texts, i.e., all annotation from PDT 1.0, PDT 2.0, PDT 2.5, PDT 3.0, PDiT 1.0 and PDiT 2.0, plus corrections, new structure of basic documentation and new list of authors covering all previous editions. The Prague Dependency Treebank 3.5 (PDT 3.5) contains the same texts as the previous versions since 2.0; there are 49,431 annotated sentences (832,823 words) on all layers, from tectogrammatical annotation to syntax to morphology. There are additional annotated sentences for syntax and morphology; the totals for the lower layers of annotation are: 87,913 sentences with 1,502,976 words at the analytical layer (surface dependency syntax) and 115,844 sentences with 1,956,693 words at the morphological layer of annotation (these totals include the annotation with the higher layers annotated as well). Closely linked to the tectogrammatical layer is the annotation of sentence information structure, multiword expressions, coreference, bridging relations and discourse relations.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
10. Prague Dependency Treebank of Spoken Language (PDTSL) 0.5
- Creator:
- Hajič, Jan, Pajas, Petr, Mareček, David, Mikulová, Marie, Urešová, Zdeňka, and Podveský, Petr
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- audio and corpus
- Subject:
- corpus and spoken language
- Language:
- Czech and English
- Description:
- The first edition of a speech corpus with a speech reconstruction layer (edited transcript). The project of speech reconstruction of Czech and English has been started at UFAL together with the PIRE project in 2005, and has gradually grown from ideas to (first) annotation specification, annotation software and actual annotation. It is part of the Prague Dependency Treebank family of annotated corpus resources and tools, to which it adds the spoken language layer(s). and LC536; MSM0021620838; IST-034344; ME838
- Rights:
- PDTSL, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-pdtsl, and ACA