Harvested from: LINDAT/CLARIAH-CZ repository - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Harvested from LINDAT/CLARIAH-CZ repository Date Unknown

111. Cashinahua corpus

Publisher:: Max Planck Institute for Evolutionary Anthropology and Université Paris X Nanterre
Type:: corpus
Description:: Documentation of the Cashinahua project (DoBeS project)
Rights:: Code of conduct

112. Catalan Annotated Corpora CQP

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: This RESTful service allows to define a sub-corpus from different annotated corpora. The service includes a POS tag harmonisation process where original tags are converted to EAGLES/Parole format. The eventual sub-corpus is indexed using the IMS CWB tool. The user receives an ID which can be used by the CQP service to exploit the sub-corpus.
Rights:: Not specified

113. Catalan Digital Press

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: This RESTful service accesses part of the Hemeroteca Digital de l’Arxiu Municipal de Girona (digital press archive from the Girona city council), specifically Catalan press from 2003. The service uses the SRU protocol.
Rights:: Not specified

114. catdoc

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Format conversion service: Word .doc to .txt converter
Rights:: Not specified

115. CEHugeWebCorpus

Creator:: Rüdiger, Jan Oliver
Publisher:: Rüdiger, Jan Oliver
Type:: text and corpus
Subject:: corpus, German, Germanistik, Web corpus, web corpora, and CorpusExplorer
Language:: German
Description:: This corpus was originally created for performance testing (server infrastructure CorpusExplorer - see: diskurslinguistik.net / diskursmonitor.de). It includes the filtered database (German texts only) of CommonCrawl (as of March 2018). First, the URLs were filtered according to their top-level domain (de, at, ch). Then the texts were classified using NTextCat and only uniquely German texts were included in the corpus. The texts were then annotated using TreeTagger (token, lemma, part-of-speech). 2.58 million documents - 232.87 million sentences - 3.021 billion tokens. You can use CorpusExplorer (http://hdl.handle.net/11234/1-2634) to convert this data into various other corpus formats (XML, JSON, Weblicht, TXM and many more).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), PUB, and http://creativecommons.org/licenses/by-nc-sa/4.0/

116. CELEX (web version)

Publisher:: Max Planck Institute for Psycholinguistics
Type:: lexicalConceptualResource
Language:: Dutch, English, and German
Rights:: Not specified

117. CELT Corpus of Electronic Texts

Publisher:: University College, Cork
Format:: application/tei+xml
Type:: corpus
Language:: English, Irish, and Latin
Description:: searchable online corpus of multilingual texts of Irish literature and history
Rights:: Not specified

118. Cercador NEOROM

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Language:: Catalan and Spanish
Description:: Search engine for the neologisms database of the NEOROM network. The network collects neologisms used in the press written in Romance languages from 2005 onwards.
Rights:: Not specified

119. Cercador OBNEO

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: toolService
Description:: Search engine of the BOBNEO data bank, a database of neologisms present in the mass media in Spanish and Catalan, written and oral, from 1992.
Rights:: Not specified

120. Česílko

Creator:: Hajič, Jan, Kuboň, Vladislav, and Homola, Petr
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService
Subject:: machine translation and Czech-Slovak translation
Language:: Czech
Description:: Česílko is a tool enabling the fast and efficient translation from one source language into many target languages, which are mutually related.
Rights:: Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0), http://creativecommons.org/licenses/by-nc-nd/3.0/, and PUB

« Previous
Next »
1
2
…
8
9
10
11
12
13
14
15
16
…
112
113

111. Cashinahua corpus

112. Catalan Annotated Corpora CQP

113. Catalan Digital Press

114. catdoc

115. CEHugeWebCorpus

116. CELEX (web version)

117. CELT Corpus of Electronic Texts

118. Cercador NEOROM

119. Cercador OBNEO

120. Česílko

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Original context has metadata only

Harvested from