What's New
OBRAZ
Description:
The key idea of our project is to convey to the widest possible readership detailed abstracts of the testimonies of Roma and Sinti and thus their personal and irreplaceable experience of the Second World War. We hope that ...
This item contains 1 file (6.53
MB).
Publicly Available
toolService
Description:
Tokenizer, POS Tagger, Lemmatizer and Parser models for 147 treebanks of 78 languages of Universal Depenencies 2.15 Treebanks, created solely using UD 2.15 data (https://hdl.handle.net/11234/1-5787). The model documentation ...
This item contains 1 file (8.53
GB).
Publicly Available
corpus
Description:
*** german version see below ***
The ‘Ancillary Monitor Corpus: Common Crawl - german web’ was designed with the aim of enabling a broad-based linguistic analysis of the German-language (visible) internet over time - ...
This item contains 272 files (53.6
GB).
Publicly Available
Most Viewed Items
Top Last Week
corpus
Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and ...
This item contains 3 files (650.18
MB).
Publicly Available
corpus
Description:
The ParCzech 3.0 corpus is the third version of ParCzech consisting of stenographic protocols that record the Chamber of Deputies’ meetings held in the 7th term (2013-2017) and the current 8th term (2017-Mar 2021). The ...
This item contains 40 files (1064.79
GB).
Publicly Available
corpus
Description:
A set of corpora for 120 languages automatically collected from wikipedia and the web.
Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
This item contains 122 files (18.91
GB).
Publicly Available