What's New

 toolService 
toolService
Description:
Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks such as text acquisition, cleaning or tagging ...
 This item contains no files.
 corpus 
corpus
Description:
The corpus presented consists of job ads in Spanish related to Engineering positions in Peru. The documents were preprocessed and annotated for POS tagging, NER, and topic modeling tasks. The corpus is divided in two ...
 This item contains 1 file (10.99 MB).
 
Publicly Available Distributed under Creative Commons Attribution Required
 corpus 
corpus
Description:
Corpus of texts in 12 languages. For each language, we provide one training, one development and one testing set acquired from Wikipedia articles. Moreover, each language dataset contains (substantially larger) training ...
 This item contains 13 files (17.37 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike

Most Viewed Items

Top Last Week
 languageDescription 
languageDescription
Description:
Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed ...
 This item contains 46 files (629.66 GB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike
 corpus 
corpus
Description:
A slightly modified version of the Czech Wordnet. This is the version used to annotate "The Lexico-Semantic Annotation of PDT using Czech WordNet": http://hdl.handle.net/11858/00-097C-0000-0001-487A-4 The Czech WordNet ...
 This item contains 1 file (440.85 KB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike
 corpus 
corpus
Description:
Additional three Czech reference translations of the whole WMT 2011 data set (http://www.statmt.org/wmt11/test.tgz), translated from the German originals. Original segmentation of the WMT 2011 data is preserved.
 This item contains 1 file (527.44 KB).
 
Publicly Available Distributed under Creative Commons Attribution Required Noncommercial Share Alike