UDMorph - Statistics
Below is an overview of the amount of data in UDMorph. These data are updated periodically.
Taggers
The tagger count includes all the languages for which a tagger model is accessible via the UDMorph interface. That includes models that are trained locally, but also models that are run via the API of a remote tool, most notably the models running in the UDPIPE2 REST interface.
Current count: 146 languages
|
|
Corpora
This corpus count includes all datasets that are accessible as a corpus via the UDMorph interface as a TEITOK corpus. For all corpora, a downloadable tagger model is available.
Current count: 54 corpora
Git Repositories
The repository count includes all datasets that are accessible as a Git repository in CoNLL-U format via the UDMorph Git group. This includes repositories that are maintained elsewhere and cloned under UDMorph, datasets that were converted to CoNLL-U from elsewhere (most notably from the UDMorph corpora), and datasets that are maintained directly in the UDMorph group.
Current count: 4 repositories
The number of repositories is currently still very limited since we strive to only publish data with the consent of the creators of the data even in cases where the original data are open source. Over time, all UDMorph corpora should in principle be released as a CoNLL-U repository.
UDMorph Release
The release count includes all the corpora that meet all the standards of a full UDMorph dataset, and the count always represents the last periodic release.
No releases have yet been made