This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Universal Segmentations 1.5 (UniSegments 1.5)

Please use the following text to cite this item or export to a predefined format:
John, Vojtěch; et al., 2026, Universal Segmentations 1.5 (UniSegments 1.5), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-6130.
Date issued
2026-03-25
Size
5041832 entries
Description
Universal Segmentations (UniSegments) is a collection of lexical resources that captures morphological segmentations, harmonized into a cross-linguistically consistent annotation scheme. The file format consists of simple tab-separated columns, where each entry represents a word and its morphological segmentations. Additionally, the entries include information such as part-of-speech categories and morph types. The second publicly available version of this collection, UniSegments v1.5, includes 62 harmonized segmentation datasets covering 46 languages from various language families.
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
UniSegments-1.5-public.tar.gz
Size
70.52 MB
Format
application/x-gzip
Description
MD5
331b1d65ea95ae76d847dae71694e486
Preview
  File Preview