Universal Segmentations 1.5 (UniSegments 1.5)
Please use the following text to cite this item or export to a predefined format:
John, Vojtěch; et al., 2026,
Universal Segmentations 1.5 (UniSegments 1.5), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-6130.
Authors
John, Vojtěch ; et al.
Item identifier
Project URL
Date issued
2026-03-25
Size
5041832 entries
Language(s)
Pedi,
Description
Universal Segmentations (UniSegments) is a collection of lexical resources that captures morphological segmentations, harmonized into a cross-linguistically consistent annotation scheme. The file format consists of simple tab-separated columns, where each entry represents a word and its morphological segmentations. Additionally, the entries include information such as part-of-speech categories and morph types.
The second publicly available version of this collection, UniSegments v1.5, includes 62 harmonized segmentation datasets covering 46 languages from various language families.
Acknowledgement
Charles University
Project code:24/SSH/009
Project name:Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives
The Charles University Grant Agency
Project code:GAUK 101924
Project name:Modeling Morpheme Flow
Czech Science Foundation
Project code:26-21822S
Project name:Complexity of inflection and word-formation: An intra- and cross-linguistic perspective
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2023062
Project name:LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy
Subject(s)
Collections
Files in this item
- Name
- UniSegments-1.5-public.tar.gz
- Size
- 70.52 MB
- Format
- application/x-gzip
- Description
- MD5
- 331b1d65ea95ae76d847dae71694e486

- universal-segmentations-1.5
- data
- tso-Sadilar
- README.txt391 B
- tso-Sadilar.useg1 MB
- LICENSE.txt146 B
- swe-Morphynet
- README.txt1 kB
- LICENSE.txt152 B
- swe-Morphynet.useg21 MB
- eng-MorphoLex
- README.txt790 B
- LICENSE.txt158 B
- eng-MorphoLex.useg16 MB
- fas-PerSegLex
- README.txt441 B
- LICENSE.txt158 B
- fas-PerSegLex.useg9 MB
- kan-kcis
- README.txt545 B
- kan-kcis.useg10 MB
- LICENSE.txt152 B
- tel-Metamorphosis
- LICENSE.txt152 B
- tel-Metamorphosis.useg45 kB
- ita-Metamorphosis
- LICENSE.txt152 B
- ita-Metamorphosis.useg43 kB
- spa-Metamorphosis
- LICENSE.txt152 B
- spa-Metamorphosis.useg43 kB
- hbs-Morphynet
- README.txt1 kB
- LICENSE.txt152 B
- hbs-Morphynet.useg1 MB
- myv-Uniparser
- myv-Uniparser.useg50 MB
- README.txt590 B
- LICENSE.txt119 B
- rus-Metamorphosis
- LICENSE.txt152 B
- rus-Metamorphosis.useg51 kB
- hun-Morphynet
- README.txt1 kB
- LICENSE.txt152 B
- hun-Morphynet.useg6 MB
- mal-kcis
- README.txt545 B
- LICENSE.txt152 B
- mal-kcis.useg18 MB
- hye-Uniparser
- README.txt590 B
- LICENSE.txt119 B
- hye-Uniparser.useg189 MB
- mhr-Uniparser
- mhr-Uniparser.useg77 MB
- README.txt590 B
- LICENSE.txt119 B
- hrv-Metamorphosis
- LICENSE.txt152 B
- hrv-Metamorphosis.useg46 kB
- eng-Morphynet
- README.txt1 kB
- eng-Morphynet.useg92 MB
- LICENSE.txt152 B
- ces-Morphynet
- README.txt1 kB
- LICENSE.txt152 B
- ces-Morphynet.useg82 MB
- uig-thuuymorph
- README.txt2 kB
- LICENSE.txt117 B
- uig-thuuymorph.useg3 MB
- ces-SlavickovaDict
- ces-SlavickovaDict.useg5 MB
- README.txt589 B
- LICENSE.txt158 B
- fra-MorphoLex
- fra-MorphoLex.useg2 MB
- README.txt689 B
- LICENSE.txt158 B
- mar-kcis
- README.txt545 B
- LICENSE.txt152 B
- mar-kcis.useg13 MB
- por-Morphynet
- README.txt1 kB
- LICENSE.txt152 B
- por-Morphynet.useg2 MB
- lat-WFL
- lat-WFL.useg10 MB
- README.txt429 B
- LICENSE.txt158 B
- ukr-Metamorphosis
- ukr-Metamorphosis.useg49 kB
- LICENSE.txt152 B
- fin-Morphynet
- README.txt1 kB
- fin-Morphynet.useg186 MB
- LICENSE.txt152 B
- ben-kcis
- ben-kcis.useg951 kB
- README.txt545 B
- LICENSE.txt152 B
- nso-Sadilar
- nso-Sadilar.useg1 MB
- README.txt391 B
- LICENSE.txt146 B
- rus-Morphynet
- rus-Morphynet.useg24 MB
- README.txt1 kB
- LICENSE.txt152 B
- xho-Sadilar
- xho-Sadilar.useg4 MB
- README.txt391 B
- LICENSE.txt146 B
- mon-Morphynet
- mon-Morphynet.useg2 MB
- README.txt1 kB
- LICENSE.txt152 B
- fra-Morphynet
- README.txt1 kB
- LICENSE.txt152 B
- fra-Morphynet.useg29 MB
- udm-Uniparser
- udm-Uniparser.useg138 MB
- README.txt590 B
- LICENSE.txt119 B
- spa-Morphynet
- spa-Morphynet.useg49 MB
- README.txt1 kB
- LICENSE.txt152 B
- ven-Sadilar
- README.txt391 B
- LICENSE.txt146 B
- ven-Sadilar.useg1017 kB
- cat-Morphynet
- README.txt1 kB
- cat-Morphynet.useg4 MB
- LICENSE.txt152 B
- tgk-Uniparser
- tgk-Uniparser.useg76 MB
- README.txt590 B
- LICENSE.txt119 B
- zho-Metamorphosis
- zho-Metamorphosis.useg34 kB
- LICENSE.txt152 B
- tsn-Sadilar
- README.txt391 B
- tsn-Sadilar.useg1 MB
- LICENSE.txt146 B
- pol-Metamorphosis
- pol-Metamorphosis.useg48 kB
- LICENSE.txt152 B
- nbl-Sadilar
- README.txt391 B
- LICENSE.txt146 B
- nbl-Sadilar.useg4 MB
- ita-DerIvaTario
- README.txt1 kB
- ita-DerIvaTario.useg4 MB
- LICENSE.txt152 B
- epo-Metamorphosis
- epo-Metamorphosis.useg42 kB
- LICENSE.txt152 B
- ita-Morphynet
- README.txt1 kB
- LICENSE.txt152 B
- ita-Morphynet.useg17 MB
- pol-Morphynet
- README.txt1 kB
- pol-Morphynet.useg12 MB
- LICENSE.txt152 B
- fra-demonette
- fra-demonette.useg16 MB
- README.txt846 B
- LICENSE.txt158 B
- zul-Sadilar
- zul-Sadilar.useg4 MB
- README.txt391 B
- LICENSE.txt146 B
- ell-GreekAnnotatedDictionary
- README.txt805 B
- LICENSE.txt152 B
- ell-GreekAnnotatedDictionary.useg2 MB
- bel-Slounik
- README.txt1 kB
- bel-Slounik.useg10 MB
- LICENSE.txt492 B
- hrv-CroDeriV
- README.txt459 B
- hrv-CroDeriV.useg3 MB
- LICENSE.txt152 B
- deu-Morphynet
- README.txt1 kB
- LICENSE.txt152 B
- deu-Morphynet.useg55 MB
- fra-Metamorphosis
- fra-Metamorphosis.useg42 kB
- LICENSE.txt152 B
- lat-Metamorphosis
- lat-Metamorphosis.useg46 kB
- LICENSE.txt152 B
- mdf-Uniparser
- README.txt590 B
- LICENSE.txt119 B
- mdf-Uniparser.useg31 MB
- eng-Metamorphosis
- eng-Metamorphosis.useg38 kB
- LICENSE.txt152 B
- hin-kcis
- README.txt545 B
- LICENSE.txt152 B
- hin-kcis.useg2 MB
- kpv-Uniparser
- kpv-Uniparser.useg66 MB
- README.txt590 B
- LICENSE.txt119 B
- sot-Sadilar
- README.txt391 B
- sot-Sadilar.useg1 MB
- LICENSE.txt146 B
- ssw-Sadilar
- ssw-Sadilar.useg4 MB
- README.txt391 B
- LICENSE.txt146 B
- ell-Metamorphosis
- LICENSE.txt152 B
- ell-Metamorphosis.useg51 kB
- ces-Metamorphosis
- ces-Metamorphosis.useg49 kB
- LICENSE.txt152 B
- deu-Metamorphosis
- deu-Metamorphosis.useg44 kB
- LICENSE.txt152 B
- tso-Sadilar
- doc
- README.md7 kB
- LICENSE.txt419 B
- data

