Show simple item record Žabokrtský, Zdeněk Bafna, Nyati Bodnár, Jan Kyjánek, Lukáš Svoboda, Emil Ševčíková, Magda Vidra, Jonáš Angle, Sachi Ansari, Ebrahim Arkhangelskiy, Timofey Batsuren, Khuyagbaatar Bella, Gábor Bertinetto, Pier Marco Bonami, Olivier Celata, Chiara Daniel, Michael Fedorenko, Alexei Filko, Matea Giunchiglia, Fausto Haghdoost, Hamid Hathout, Nabil Khomchenkova, Irina Khurshudyan, Victoria Levonian, Dmitri Litta, Eleonora Medvedeva, Maria Muralikrishna, S. N. Namer, Fiammetta Nikravesh, Mahshid Padó, Sebastian Passarotti, Marco Plungian, Vladimir Polyakov, Alexey Potapov, Mihail Pruthwik, Mishra Rao B, Ashwath Rubakov, Sergei Samar, Husain Sharma, Dipti Misra Šnajder, Jan Šojat, Krešimir Štefanec, Vanja Talamo, Luigi Tribout, Delphine Vodolazsky, Daniil Vydrin, Arseniy Zakirova, Aigul Zeller, Britta 2022-01-24T15:25:57Z 2022-01-24T15:25:57Z 2022-01-17
dc.description Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation scheme for many languages. The annotation scheme consists of simple tab-separated columns that stores a word and its morphological segmentations, including pieces of information about the word and the segmented units, e.g., part-of-speech categories, type of morphs/morphemes etc. The current public version of the collection contains 38 harmonised segmentation datasets covering 30 different languages.
dc.language.iso ces
dc.language.iso cat
dc.language.iso deu
dc.language.iso eng
dc.language.iso fas
dc.language.iso fin
dc.language.iso fra
dc.language.iso hbs
dc.language.iso hrv
dc.language.iso hun
dc.language.iso ita
dc.language.iso kpv
dc.language.iso lat
dc.language.iso mdf
dc.language.iso chm
dc.language.iso mon
dc.language.iso myv
dc.language.iso pol
dc.language.iso por
dc.language.iso rus
dc.language.iso spa
dc.language.iso swe
dc.language.iso tgk
dc.language.iso udm
dc.language.iso hye
dc.language.iso ben
dc.language.iso hin
dc.language.iso mal
dc.language.iso mar
dc.language.iso kan
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Universal Segmentations 1.0 License Terms
dc.subject universal segmentations
dc.subject morphological segmentation
dc.subject word segmentation
dc.subject segmentation
dc.subject morphology
dc.subject morphemes
dc.subject morphological dictionary
dc.subject unisegments
dc.subject morph
dc.subject multilingual
dc.title Universal Segmentations 1.0 (UniSegments 1.0)
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.mediaType text
metashare.ResourceInfo#ContentInfo.detailedType lexicon
dc.rights.label PUB
has.files yes
contact.person Jonáš Vidra Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
contact.person Zdeněk Žabokrtský Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
sponsor Grantová agentura České Republiky 19-14534S Popis slovotvorné struktury českých slov na základě jazykových dat nationalFunds
sponsor Charles University START/HUM/010 A data-based approach to competition in word-formation: selected semantic categories across seven languages nationalFunds
sponsor Univerzita Karlova (mimo GAUK) SVV 260 453 Specifický vysokoškolský výzkum nationalFunds
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2018101 LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy nationalFunds 38 files
files.size 136889577
files.count 1

 Files in this item

This item is
Publicly Available
and licensed under:
Universal Segmentations 1.0 License Terms
The MIT License Distributed under Creative Commons
130.55 MB
 Download file  Preview
 File Preview  
  • UniSegments-1.0-public
    • data
      • por-MorphyNet
        • README.TXT633 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-por-MorphyNet.useg75 MB
      • fin-MorphyNet
        • README.TXT633 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-fin-MorphyNet.useg67 MB
      • fra-Demonette
        • README.TXT399 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-fra-Demonette.useg16 MB
      • eng-MorphoLex
        • README.TXT1004 B
        • UniSegments-1.0-eng-MorphoLex.useg17 MB
        • LICENSE.TXT15 kB
      • rus-DerivBaseRU
        • README.TXT372 B
        • LICENSE.TXT10 kB
        • UniSegments-1.0-rus-DerivBaseRU.useg33 MB
      • fas-PerSegLex
        • README.TXT498 B
        • UniSegments-1.0-fas-PerSegLex.useg8 MB
        • LICENSE.TXT15 kB
      • rus-MorphyNet
        • README.TXT633 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-rus-MorphyNet.useg119 MB
      • fra-Echantinom
        • README.TXT203 B
        • LICENSE.TXT13 kB
        • UniSegments-1.0-fra-Echantinom.useg1 MB
      • myv-Uniparser
        • README.TXT522 B
        • LICENSE.TXT1 kB
        • UniSegments-1.0-myv-Uniparser.useg52 MB
      • mon-MorphyNet
        • README.TXT633 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-mon-MorphyNet.useg5 MB
      • fra-MorphyNet
        • README.TXT633 B
        • UniSegments-1.0-fra-MorphyNet.useg61 MB
        • LICENSE.TXT19 kB
      • hin-KCIS
        • README.TXT1 kB
        • LICENSE.TXT13 kB
        • UniSegments-1.0-hin-KCIS.useg2 MB
      • hye-Uniparser
        • README.TXT522 B
        • LICENSE.TXT1 kB
        • UniSegments-1.0-hye-Uniparser.useg189 MB
      • spa-MorphyNet
        • README.TXT633 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-spa-MorphyNet.useg90 MB
      • mhr-Uniparser
        • README.TXT522 B
        • LICENSE.TXT1 kB
        • UniSegments-1.0-mhr-Uniparser.useg83 MB
      • cat-MorphyNet
        • README.TXT633 B
        • UniSegments-1.0-cat-MorphyNet.useg86 MB
        • LICENSE.TXT19 kB
      • fra-MorphoLex
        • README.TXT1004 B
        • LICENSE.TXT15 kB
        • UniSegments-1.0-fra-MorphoLex.useg2 MB
      • ita-MorphyNet
        • README.TXT633 B
        • UniSegments-1.0-ita-MorphyNet.useg100 MB
        • LICENSE.TXT19 kB
      • pol-MorphyNet
        • README.TXT633 B
        • UniSegments-1.0-pol-MorphyNet.useg85 MB
        • LICENSE.TXT19 kB
      • kan-KCIS
        • README.TXT1 kB
        • LICENSE.TXT13 kB
        • UniSegments-1.0-kan-KCIS.useg10 MB
      • udm-Uniparser
        • README.TXT522 B
        • LICENSE.TXT1 kB
        • UniSegments-1.0-udm-Uniparser.useg134 MB
      • deu-MorphyNet
        • README.TXT633 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-deu-MorphyNet.useg4 MB
      • tgk-Uniparser
        • README.TXT522 B
        • LICENSE.TXT1 kB
        • UniSegments-1.0-tgk-Uniparser.useg78 MB
      • mal-KCIS
        • README.TXT1 kB
        • UniSegments-1.0-mal-KCIS.useg18 MB
        • LICENSE.TXT13 kB
      • ita-DerIvaTario
        • README.TXT438 B
        • LICENSE.TXT14 kB
        • UniSegments-1.0-ita-DerIvaTario.useg4 MB
      • mar-KCIS
        • README.TXT1 kB
        • LICENSE.TXT13 kB
        • UniSegments-1.0-mar-KCIS.useg13 MB
      • lat-WordFormationLatin
        • README.TXT336 B
        • LICENSE.TXT15 kB
        • UniSegments-1.0-lat-WordFormationLatin.useg8 MB
      • ben-KCIS
        • README.TXT1 kB
        • LICENSE.TXT13 kB
        • UniSegments-1.0-ben-KCIS.useg950 kB
      • ces-DeriNet
        • README.TXT643 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-ces-DeriNet.useg291 MB
      • swe-MorphyNet
        • README.TXT633 B
        • UniSegments-1.0-swe-MorphyNet.useg73 MB
        • LICENSE.TXT19 kB
      • hrv-CroDeriV
        • README.TXT506 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-hrv-CroDeriV.useg3 MB
      • mdf-Uniparser
        • UniSegments-1.0-mdf-Uniparser.useg32 MB
        • README.TXT522 B
        • LICENSE.TXT1 kB
      • hbs-MorphyNet
        • README.TXT633 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-hbs-MorphyNet.useg5 MB
      • kpv-Uniparser
        • README.TXT522 B
        • LICENSE.TXT1 kB
        • UniSegments-1.0-kpv-Uniparser.useg63 MB
      • hun-MorphyNet
        • README.TXT633 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-hun-MorphyNet.useg72 MB
      • deu-DerivBaseDE
        • README.TXT496 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-deu-DerivBaseDE.useg10 MB
      • ces-MorphyNet
        • README.TXT633 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-ces-MorphyNet.useg11 MB
      • eng-MorphyNet
        • README.TXT633 B
        • LICENSE.TXT19 kB
        • UniSegments-1.0-eng-MorphyNet.useg49 MB
    • doc
      • LICENSE2 kB
      • README.md3 kB
      • Towards-Universal-Segmentations-Survey-of-Existing-Morphosegmentation-Resources.pdf441 kB

Show simple item record