dc.contributor.author | Müller, Thomas |
dc.contributor.author | Schütze, Hinrich |
dc.date.accessioned | 2015-06-08T09:25:01Z |
dc.date.available | 2015-06-08T09:25:01Z |
dc.date.issued | 2015-03-01 |
dc.identifier.uri | http://hdl.handle.net/11234/LRT-1483 |
dc.description | Dictionaries with different representations for various languages. Representations include brown clusters of different sizes and morphological dictionaries extracted using different morphological analyzers. All representations cover the most frequent 250,000 word types on the Wikipedia version of the respective language. Analzers used: MAGYARLANC (Hungarian, Zsibrita et al. (2013)), FREELING (English and Spanish, Padro and Stanilovsky (2012)), SMOR (German, Schmid et al. (2004)), an MA from Charles University (Czech, Hajic (2001)) and LATMOR (Latin, Springmann et al. (2014)). |
dc.language.iso | eng |
dc.language.iso | deu |
dc.language.iso | lat |
dc.language.iso | hun |
dc.language.iso | spa |
dc.language.iso | ces |
dc.publisher | Center for Information and Language Processing, University of Munich |
dc.rights | Creative Commons - Attribution 3.0 Unported (CC BY 3.0) |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/ |
dc.source.uri | http://cistern.cis.lmu.de/marmot/naacl2015/ |
dc.subject | morphological dictionary |
dc.subject | morphological analysis |
dc.subject | PoS tagging |
dc.title | Word representations for multiple languages |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LRT + Open Submissions |
contact.person | Thomas Müller muellets@cis.lmu.de Center for Information and Language Processing, University of Munich |
sponsor | Deutsche Forschungsgemeinschaft DFG 2246/2 Wordgraph nationalFunds |
sponsor | Google Europe NLP / 2012 Google Europe Doctoral Fellowship Other |
size.info | 250000 words |
files.size | 318632335 |
files.count | 36 |
Files in this item
Download all files in item (303.87 MB)This item is
Creative Commons - Attribution 3.0 Unported (CC BY 3.0)
Publicly Available
and licensed under:Creative Commons - Attribution 3.0 Unported (CC BY 3.0)
- Name
- cs_marlin_cluster_200
- Size
- 3.18 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 6bb83d779be68d4777baf844131ffd32
- Name
- cs_marlin_cluster_500
- Size
- 3.26 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 8ba54e2f8a2eae938b69b63cfc0f87e4
- Name
- cs_marlin_cluster_100
- Size
- 3.03 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 94665a3018a70ebadb3887471f8978a3
- Name
- cs_marlin_cluster_1000
- Size
- 3.28 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 35514bf59d33a484e29ee8b8a2f52fcd
- Name
- cs_brown_1000
- Size
- 5.26 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 363414e183383ea518f03720d2fac1a3
- Name
- de_marlin_cluster_100
- Size
- 3.34 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 86c65b7de7f5d2da8d823e546919f5f7
- Name
- de_marlin_cluster_1000
- Size
- 3.58 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 4ef8eca2846a0cce6bb0e87d5d6f2978
- Name
- de_marlin_cluster_500
- Size
- 3.57 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- e80564f0d4c230fb1c8ab4955b7245d4
- Name
- de_marlin_cluster_200
- Size
- 3.52 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 7dd15f770cc46fa7fcda2221b60f42a9
- Name
- de_brown_1000
- Size
- 5.31 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 2fe8836db5a61149b712f15080432fd5
- Name
- en_marlin_cluster_200
- Size
- 2.94 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- e439323d02b7f028aefa43b241c81435
- Name
- en_marlin_cluster_100
- Size
- 2.74 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- a0caf5c9ede8e70fda33e8aa764e8d43
- Name
- en_marlin_cluster_500
- Size
- 2.97 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 76dfbc92ff3b8efaa7c3ffd26862049f
- Name
- en_marlin_cluster_1000
- Size
- 2.98 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- aeec947ac699bec6fa5ca6fc18751ceb
- Name
- en_brown_1000
- Size
- 5 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 7dd120107a0694b7eb4a4cf6cf14998e
- Name
- es_marlin_cluster_100
- Size
- 3.39 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- ab6b37a6e13a57c728ba8b4a9156a6ab
- Name
- es_marlin_cluster_200
- Size
- 3.54 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 18c7bb59acbc0131f3748183f6d6605d
- Name
- es_marlin_cluster_500
- Size
- 3.61 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 499c83b38d03bfd54ea50485cb91cc29
- Name
- es_marlin_cluster_1000
- Size
- 3.63 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 60ac3e3cb7912cc07c8ef92a83b56d95
- Name
- es_brown_1000
- Size
- 5.41 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 67436df6c1006d19d26df9ac222d9246
- Name
- en_mdict
- Size
- 7.57 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- adfc204da2c34ab1841a0c19a09d9291
- Name
- hu_marlin_cluster_100
- Size
- 3.35 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 3241e381224c2706447571718d39808b
- Name
- hu_marlin_cluster_1000
- Size
- 3.6 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 482ab4996f6f7448dcd78d848f298b5f
- Name
- hu_brown_1000
- Size
- 5.71 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 29cbd50c751bbee450b64052b38f979d
- Name
- hu_marlin_cluster_200
- Size
- 3.5 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- b175b85372f3fda3f2cba4c62d8bbe93
- Name
- hu_marlin_cluster_500
- Size
- 3.58 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- bd5cf76d1873ac239e618ad4204edcfa
- Name
- la_marlin_cluster_200
- Size
- 3.23 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 7abee31c8acf9faf35485901437bc958
- Name
- la_marlin_cluster_100
- Size
- 3.09 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 409ad9985953c98dfeab51048cbf7c66
- Name
- la_marlin_cluster_500
- Size
- 3.3 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 95d9d2d22ba7d9e8a9c51343155d2e36
- Name
- la_brown_1000
- Size
- 5.35 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- f8cb043ee009cc900c392e2ca8a5141b
- Name
- la_marlin_cluster_1000
- Size
- 3.34 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- d1b3f7edcb4a533f6f2d1fc040b64b98
- Name
- hu_mdict
- Size
- 17.37 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 6e375370757672dd00fc9e89da88d5bf
- Name
- la_mdict
- Size
- 20.39 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- bbf91414242ddb5522eb2d811661d3b2
- Name
- es_mdict
- Size
- 31.3 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- d6e55d58af424141508db15afce9db3d
- Name
- de_mdict
- Size
- 46.61 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 8472805a6cd8effbb1fd98c06708b75b
- Name
- cs_mdict
- Size
- 69.05 MB
- Format
- Unknown
- Description
- Unknown
- MD5
- 3fe875bb5fcbfa03fb09ae46c2d20dbc