Zobrazit minimální záznam

 
dc.contributor.author Straka, Milan
dc.contributor.author Straková, Jana
dc.date.accessioned 2014-04-03T15:58:02Z
dc.date.available 2014-04-03T15:58:02Z
dc.date.issued 2014-03-04
dc.identifier.uri http://hdl.handle.net/11858/00-097C-0000-0023-68D9-0
dc.description English models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from Morphium and SCOWL (Spell Checker Oriented Word Lists), the PoS tagger is trained on WSJ (Wall Street Journal).
dc.description.sponsorship This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education of the Czech Republic (project LM2010013). The morphological POS analyzer development was supported by grant of the Ministry of Education, Youth and Sports of the Czech Republic No. LC536 "Center for Computational Linguistics". The morphological POS analyzer research was performed by Johanka Spoustová (Spoustová 2008; the Treex::Tool::EnglishMorpho::Analysis Perl module). The lemmatizer was implemented by Martin Popel (Popel 2009; the Treex::Tool::EnglishMorpho::Lemmatizer Perl module). The lemmatizer is based on morpha, which was released under LGPL licence as a part of RASP system (http://ilexir.co.uk/applications/rasp). The tagger algorithm and feature set research was supported by the projects MSM0021620838 and LC536 of Ministry of Education, Youth and Sports of the Czech Republic, GA405/09/0278 of the Grant Agency of the Czech Republic and 1ET101120503 of Academy of Sciences of the Czech Republic. The research was performed by Drahomíra "johanka" Spoustová, Jan Hajič, Jan Raab and Miroslav Spousta.
dc.language.iso eng
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/
dc.source.uri http://ufal.mff.cuni.cz/morphodita/users-manual#english-morphium-wsj
dc.subject MorphoDiTa
dc.subject English
dc.subject morphological analysis
dc.subject morphological generation
dc.subject PoS tagging
dc.title English Models (Morphium + WSJ) for MorphoDiTa
dc.type languageDescription
metashare.ResourceInfo#ContactInfo#PersonInfo.surname Straka
metashare.ResourceInfo#ContactInfo#PersonInfo.givenName Milan
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationName Charles University in Prague, UFAL
metashare.ResourceInfo#DistributionInfo.availability unrestrictedUse
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse academic-nonCommercialUse
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse attribution
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse shareAlike
metashare.ResourceInfo#ContentInfo.mediaType text
metashare.ResourceInfo#TextInfo#SizeInfo.size 14
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnit mb
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.email straka@ufal.mff.cuni.cz
metashare.ResourceInfo#ContentInfo.detailedType mlmodel
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2010013 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LC536 Centrum komputační lingvistiky nationalFunds
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky MSM 0021620838 Moderní metody, struktury a systémy informatiky nationalFunds
sponsor Grantová agentura České republiky GA405/09/0278 Internet jako jazykový korpus nationalFunds
sponsor Grantová agentura Akademie věd České republiky 1ET101120503 Integrace jazykových zdrojů za účelem extrakce informací z přirozených textů nationalFunds
size.info 14 mb
files.size 27517406
files.count 2


 Soubory tohoto záznamu

 Stáhnout všechny soubory záznamu (26.24 MB)
Licenční kategorie:
Publicly Available

Licence: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
Distributed under Creative Commons Attribution Required Noncommercial Share Alike
Icon
Název
english-morphium-wsj-140407.zip
Velikost
13.12 MB
Formát
application/zip
Popis
English Models (Morphium + WSJ) for MorphoDiTa. Minor fix to english-morphium-wsj-140304 allowing to recognize "non-" as a negative prefix in addition to "non".
MD5
525dd6e87fe6420b46681204111ac403
 Stáhnout soubor  Náhled
 Náhled souboru  
  • english-morphium-wsj-140407
    • LICENSE21 kB
    • english-morphium-wsj-140407-no_negation.tagger5 MB
    • README6 kB
    • README.html8 kB
    • english-morphium-140407.dict1 MB
    • english-morphium-140407-no_negation.dict1 MB
    • english-morphium-wsj-140407.tagger5 MB
Icon
Název
english-morphium-wsj-140304.zip
Velikost
13.12 MB
Formát
application/zip
Popis
English Models (Morphium + WSJ) for MorphoDiTa
MD5
80c9a0a5f771d62d5a6f0b5f217532a7
 Stáhnout soubor  Náhled
 Náhled souboru  
  • english-morphium-wsj-140304
    • english-morphium-wsj-140304-no_negation.tagger5 MB
    • LICENSE21 kB
    • english-morphium-140304.dict1 MB
    • english-morphium-140304-no_negation.dict1 MB
    • README6 kB
    • README.html8 kB
    • english-morphium-wsj-140304.tagger5 MB

Zobrazit minimální záznam