This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
Please use the following text to cite this item or export to a predefined format:
Straka, Milan and Straková, Jana, 2014, English Models (Morphium + WSJ) for MorphoDiTa, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11858/00-097C-0000-0023-68D9-0.
dc.contributor.authorStraka, Milan
dc.contributor.authorStraková, Jana
dc.date.accessioned2014-04-03T15:58:02Z
dc.date.available2014-04-03T15:58:02Z
dc.date.issued2014-03-04
dc.descriptionEnglish models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from Morphium and SCOWL (Spell Checker Oriented Word Lists), the PoS tagger is trained on WSJ (Wall Street Journal).
dc.description.sponsorshipThis work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education of the Czech Republic (project LM2010013). The morphological POS analyzer development was supported by grant of the Ministry of Education, Youth and Sports of the Czech Republic No. LC536 "Center for Computational Linguistics". The morphological POS analyzer research was performed by Johanka Spoustová (Spoustová 2008 the Treex::Tool::EnglishMorpho::Analysis Perl module). The lemmatizer was implemented by Martin Popel (Popel 2009 the Treex::Tool::EnglishMorpho::Lemmatizer Perl module). The lemmatizer is based on morpha, which was released under LGPL licence as a part of RASP system (http://ilexir.co.uk/applications/rasp). The tagger algorithm and feature set research was supported by the projects MSM0021620838 and LC536 of Ministry of Education, Youth and Sports of the Czech Republic, GA405/09/0278 of the Grant Agency of the Czech Republic and 1ET101120503 of Academy of Sciences of the Czech Republic. The research was performed by Drahomíra "johanka" Spoustová, Jan Hajič, Jan Raab and Miroslav Spousta.
dc.identifier.urihttp://hdl.handle.net/11858/00-097C-0000-0023-68D9-0
dc.language.isoeng
dc.publisherCharles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
dc.rights.labelPUB
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/
dc.source.urihttp://ufal.mff.cuni.cz/morphodita/users-manual#english-morphium-wsj
dc.subjectMorphoDiTa
dc.subjectEnglish
dc.subjectmorphological analysis
dc.subjectmorphological generation
dc.subjectPoS tagging
dc.titleEnglish Models (Morphium + WSJ) for MorphoDiTa
dc.typelanguageDescription
local.brandingLINDAT / CLARIAH-CZ
local.files.count2
local.files.size13758599
local.has.filesyes
local.language.nameEnglish
local.size.info14 mb
local.sponsornationalFunds LM2010013 Ministerstvo školství, mládeže a tělovýchovy České republiky LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat
local.sponsornationalFunds LC536 Ministerstvo školství, mládeže a tělovýchovy České republiky Centrum komputační lingvistiky
local.sponsornationalFunds MSM 0021620838 Ministerstvo školství, mládeže a tělovýchovy České republiky Moderní metody, struktury a systémy informatiky
local.sponsornationalFunds GA405/09/0278 Grantová agentura České republiky Internet jako jazykový korpus
local.sponsornationalFunds 1ET101120503 Grantová agentura Akademie věd České republiky Integrace jazykových zdrojů za účelem extrakce informací z přirozených textů
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.emailstraka@ufal.mff.cuni.cz
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationNameCharles University in Prague, UFAL
metashare.ResourceInfo#ContactInfo#PersonInfo.givenNameMilan
metashare.ResourceInfo#ContactInfo#PersonInfo.surnameStraka
metashare.ResourceInfo#ContentInfo.detailedTypemlmodel
metashare.ResourceInfo#ContentInfo.mediaTypetext
metashare.ResourceInfo#ContentInfo.resourceTypelanguageDescription
metashare.ResourceInfo#DistributionInfo#LicenseInfo.licenseAttribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUseacademic-nonCommercialUse
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUseattribution
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUseshareAlike
metashare.ResourceInfo#DistributionInfo.availabilityunrestrictedUse
metashare.ResourceInfo#IdentificationInfo.resourceNameEnglish Models (Morphium + WSJ) for MorphoDiTa
metashare.ResourceInfo#TextInfo#SizeInfo.size14
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnitmb
This item isPublicly Available
and licensed under:
 Files in this item
Name
english-morphium-wsj-140304.zip
Size
13.12 MB
Format
application/zip
Description
English Models (Morphium + WSJ) for MorphoDiTa
MD5
80c9a0a5f771d62d5a6f0b5f217532a7
Preview
  File Preview
  • english-morphium-wsj-140304
    • english-morphium-wsj-140304-no_negation.tagger5 MB
    • LICENSE21 kB
    • english-morphium-140304-no_negation.dict1 MB
    • english-morphium-140304.dict1 MB
    • README6 kB
    • README.html8 kB
    • english-morphium-wsj-140304.tagger5 MB
Name
english-morphium-wsj-140407.zip
Size
13.12 MB
Format
application/zip
Description
English Models (Morphium + WSJ) for MorphoDiTa. Minor fix to english-morphium-wsj-140304 allowing to recognize "non-" as a negative prefix in addition to "non".
MD5
525dd6e87fe6420b46681204111ac403
Preview
  File Preview
  • english-morphium-wsj-140407
    • LICENSE21 kB
    • english-morphium-wsj-140407-no_negation.tagger5 MB
    • README6 kB
    • README.html8 kB
    • english-morphium-140407.dict1 MB
    • english-morphium-140407-no_negation.dict1 MB
    • english-morphium-wsj-140407.tagger5 MB