MSTperl parser (2015-05-19)
Please use the following text to cite this item or export to a predefined format:
Rosa, Rudolf, 2015,
MSTperl parser (2015-05-19), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-1480.
Authors
Item identifier
Project URL
Date issued
2015-05-19
Type
Description
MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html).
MST parser (Maximum Spanning Tree parser) is a state-of-the-art natural language dependency parser -- a tool that takes a sentence and returns its dependency tree.
In MSTperl, only some functionality was implemented; the limitations include the following:
the parser is a non-projective one, curently with no possibility of enforcing the requirement of projectivity of the parse trees;
only first-order features are supported, i.e. no second-order or third-order features are possible;
the implementation of MIRA is that of a single-best MIRA, with a closed-form update instead of using quadratic programming.
On the other hand, the parser supports several advanced features:
parallel features, i.e. enriching the parser input with word-aligned sentence in other language;
adding large-scale information, i.e. the feature set enriched with features corresponding to pointwise mutual information of word pairs in a large corpus (CzEng);
weighted/unweighted parser model interpolation;
combination of several instances of the MSTperl parser (through MST algorithm);
combination of several existing parses from any parsers (through MST algorithm).
The MSTperl parser is tuned for parsing Czech. Trained models are available for Czech, English and German. We can train the parser for other languages on demand, or you can train it yourself -- the guidelines are part of the documentation.
The parser, together with detailed documentation, is avalable on CPAN (http://search.cpan.org/~rur/Treex-Parser-MSTperl/).
Acknowledgement
European Union
Project code:FP7-ICT-2009-4-247762
Project name:Faust
Grantová agentura Univerzity Karlovy v Praze
Project code:GAUK 116310/2010
Project name:Anglicko-český strojový překlad s využitím hloubkové syntaxe
Grantová agentura České republiky
Project code:GD201/09/H057
Project name:Res Informatica
Grantová agentura Univerzity Karlovy v Praze
Project code:GAUK 15723/2014
Project name:Modelování závislostní syntaxe napříč jazyky
European Union
Project code:FP7-ICT-2013-10-610516
Project name:Quality Translation by Deep Language Engineering Approaches (QTLeap)
Univerzita Karlova v Praze (mimo GAUK)
Project code:SVV 260 224
Project name:Specifický vysokoškolský výzkum
Subject(s)
Collections
Version History
Files in this item
- Name
- Treex-Parser-MSTperl-150519.tar.gz
- Size
- 81.32 KB
- Format
- application/x-gzip
- Description
- MD5
- c1547a7a762db0d713a22d5507ea06e5

- Treex
- Tool
- Parser
- MSTperl
- MultiModelParser.pm10 kB
- TrainerLabelling.pm25 kB
- scripts
- test_labeller_tsv.pl2 kB
- unlabelled_test_rur.sh303 B
- test_conll_multiplefiles_printout.pl1 kB
- test_conll.pl1 kB
- TagMorceEnglishCoNLL.pl1 kB
- pcedt2conll.sh564 B
- test_conll_multimodel_weighted_f_printout.pl1 kB
- test_conll_treecomb_weighted_f_printout.pl2 kB
- test_conll_multimodel_weighted.pl1 kB
- labelled_parse_test.sh493 B
- test_rur_conll.pl1 kB
- test_conll_treecomb_weighted.pl2 kB
- unlabelled_train_and_test.sh538 B
- labeller_test.sh428 B
- unlabelled_test.sh295 B
- test_conll_multimodel_weighted_norm.pl1 kB
- pdtT2conll.sh377 B
- labeller_train_and_test.sh589 B
- pcedt2conll_tag_and_parse_en_worsen_cs.sh898 B
- worsen_pcedt.sh787 B
- conll2inline.pl449 B
- train_conll.pl855 B
- test_conll_multimodel_weighted_f.pl1 kB
- compare_lines.pl1 kB
- test_conll_treecomb_weighted_f.pl2 kB
- inline2conll.pl398 B
- test_conll_multimodel.pl1 kB
- pcedt2conll_td.sh545 B
- train_labeller_tsv.pl1 kB
- test_conll_multimodel_weighted_f_norm.pl1 kB
- split_afun_ismember.sh332 B
- inline_sentences_reorder.pl630 B
- test_parse_and_label.pl2 kB
- test_conll_parsecomb.pl1 kB
- test_conll_multimodel_weighted_f_norm_printout.pl1 kB
- simple_lemmas.pl708 B
- test_conll_parsecomb_weighted.pl1 kB
- pcedt2conll_tag_and_parse_en.sh668 B
- test_conll_multiplefiles.pl1 kB
- test_conll_treecomb_weighted_f_multiconf.pl2 kB
- make_czech_tags.pl681 B
- test_conll_multimodel_weighted_f_multiconf.pl1 kB
- ModelUnlabelled.pm8 kB
- Parser.pm6 kB
- ModelLabelling.pm43 kB
- Reader.pm2 kB
- ModelBase.pm6 kB
- Labeller.pm28 kB
- t
- train_and_test.t6 kB
- sample_test.tsv2 kB
- sample_train.tsv4 kB
- sample.config7 kB
- samples
- treex_input.txt813 B
- sample_train.sh73 B
- sample.config5 kB
- train_tsv.pl850 B
- treex_parse.scen721 B
- train_labeller_tsv.pl1021 B
- sample_test.sh68 B
- labeller_test.sh82 B
- test_tsv.pl1 kB
- labeller_train.sh86 B
- sample_test.tsv2 kB
- sample_train.tsv4 kB
- test_labeller_tsv.pl2 kB
- FeaturesControl.pm56 kB
- ParsedSentencesCombiner.pm3 kB
- Node.pm5 kB
- Edge.pm4 kB
- TrainerBase.pm8 kB
- Sentence.pm18 kB
- TrainerUnlabelled.pm11 kB
- ModelAdditional.pm7 kB
- Writer.pm2 kB
- RootNode.pm1 kB
- MultiHeteroModelParser.pm3 kB
- ParserCombiner.pm4 kB
- Config.pm31 kB
- MSTperl.pm10 kB
- MSTperl
- Parser
- Tool

