Slavic Forest, Norwegian Wood (models)
Please use the following text to cite this item or export to a predefined format:
Rosa, Rudolf; Zeman, Daniel; Mareček, David and Žabokrtský, Zdeněk, 2017,
Slavic Forest, Norwegian Wood (models), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-1971.
Authors
Item identifier
Date issued
2017-01-28
Type
Description
Trained models for UDPipe used to produce our final submission to the Vardial 2017 CLP shared task (https://bitbucket.org/hy-crossNLP/vardial2017). The SK model was trained on CS data, the HR model on SL data, and the SV model on a concatenation of DA and NO data. The scripts and commands used to create the models are part of separate submission (http://hdl.handle.net/11234/1-1970).
The models were trained with UDPipe version 3e65d69 from 3rd Jan 2017, obtained from
https://github.com/ufal/udpipe -- their functionality with newer or older versions of UDPipe is not guaranteed.
We list here the Bash command sequences that can be used to reproduce our results submitted to VarDial 2017. The input files must be in CoNLLU format. The models only use the form, UPOS, and Universal Features fields (SK only uses the form). You must have UDPipe installed. The feats2FEAT.py script, which prunes the universal features, is bundled with this submission.
SK -- tag and parse with the model:
udpipe --tag --parse sk-translex.v2.norm.feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu
A slightly better after-deadline model (sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe), which we mention in the accompanying paper, is also included. It is applied in the same way (udpipe --tag --parse sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu).
HR -- prune the Features to keep only Case and parse with the model:
python3 feats2FEAT.py Case < hr-ud-predPoS-test.conllu | udpipe --parse hr-translex.v2.norm.Case.w2v.trainonpred.udpipe
NO -- put the UPOS annotation aside, tag Features with the model, merge with the left-aside UPOS annotation, and parse with the model (this hassle is because UDPipe cannot be told to keep UPOS and only change Features):
cut -f1-4 no-ud-predPoS-test.conllu > tmp
udpipe --tag no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe no-ud-predPoS-test.conllu | cut -f5- | paste tmp - | sed 's/^\t$//' | udpipe --parse no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe
Acknowledgement
European Union
Project code:EC/H2020/644402
Project name:HimL - Health in my Language
Univerzita Karlova (mimo GAUK)
Project code:SVV 260 333
Project name:Specifický vysokoškolský výzkum
Grantová agentura České republiky
Project code:15-10472S
Project name:Morphologically and Syntactically Annotated Corpora of Many Languages
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2015071
Project name:LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat
Grantová agentura Univerzity Karlovy v Praze
Project code:GAUK 15723/2014
Project name:Modelování závislostní syntaxe napříč jazyky
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- hr-translex.v2.norm.Case.w2v.trainonpred.udpipe
- Size
- 50.82 MB
- Format
- application/octet-stream
- Description
- Model for parsing Croatian
- MD5
- 9281c6a9cf0cf1df0e7466bc1d8ba2fa

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- feats2FEAT.py
- Size
- 412 B
- Format
- application/octet-stream
- Description
- Features pruning script
- MD5
- 5089de1e63c1aa36cf284bb85600365c

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe
- Size
- 28.54 MB
- Format
- application/octet-stream
- Description
- Model for parsing Norwegian (and tagging Norwegian Case)
- MD5
- af624d0dcde21068f51da7c2a4511780

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- sk-translex.v2.norm.feats07.w2v.trainonpred.udpipe
- Size
- 58.83 MB
- Format
- application/octet-stream
- Description
- Model for tagging and parsing Slovak
- MD5
- 1d3793c42d2a75e14074dbbef8fdc5bf

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe
- Size
- 63.83 MB
- Format
- application/octet-stream
- Description
- Better after-deadline model for tagging and parsing Slovak
- MD5
- e3b6101b345e6ffe361ac0c83ccc41fd

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

