MorphoDiTa

MorphoDiTa web service is available on http(s)://lindat.mff.cuni.cz/services/morphodita/api/.

The web service is freely available for testing. Respect the CC BY-NC-SA licence of the models – explicit written permission of the authors is required for any commercial exploitation of the system. If you use the service, you agree that data obtained by us during such use can be used for further improvements of the systems at UFAL. All comments and reactions are welcome.

API Reference

The MorphoDiTa REST API can be accessed directly or via any other web programming tools that support standard HTTP request methods and JSON for output handling.

Service Request	Description	HTTP Method
models	return list of models and supported methods	GET/POST
tag	tag supplied text	GET/POST
analyze	perform morphological analysis of supplied text	GET/POST
generate	perform morphological generation	GET/POST
tokenize	tokenize supplied text	GET/POST

Method models

Return the list of models available in the MorphoDiTa REST API, and for each model enumerate methods supported by this models. The default model (used when user supplies no model to a method call) is also returned – this is guaranteed to be the latest Czech model.

Browser Example

http://lindat.mff.cuni.cz/services/morphodita/api/models

Example JSON Response

{
 "models": {
  "czech-160310": [
   "tag"
  ,"analyze"
  ,"generate"
  ,"tokenize"
  ]
 ,"czech-160310-morpho_only": [
   "analyze"
  ,"generate"
  ,"tokenize"
  ]
 }
,"default_model": "czech-160310"
}

Method tag

Tag given text as described in the User's Manual. The response format is described later.

Parameter	Mandatory	Data type	Description
data	yes	string	Input text in UTF-8.
model	no	string	Model to use; see model selection for model matching rules.
guesser	no	string (`yes` / `no`)	Use morphological guesser for unknown words; default `yes`.
input	no	string (`untokenized` / `vertical`)	Input format to use; default is `untokenized`.
convert_tagset	no	string (`pdt_to_conll2009` / `strip_lemma_comment` / `strip_lemma_id`)	Apply specified tag set converter.
derivation	no	string (`none` / `root` / `path` / `tree`)	Apply specified morphological derivation to lemmas; default `none`.
output	no	string (`json` / `xml` / `vertical`)	Output format, default is `xml`: `json`: the result is JSON array of sentences, each sentence is an array of tokens, each token is an object containing `token`, `lemma` and `tag` string fields and optionally non-empty `space` string field containing spaces following this token in the input (spaces at the beginning of the input are discarded as they follow no token) `xml`, `vertical`: the result is a string formatted according to MorphoDiTa manual

Browser Examples

http://lindat.mff.cuni.cz/services/morphodita/api/tag?data=Děti pojedou k babičce. Už se těší.

http://lindat.mff.cuni.cz/services/morphodita/api/tag?data=Děti pojedou k babičce. Už se těší.&output=json

Method analyze

Perform morphological analysis of supplied text as described in the User's Manual. The response format is described later.

Parameter	Mandatory	Data type	Description
data	yes	string	Input text in UTF-8.
model	no	string	Model to use; see model selection for model matching rules.
guesser	no	string (`yes` / `no`)	Use morphological guesser for unknown words; default `yes`.
input	no	string (`untokenized` / `vertical`)	Input format to use; default is `untokenized`.
convert_tagset	no	string (`pdt_to_conll2009` / `strip_lemma_comment` / `strip_lemma_id`)	Apply specified tag set converter.
derivation	no	string (`none` / `root` / `path` / `tree`)	Apply specified morphological derivation to lemmas; default `none`.
output	no	string (`json` / `xml` / `vertical`)	Output format, default is `xml`: `json`: the result is JSON array of sentences, each sentence is an array of tokens, each token is an object containing `token` string field, `analyses` field containing array of objects with `lemma` and `tag` string fields, and optionally non-empty `space` string field containing spaces following this token in the input (spaces at the beginning of the input are discarded as they follow no token) `xml`, `vertical`: the result is a string formatted according to MorphoDiTa manual

Browser Examples

http://lindat.mff.cuni.cz/services/morphodita/api/analyze?data=Děti pojedou k babičce. Už se těší.

http://lindat.mff.cuni.cz/services/morphodita/api/analyze?data=Děti pojedou k babičce. Už se těší.&convert_tagset=pdt_to_conll2009&output=json

Method generate

Perform morphological generation as described in the User's Manual. The response format is described later.

Parameter	Mandatory	Data type	Description
data	yes	string	Input text in UTF-8.
model	no	string	Model to use; see model selection for model matching rules.
guesser	no	string (`yes` / `no`)	Use morphological guesser for unknown words; default `yes`.
convert_tagset	no	string (`pdt_to_conll2009` / `strip_lemma_comment` / `strip_lemma_id`)	Apply specified tag set converter.
output	no	string (`json` / `vertical`)	Output format, default is `vertical`: `json`: the result is JSON array of lemma results, each lemma results are an array of objects containing `form`, `lemma` and `tag` string fields `vertical`: the result is a string formatted according to MorphoDiTa manual

Browser Examples

http://lindat.mff.cuni.cz/services/morphodita/api/generate?data=dítě%0Ajet%0Ak-1%0Ababička

http://lindat.mff.cuni.cz/services/morphodita/api/generate?data=dítě%0Ajet%0Ak-1%0Ababička&convert_tagset=pdt_to_conll2009&output=json

Method tokenize

Tokenize the supplied text as described in the User's Manual. The response format is described later.

Parameter	Mandatory	Data type	Description
data	yes	string	Input text in UTF-8.
model	no	string	Model to use; see model selection for model matching rules.
output	no	string (`json` / `xml` / `vertical`)	Output format, default is `xml`: `json`: the result is JSON array of sentences, each sentence is an array of tokens, each token is an object containing `token` string field and optionally non-empty `space` string field containing spaces following this token in the input (spaces at the beginning of the input are discarded as they follow no token) `xml`, `vertical`: the result is a string formatted according to MorphoDiTa manual

Browser Examples

http://lindat.mff.cuni.cz/services/morphodita/api/tokenize?data=Děti pojedou k babičce. Už se těší.

http://lindat.mff.cuni.cz/services/morphodita/api/tokenize?data=Děti pojedou k babičce. Už se těší.&output=json

Common Response Format

The response format of all methods is JSON. Except for the models method, the output JSON has the following structure (with result_object being usually a string or an array):

{
 "model": "Model used"
,"acknowledgements": ["URL with acknowledgements", ...]
,"result": result_object
}

Model Selection

There are several possibilities how to select required model using the model option:

If model option is not specified, the default model (returned by models method) is used – this is guaranteed to be the latest Czech model.
The model option can specify one of the models returned by the models method.
Version info in the -YYMMDD format can be left out when supplying model option – the latest avilable model will be used.
The model option may be only several first words of model name. In this case, the latest most suitable model is used.

Note that the last possibility allows using czech or english as models.

Accessing API using Curl

The described API can be comfortably used by curl. Several examples follow:

Passing Input on Command Line (if UTF-8 locale is being used)

curl --data-urlencode 'data=Děti jedou k babičce. Už se těší.' http://lindat.mff.cuni.cz/services/morphodita/api/tag

Using Files as Input (files must be in UTF-8 encoding)

curl -F 'data=@input_file' http://lindat.mff.cuni.cz/services/morphodita/api/tag

Specifying Additional Parameters

curl -F 'data=@input_file' -F 'output=vertical' -F 'convert_tagset=strip_lemma_id' http://lindat.mff.cuni.cz/services/morphodita/api/tag

Converting JSON Result to Plain Text

curl -F 'data=@input_file' http://lindat.mff.cuni.cz/services/morphodita/api/tag | PYTHONIOENCODING=utf-8 python -c "import sys,json; sys.stdout.write(json.load(sys.stdin)['result'])"