MorphoDiTa

MorphoDiTa web service is available on http(s)://lindat.mff.cuni.cz/services/morphodita/api/.

The web service is freely available for testing. Respect the CC BY-NC-SA licence of the models – explicit written permission of the authors is required for any commercial exploitation of the system. If you use the service, you agree that data obtained by us during such use can be used for further improvements of the systems at UFAL. All comments and reactions are welcome.

API Reference

The MorphoDiTa REST API can be accessed directly or via any other web programming tools that support standard HTTP request methods and JSON for output handling.

Service Request Description HTTP Method
models return list of models and supported methods GET/POST
tag tag supplied text GET/POST
analyze perform morphological analysis of supplied text GET/POST
generate perform morphological generation GET/POST
tokenize tokenize supplied text GET/POST

Method models

Return the list of models available in the MorphoDiTa REST API, and for each model enumerate methods supported by this models. The default model (used when user supplies no model to a method call) is also returned – this is guaranteed to be the latest Czech model.

Browser Example

http://lindat.mff.cuni.cz/services/morphodita/api/models

Example JSON Response

{
 "models": {
  "czech-160310": [
   "tag"
  ,"analyze"
  ,"generate"
  ,"tokenize"
  ]
 ,"czech-160310-morpho_only": [
   "analyze"
  ,"generate"
  ,"tokenize"
  ]
 }
,"default_model": "czech-160310"
}

Method tag

Tag given text as described in the User's Manual. The response format is described later.

ParameterMandatoryData typeDescription
datayesstringInput text in UTF-8.
modelnostringModel to use; see model selection for model matching rules.
guessernostring (yes / no)Use morphological guesser for unknown words; default yes.
inputnostring (untokenized / vertical)Input format to use; default is untokenized.
convert_tagsetnostring (pdt_to_conll2009 / strip_lemma_comment / strip_lemma_id)Apply specified tag set converter.
derivationnostring (none / root / path / tree)Apply specified morphological derivation to lemmas; default none.
outputnostring (json / xml / vertical)Output format, default is xml:
  • json: the result is JSON array of sentences, each sentence is an array of tokens, each token is an object containing token, lemma and tag string fields and optionally non-empty space string field containing spaces following this token in the input (spaces at the beginning of the input are discarded as they follow no token)
  • xml, vertical: the result is a string formatted according to MorphoDiTa manual

Browser Examples

http://lindat.mff.cuni.cz/services/morphodita/api/tag?data=Děti pojedou k babičce. Už se těší.
http://lindat.mff.cuni.cz/services/morphodita/api/tag?data=Děti pojedou k babičce. Už se těší.&output=json

Method analyze

Perform morphological analysis of supplied text as described in the User's Manual. The response format is described later.

ParameterMandatoryData typeDescription
datayesstringInput text in UTF-8.
modelnostringModel to use; see model selection for model matching rules.
guessernostring (yes / no)Use morphological guesser for unknown words; default yes.
inputnostring (untokenized / vertical)Input format to use; default is untokenized.
convert_tagsetnostring (pdt_to_conll2009 / strip_lemma_comment / strip_lemma_id)Apply specified tag set converter.
derivationnostring (none / root / path / tree)Apply specified morphological derivation to lemmas; default none.
outputnostring (json / xml / vertical)Output format, default is xml:
  • json: the result is JSON array of sentences, each sentence is an array of tokens, each token is an object containing token string field, analyses field containing array of objects with lemma and tag string fields, and optionally non-empty space string field containing spaces following this token in the input (spaces at the beginning of the input are discarded as they follow no token)
  • xml, vertical: the result is a string formatted according to MorphoDiTa manual

Browser Examples

http://lindat.mff.cuni.cz/services/morphodita/api/analyze?data=Děti pojedou k babičce. Už se těší.
http://lindat.mff.cuni.cz/services/morphodita/api/analyze?data=Děti pojedou k babičce. Už se těší.&convert_tagset=pdt_to_conll2009&output=json

Method generate

Perform morphological generation as described in the User's Manual. The response format is described later.

ParameterMandatoryData typeDescription
datayesstringInput text in UTF-8.
modelnostringModel to use; see model selection for model matching rules.
guessernostring (yes / no)Use morphological guesser for unknown words; default yes.
convert_tagsetnostring (pdt_to_conll2009 / strip_lemma_comment / strip_lemma_id)Apply specified tag set converter.
outputnostring (json / vertical)Output format, default is vertical:
  • json: the result is JSON array of lemma results, each lemma results are an array of objects containing form, lemma and tag string fields
  • vertical: the result is a string formatted according to MorphoDiTa manual

Browser Examples

http://lindat.mff.cuni.cz/services/morphodita/api/generate?data=dítě%0Ajet%0Ak-1%0Ababička
http://lindat.mff.cuni.cz/services/morphodita/api/generate?data=dítě%0Ajet%0Ak-1%0Ababička&convert_tagset=pdt_to_conll2009&output=json

Method tokenize

Tokenize the supplied text as described in the User's Manual. The response format is described later.

ParameterMandatoryData typeDescription
datayesstringInput text in UTF-8.
modelnostringModel to use; see model selection for model matching rules.
outputnostring (json / xml / vertical)Output format, default is xml:
  • json: the result is JSON array of sentences, each sentence is an array of tokens, each token is an object containing token string field and optionally non-empty space string field containing spaces following this token in the input (spaces at the beginning of the input are discarded as they follow no token)
  • xml, vertical: the result is a string formatted according to MorphoDiTa manual

Browser Examples

http://lindat.mff.cuni.cz/services/morphodita/api/tokenize?data=Děti pojedou k babičce. Už se těší.
http://lindat.mff.cuni.cz/services/morphodita/api/tokenize?data=Děti pojedou k babičce. Už se těší.&output=json

Common Response Format

The response format of all methods is JSON. Except for the models method, the output JSON has the following structure (with result_object being usually a string or an array):

{
 "model": "Model used"
,"acknowledgements": ["URL with acknowledgements", ...]
,"result": result_object
}

Model Selection

There are several possibilities how to select required model using the model option:

Note that the last possibility allows using czech or english as models.


Accessing API using Curl

The described API can be comfortably used by curl. Several examples follow:

Passing Input on Command Line (if UTF-8 locale is being used)

curl --data-urlencode 'data=Děti jedou k babičce. Už se těší.' http://lindat.mff.cuni.cz/services/morphodita/api/tag

Using Files as Input (files must be in UTF-8 encoding)

curl -F 'data=@input_file' http://lindat.mff.cuni.cz/services/morphodita/api/tag

Specifying Additional Parameters

curl -F 'data=@input_file' -F 'output=vertical' -F 'convert_tagset=strip_lemma_id' http://lindat.mff.cuni.cz/services/morphodita/api/tag

Converting JSON Result to Plain Text

curl -F 'data=@input_file' http://lindat.mff.cuni.cz/services/morphodita/api/tag | PYTHONIOENCODING=utf-8 python -c "import sys,json; sys.stdout.write(json.load(sys.stdin)['result'])"