UDPipe

UDPipe web service is available on http(s)://lindat.mff.cuni.cz/services/udpipe/api/.

The web service is freely available. Respect the CC BY-NC-SA licence of the models – explicit written permission of the authors is required for any commercial exploitation of the system. If you use the service, you agree that data obtained by us during such use can be used for further improvements of the systems at UFAL. All comments and reactions are welcome.

API Reference

The UDPipe REST API can be accessed directly or via any other web programming tools that support standard HTTP request methods and JSON for output handling.

Service Request Description HTTP Method
models return list of models and supported methods GET/POST
process process supplied data GET/POST

Method models

Return the list of models available in the UDPipe REST API, and for each model enumerate components supported by this models (model can contain a tokenizer, tagger and a parser). The default model (used when user supplies no model to a method call) is also returned – this is guaranteed to be the latest Czech model.

Browser Example

https://lindat.mff.cuni.cz/services/udpipe/api/models

Example JSON Response

{
 "models": {
  "czech-ud-1.2-160523": ["tokenizer", "tagger", "parser"],
  "english-ud-1.2-160523": ["tokenizer", "tagger", "parser"]
 },
 "default_model": "czech-ud-1.2-160523"
}

Method process

Process given data as described in the User's Manual.

ParameterMandatoryData typeDescription
datayesstringInput text in UTF-8.
modelnostringModel to use; see model selection for model matching rules.
tokenizernostringIf the option is present, the input is assumed to be in plain text and is tokenized. If the parameter has a value, it is passed to the tokenizer as tokenizer options.
inputnostring (conllu / generic_tokenizer / horizontal / vertical)If the tokenizer is not used, the input is assumed to be in the specified input format (eventually with options); default conllu.
taggernostringIf the option is present, the input is POS tagged and lemmatized. If the parameter has a value, it is passed to the tagger.
parsernostringIf the option is present, the input is dependency parsed. If the parameter has a value, it is passed to the parser.
outputnostring (conllu / horizontal / matxin / plaintext / vertical)The output format (eventually with options) to use; default conllu.

The response is in JSON format of the following structure:

{
 "model": "Model used",
 "acknowledgements": ["URL with acknowledgements", ...],
 "result": "processed_output"
}
The processed_output is the output of the UDPipe in the requested output format.

Browser Examples

https://lindat.mff.cuni.cz/services/udpipe/api/process?tokenizer&tagger&parser&data=Děti pojedou k babičce. Už se těší.

Model Selection

There are several possibilities how to select required model using the model option:

Note that the last two possibilities allow using czech, cs, ces, cze, english, en or eng as models.


Accessing API using Curl

The described API can be comfortably used by curl. Several examples follow:

Passing Input on Command Line (if UTF-8 locale is being used)

curl --data 'tokenizer=&tagger=&parser=&data=Děti pojedou k babičce. Už se těší.' https://lindat.mff.cuni.cz/services/udpipe/api/process

Using Files as Input (files must be in UTF-8 encoding)

curl -F data=@input_file.txt -F tokenizer= -F tagger= -F parser= https://lindat.mff.cuni.cz/services/udpipe/api/process

Specifying Model Parameters

curl -F data=@input_file.txt -F model=english -F tokenizer= -F tagger= -F parser= https://lindat.mff.cuni.cz/services/udpipe/api/process

Converting JSON Result to Plain Text

curl -F data=@input_file.txt -F model=english -F tokenizer= -F tagger= -F parser= https://lindat.mff.cuni.cz/services/udpipe/api/process | PYTHONIOENCODING=utf-8 python -c "import sys,json; sys.stdout.write(json.load(sys.stdin)['result'])"