UDPipe

UDPipe web service is available on http(s)://lindat.mff.cuni.cz/services/udpipe/api/.

The web service is freely available. Respect the CC BY-NC-SA licence of the models – explicit written permission of the authors is required for any commercial exploitation of the system. If you use the service, you agree that data obtained by us during such use can be used for further improvements of the systems at UFAL. All comments and reactions are welcome.

API Reference

The UDPipe REST API can be accessed directly or via any other web programming tools that support standard HTTP request methods and JSON for output handling.

Service Request	Description	HTTP Method
models	return list of models and supported methods	GET/POST
process	process supplied data	GET/POST

Method models

Return the list of models available in the UDPipe REST API, and for each model enumerate components supported by this models (model can contain a tokenizer, tagger and a parser). The default model (used when user supplies no model to a method call) is also returned – this is guaranteed to be the latest Czech model.

Browser Example

https://lindat.mff.cuni.cz/services/udpipe/api/models

Example JSON Response

{
 "models": {
  "czech-ud-1.2-160523": ["tokenizer", "tagger", "parser"],
  "english-ud-1.2-160523": ["tokenizer", "tagger", "parser"]
 },
 "default_model": "czech-ud-1.2-160523"
}

Method process

Process given data as described in the User's Manual.

Parameter	Mandatory	Data type	Description
data	yes	string	Input text in UTF-8.
model	no	string	Model to use; see model selection for model matching rules.
tokenizer	no	string	If the option is present, the input is assumed to be in plain text and is tokenized. If the parameter has a value, it is passed to the tokenizer as tokenizer options.
input	no	string (`conllu` / `generic_tokenizer` / `horizontal` / `vertical`)	If the tokenizer is not used, the input is assumed to be in the specified input format (eventually with options); default `conllu`.
tagger	no	string	If the option is present, the input is POS tagged and lemmatized. If the parameter has a value, it is passed to the tagger.
parser	no	string	If the option is present, the input is dependency parsed. If the parameter has a value, it is passed to the parser.
output	no	string (`conllu` / `horizontal` / `matxin` / `plaintext` / `vertical`)	The output format (eventually with options) to use; default `conllu`.

The response is in JSON format of the following structure:

{
 "model": "Model used",
 "acknowledgements": ["URL with acknowledgements", ...],
 "result": "processed_output"
}

The processed_output is the output of the UDPipe in the requested output format.

Browser Examples

https://lindat.mff.cuni.cz/services/udpipe/api/process?tokenizer&tagger&parser&data=Děti pojedou k babičce. Už se těší.

Model Selection

There are several possibilities how to select required model using the model option:

If model option is not specified, the default model (returned by models method) is used – this is guaranteed to be the latest Czech model.
The model option can specify one of the models returned by the models method.
The model option may be only several first words of model name. In this case, the latest most suitable model is used.
The model can be ISO 639-1 or ISO 639-2 code of a language. If available, newest model for the requested language is used.

Note that the last two possibilities allow using czech, cs, ces, cze, english, en or eng as models.

Accessing API using Curl

The described API can be comfortably used by curl. Several examples follow:

Passing Input on Command Line (if UTF-8 locale is being used)

curl --data 'tokenizer=&tagger=&parser=&data=Děti pojedou k babičce. Už se těší.' https://lindat.mff.cuni.cz/services/udpipe/api/process

Using Files as Input (files must be in UTF-8 encoding)

curl -F data=@input_file.txt -F tokenizer= -F tagger= -F parser= https://lindat.mff.cuni.cz/services/udpipe/api/process

Specifying Model Parameters

curl -F data=@input_file.txt -F model=english -F tokenizer= -F tagger= -F parser= https://lindat.mff.cuni.cz/services/udpipe/api/process

Converting JSON Result to Plain Text

curl -F data=@input_file.txt -F model=english -F tokenizer= -F tagger= -F parser= https://lindat.mff.cuni.cz/services/udpipe/api/process | PYTHONIOENCODING=utf-8 python -c "import sys,json; sys.stdout.write(json.load(sys.stdin)['result'])"