UDPipe web service is available on
http(s)://lindat.mff.cuni.cz/services/udpipe/api/
.
The web service is freely available. Respect the CC BY-NC-SA licence of the models – explicit written permission of the authors is required for any commercial exploitation of the system. If you use the service, you agree that data obtained by us during such use can be used for further improvements of the systems at UFAL. All comments and reactions are welcome.
The UDPipe REST API can be accessed directly or via any other web programming tools that support standard HTTP request methods and JSON for output handling.
Service Request | Description | HTTP Method |
---|---|---|
models | return list of models and supported methods | GET/POST |
process | process supplied data | GET/POST |
Return the list of models available in the UDPipe REST API, and for each
model enumerate components supported by this models (model can contain
a tokenizer
, tagger
and a parser
).
The default model (used when user supplies no model to a method call) is
also returned – this is guaranteed to be the latest Czech model.
https://lindat.mff.cuni.cz/services/udpipe/api/models |
{ "models": { "czech-ud-1.2-160523": ["tokenizer", "tagger", "parser"], "english-ud-1.2-160523": ["tokenizer", "tagger", "parser"] }, "default_model": "czech-ud-1.2-160523" }
Process given data as described in the User's Manual.
Parameter | Mandatory | Data type | Description |
---|---|---|---|
data | yes | string | Input text in UTF-8. |
model | no | string | Model to use; see model selection for model matching rules. |
tokenizer | no | string | If the option is present, the input is assumed to be in plain text and is tokenized. If the parameter has a value, it is passed to the tokenizer as tokenizer options. |
input | no | string (conllu / generic_tokenizer / horizontal / vertical ) | If the tokenizer is not used, the input is assumed to be in the specified input format (eventually with options); default conllu . |
tagger | no | string | If the option is present, the input is POS tagged and lemmatized. If the parameter has a value, it is passed to the tagger. |
parser | no | string | If the option is present, the input is dependency parsed. If the parameter has a value, it is passed to the parser. |
output | no | string (conllu / horizontal / matxin / plaintext / vertical ) | The output format (eventually with options) to use; default conllu . |
The response is in JSON format of the following structure:
{ "model": "Model used", "acknowledgements": ["URL with acknowledgements", ...], "result": "processed_output" }The
processed_output
is the output of the UDPipe in the requested output format.
https://lindat.mff.cuni.cz/services/udpipe/api/process?tokenizer&tagger&parser&data=Děti pojedou k babičce. Už se těší. |
There are several possibilities how to select required model using
the model
option:
model
option is not specified, the default model
(returned by models method) is used – this is
guaranteed to be the latest Czech model.model
option can specify one of the models returned
by the models method.model
option may be only several first words of model
name. In this case, the latest most suitable model is used.model
can be ISO 639-1 or ISO 639-2 code of a language.
If available, newest model for the requested language is used. Note that the last two possibilities allow using czech
, cs
, ces
, cze
, english
, en
or eng
as models.
curl
. Several examples follow:
curl --data 'tokenizer=&tagger=&parser=&data=Děti pojedou k babičce. Už se těší.' https://lindat.mff.cuni.cz/services/udpipe/api/process
curl -F data=@input_file.txt -F tokenizer= -F tagger= -F parser= https://lindat.mff.cuni.cz/services/udpipe/api/process
curl -F data=@input_file.txt -F model=english -F tokenizer= -F tagger= -F parser= https://lindat.mff.cuni.cz/services/udpipe/api/process
curl -F data=@input_file.txt -F model=english -F tokenizer= -F tagger= -F parser= https://lindat.mff.cuni.cz/services/udpipe/api/process | PYTHONIOENCODING=utf-8 python -c "import sys,json; sys.stdout.write(json.load(sys.stdin)['result'])"