Korektor

Korektor web service is available on http(s)://lindat.mff.cuni.cz/services/korektor/api/.

The web service is freely available for testing. Respect the CC BY-NC-SA licence of the models – explicit written permission of the authors is required for any commercial exploitation of the system. If you use the service, you agree that data obtained by us during such use can be used for further improvements of the systems at UFAL. All comments and reactions are welcome.

API Reference

The Korektor REST API can be accessed directly or via any other web programming tools that support standard HTTP request methods and JSON for output handling.

Service Request Description HTTP Method
models return list of models GET/POST
correct correct given text according to chosen model GET/POST
suggestions generate spelling suggestions of the given text according to chosen model GET/POST

Method models

Return the list of available models. The default model (used when user supplies no model to a method call) is also returned – this is guaranteed to be the latest Czech spellchecking model.

Browser Example

http://lindat.mff.cuni.cz/services/korektor/api/models

JSON Response

The response object contains two fields models (containing array of existing model names) and default_model (one of the models which is used when no model is specified).

Example JSON Response

{
 "models": [
  "czech-spellchecker-130202",
  "czech-diacritics_generator-130202",
  "strip_diacritics-130202"
 ],
 "default_model": "czech-spellchecker-130202"
}

Method correct

Auto-correct the given text according to chosen model and return the corrected text as a string. The response format is described later.

ParameterMandatoryData typeDescription
datayesstringInput text in UTF-8.
modelnostringModel to use; see model selection for model matching rules.
inputnostring (untokenized / untokenized_lines / segmented / vertical / horizontal)Input format to use; default is untokenized.

Browser Examples

http://lindat.mff.cuni.cz/services/korektor/api/correct?data=Přílyš žluťoučky kůň ůpěl ďábelské ódi.
http://lindat.mff.cuni.cz/services/korektor/api/correct?data=Příliš žluťoučký kůň úpěl ďábelské ódy .&input=horizontal&model=strip_diacritics

Method suggestions

Generate spelling suggestions for the given text. For every located error, a list of suggestions is returned, from the most probable to the least probable. User can specify the limit on number of suggestions returned. The response format is described later.

ParameterMandatoryData typeDescription
datayesstringInput text in UTF-8.
modelnostringModel to use; see model selection for model matching rules.
inputnostring (untokenized / untokenized_lines / segmented / vertical / horizontal)Input format to use; default is untokenized.
suggestionsnopositive integerThe maximum number of suggestions to return for a single token. If unspecified, value 5 is used.

Browser Examples

http://lindat.mff.cuni.cz/services/korektor/api/suggestions?data=Přílyš žluťoučky kůň ůpěl ďábelské ódi.
http://lindat.mff.cuni.cz/services/korektor/api/suggestions?data=Prilis zlutoucky kun upel dabelske ody.&model=czech-diacritics_generator&suggestions=3

Result Object

The result field in the response format is an array of suggestions. Each suggestion is an array of strings, whose first element is the original piece of text and the other elements (which may or may not be present) are the suggestions, from the most probable to the least probable. The concatenation of first elements of suggestions is equal to the original text.

Example JSON Response

{
 "model": "czech-spellchecker-130202",
 "acknowledgements": [
  "http://ufal.mff.cuni.cz/korektor#korektor_acknowledgements",
  "https://ufal.mff.cuni.cz/korektor/users-manual#korektor-czech_acknowledgements"
 ],
 "result": [["Přílyš","Příliš"],[" "],["žluťoučky","žluťoučký","žluťoučké"],[" kůň "],["ůpěl","úpěl","pěl"],[" ďábelské "],["ódi","ódy","zdi"],["."]]
}

Common Response Format

The response format of all methods is JSON. Except for the models method, the output JSON has the following structure (with result_object being usually a string or an array):

{
 "model": "Model used",
 "acknowledgements": ["URL with acknowledgements", ...],
 "result": result_object
}

Model Selection

There are several possibilities how to select required model using the model option:


Using Curl to Access the API

The described API can be comfortably used by curl. Several examples follow:

Passing Input on Command Line (if UTF-8 locale is being used)

curl --data-urlencode 'data=Přílyš žluťoučky kůň ůpěl ďábelské ódi.' http://lindat.mff.cuni.cz/services/korektor/api/correct

Using Files as Input (files must be in UTF-8 encoding)

curl -F 'data=@input_file' http://lindat.mff.cuni.cz/services/korektor/api/suggestions

Specifying Additional Parameters

curl -F 'data=@input_file' -F 'model=czech-diacritics_generator' -F 'suggestions=3' http://lindat.mff.cuni.cz/services/korektor/api/suggestions

Converting JSON Result to Plain Text

curl -F 'data=@input_file' http://lindat.mff.cuni.cz/services/korektor/api/correct | PYTHONIOENCODING=utf-8 python -c "import sys,json; sys.stdout.write(json.load(sys.stdin)['result'])"