iula_preprocess
- Identifikátor
- http://hdl.handle.net/11372/LRT-1413
- Datum vydání
- 2014-07-30
- Typ
- toolService
- Popis
- Text preprocess (this preprocess service requires that the input text be in plain text format (file .txt) and UTF-8). Basically, it carries out: (i) text segmentation into minor structural units (titles, paragraphs, sentences, etc.); (ii) detection of entities not found in dictionaries (numbers, abbreviations, URLs, emails, proper nouns, etc.); and (iii) the keeping of sequences of two or more words in a single block (dates, phrases, proper nouns, etc.).