This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Feature-based tagger

Please use the following text to cite this item or export to a predefined format:
Hajič, Jan, 2009, Feature-based tagger, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11858/00-097C-0000-0001-4904-2.
Date issued
2009-11-02T09:22:59Z
Description
The Feature-based (exponential model) Tagger is a fast implementation of the Czech tagger developed at UFAL and described in the PDT 1.0 documentation (Czech Language Tagging page). In order to get the best possible results, the tagger requires preprocessing by a Czech morphological module with a very high coverage. This module covers a superset of the Czech "FM" morphology. Both the morphological module and the tagger are supplied as binary executables, together with all necessary precompiled Czech data. Input must be in the ISO Latin 2 (iso-8859-2) code and follow the csts.dtd definition, and output is produced in the same way (ISO Latin 2 code, csts.dtd). (As is the case with many of the tools provided with PDT 1.0, both executables also accept - and then produce - a "simplified SGML", which is not a real, valid SGML, but simply contains at least the tags for words, punctuation, and sentence breaks, one item per line.)
This item isAcademic Use
and licensed under:
 Files in this item
Name
CZ010619x.tgz
Size
2.04 MB
Format
application/x-gzip
Description
PDT 1.0 version - Linux
MD5
0c0b1bafc4080ea9edc15873dd7791db
Preview
  File Preview
Name
CZ130120ax.tgz
Size
2.43 MB
Format
application/x-gzip
Description
Newer version, linux
MD5
94f0873a87129e64be41c14356cd0948
Preview
  File Preview
Name
CZ010619xs.tgz
Size
2.06 MB
Format
application/x-gzip
Description
PDT 1.0 version - Solaris
MD5
c2b38cc100a6981ea08bfd830c90e319
Preview
  File Preview