dc.description |
Open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks. NLTK includes the following software modules (~120k lines of Python code): Corpus readers interfaces to many corpora Tokenizers whitespace, newline, blankline, word, treebank, sexpr, regexp, Punkt sentence segmenter Stemmers Porter, Lancaster, regexp Taggers regexp, n-gram, backoff, Brill, HMM, TnT Chunkers regexp, n-gram, named-entity Parsers recursive descent, shift-reduce, chart, feature-based, probabilistic, dependency, ... Semantic interpretation untyped lambda calculus, first-order models, DRT, glue semantics, hole semantics, parser interface WordNet WordNet interface, lexical relations, similarity, interactive browser Classifiers decision tree, maximum entropy, naive Bayes, Weka interface, megam Clusterers expectation maximization, agglomerative, k-means Metrics accuracy, precision, recall, windowdiff, distance metrics, inter-annotator agreement coefficients, word association measures, rank correlation Estimation uniform, maximum likelihood, Lidstone, Laplace, expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell Miscellaneous unification, chatbots, many utilities NLTK-Contrib (less mature) categorial grammar (Lambek, CCG), finite-state automata, hadoop (MapReduce), kimmo, readability, textual entailment, timex, TnT interface, inter-annotator agreement |