The ultimate aim of the project is to compile a representative historical corpus of written German for the years 1650-1800. The complete GerManC corpus will contain 2000 word samples from nine genres
web-based information system on scientific community (news, events, persons, job market, mailing list, database on research projects and corpora, bibliography, glossary and links) and recording equipment/software; disciplinary scope: research on conversation and discourse analysis and spoken language
Glossa is a web-based system for corpus search and results management. It comes with built-in support for CLARIN federated content search as well as corpora encoded with the IMS Corpus Workbench. It also has a plugin architecture that enables other search engines to be used once a wrapper has been created.Glossa can be freely downloaded and installed on the user's server. It currently supports only monolignual written corpora, but support for multilingual corpora is under development, as well as support for spoken corpora with audio, video and maps.
70K words, Non-validated sentence segmentation. Non-validated POS tagging, Manual annotation of syntactic dependencies and dependency labels, Manual annotation of semantic roles, Manual annotation of events based on a shallow domain specific ontology (only for a 31K words subset of GDT)
Collection of orthographically transcribed audio recorded speech, mainly from East Anglia and the South-West, with a minor collection from Lancashire. The recordings were made in the 1970s and the 1980s by Finnish postgraduates.