This reference corpus of written Slovenian is a precursor to the Gigafida corpora (see http://hdl.handle.net/11356/1320 for version 2.0).
It contains 600 million words and 738.5 million tokens. In terms of annotation, it is tagged for morphosyntactic descriptors (MSD tags) and lemmatised.