This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

SFU Opinion and Comments Corpus (SOCC) for NoSketch Engine

Please use the following text to cite this item or export to a predefined format:
Marek Hába, 2024, SFU Opinion and Comments Corpus (SOCC) for NoSketch Engine, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-5969.
Date issued
2024
Size
51200000 tokens,
43336661 words,
2600000 sentences
Language(s)
Description
The SFU Opinion and Comments Corpus (SOCC) is a corpus for the analysis of online news comments. It contains opinionated articles and comments. It was tagged using TreeTagger and prepared for the NoSketch Engine corpus manager. The 7z archive already contains the prepared registry ("sfu_opinion_and_comments"), subcdef files, scripts and the vertical file which was also archived in 7z format. To complete the setup, simply configure the paths in the registry and compile the corpus.
Acknowledgement
 Files in this item
Name
sfu_opinion_and_comments.7z
Size
64.37 MB
Format
application/octet-stream
Description
MD5
52937e750eaca4dbd454c5f585b090e4
Preview
  File Preview