SFU Opinion and Comments Corpus (SOCC) for NoSketch Engine
Please use the following text to cite this item or export to a predefined format:
Marek Hába, 2024,
SFU Opinion and Comments Corpus (SOCC) for NoSketch Engine, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-5969.
Authors
Item identifier
Project URL
Date issued
2024
Size
51200000 tokens,
43336661 words,
2600000 sentences
Language(s)
Description
The SFU Opinion and Comments Corpus (SOCC) is a corpus for the analysis of online news comments. It contains opinionated articles and comments. It was tagged using TreeTagger and prepared for the NoSketch Engine corpus manager.
The 7z archive already contains the prepared registry ("sfu_opinion_and_comments"), subcdef files, scripts and the vertical file which was also archived in 7z format. To complete the setup, simply configure the paths in the registry and compile the corpus.
Publisher
Acknowledgement
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2023062
Project name:LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- sfu_opinion_and_comments.7z
- Size
- 64.37 MB
- Format
- application/octet-stream
- Description
- MD5
- 52937e750eaca4dbd454c5f585b090e4

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

