This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

Nottinghamer Korpus Deutscher YouTube-Sprache (The NottDeuYTSch Corpus)

Please use the following text to cite this item or export to a predefined format:
Cotgrove, Louis Alexander, 2018, Nottinghamer Korpus Deutscher YouTube-Sprache (The NottDeuYTSch Corpus), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11372/LRT-4779.
Date issued
2018
Size
33760494 tokens,
32549462 words
Description
The NottDeuYTSch corpus contains over 33 million words taken from approximately 3 million YouTube comments from videos published between 2008 to 2018 targeted at a young, German-speaking demographic and represents an authentic language snapshot of young German speakers. The corpus was proportionally sampled based on video category and year from a database of 112 popular German-speaking YouTube channels in the DACH region for optimal representativeness and balance and contains a considerable amount of associated metadata for each comment that enable further longitudinal cross-sectional analyses.

Version History

Showing 1 - 2 out of 2 results
VersionDateSummary
2018-01-01 00:00:00
1*
2018-01-01 00:00:00
* Selected version
 Files in this item
Name
NottDeuYTSch_Corpus.rda
Size
280.29 MB
Format
application/octet-stream
Description
Unknown
MD5
e66260b11688917660e5ca511de4d066
Preview
  File Preview
Name
ndy296.i5.zip
Size
423.81 MB
Format
application/zip
Description
Zip
MD5
d96a1a7f5a95b866dbc2bbbc7164900d
Preview
  File Preview
    • ndy296.i5.xml11 GB