The NottDeuYTSch corpus contains over 33 million words taken from approximately 3 million YouTube comments from videos published between 2008 to 2018 targeted at a young, German-speaking demographic and represents an authentic language snapshot of young German speakers. The corpus was proportionally sampled based on video category and year from a database of 112 popular German-speaking YouTube channels in the DACH region for optimal representativeness and balance and contains a considerable amount of associated metadata for each comment that enable further longitudinal cross-sectional analyses.
The NottDeuYTSch corpus contains over 33 million words taken from approximately 3 million YouTube comments from videos published between 2008 to 2018 targeted at a young, German-speaking demographic and represents an authentic language snapshot of young German speakers. The corpus was proportionally sampled based on video category and year from a database of 112 popular German-speaking YouTube channels in the DACH region for optimal representativeness and balance and contains a considerable amount of associated metadata for each comment that enable further longitudinal cross-sectional analyses.
The aim of this work is to analyze suitability of existing internet multimedia storage services to use as a covert (steganographic) transmission channel. After general overview we focus specifically on the YouTube service. In particular, we study the feasibility of the recently proposed new steganographic technique \cite{wseas} of hiding information directly in the structure of the mp4-encoded video file. Our statistical analysis of the set of 1000 video files stored by this service show the practical limitations for this type of information hiding.