This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

DigiDiaDem Speech-Cognitive Dataset (DSCD-CZ-2)

Please use the following text to cite this item or export to a predefined format:
Šmídl, Luboš; et al., 2025, DigiDiaDem Speech-Cognitive Dataset (DSCD-CZ-2), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-6043.
Date issued
2025-11-21
Size
371 entries
Language(s)
Description
An updated and expanded version of the dataset was created to investigate the speech and cognitive performance of people with varying degrees of cognitive impairment, primarily dementia. The dataset contains a comprehensive set of data including the results of standardized neuropsychological tests (RBANS, ALBA, POBAV, MASTCZ), speech tasks focused on comprehension, memory, naming, and repetition, and demographic data (age, gender, education). Participants were divided into four groups based on clinical assessment: healthy individuals, healthy individuals with possible mild cognitive impairment, patients with mild cognitive impairment, and patients with dementia. All recordings and examinations were managed as part of routine clinical practice in the neurological outpatient clinic – Memory Clinic at the Department of Neurology at the Faculty Hospital Královské Vinohrady. The dataset containing 371 examinations was divided into a training and test part using stratification by clinical group, age, gender, and level of education to ensure an even distribution of these key characteristics in both parts of the data. Additionally, Manually Engineered Features and Scores were added to the previous version of the dataset. The aim of the dataset is to support the development of methods for automated detection of cognitive disorders based on speech analysis and cognitive performance. The data are suitable for research in the areas of clinical neuropsychology, computational linguistics, and machine learning. The dataset is intended for non-commercial research purposes.
Acknowledgement
 Files in this item
Name
DigiDiademSpeechCognitiveDataset.zip
Size
1.01 MB
Format
application/zip
Description
MD5
5008ebcc471d889bdc7c0540454d221d
Preview
  File Preview
    • transcriptions_zipformer_lm-extra06_20251031.json1 MB
    • transcriptions_annotation_20251031.json366 kB
    • test_20251031.json2 kB
    • metadata_20251031.json1 MB
    • transcriptions_zipformer_20251031.json1 MB
    • expert_features_zipformer_lm-extra06.json1 MB
    • expert_scores.json71 kB
    • ddd.yaml468 B
    • sessions_20251031.json45 kB
    • train_20251031.json9 kB
    • recordings_20251031.json2 MB
Name
DigiDiaDemSpeechCognitiveDataset.md
Size
28.25 KB
Format
application/octet-stream
Description
MD5
bd1da4f3e92cda5de94c94e22b14aeae
Preview
  File Preview