DigiDiaDem Speech-Cognitive Dataset (DSCD-CZ-2)
Please use the following text to cite this item or export to a predefined format:
Šmídl, Luboš; et al., 2025,
DigiDiaDem Speech-Cognitive Dataset (DSCD-CZ-2), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-6043.
Authors
Šmídl, Luboš ; et al.
Item identifier
Project URL
Date issued
2025-11-21
Size
371 entries
Language(s)
Description
An updated and expanded version of the dataset was created to investigate the speech and cognitive performance of people with varying degrees of cognitive impairment, primarily dementia. The dataset contains a comprehensive set of data including the results of standardized neuropsychological tests (RBANS, ALBA, POBAV, MASTCZ), speech tasks focused on comprehension, memory, naming, and repetition, and demographic data (age, gender, education).
Participants were divided into four groups based on clinical assessment: healthy individuals, healthy individuals with possible mild cognitive impairment, patients with mild cognitive impairment, and patients with dementia. All recordings and examinations were managed as part of routine clinical practice in the neurological outpatient clinic – Memory Clinic at the Department of Neurology at the Faculty Hospital Královské Vinohrady. The dataset containing 371 examinations was divided into a training and test part using stratification by clinical group, age, gender, and level of education to ensure an even distribution of these key characteristics in both parts of the data.
Additionally, Manually Engineered Features and Scores were added to the previous version of the dataset.
The aim of the dataset is to support the development of methods for automated detection of cognitive disorders based on speech analysis and cognitive performance. The data are suitable for research in the areas of clinical neuropsychology, computational linguistics, and machine learning. The dataset is intended for non-commercial research purposes.
Acknowledgement
Technology Agency of the Czech Republic
Project code:TQ01000332
Project name:Telemedicine self-examination of speech and memory for rapid detection of cognitive impairments using machine learning methods
Collections
Version History
This item isPublicly Available
and licensed under:
Files in this item
- Name
- DigiDiademSpeechCognitiveDataset.zip
- Size
- 1.01 MB
- Format
- application/zip
- Description
- MD5
- 5008ebcc471d889bdc7c0540454d221d

-
- transcriptions_zipformer_lm-extra06_20251031.json1 MB
- transcriptions_annotation_20251031.json366 kB
- test_20251031.json2 kB
- metadata_20251031.json1 MB
- transcriptions_zipformer_20251031.json1 MB
- expert_features_zipformer_lm-extra06.json1 MB
- expert_scores.json71 kB
- ddd.yaml468 B
- sessions_20251031.json45 kB
- train_20251031.json9 kB
- recordings_20251031.json2 MB
- Name
- DigiDiaDemSpeechCognitiveDataset.md
- Size
- 28.25 KB
- Format
- application/octet-stream
- Description
- MD5
- bd1da4f3e92cda5de94c94e22b14aeae

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

