This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Vystadial 2013 – Czech data

Please use the following text to cite this item or export to a predefined format:
Korvas, Matěj; Plátek, Ondřej; Dušek, Ondřej; Žilka, Lukáš and Jurčíček, Filip, 2014, Vystadial 2013 – Czech data, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11858/00-097C-0000-0023-4670-6.
Date issued
2014-02-21
Size
18 hours
Language(s)
Description
Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems. It ships in three parts: Czech data, English data, and scripts. The data comprise over 41 hours of speech in English and over 15 hours in Czech, plus orthographic transcriptions. The scripts implement data pre-processing and building acoustic models using the HTK and Kaldi toolkits. This is the Czech data part of the dataset.
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
data_voip_cs.tgz
Size
1.47 GB
Format
application/x-gzip
Description
Vystadial 2013 Czech data, tgz archive
MD5
514b38e657bdd52309e80f22a773d6cc
Preview
  File Preview
    • data_voip_cs.tgz2 GB