Show simple item record Macháček, Dominik Kratochvíl, Jonáš Vojtěchová, Tereza Bojar, Ondřej 2019-07-15T14:53:51Z 2019-07-15T14:53:51Z 2019-07-13
dc.description We present a test corpus of audio recordings and transcriptions of presentations of students' enterprises together with their slides and web-pages. The corpus is intended for evaluation of automatic speech recognition (ASR) systems, especially in conditions where the prior availability of in-domain vocabulary and named entities is benefitable. The corpus consists of 39 presentations in English, each up to 90 seconds long, and slides and web-pages in Czech, Slovak, English, German, Romanian, Italian or Spanish. The speakers are high school students from European countries with English as their second language. We benchmark three baseline ASR systems on the corpus and show their imperfection.
dc.language.iso eng
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation info:eu-repo/grantAgreement/EC/H2020/825460
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.subject ASR
dc.subject ASR evaluation
dc.subject speech corpus
dc.subject non-native English
dc.subject speech recognition
dc.subject speech recognition evaluation
dc.subject speech and relevant texts
dc.subject European non-native English
dc.title A Speech Test Set of Practice Business Presentations with Additional Relevant Texts
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType audio
dc.rights.label PUB
has.files yes
contact.person Macháček Dominik Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
sponsor European Union H2020-ICT-2018-2-825460 ELITR - European Live Translator euFunds info:eu-repo/grantAgreement/EC/H2020/825460
sponsor Czech Science Foundation 19-26934X Neural Representations in Multi-modal and Multi-lingual Modelling nationalFunds 59 minutes 39 entries
files.size 929830594
files.count 1

 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
886.76 MB
zipped corpus
 Download file  Preview
 File Preview  

Show simple item record