The corpus contains speech data of 2 Czech native speakers, male and female. The speech is very precisely articulated up to hyper-articulated, and the speech rate is low. The speech data with a highlighted articulation is suitable for teaching foreigners the Czech language, and it can also be used for people with hearing or speech impairment. The recorded sentences can be used either directly, e.g., as a part of educational material, or as source data for building complex educational systems incorporating speech synthesis technology. All recorded sentences were precisely orthographically annotated and phonetically segmented, i.e., split into phones, using modern neural network-based methods.
Each national language is described by specific grammatical rules. But rule-based knowledge representations alone cannot be used for the natural flow of speech.
In this paper, optimisation of the naturalness of speech, i.e. the optimal choice of phonetic and phonologic parameters for prosody modelling is sought. We will try to find relevant features (speech parameters) having the basic influence on the fundamental frequency and duration of speech units. If the prosody of the synthesizer is controlled by an artificial neural network (ANN), optimisation of the ANN topology is necessary.
The topology of the ANN is also dependent on the number of input neurons which represent the most important speech parameters. The pruning of the ANN based on the several approaches (GUHA method, sensitivities of the synaptic weights, etc.) is a suitable tool for reducing the ANN structure.