This study is supported by the collective project of Department of Circuit Theory in FEE-CTU in Prague and the Department of Paediatric Neurology in 2nd Faculty of Medicine of Charles University in Prague. One of the interests in paediatric neurology is a research on electroclinical syndromes area combined with speech disorders. The aim of our project is, among others, finding a connectivity between children's neurological disorders called developmental dysphasia [2], [3] and the assessment of the degree of perception and impairment of speech. From the point of view the characterisation of language, it is very complicated to determine relevant and irrelevant information about speech and to connect it with a searching target. That is why a part of the project is solved by artificial neural networks (ANNs) with using knowledge of phonetics.
At first the analysis of vowels was researched using the ANN. An initial hypothesis says that developmental dysphasia can influence a shift of formant frequencies in spectral characteristics compared with the formant frequencies of healthy children.
It is necessary to have a comparative voice analysis of healthy children for evaluating the degree of these modifications. Our team created the healthy and ill children's speech databases with a comparative corpus. The healthy children's speech was recorded at kindergartens and on the first level of elementary school. The ill children's speech was recorded at hospital. The children were from 4 to 10 years old. The comparative corpus, which includes isolated vowels, monosyllables and polysyllables, was compiled by neurological specialists as related to medical therapy. The same corpus was used for the comparative analysis of healthy children. Our aim is a vowel recognition and visualisation by a Supervised Self-Organizing Maps - Supervised SOMs, which represent one of the types of the ANNs with better cluster separation based on the Kohonen map, see [1]. Better cluster separation is useful for the visualisation analysis, which is easy for the current user. The Recognition Rate (RR) depends also on the knowledge of the children's voice evolution regularity related to their age and gender. Our main objective is not the highest RR, but to observe its trend. We assume that wrong mapped vowels should be one of the indicators of developmental dysphasia.
The application of the Supervised SOM should prove the ability not only to discriminate healthy and ill children, but also to describe a trend of the neurological disorders with the assistance of repeated three-month recordings during a medical therapy.
Our Laboratory of Artificial Neural Network Applications (LANNA) in the Czech Technical University in Prague (head of the laboratory is professor Jana Tučková) collaborates on a project with the Department of Paediatric Neurology, 2nd Faculty of Medicine of Charles University in Prague and with the Motol University Hospital (head of clinic is professor Vladimír Komárek), which focuses on the study of children with SLI.
The speech database contains two subgroups of recordings of children's speech from different types of speakers. The first subgroup (healthy) consists of recordings of children without speech disorders; the second subgroup (patients) consists of recordings of children with SLI. These children have different degrees of severity (1 – mild, 2 – moderate, and 3 – severe). The speech therapists and specialists from Motol Hospital decided upon this classification. The children’s speech was recorded in the period 2003-2013. These databases were commonly created in a schoolroom or a speech therapist’s consulting room, in the presence of surrounding background noise. This situation simulates the natural environment in which the children live, and is important for capturing the normal behavior of children. The database of healthy children’s speech was created as a referential database for the computer processing of children’s speech. It was recorded on the SONY digital Dictaphone (sampling frequency, fs = 16 kHz, 16-bit resolution in stereo mode in the standardized wav format) and on the MD SONY MZ-N710 (sampling frequency, fs = 44.1 kHz, 16-bit resolution in stereo mode in the standardized wav format). The corpus was recorded in the natural environment of a schoolroom and in a clinic. This subgroup contains a total of 44 native Czech participants (15 boys, 29 girls) aged 4 to 12 years, and was recorded during the period 2003–2005. The database of children with SLI was recorded in a private speech therapist’s office. The children’s speech is captured by means of a SHURE lapel microphone using the solution by the company AVID (MBox – USB AD/DA converter and ProTools LE software) on an Apple laptop (iBook G4). The sound recordings are saved in the standardized wav format. The sampling frequency is set to 44.1 kHz with 16-bit resolution in mono mode. This subgroup contains a total of 54 native Czech participants (35 boys, 19 girls) aged 6 to 12 years, and was recorded during the period 2009–2013. This package contains wav data sets for development and testing methods for detection children with SLI.
Software pack:
FORANA - was developed the original software FORANA for formants analysis. It is based on the MATLAB programming environment. The development of this software was mainly driven by the need to have the ability to complete formant analysis correctly and full automation of the process of extracting formants from the recorded speech signals. Development of this application is still running. Software was developed in the LANNA at CTU FEE in Prague.
LABELING - the program LABELING is used for segmentation of the speech signal. It is a part of SOMLab program system. Software was developed in the LANNA at CTU FEE in Prague.
PRAAT - is an acoustic analysis software. The Praat program was created by Paul Boersma and David Weenink of the Institute of Phonetics Sciences of the University of Amsterdam. Home page: http://www.praat.org or http://www.fon.hum.uva.nl/praat/.