Combining classifiers, so-called Multiple Classifier Systems (MCSs), gained a lot of interest has recent years. Researchers, developed a large variety of methods in order to exploit strengths of individual classifiers. In this paper, we address the problem of how to implement a multi-class classifier by an ensemble of one-class classifiers. To improve performance of a compound classifier, different individual classifiers (which may, e.g., differ in complexity, type, training algorithm or other) can be combined and that could increase its both performance, and robustness. The model of one-class classifiers can only recognize one of the classes, therefore, it is quite difficult to produce MCSs on the basis of one-class classifiers. Thus, we introduce a new scheme for decision-making in MCSs through a fuzzy inference system. Specifically, we address two important open problems in the context: model selection and combiner training. Classifiers' outputs as supports for given classes are combined by means of a fuzzy engine. Thus, we are interested in such individual classifiers which can return support for given classes. There are no other restrictions on the used classifiers. The proposed model has been evaluated by computer experiments on several benchmark datasets in the Matlab environment. Their results prove that fuzzy combination of binary classifiers may be a valuable classifier itself. Additionally, there are indicated both some application areas of the models, and new research frontiers to be examined.
LANGUAGES IN MIGRATION is designed as a representation of authentic spoken Czech and German that is used in informal speech (private environment, spontaneity, unpreparedness etc.) by Czech-German bilingual speakers born in Czechoslovakia around 1955 and who departed for Germany after becoming 12 years old. The corpus is composed of interviews conducted from 2018–2020 with 20 speakers on language biographies and narrated in Czech and German respectively. 10 interviews were recorded with late (German) repatriates and 10 with Czech migrants. The corpus includes transcripts of ca. 14 hours of Czech recordings and ca. 13,5 hours of German recordings. It contains 217 650 orthographic words (i.e. a total of 286 533 tokens including punctuation). Metadata of LANGUAGES IN MIGRATION include basic sociolinguistically relevant speaker categories (gender, year of birth and of migration, level of education and region of childhood and present residence).
The transcription of LANGUAGES IN MIGRATION is linked to the corresponding audio track. The transcription was carried out on the orthographic tier and supplemented by an additional metalanguage tier. The corpus LANGUAGES IN MIGRATION is lemmatized and morphologically tagged in different formats for Czech and German (Stuttgart-Tübingen-Tagset). Deviations from the norm of the spoken Czech and German of the homeland, which are understood as the result of language contact and language isolation, are tagged in a further tier both in the Czech and in the German sub-corpuses of LANGUAGES IN MIGRATION. The (anonymized) corpus is provided in form of transcripts in EAF format, which can be viewed via the freely available ELAN program, and a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query engine to registered users of the CNC at http://www.korpus.cz