1 - 6 of 6
Number of results to display per page
Search Results
2. In memoriam Stanislav Segert
- Creator:
- Oliverius, Jaroslav and Zemánek, Petr
- Format:
- bez média and svazek
- Type:
- model:article and TEXT
- Language:
- English
- Rights:
- http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
3. Prague Arabic Dependency Treebank 1.0
- Creator:
- Hajič, Jan, Smrž, Otakar, Zemánek, Petr, Pajas, Petr, Šnaidauf, Jan, Beška, Emanuel, Kracmar, Jakub, and Hassanová, Kamila
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- corpus and Arabic
- Language:
- Arabic
- Description:
- The PADT project might be summarized as an open-ended activity of the Center for Computational Linguistics, the Institute of Formal and Applied Linguistics, and the Institute of Comparative Linguistics, Charles University in Prague, resting in multi-level annotation of Arabic language resources in the light of the theory of Functional Generative Description (Sgall et al., 1986; Hajičová and Sgall, 2003).
- Rights:
- Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB
4. UFAL Parallel Corpus of North Levantine 1.0
- Creator:
- Sellat, Hashem, Saleh, Shadi, Krubiński, Mateusz, Pospíšil, Adam, Zemánek, Petr, and Pecina, Pavel
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- multilingual, machine translation, parallel corpus, north levantine, and corpus
- Language:
- North Levantine Arabic, English, French, Spanish, Standard Arabic, Modern Greek (1453-), and German
- Description:
- This is the first release of the UFAL Parallel Corpus of North Levantine, compiled by the Institute of Formal and Applied Linguistics (ÚFAL) at Charles University within the Welcome project (https://welcome-h2020.eu/). The corpus consists of 120,600 multiparallel sentences in English, French, German, Greek, Spanish, and Standard Arabic selected from the OpenSubtitles2018 corpus [1] and manually translated into the North Levantine Arabic language. The corpus was created for the purpose of training machine translation for North Levantine and the other languages.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
5. UFAL Speech Corpus of North Levantine Arabic 1.0 - Part 1
- Creator:
- Zemánek, Petr, Pospíšil, Adam, Sellat, Hashem, Krubiński, Mateusz, and Pecina, Pavel
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- audio and corpus
- Subject:
- speech corpus, speech recognition, speech-to-text translation, machine translation, multilingual, Arabic, Arabic Corpus, and north levantine
- Language:
- North Levantine Arabic and English
- Description:
- The corpus contains recordings by the native speakers of the North Levantine Arabic (apc) acquired during 2020, 2021, and 2023 in Prague, Paris, Kabardia, and St. Petersburg. Altogether, there were 13 speakers (9 male and 4 female, aged 1x 15-20, 7x 20-30, 4x 30-40, and 1x 40-50). The recordings contain both monologues and dialogues on the topics of everyday life (health, education, family life, sports, culture) as well as information on both host countries (living abroad) and country of origin (Syria traditions, education system, etc.). Both types are spontaneous, the participants were given only the general subject and talked on the topic or discussed it freely. The transcription and translation team consisted of students of Arabic at Charles University, with an additional quality check provided by the native speakers of the dialect. The textual data is split between the (parallel) transcriptions (.apc) and translations (.eng), with one segment per line. The additional .yaml file provides mapping to the corresponding audio file (with the duration and offset in the "%S.%03d" format, i.e., seconds and milliseconds) and a unique speaker ID. The audio data is shared in the 48kHz .wav format, with dialogues and monologues in separate folders. All of the recordings are mono, with a single channel. For dialogues, there is a separate file for each speaker, e.g., "Tar_13052022_Czechia-01.wav" and "Tar_13052022_Czechia-02.wav". The data provided in this repository corresponds to the validation split of the dialectal Arabic to English shared task hosted at the 21st edition of the International Conference on Spoken Language Translation, i.e., IWSLT 2024.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
6. UFAL Speech Corpus of North Levantine Arabic 1.0 - Part 2
- Creator:
- Zemánek, Petr, Pospíšil, Adam, Sellat, Hashem, Krubiński, Mateusz, and Pecina, Pavel
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- audio and corpus
- Subject:
- speech corpus, speech recognition, speech-to-text translation, machine translation, multilingual, Arabic, Arabic Corpus, and north levantine
- Language:
- North Levantine Arabic and English
- Description:
- The corpus contains recordings by the native speakers of the North Levantine Arabic (apc) acquired during 2020, 2021, and 2023 in Prague, Paris, Kabardia, and St. Petersburg. Altogether, there were 13 speakers (9 male and 4 female, aged 1x 15-20, 7x 20-30, 4x 30-40, and 1x 40-50). The recordings contain both monologues and dialogues on the topics of everyday life (health, education, family life, sports, culture) as well as information on both host countries (living abroad) and country of origin (Syria traditions, education system, etc.). Both types are spontaneous, the participants were given only the general subject and talked on the topic or discussed it freely. The transcription and translation team consisted of students of Arabic at Charles University, with an additional quality check provided by the native speakers of the dialect. The textual data is split between the (parallel) transcriptions (.apc) and translations (.eng), with one segment per line. The additional .yaml file provides mapping to the corresponding audio file (with the duration and offset in the "%S.%03d" format, i.e., seconds and milliseconds) and a unique speaker ID. The audio data is shared in the 48kHz .wav format, with dialogues and monologues in separate folders. All of the recordings are mono, with a single channel. For dialogues, there is a separate file for each speaker, e.g., "16072022_Family-01.wav" and "16072022_Family-02.wav". The data provided in this repository corresponds to the test split of the dialectal Arabic to English shared task hosted at the 21st edition of the International Conference on Spoken Language Translation, i.e., IWSLT 2024.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB