This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

InCroMin 1.0: Corpus of Cross-lingual Dialogues with Minutes and Detection of Misunderstandings

Please use the following text to cite this item or export to a predefined format:
Marko Čechovič, Natália Komorníková, Dominik Macháček, Ondřej Bojar, 2025, InCroMin 1.0: Corpus of Cross-lingual Dialogues with Minutes and Detection of Misunderstandings, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-5956.
Date issued
2025-07-08
Size
5 hours,
14 entries
Description
This data package contains published parts of InCroMin, a corpus of cross-lingual dialogues with minutes and detection of misunderstandings. InCroMin is described in a paper **Corpus of Cross-lingual Dialogues with Minutes and Detection of Misunderstandings,** by Marko Čechovič, Natália Komorníková, Dominik Macháček, and Ondřej Bojar. To be published in TSD 2025. The data were created by volunteering participants, by 2-5 people in each meeting. They were matched in a way that there are at least two groups of people who did not understand each other's language. Their meeting was facilitated by simultaneous speech translation tool integrated in Minuteman. The meetings were held via a teleconferencing platform that recorded each speaker in a separate audio track. The participants gave consent with data processing and release. Then, their speech was automatically transcribed in their original language, and automatically translated into English. Then, human annotators manually corrected transcripts and translations, and deidentified audio and texts by removing confidential information such as person names. The annotators also created minutes. InCroMin corpus is a very useful data set intended primarily for evaluating automatic systems that aim to facilitate cross-lingual dialogues in realistic conditions and end-to-end. It can evaluate Automatic Speech Processing, Speech Translation, Simultaneous Speech Translation, Quality Estimation, and Automatic Minuting.
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
incromin-1.0.zip
Size
177.13 MB
Format
application/zip
Description
MD5
604e9031015bde065f65b2de5e5efef3
Preview
  File Preview
  • meetings
    • ru_zh_1
      • A_ru-B_zh_corrected.txt11 kB
      • A_ru-B_zh.mp35 MB
      • minutes.txt718 B
    • fr_sk_1
      • B_sk.tt.txt6 kB
      • B_sk.mp39 MB
      • B_en_corrected.tt.txt6 kB
      • A_en_corrected.tt.txt6 kB
      • minutes.txt716 B
      • B_sk_audiodeident.tt.txt28 B
      • A_fr_corrected.tt.txt7 kB
      • A_fr.tt.txt6 kB
      • A_fr.mp39 MB
      • B_sk_corrected.tt.txt6 kB
    • cs_ru_3
      • A_ru.mp35 MB
      • A_ru.tt.txt13 kB
      • B_en_corrected.tt.txt12 kB
      • A_ru_corrected.tt.txt13 kB
      • A_en_corrected.tt.txt8 kB
      • minutes.txt760 B
      • B_en.tt.txt12 kB
      • B_cs.mp35 MB
      • B_cs_corrected.tt.txt13 kB
      • B_cs.tt.txt13 kB
    • cs_ru_2
      • B_cs.tt.txt22 kB
      • B_en_corrected.tt.txt20 kB
      • A_ru.mp34 MB
      • B_cs.mp34 MB
      • B_cs_corrected.tt.txt21 kB
      • minutes.txt1 kB
      • A_en.tt.txt3 kB
      • A_ru.tt.txt5 kB
    • cs_ru_1
      • A_ru.mp37 MB
      • A_ru.tt.txt15 kB
      • B_en_corrected.tt.txt17 kB
      • A_en_corrected.tt.txt13 kB
      • minutes.txt5 kB
      • B_cs_audiodeident.tt.txt138 B
      • B_cs.mp37 MB
      • A_ru_audiodeident.tt.txt439 B
      • B_cs_corrected.tt.txt17 kB
      • B_cs.tt.txt14 kB
    • cs_zh_1
      • A_zh.tt.txt4 kB
      • B_en_corrected.tt.txt15 kB
      • minutes.txt2 kB
      • B_en.tt.txt15 kB
      • B_cs_audiodeident.tt.txt253 B
      • B_cs.mp35 MB
      • B_cs_corrected.tt.txt16 kB
      • A_zh.mp35 MB
      • B_cs.tt.txt16 kB
      • A_en.tt.txt4 kB
      • A_zh_audiodeident.tt.txt85 B
    • cs_pt-BR_1
      • A_cs.tt.txt10 kB
      • B_en_corrected.tt.txt10 kB
      • B_pt-BR.mp35 MB
      • A_en_corrected.tt.txt9 kB
      • minutes.txt4 kB
      • B_pt-BR2pt-BR.tt.txt12 kB
      • B_pt-BR_corrected.tt.txt12 kB
      • A_en.tt.txt10 kB
      • A_cs.mp35 MB
      • A_cs_corrected.tt.txt10 kB
    • cs_it_1
      • B_en_corrected.tt.txt9 kB
      • A_en_corrected.tt.txt7 kB
      • A_it.mp35 MB
      • A_it.tt.txt7 kB
      • B_cs_audiodeident.tt.txt163 B
      • B_cs.mp35 MB
      • B_cs_corrected.tt.txt9 kB
      • A_it_corrected.tt.txt7 kB
      • B_cs.tt.txt9 kB
      • A_it_audiodeident.tt.txt569 B
    • cs_cs_es_pt_sk_1
      • C_cs.mp33 MB
      • E_cs_audiodeident.tt.txt56 B
      • A_pt.tt.txt2 kB
      • D_es.mp33 MB
      • A_pt.mp33 MB
      • A_pt_audiodeident.tt.txt27 B
      • C_en.tt.txt1 kB
      • B_sk.tt.txt2 kB
      • A_en.tt.txt2 kB
      • C_cs_corrected.tt.txt1 kB
      • D_es.tt.txt1 kB
      • D_en.tt.txt1 kB
      • E_cs.mp33 MB
      • E_cs_corrected.tt.txt5 kB
      • B_en.tt.txt1 kB
      • E_en.tt.txt3 kB
      • E_cs.tt.txt3 kB
      • C_cs_audiodeident.tt.txt27 B
      • C_cs.tt.txt1 kB
    • cs_cs_zh_1
      • A_cs.tt.txt5 kB
      • C_en_corrected.tt.txt10 kB
      • C_zh_audiodeident.tt.txt774 B
      • C_en.tt.txt10 kB
      • C_zh.tt.txt9 kB
      • C_zh.mp35 MB
      • B_en.tt.txt3 kB
      • A_cs_audiodeident.tt.txt380 B
      • B_cs.mp35 MB
      • B_cs_audiodeident.tt.txt385 B
      • B_cs_corrected.tt.txt4 kB
      • B_cs.tt.txt4 kB
      • A_en.tt.txt4 kB
      • A_cs.mp35 MB
      • A_cs_corrected.tt.txt5 kB
      • C_zh_corrected.tt.txt8 kB
    • cs_hy_1
      • A_hy.tt.txt5 kB
      • B_en_corrected.tt.txt9 kB
      • minutes.txt1 kB
      • B_en.tt.txt12 kB
      • B_cs.mp38 MB
      • B_cs_corrected.tt.txt10 kB
      • B_cs.tt.txt10 kB
      • A_en.tt.txt4 kB
      • A_hy.mp38 MB
    • uk_vi_1
      • B_vi_corrected.tt.txt6 kB
      • B_en_corrected.tt.txt4 kB
      • B_vi.tt.txt6 kB
      • B_vi.mp34 MB
      • A_en_corrected.tt.txt6 kB
      • A_uk.mp34 MB
      • minutes.txt2 kB
      • A_uk.tt.txt8 kB
    • cs_mr_1
      • B_mr.mp311 MB
      • B_en.tt.txt5 kB
      • A_en.orig.tt.txt6 kB
      • A_cs_audiodeident.tt.txt101 B
      • A_cs.mp35 MB
      • B_mr.tt.txt8 kB
      • A_cs.tt.txt9 kB
      • A_en.tt.txt6 kB
    • ru_zh_2
      • A_ru-B_zh_corrected.txt12 kB
      • A_ru-B_zh.mp317 MB
      • A_ru-B_zh.diarization_corrected.tt.txt3 kB
      • minutes.txt2 kB
    • README.md6 kB
    • metadata.ods38 kB