This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Migrant Stories

Please use the following text to cite this item or export to a predefined format:
Hájek, Martin; Mírovský, Jiří and Hladká, Barbora, 2022, Migrant Stories, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-4818.
Date issued
2022-10-22
Size
1017 entries
Language(s)
Description
Migrant Stories is a corpus of 1017 short biographic narratives of migrants supplemented with meta information about countries of origin/destination, the migrant gender, GDP per capita of the respective countries, etc. The corpus has been compiled as a teaching material for data analysis.
Acknowledgement
Subject(s)
 Files in this item
Name
Migrant_Stories.zip
Size
761.94 KB
Format
application/zip
Description
Migrant Stories distribution
MD5
d0b899ae9bc071f01f4c077b9fcb119f
Preview
  File Preview
Name
README.TXT
Size
2.95 KB
Format
text/plain
Description
Migrant Stories description
MD5
b1f9ac74f0b13c1f4296a21f6ab111b7
Preview
  File Preview
    ===============
    Migrant Stories
    ===============
    
    
    Authors
    =======
    
    Martin Hájek (martin.hajek@fsv.cuni.cz)
    Jiří Mírovský (mirovsky@ufal.mff.cuni.cz)
    Barbora Hladká (hladka@ufal.mff.cuni.cz)
    
    Introduction
    ============
    
    Migrant Stories is a corpus of 1017 short biographic narratives of migrants
    originally published on https://iamamigrant.org/stories/.
    For the original site, the narratives had been adapted by people or organizations
    submitting the particular story and eventually selected for publication by
    The International Organization for Migration (IOM, the UN organization providing
    help for migrants). It is a very heterogeneous sample of migrants' stories and
    cannot be taken as representative or unbiased sample of migrant experiences over
    the world.
    
    In the Migrant Stories corpus, the narratives have been supplemented with meta
    information about countries of origin/destination, the migrant gender, GDP per capita
    of the respective countries etc., see below for details.
    
    The Migrant Stories corpus was compiled for students in the course NPFL134 (Data
    Analytics for Students of Social Studies and Humanities) at the Institute of
    Formal and Applied Linguistics in the summer semester of 2022
    (https://ufal.mff.cuni.cz/courses/npfl134), as a teaching material for data analysis. 
    
    Data Format
    ===========
    
    The data are distributed in a single TSV (tab-separated values) file.
    Each story is represented by a single row containing the following fields (columns):
    
    - id_story (a numerical id from 1 to 1017)
    - name (the name of the migrant)
    - country_or (the country of origin)
    - country_de (the destination country)
    - conti_or (the original (part of) continent; A for Africa, E for Europe, I for Asia,
    LA for Latin America, M for Middle East, NA for North America, O for other)
    - conti_de (the destination (part of ) continent)
    - distance (classification of the distance from the origin to the destination
    into two classes: close, far
    - country_or_gdp (GDP per capita of the original . . .