This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Datasets and R scripts for modelling Czech translation counterparts of Romance causative constructions

Please use the following text to cite this item or export to a predefined format:
Štichauer,Pavel and Čermák, Petr, 2026, Datasets and R scripts for modelling Czech translation counterparts of Romance causative constructions, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-5841.
Date issued
2026-03-16
Size
1.4 mb
Language(s)
Description
This repository contains the datasets and code used in the study “Predicting translation counterparts in causative constructions.” The datasets consist of annotated examples of Italian and Spanish causative constructions and their Czech translation counterparts. The repository includes (i) full annotated datasets for Italian and Spanish, (ii) revised datasets used for statistical modelling, and (iii) the R script used to estimate Bayesian multinomial regression models using the brms package (Stan backend). The models estimate the probability of selecting a Czech translation counterpart (TYPE) as a function of verb valency (VALENCY) and complement class (COMP_CLASS), with random effects for VERB and TRANSLATOR. The repository also contains summaries of the fitted models.
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
Causatives_table_full_Spanish.xlsx
Size
752.69 KB
Format
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Description
MD5
48cb5346c5d691d09723099e97d4428c
Preview
  File Preview
Name
Causatives_table_full_Italian.xlsx
Size
397.49 KB
Format
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Description
MD5
250e3367f4c728567123f50a2f8aa8f4
Preview
  File Preview
Name
README_causatives_repository.txt
Size
1.41 KB
Format
text/plain
Description
MD5
d84644028ea2f287973b3f9a37bf65dc
Preview
  File Preview
    README – Supplementary materials
    
    This repository contains the data and code used in the article:
    
    “Predicting translation counterparts in causative constructions”
    
    FILES INCLUDED
    
    DATA
    - Causatives_table_full_Spanish.xlsx
      Full annotated dataset for Spanish causative constructions and their Czech translation counterparts.
    
    - Causatives_table_full_Italian.xlsx
      Full annotated dataset for Italian causative constructions and their Czech translation counterparts.
    
    - causatives_es_revised.csv
      Revised Spanish dataset used for statistical modelling, including the variable COMP_CLASS.
    
    - causatives_it_revised.csv
      Revised Italian dataset used for statistical modelling, including the variable COMP_CLASS.
    
    CODE
    - brms_causatives_models.R
      R script used to estimate the Bayesian multinomial regression models reported in the paper.
      The models were estimated using the brms package (Stan backend).
    
    MODEL OUTPUT
    - brms_summary_causatives.txt
      Summaries of the fitted Bayesian multinomial models for both datasets.
    
    MODEL SPECIFICATION
    
    The models estimate the probability of selecting a Czech translation counterpart (TYPE)
    as a function of:
    
    - VALENCY (valency of the base verb)
    - COMP_CLASS (class of the complement)
    
    with random effects for:
    
    - VERB (random intercepts and slopes for VALENCY)
    - TRANSLATOR (random intercepts)
    
    The models were fitted using Hamiltonian Monte Carlo as implemented in Stan
    via the brms package in R.
    
Name
brms_causatives_models.R
Size
1.08 KB
Format
application/octet-stream
Description
MD5
704d59f4a2ec2dc48f083eccef06d331
Preview
  File Preview
Name
brms_summary_causatives.txt
Size
20.13 KB
Format
text/plain
Description
MD5
d225ad3c0e22d05bcc5034cdea76dbe6
Preview
  File Preview
    > summary(brms_it_noanim)
     Family: categorical 
      Links: muB = logit; muC = logit; muD = logit; muE = logit; muF = logit; muX = logit 
    Formula: TYPE ~ VALENCY + COMP_CLASS + (1 + VALENCY | VERB) + (1 | TRANSLATOR) 
       Data: it (Number of observations: 1394) 
      Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
             total post-warmup draws = 12000
    
    Multilevel Hyperparameters:
    ~TRANSLATOR (Number of levels: 15) 
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
    sd(muB_Intercept)     0.41      0.25     0.03     0.97 1.00     2192     2556
    sd(muC_Intercept)     0.76      0.29     0.32     1.44 1.00     4258     5629
    sd(muD_Intercept)     0.51      0.35     0.03     1.33 1.00     3006     4035
    sd(muE_Intercept)     0.87      0.51     0.10     2.11 1.00     2721     2697
    sd(muF_Intercept)     0.39      0.26     0.02     1.00 1.00     2508     3392
    sd(muX_Intercept)     0.86      0.33     0.34     1.66 1.00     3778     4805
    
    ~VERB (Number of levels: 63) 
                                                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
    sd(muB_Intercept)                                   1.43      0.22     1.05     1.91 1.00     3231     5719
    sd(muB_VALENCYreflexive)                            1.47      0.97     0.08     3.74 1.00     3007     3738
    sd(muB_VALENCYtransitive)                           0.73      0.52     0.03     1.93 1.01     1353     2253
    sd(muC_Intercept)                                   1.87      0.32     1.29     2.57 1.00     2879     5073
    sd(muC_VALENCYreflexive)                            0.85      0.72     0.04     2.65 1.00     2437     3604
    sd(muC_VALENCYtransitive)                           0.67      0.53     0.03     1.99 1.00     1439     1685
    sd(muD_Intercept)                                   1.97      0.46     1.20     3.05 1.00     3270     5731
    sd(muD_VALENCYreflexive)                            1.83      1.39     0.08     5.17 1.00     3972     5000
    sd(muD_VALENC . . .
Name
causatives_es_revised.csv
Size
98.96 KB
Format
text/csv
Description
MD5
4f3bba1e0c12890b58d4dd3de7088d98
Preview
  File Preview
Name
causatives_it_revised.csv
Size
93.68 KB
Format
text/csv
Description
MD5
7a319ccd13a3269c26da517408cf7bde
Preview
  File Preview