This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Debiasing Algorithm through Model Adaptation

Please use the following text to cite this item or export to a predefined format:
Limisiewicz, Tomasz; Mareček, David and Musil, Tomáš, 2025, Debiasing Algorithm through Model Adaptation, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-5847.
Date issued
2025-01-31
Language(s)
Description
Debiasing Algorithm through Model Adaptation (DAMA) is based on guarding stereotypical gender signals and model editing. DAMA is performed on specific modules prone to convey gender bias, as shown by causal tracing. Our novel method effectively reduces gender bias in LLaMA models in three diagnostic tests: generation, coreference (WinoBias), and stereotypical sentence likelihood (StereoSet). The method does not change the model’s architecture, parameter count, or inference cost. We have also shown that the model’s performance in language modeling and a diverse set of downstream tasks is almost unaffected. This package contains both the source codes and English, English-to-Czech, and English-to-German datasets.
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
DAMA.zip
Size
8.73 MB
Format
application/zip
Description
Code and data for DAMA
MD5
b58bcac6e5bdf7e54c5adb4afc7a36a1
Preview
  File Preview
  • DAMA
    • scripts
      • eval_mt_new_models.sh1 kB
      • run_adapt_model_7B.sh721 B
      • run_coreference_evaluation_13B.sh1 kB
      • eval_peft.sh853 B
      • run_generation_evaluation_7B.sh965 B
      • run_causal_lm_evaluation_13B.sh7 kB
      • run_generation_evaluation_30B.sh943 B
      • run_adapt_model_30B.sh752 B
      • eval_memit.sh851 B
      • run_coreference_evaluation_30B.sh1 kB
      • eval_new_models.sh2 kB
      • eval_ft.sh836 B
      • run_causal_lm_evaluation_30B.sh941 B
      • run_ft.sh495 B
      • run_adapt_model_65B.sh737 B
      • run_coreference_evaluation_7B.sh1 kB
      • run_causal_lm_evaluation_7B.sh948 B
      • run_generation_evaluation_13B.sh9 kB
      • run_adapt_model_13B.sh827 B
    • LICENSE1 kB
    • examples
      • llama_7B_l9_iter_postl_gen_bn_on.json876 B
      • train_dama_tiny.json8 kB
      • test_dama_tiny.json415 B
    • .gitignore3 kB
    • README.md2 kB
    • data
      • test_dama.json36 kB
      • cs_train_llama2.json315 kB
      • cs_variants.json11 kB
      • convert_english.py361 B
      • cs_train_1.json24 kB
      • cs_train_llama3.json317 kB
      • cs_train_2.json17 kB
      • cs_moretok_train.json233 kB
      • cs_train.json232 kB
      • de_train_llama2.json150 kB
      • de_moretok_train.json110 kB
      • de_train_llama3.json150 kB
      • de_variants.json4 kB
      • de_train_1.json11 kB
      • de_train.json109 kB
      • multilingual_prompts
        • en-cs_stereo_prompts.json555 kB
        • en-de_stereo_prompts.json541 kB
        • en-de_factual_prompts.json110 kB
        • en-cs_factual_prompts.json113 kB
      • tokenize_and_filter_variants.py4 kB
      • de_train_2.json10 kB
      • train_dama.json439 kB
      • en_train.json399 kB
      • professions.json7 kB
    • requirements.txt216 B
    • src
      • __init__.py0 B
      • noise.py8 kB
      • rome
        • rome_main.py5 kB
        • compute_u.py3 kB
        • __init__.py73 B
        • rome_hparams.py675 B
        • compute_v.py8 kB
      • causal_tracing
        • causal_trace.py26 kB
        • __init__.py0 B
        • utils.py937 B
        • gender_trace.py1 kB
      • dama_l
        • dama_l_main.py11 kB
        • dama_l_hparams.py590 B
        • __init__.py45 B
      • hf_upload_model.py3 kB
      • globals.yml323 B
      • memit
        • memit_main.py10 kB
        • compute_ks.py1 kB
        • __init__.py77 B
        • compute_z.py9 kB
        • memit_hparams.py884 B
      • evaluation
        • generation.py5 kB
        • __init__.py315 B
        • causal_lm.py1 kB
        • evaluate.py1 kB
        • perplexity.py4 kB
        • translation.py4 kB
        • stereoset.py15 kB
        • coreference.py3 kB
        • qa.py3 kB
      • ft
        • ft_hparams.py672 B
        • ft_main.py6 kB
        • __init__.py65 B
      • generate_adapt_data.py5 kB
      • trace.py5 kB
      • adapt_model.py11 kB
      • utils
        • __init__.py33 B
        • logit_lens.py2 kB
        • layer_stats.py5 kB
        • repr_tools.py8 kB
        • runningstats.py63 kB
        • globals.py435 B
        • knowns.py953 B
        • tok_dataset.py3 kB
        • notebooks_utils.py26 kB
        • constants.py1 kB
        • hparams.py435 B
        • nethook.py15 kB
        • model_utils.py8 kB
        • generate.py5 kB
      • evaluate_model.py6 kB
      • dama
        • compute_us_dama.py1 kB
        • __init__.py73 B
        • compute_v_dama.py12 kB
        • dama_hparams.py858 B
        • dama_main.py25 kB
      • notebooks
        • globals.yml334 B
        • averaged_casual_tracing_severing_mlps.ipynb436 kB
        • llama2_mt_adaptations_factual_stereotypical_traces.ipynb995 kB
        • collect_results.ipynb133 kB
        • factual_stereotypical_traces_severing_mlps.ipynb10 MB