Debiasing Algorithm through Model Adaptation
Please use the following text to cite this item or export to a predefined format:
Limisiewicz, Tomasz; Mareček, David and Musil, Tomáš, 2025,
Debiasing Algorithm through Model Adaptation, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-5847.
Authors
Item identifier
Project URL
Referenced by
Date issued
2025-01-31
Type
Description
Debiasing Algorithm through Model Adaptation (DAMA) is based on guarding stereotypical gender signals and model editing. DAMA is performed on specific modules prone to convey gender bias, as shown by causal tracing. Our novel method effectively reduces gender bias in LLaMA models in three diagnostic tests: generation, coreference (WinoBias), and stereotypical sentence likelihood (StereoSet). The method does not change the model’s architecture, parameter count, or inference cost. We have also shown that the model’s performance in language modeling and a diverse set of downstream tasks is almost unaffected. This package contains both the source codes and English, English-to-Czech, and English-to-German datasets.
Acknowledgement
Czech Science Foundation
Project code:GA23-06912S
Project name:Identification and Prevention of Unwanted Gender Bias in Neural Language Models
Subject(s)
Collections
Files in this item
- Name
- DAMA.zip
- Size
- 8.73 MB
- Format
- application/zip
- Description
- Code and data for DAMA
- MD5
- b58bcac6e5bdf7e54c5adb4afc7a36a1

- DAMA
- scripts
- eval_mt_new_models.sh1 kB
- run_adapt_model_7B.sh721 B
- run_coreference_evaluation_13B.sh1 kB
- eval_peft.sh853 B
- run_generation_evaluation_7B.sh965 B
- run_causal_lm_evaluation_13B.sh7 kB
- run_generation_evaluation_30B.sh943 B
- run_adapt_model_30B.sh752 B
- eval_memit.sh851 B
- run_coreference_evaluation_30B.sh1 kB
- eval_new_models.sh2 kB
- eval_ft.sh836 B
- run_causal_lm_evaluation_30B.sh941 B
- run_ft.sh495 B
- run_adapt_model_65B.sh737 B
- run_coreference_evaluation_7B.sh1 kB
- run_causal_lm_evaluation_7B.sh948 B
- run_generation_evaluation_13B.sh9 kB
- run_adapt_model_13B.sh827 B
- LICENSE1 kB
- examples
- llama_7B_l9_iter_postl_gen_bn_on.json876 B
- train_dama_tiny.json8 kB
- test_dama_tiny.json415 B
- .gitignore3 kB
- README.md2 kB
- data
- test_dama.json36 kB
- cs_train_llama2.json315 kB
- cs_variants.json11 kB
- convert_english.py361 B
- cs_train_1.json24 kB
- cs_train_llama3.json317 kB
- cs_train_2.json17 kB
- cs_moretok_train.json233 kB
- cs_train.json232 kB
- de_train_llama2.json150 kB
- de_moretok_train.json110 kB
- de_train_llama3.json150 kB
- de_variants.json4 kB
- de_train_1.json11 kB
- de_train.json109 kB
- multilingual_prompts
- en-cs_stereo_prompts.json555 kB
- en-de_stereo_prompts.json541 kB
- en-de_factual_prompts.json110 kB
- en-cs_factual_prompts.json113 kB
- tokenize_and_filter_variants.py4 kB
- de_train_2.json10 kB
- train_dama.json439 kB
- en_train.json399 kB
- professions.json7 kB
- requirements.txt216 B
- src
- __init__.py0 B
- noise.py8 kB
- rome
- rome_main.py5 kB
- compute_u.py3 kB
- __init__.py73 B
- rome_hparams.py675 B
- compute_v.py8 kB
- causal_tracing
- causal_trace.py26 kB
- __init__.py0 B
- utils.py937 B
- gender_trace.py1 kB
- dama_l
- dama_l_main.py11 kB
- dama_l_hparams.py590 B
- __init__.py45 B
- hf_upload_model.py3 kB
- globals.yml323 B
- memit
- memit_main.py10 kB
- compute_ks.py1 kB
- __init__.py77 B
- compute_z.py9 kB
- memit_hparams.py884 B
- evaluation
- generation.py5 kB
- __init__.py315 B
- causal_lm.py1 kB
- evaluate.py1 kB
- perplexity.py4 kB
- translation.py4 kB
- stereoset.py15 kB
- coreference.py3 kB
- qa.py3 kB
- ft
- ft_hparams.py672 B
- ft_main.py6 kB
- __init__.py65 B
- generate_adapt_data.py5 kB
- trace.py5 kB
- adapt_model.py11 kB
- utils
- __init__.py33 B
- logit_lens.py2 kB
- layer_stats.py5 kB
- repr_tools.py8 kB
- runningstats.py63 kB
- globals.py435 B
- knowns.py953 B
- tok_dataset.py3 kB
- notebooks_utils.py26 kB
- constants.py1 kB
- hparams.py435 B
- nethook.py15 kB
- model_utils.py8 kB
- generate.py5 kB
- evaluate_model.py6 kB
- dama
- compute_us_dama.py1 kB
- __init__.py73 B
- compute_v_dama.py12 kB
- dama_hparams.py858 B
- dama_main.py25 kB
- notebooks
- globals.yml334 B
- averaged_casual_tracing_severing_mlps.ipynb436 kB
- llama2_mt_adaptations_factual_stereotypical_traces.ipynb995 kB
- collect_results.ipynb133 kB
- factual_stereotypical_traces_severing_mlps.ipynb10 MB
- scripts

