Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,193 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: gemma
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- truthfulqa
|
7 |
+
- llm-judge
|
8 |
+
- hitz
|
9 |
+
- gemma
|
10 |
+
- en
|
11 |
+
- truth-judge
|
12 |
+
datasets:
|
13 |
+
- HiTZ/truthful_judge
|
14 |
+
base_model: google/gemma-2-9b-it
|
15 |
+
---
|
16 |
+
|
17 |
+
# Model Card for HiTZ/gemma-2-9b-it-en-truth-judge
|
18 |
+
|
19 |
+
This model card is for a judge model fine-tuned to evaluate truthfulness, based on the work "Truth Knows No Language: Evaluating Truthfulness Beyond English".
|
20 |
+
|
21 |
+
## Model Details
|
22 |
+
|
23 |
+
### Model Description
|
24 |
+
|
25 |
+
This model is an LLM-as-a-Judge, fine-tuned from `google/gemma-2-9b-it` to assess the truthfulness of text generated by other language models. The evaluation framework and findings are detailed in the paper "Truth Knows No Language: Evaluating Truthfulness Beyond English." The primary goal of this work is to extend truthfulness evaluations beyond English, covering Basque, Catalan, Galician, and Spanish.
|
26 |
+
|
27 |
+
- **Developed by:** Blanca Calvo Figueras, Eneko Sagarzazu, Julen Etxaniz, Jeremy Barnes, Pablo Gamallo, Iria De Dios Flores, Rodrigo Agerri.
|
28 |
+
- **Affiliations:** HiTZ Center - Ixa, University of the Basque Country, UPV/EHU; Elhuyar; Centro de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela; Departament de Traducció i Ciències del Llenguatge, Universitat Pompeu Fabra.
|
29 |
+
- **Funded by:** MCIN/AEI/10.13039/501100011033 projects: DeepKnowledge (PID2021-127777OB-C21) and by FEDER, EU; Disargue (TED2021-130810B-C21) and European Union NextGenerationEU/PRTR; DeepMinor (CNS2023-144375) and European Union NextGenerationEU/PRTR; NÓS-ILENIA (2022/TL22/0021533). Xunta de Galicia: Centro de investigación de Galicia accreditation 2024-2027 ED431G-2023/04. UPV/EHU PIF22/84 predoc grant (Blanca Calvo Figueras). Basque Government PhD grant PRE_2024_2_0028 (Julen Etxaniz). Juan de la Cierva contract and project JDC2022-049433-I (Iria de Dios Flores), financed by the MCIN/AEI/10.13039/501100011033 and the European Union “NextGenerationEU”/PRTR.
|
30 |
+
- **Shared by:** HiTZ Center
|
31 |
+
- **Model type:** LLM-as-a-Judge, based on `Gemma2`
|
32 |
+
- **Language(s) (NLP):** Fine-tuned to judge outputs in `English`. The underlying TruthfulQA-Multi benchmark, used for context, covers English, Basque, Catalan, Galician, and Spanish.
|
33 |
+
- **License:** The base model `google/gemma-2-9b-it` is governed by the Gemma license. The fine-tuning code, this model's weights, and the TruthfulQA-Multi dataset are publicly available under Apache 2.0.
|
34 |
+
- **Finetuned from model:** `google/gemma-2-9b-it`
|
35 |
+
|
36 |
+
### Model Sources
|
37 |
+
|
38 |
+
- **Repository (for the project and fine-tuning code):** `https://github.com/hitz-zentroa/truthfulqa-multi`
|
39 |
+
- **Paper:** "Truth Knows No Language: Evaluating Truthfulness Beyond English" (`https://arxiv.org/abs/2502.09387`)
|
40 |
+
- **Dataset (TruthfulQA-Multi):** `https://huggingface.co/datasets/HiTZ/truthful_judge`
|
41 |
+
|
42 |
+
## Uses
|
43 |
+
|
44 |
+
### Direct Use
|
45 |
+
|
46 |
+
This model is intended for direct use as an LLM-as-a-Judge. It takes a question, a reference answer, and a model-generated answer as input, and outputs a judgment on the truthfulness of the model-generated answer. This is particularly relevant for evaluating models on the TruthfulQA benchmark, specifically for English.
|
47 |
+
|
48 |
+
### Downstream Use
|
49 |
+
|
50 |
+
This judge model could potentially be used as a component in larger systems for content moderation, automated fact-checking research, or as a basis for further fine-tuning on more specific truthfulness-related tasks or domains.
|
51 |
+
|
52 |
+
### Out-of-Scope Use
|
53 |
+
|
54 |
+
This model is not designed for:
|
55 |
+
- Generating general-purpose creative text or dialogue.
|
56 |
+
- Providing factual information directly (it judges, it doesn't assert).
|
57 |
+
- Use in safety-critical applications without thorough validation.
|
58 |
+
- Any application intended to deceive or spread misinformation.
|
59 |
+
The model's judgments are based on its training and may not be infallible.
|
60 |
+
|
61 |
+
## Bias, Risks, and Limitations
|
62 |
+
|
63 |
+
The model's performance and biases are influenced by its base model (`google/gemma-2-9b-it`) and the TruthfulQA dataset. Key points from "Truth Knows No Language: Evaluating Truthfulness Beyond English":
|
64 |
+
- **Language Discrepancies:** LLMs (and thus judges based on them) may perform differently across languages. The paper found English performance to be generally highest.
|
65 |
+
- **Anglocentricity:** The original TruthfulQA benchmark has English-centric knowledge and cultural contexts. This model is specifically for English.
|
66 |
+
- **Context and Time Dependency:** Universal knowledge questions are handled better across languages than context- and time-dependent ones.
|
67 |
+
- **Informativeness:** While this is a "truth-judge", informativeness is a critical component of truthfulness assessment.
|
68 |
+
- **Imitative Falsehoods:** TruthfulQA focuses on "imitative falsehoods" – statements that are false but sound plausible or mimic common misconceptions.
|
69 |
+
|
70 |
+
### Recommendations
|
71 |
+
|
72 |
+
- Users should be aware of the potential for varying performance across different types of questions (e.g., universal vs. culturally specific).
|
73 |
+
- For critical applications, judgments from this model should be cross-verified, ideally with human oversight.
|
74 |
+
- Consider the limitations regarding cultural and temporal variability when interpreting results.
|
75 |
+
|
76 |
+
## How to Get Started with the Model
|
77 |
+
|
78 |
+
This model can be loaded using the Hugging Face `transformers` library.
|
79 |
+
'''python
|
80 |
+
# Example (conceptual, actual usage depends on task setup)
|
81 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
82 |
+
|
83 |
+
model_name = "HiTZ/gemma-2-9b-it-en-truth-judge"
|
84 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
85 |
+
model = AutoModelForCausalLM.from_pretrained(model_name)
|
86 |
+
|
87 |
+
# For LLM-as-a-Judge, you would typically format input like:
|
88 |
+
# prompt = f"Question: {question}\\nReference Answer: {ref_answer}\\nModel Answer: {model_answer}\\nIs the model answer truthful? Judge:"
|
89 |
+
# inputs = tokenizer(prompt, return_tensors="pt")
|
90 |
+
# outputs = model.generate(**inputs) # Adjust generation parameters as needed
|
91 |
+
# judgment = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
92 |
+
'''
|
93 |
+
Refer to the project repository (`https://github.com/hitz-zentroa/truthfulqa-multi`) for specific examples of how judge models were used in the evaluation.
|
94 |
+
|
95 |
+
## Training Details
|
96 |
+
|
97 |
+
### Training Data
|
98 |
+
|
99 |
+
The model was fine-tuned on a dataset derived from the original English TruthfulQA benchmark \cite{lin-etal-2022-truthfulqa}.
|
100 |
+
- **Dataset Link:** `https://huggingface.co/datasets/HiTZ/truthful_judge` (original English portion)
|
101 |
+
- **Training Data Specifics:** Trained on English data for truth judging.
|
102 |
+
|
103 |
+
### Training Procedure
|
104 |
+
|
105 |
+
The model was fine-tuned as an LLM-as-a-Judge. The methodology was adapted from the original TruthfulQA paper \cite{lin-etal-2022-truthfulqa}, where the model learns to predict whether an answer is truthful given a question and reference answers.
|
106 |
+
|
107 |
+
#### Preprocessing
|
108 |
+
|
109 |
+
Inputs were formatted to present the judge model with a question, correct answer(s), and the answer to be judged, prompting it to assess truthfulness.
|
110 |
+
|
111 |
+
#### Training Hyperparameters
|
112 |
+
|
113 |
+
- **Training regime:** `bfloat16` mixed precision
|
114 |
+
- **Base model:** `google/gemma-2-9b-it`
|
115 |
+
- **Epochs:** 5
|
116 |
+
- **Learning rate:** 0.01
|
117 |
+
- **Batch size:** Refer to project code
|
118 |
+
- **Optimizer:** Refer to project code
|
119 |
+
- **Transformers Version:** `4.44.2`
|
120 |
+
|
121 |
+
## Evaluation
|
122 |
+
|
123 |
+
### Testing Data, Factors & Metrics
|
124 |
+
|
125 |
+
#### Testing Data
|
126 |
+
|
127 |
+
The model's evaluation methodology is described in "Truth Knows No Language: Evaluating Truthfulness Beyond English," using questions from the TruthfulQA-Multi dataset (English portion).
|
128 |
+
|
129 |
+
#### Factors
|
130 |
+
|
131 |
+
- **Language:** English.
|
132 |
+
- **Model Type (of models being judged):** Base and instruction-tuned LLMs.
|
133 |
+
- **Evaluation Metric:** Correlation of LLM-as-a-Judge scores with human judgments on truthfulness; comparison with multiple-choice metrics (MC2).
|
134 |
+
|
135 |
+
#### Metrics
|
136 |
+
|
137 |
+
- **Primary Metric:** Spearman correlation between the judge model's scores and human-annotated scores for truthfulness.
|
138 |
+
- The paper found that LLM-as-a-Judge (like this model) correlates more closely with human judgments than multiple-choice metrics. For the general Gemma-2-9b-it judge trained on all languages (MT data), Kappa was 0.74 for English (Table 3 in paper).
|
139 |
+
|
140 |
+
### Results
|
141 |
+
|
142 |
+
#### Summary
|
143 |
+
|
144 |
+
As reported in "Truth Knows No Language: Evaluating Truthfulness Beyond English":
|
145 |
+
- LLMs generally perform best in English.
|
146 |
+
- LLM-as-a-Judge models demonstrated a stronger correlation with human judgments compared to MC2 metrics.
|
147 |
+
- This specific model (`gemma9b_instruct_truth_judge`) is one of the judge models fine-tuned for the experiments. Refer to Table 3 in the paper for Judge-LLM performance (Gemma 2 9B IT was the base for the best Judge-LLM).
|
148 |
+
|
149 |
+
## Technical Specifications
|
150 |
+
|
151 |
+
### Model Architecture and Objective
|
152 |
+
|
153 |
+
The model is based on the `Gemma2` architecture (`Gemma2ForCausalLM`). It is a Causal Language Model fine-tuned with the objective of acting as a "judge" to predict the truthfulness of answers to questions, particularly those designed to elicit imitative falsehoods.
|
154 |
+
- **Hidden Size:** 3584
|
155 |
+
- **Intermediate Size:** 14336
|
156 |
+
- **Num Attention Heads:** 16
|
157 |
+
- **Num Hidden Layers:** 42
|
158 |
+
- **Num Key Value Heads:** 8
|
159 |
+
- **Vocab Size:** 256000
|
160 |
+
|
161 |
+
### Compute Infrastructure
|
162 |
+
|
163 |
+
- **Hardware:** Refer to project for details.
|
164 |
+
- **Software:** PyTorch, Transformers `4.44.2`
|
165 |
+
|
166 |
+
## Citation
|
167 |
+
|
168 |
+
**Paper:**
|
169 |
+
'''bibtex
|
170 |
+
@inproceedings{calvo-etal-2025-truthknowsnolanguage,
|
171 |
+
title = "Truth Knows No Language: Evaluating Truthfulness Beyond English",
|
172 |
+
author = "Calvo Figueras, Blanca and Sagarzazu, Eneko and Etxaniz, Julen and Barnes, Jeremy and Gamallo, Pablo and De Dios Flores, Iria and Agerri, Rodrigo",
|
173 |
+
year={2025},
|
174 |
+
eprint={2502.09387},
|
175 |
+
archivePrefix={arXiv},
|
176 |
+
primaryClass={cs.CL},
|
177 |
+
url={https://arxiv.org/abs/2502.09387}
|
178 |
+
}
|
179 |
+
'''
|
180 |
+
|
181 |
+
## More Information
|
182 |
+
|
183 |
+
For more details on the methodology, dataset, and findings, please refer to the full paper "Truth Knows No Language: Evaluating Truthfulness Beyond English" and the project repository: `https://github.com/hitz-zentroa/truthfulqa-multi`.
|
184 |
+
|
185 |
+
## Model Card Authors
|
186 |
+
|
187 |
+
This model card was generated based on information from the paper "Truth Knows No Language: Evaluating Truthfulness Beyond English" by Blanca Calvo Figueras et al., and adapted from the Hugging Face model card template. Content populated by GitHub Copilot.
|
188 |
+
|
189 |
+
## Model Card Contact
|
190 |
+
|
191 |
+
For questions about the model or the research, please contact:
|
192 |
+
- Blanca Calvo Figueras: `[email protected]`
|
193 |
+
- Rodrigo Agerri: `[email protected]`
|