gemma-2-9b-it-en-truth-judge / README.md

Update README.md

ce2bac7 verified 4 months ago

10.3 kB

	---
	license: gemma
	language:
	- en
	tags:
	- truthfulqa
	- llm-judge
	- hitz
	- gemma
	- en
	- truth-judge
	datasets:
	- HiTZ/truthful_judge
	base_model: google/gemma-2-9b-it
	---

	# Model Card for HiTZ/gemma-2-9b-it-en-truth-judge

	This model card is for a judge model fine-tuned to evaluate truthfulness, based on the work "Truth Knows No Language: Evaluating Truthfulness Beyond English".

	## Model Details

	### Model Description

	This model is an LLM-as-a-Judge, fine-tuned from `google/gemma-2-9b-it` to assess the truthfulness of text generated by other language models. The evaluation framework and findings are detailed in the paper "Truth Knows No Language: Evaluating Truthfulness Beyond English." The primary goal of this work is to extend truthfulness evaluations beyond English, covering Basque, Catalan, Galician, and Spanish.

	- Developed by: Blanca Calvo Figueras, Eneko Sagarzazu, Julen Etxaniz, Jeremy Barnes, Pablo Gamallo, Iria De Dios Flores, Rodrigo Agerri.
	- Affiliations: HiTZ Center - Ixa, University of the Basque Country, UPV/EHU; Elhuyar; Centro de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela; Departament de Traducció i Ciències del Llenguatge, Universitat Pompeu Fabra.
	- Funded by: MCIN/AEI/10.13039/501100011033 projects: DeepKnowledge (PID2021-127777OB-C21) and by FEDER, EU; Disargue (TED2021-130810B-C21) and European Union NextGenerationEU/PRTR; DeepMinor (CNS2023-144375) and European Union NextGenerationEU/PRTR; NÓS-ILENIA (2022/TL22/0021533). Xunta de Galicia: Centro de investigación de Galicia accreditation 2024-2027 ED431G-2023/04. UPV/EHU PIF22/84 predoc grant (Blanca Calvo Figueras). Basque Government PhD grant PRE_2024_2_0028 (Julen Etxaniz). Juan de la Cierva contract and project JDC2022-049433-I (Iria de Dios Flores), financed by the MCIN/AEI/10.13039/501100011033 and the European Union “NextGenerationEU”/PRTR.
	- Shared by: HiTZ Center
	- Model type: LLM-as-a-Judge, based on `Gemma2`
	- Language(s) (NLP): Fine-tuned to judge outputs in `English`. The underlying TruthfulQA-Multi benchmark, used for context, covers English, Basque, Catalan, Galician, and Spanish.
	- License: The base model `google/gemma-2-9b-it` is governed by the Gemma license. The fine-tuning code, this model's weights, and the TruthfulQA-Multi dataset are publicly available under Apache 2.0.
	- Finetuned from model: `google/gemma-2-9b-it`

	### Model Sources

	- Repository (for the project and fine-tuning code): `https://github.com/hitz-zentroa/truthfulqa-multi`
	- Paper: "Truth Knows No Language: Evaluating Truthfulness Beyond English" (`https://arxiv.org/abs/2502.09387`)
	- Dataset (TruthfulQA-Multi): `https://huggingface.co/datasets/HiTZ/truthful_judge`

	## Uses

	### Direct Use

	This model is intended for direct use as an LLM-as-a-Judge. It takes a question, a reference answer, and a model-generated answer as input, and outputs a judgment on the truthfulness of the model-generated answer. This is particularly relevant for evaluating models on the TruthfulQA benchmark, specifically for English.

	### Downstream Use

	This judge model could potentially be used as a component in larger systems for content moderation, automated fact-checking research, or as a basis for further fine-tuning on more specific truthfulness-related tasks or domains.

	### Out-of-Scope Use

	This model is not designed for:
	- Generating general-purpose creative text or dialogue.
	- Providing factual information directly (it judges, it doesn't assert).
	- Use in safety-critical applications without thorough validation.
	- Any application intended to deceive or spread misinformation.
	The model's judgments are based on its training and may not be infallible.

	## Bias, Risks, and Limitations

	The model's performance and biases are influenced by its base model (`google/gemma-2-9b-it`) and the TruthfulQA dataset. Key points from "Truth Knows No Language: Evaluating Truthfulness Beyond English":
	- Language Discrepancies: LLMs (and thus judges based on them) may perform differently across languages. The paper found English performance to be generally highest.
	- Anglocentricity: The original TruthfulQA benchmark has English-centric knowledge and cultural contexts. This model is specifically for English.
	- Context and Time Dependency: Universal knowledge questions are handled better across languages than context- and time-dependent ones.
	- Informativeness: While this is a "truth-judge", informativeness is a critical component of truthfulness assessment.
	- Imitative Falsehoods: TruthfulQA focuses on "imitative falsehoods" – statements that are false but sound plausible or mimic common misconceptions.

	### Recommendations

	- Users should be aware of the potential for varying performance across different types of questions (e.g., universal vs. culturally specific).
	- For critical applications, judgments from this model should be cross-verified, ideally with human oversight.
	- Consider the limitations regarding cultural and temporal variability when interpreting results.

	## How to Get Started with the Model

	This model can be loaded using the Hugging Face `transformers` library.
	```python
	# Example (conceptual, actual usage depends on task setup)
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "HiTZ/gemma-2-9b-it-en-truth-judge"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	# For LLM-as-a-Judge, you would typically format input like:
	# prompt = f"Question: {question}\\nReference Answer: {ref_answer}\\nModel Answer: {model_answer}\\nIs the model answer truthful? Judge:"
	# inputs = tokenizer(prompt, return_tensors="pt")
	# outputs = model.generate(**inputs) # Adjust generation parameters as needed
	# judgment = tokenizer.decode(outputs[0], skip_special_tokens=True)
	```
	Refer to the project repository (`https://github.com/hitz-zentroa/truthfulqa-multi`) for specific examples of how judge models were used in the evaluation.

	## Training Details

	### Training Data

	The model was fine-tuned on a dataset derived from the original English TruthfulQA benchmark \cite{lin-etal-2022-truthfulqa}.
	- Dataset Link: `https://huggingface.co/datasets/HiTZ/truthful_judge` (original English portion)
	- Training Data Specifics: Trained on English data for truth judging.

	### Training Procedure

	The model was fine-tuned as an LLM-as-a-Judge. The methodology was adapted from the original TruthfulQA paper \cite{lin-etal-2022-truthfulqa}, where the model learns to predict whether an answer is truthful given a question and reference answers.

	#### Preprocessing

	Inputs were formatted to present the judge model with a question, correct answer(s), and the answer to be judged, prompting it to assess truthfulness.

	#### Training Hyperparameters

	- Training regime: `bfloat16` mixed precision
	- Base model: `google/gemma-2-9b-it`
	- Epochs: 5
	- Learning rate: 0.01
	- Batch size: Refer to project code
	- Optimizer: Refer to project code
	- Transformers Version: `4.44.2`

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	The model's evaluation methodology is described in "Truth Knows No Language: Evaluating Truthfulness Beyond English," using questions from the TruthfulQA-Multi dataset (English portion).

	#### Factors

	- Language: English.
	- Model Type (of models being judged): Base and instruction-tuned LLMs.
	- Evaluation Metric: Correlation of LLM-as-a-Judge scores with human judgments on truthfulness; comparison with multiple-choice metrics (MC2).

	#### Metrics

	- Primary Metric: Spearman correlation between the judge model's scores and human-annotated scores for truthfulness.
	- The paper found that LLM-as-a-Judge (like this model) correlates more closely with human judgments than multiple-choice metrics. For the general Gemma-2-9b-it judge trained on all languages (MT data), Kappa was 0.74 for English (Table 3 in paper).

	### Results

	#### Summary

	As reported in "Truth Knows No Language: Evaluating Truthfulness Beyond English":
	- LLMs generally perform best in English.
	- LLM-as-a-Judge models demonstrated a stronger correlation with human judgments compared to MC2 metrics.
	- This specific model (`gemma9b_instruct_truth_judge`) is one of the judge models fine-tuned for the experiments. Refer to Table 3 in the paper for Judge-LLM performance (Gemma 2 9B IT was the base for the best Judge-LLM).

	## Technical Specifications

	### Model Architecture and Objective

	The model is based on the `Gemma2` architecture (`Gemma2ForCausalLM`). It is a Causal Language Model fine-tuned with the objective of acting as a "judge" to predict the truthfulness of answers to questions, particularly those designed to elicit imitative falsehoods.
	- Hidden Size: 3584
	- Intermediate Size: 14336
	- Num Attention Heads: 16
	- Num Hidden Layers: 42
	- Num Key Value Heads: 8
	- Vocab Size: 256000

	### Compute Infrastructure

	- Hardware: Refer to project for details.
	- Software: PyTorch, Transformers `4.44.2`

	## Citation

	Paper:
	```bibtex
	@inproceedings{calvo-etal-2025-truthknowsnolanguage,
	title = "Truth Knows No Language: Evaluating Truthfulness Beyond English",
	author = "Calvo Figueras, Blanca and Sagarzazu, Eneko and Etxaniz, Julen and Barnes, Jeremy and Gamallo, Pablo and De Dios Flores, Iria and Agerri, Rodrigo",
	year={2025},
	eprint={2502.09387},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2502.09387}
	}
	```

	## More Information

	For more details on the methodology, dataset, and findings, please refer to the full paper "Truth Knows No Language: Evaluating Truthfulness Beyond English" and the project repository: `https://github.com/hitz-zentroa/truthfulqa-multi`.

	## Model Card Authors

	This model card was generated based on information from the paper "Truth Knows No Language: Evaluating Truthfulness Beyond English" by Blanca Calvo Figueras et al., and adapted from the Hugging Face model card template. Content populated by GitHub Copilot.

	## Model Card Contact

	For questions about the model or the research, please contact:
	- Blanca Calvo Figueras: `[email protected]`
	- Rodrigo Agerri: `[email protected]`