Update README.md

470db40 verified 2 months ago

8.96 kB

	---
	library_name: transformers
	license: cc-by-nc-sa-4.0
	pipeline_tag: text-ranking
	tags:
	- reranker
	- sequence-classification
	- qwen3
	- multilingual
	- bfloat16
	- 32k
	base_model: ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b
	model_type: qwen3
	---

	# Contextual AI Reranker v2 1B — SequenceClassification (single-logit) Converted Model

	This repository contains a drop-in SequenceClassification version of the original ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b.
	It exposes a single logit per input (one score) that is numerically equivalent to the original model’s last-token `vocab_id=0` logit (`next_logits[:, 0]`). That means you can use standard text-classification/CrossEncoder tooling for fast, simple reranking—without custom logits processors—while preserving the original scores and ranking order.

	> What changed? We copy the LM head’s row 0 vector into a 1-logit classification head (`score.weight ← lm_head.weight[0]`), set bias to 0 (or the matching bias row if present), and keep tokenizer/padding behavior aligned with the original. Result: `SequenceClassification` output ≡ original `next_logits[:, 0]`.

	---

	## Highlights

	* Parity with the original: The score from this model equals the original ID=0 logit at the very last token position (use the same prompt template and left-padding).
	* Frictionless integration: Works out-of-the-box with Sentence-Transformers CrossEncoder and standard Transformers classification interfaces.
	* Fast & memory-light: Computes a single logit (`hidden_size × 1`) instead of a full vocabulary projection.
	* Multilingual and long-context (inherits capabilities from the base reranker).

	---

	## Model Overview

	* Type: Text Reranking (single-logit SequenceClassification)
	* Base: `ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b` (Qwen3 CausalLM)
	* Languages: 100+ (inherited)
	* Params: \~1B (inherited)
	* Context Length: up to 32K (inherited)
	* Scoring definition: single logit ≡ original `next_logits[:, 0]`

	---

	## Input Formatting (keep this template)

	```text
	Check whether a given document contains information helpful to answer the query.
	<Document> {document}
	<Query> {query}{optional_instruction} ??
	```

	* Use left padding so the last token aligns across a batch.
	* If the tokenizer has no `pad_token`, set `pad_token = eos_token`.

	---

	## Updated Usage

	Below are drop-in examples for the converted model. These mirror the original card’s behavior but through SequenceClassification.

	### Updated Sentence Transformers Usage (CrossEncoder)

	```python
	from sentence_transformers import CrossEncoder

	MODEL_ID = "sigridjineth/ctxl-rerank-v2-1b-seq-cls" # or local folder

	def format_prompts(query: str, instruction: str, docs: list[str]) -> list[str]:
	inst = f" {instruction}" if instruction else ""
	return [
	"Check whether a given document contains information helpful to answer the query.\n"
	f"<Document> {d}\n"
	f"<Query> {query}{inst} ??"
	for d in docs
	]

	query = "Which is a domestic animal?"
	docs = ["Cats are pets.", "The moon is made of cheese.", "Dogs are loyal companions."]

	ce = CrossEncoder(MODEL_ID, max_length=8192)

	# Ensure original padding behavior
	if ce.tokenizer.pad_token is None:
	ce.tokenizer.pad_token = ce.tokenizer.eos_token
	ce.tokenizer.padding_side = "left"

	prompts = format_prompts(query, "", docs)
	scores = ce.predict(prompts) # one logit per doc (higher = more relevant)

	ranked = sorted(zip(scores, docs), key=lambda x: x[0], reverse=True)
	for s, d in ranked:
	print(f"{s:.4f} \| {d}")
	```

	### Updated Transformers Usage (SequenceClassification)

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	MODEL_ID = "sigridjineth/ctxl-rerank-v2-1b-seq-cls" # or local folder
	device = "cuda" if torch.cuda.is_available() else "cpu"
	dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32

	def format_prompts(query: str, instruction: str, docs: list[str]) -> list[str]:
	inst = f" {instruction}" if instruction else ""
	return [
	"Check whether a given document contains information helpful to answer the query.\n"
	f"<Document> {d}\n"
	f"<Query> {query}{inst} ??"
	for d in docs
	]

	tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
	if tok.pad_token is None:
	tok.pad_token = tok.eos_token
	tok.padding_side = "left"

	model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID, torch_dtype=dtype).to(device).eval()

	query = "Which is a domestic animal?"
	docs = ["Cats are pets.", "The moon is made of cheese."]
	prompts = format_prompts(query, "", docs)

	enc = tok(prompts, return_tensors="pt", padding=True, truncation=True).to(device)
	with torch.no_grad():
	logits = model(**enc).logits.squeeze(-1) # [batch]
	# Optional: exact parity rounding with original BF16 readout
	scores = logits.to(torch.bfloat16).float().cpu().tolist()

	ranked = sorted(zip(scores, docs), key=lambda x: x[0], reverse=True)
	for s, d in ranked:
	print(f"{s:.4f} \| {d}")
	```

	> Note on parity: Casting the output logit to bf16 then back to float matches the original card’s BF16 rounding step.

	---

	## (Reference) Original Transformers Usage (CausalLM)

	If you prefer to call the original model directly, compute `next_logits[:, -1, 0]` as specified in the base card.

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	BASE_ID = "ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b"
	device = "cuda" if torch.cuda.is_available() else "cpu"
	dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32

	def format_prompts(q: str, inst: str, docs: list[str]) -> list[str]:
	inst = f" {inst}" if inst else ""
	return [
	"Check whether a given document contains information helpful to answer the query.\n"
	f"<Document> {d}\n"
	f"<Query> {q}{inst} ??"
	for d in docs
	]

	tok = AutoTokenizer.from_pretrained(BASE_ID, use_fast=True)
	if tok.pad_token is None:
	tok.pad_token = tok.eos_token
	tok.padding_side = "left"

	lm = AutoModelForCausalLM.from_pretrained(BASE_ID, torch_dtype=dtype).to(device).eval()

	docs = ["Cats are pets.", "The moon is made of cheese."]
	prompts = format_prompts("Which is a domestic animal?", "", docs)
	enc = tok(prompts, return_tensors="pt", padding=True, truncation=True).to(device)

	with torch.no_grad():
	out = lm(**enc).logits[:, -1, :] # [batch, vocab]
	scores = out[:, 0].to(torch.bfloat16).float().cpu().tolist()

	for s, d in sorted(zip(scores, docs), key=lambda x: x[0], reverse=True):
	print(f"{s:.4f} \| {d}")
	```

	---

	## Conversion Details

	* Architecture: `Qwen3ForSequenceClassification(num_labels=1)`
	* Head initialization:

	* `score.weight ← lm_head.weight[0]` (row for `vocab_id=0`)
	* `score.bias ← 0` (or the corresponding bias term if present in LM head)
	* Tokenizer/Config:

	* Ensure `pad_token` exists (`pad_token = eos_token` if missing)
	* Set `padding_side="left"`
	* Propagate `pad/eos/bos` IDs into the model `config` for correct batching
	* Parity check:

	* Verified that `SequenceClassification` logit ≡ original `next_logits[:, 0]`
	* Optional BF16 round-trip on the score for exact rounding parity

	---

	## Intended Use & Limitations

	* Use: Document reranking for search/QA/multilingual scenarios; batch scoring of `(query, document)` prompts.
	* Not for: Open-ended generation; the model emits a single score per input.
	* License constraints: Non-commercial & Share-Alike. If you redistribute derivatives, include attribution and the same license.
	* Bias & safety: Inherits all limitations and potential biases of the base model; evaluate before deployment.

	---

	## Requirements

	* Transformers ≥ 4.51.0
	* PyTorch with BF16 support recommended on GPU
	* Long inputs: set `max_length` accordingly (up to the inherited context window)

	---

	## Citation

	If you use this converted model, please cite the original work:

	```bibtex
	@misc{ctxl_rerank_v2_instruct_multilingual,
	title = {Contextual AI Reranker v2},
	author = {George Halal and Sheshansh Agrawal and Bo Han and Arnav Palkhiwala},
	year = {2025},
	url = {https://contextual.ai/blog/rerank-v2}
	}
	```

	---

	## License

	This repository follows the original Creative Commons Attribution Non Commercial Share Alike 4.0 (CC-BY-NC-SA-4.0) license.
	You must provide attribution, may not use it commercially, and must distribute derivatives under the same license.

	---

	## Acknowledgements

	All modeling, training, and evaluation credit goes to Contextual AI for the original `ctxl-rerank-v2` family.
	This repository provides a compatibility conversion to a single-logit `SequenceClassification` interface for easier integration and deployment.