|
|
--- |
|
|
library_name: transformers |
|
|
license: cc-by-nc-sa-4.0 |
|
|
pipeline_tag: text-ranking |
|
|
tags: |
|
|
- reranker |
|
|
- sequence-classification |
|
|
- qwen3 |
|
|
- multilingual |
|
|
- bfloat16 |
|
|
- 32k |
|
|
base_model: ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b |
|
|
model_type: qwen3 |
|
|
--- |
|
|
|
|
|
# Contextual AI Reranker v2 1B — **SequenceClassification (single-logit) Converted Model** |
|
|
|
|
|
This repository contains a **drop-in SequenceClassification** version of the original **ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b**. |
|
|
It exposes a **single logit per input** (one score) that is **numerically equivalent** to the original model’s last-token **`vocab_id=0`** logit (`next_logits[:, 0]`). That means you can use standard **text-classification/CrossEncoder** tooling for fast, simple reranking—without custom logits processors—while preserving the original scores and ranking order. |
|
|
|
|
|
> **What changed?** We copy the LM head’s **row 0** vector into a 1-logit classification head (`score.weight ← lm_head.weight[0]`), set bias to 0 (or the matching bias row if present), and keep tokenizer/padding behavior aligned with the original. Result: `SequenceClassification` output ≡ original `next_logits[:, 0]`. |
|
|
|
|
|
--- |
|
|
|
|
|
## Highlights |
|
|
|
|
|
* **Parity with the original**: The score from this model equals the original **ID=0** logit at the very last token position (use the same prompt template and left-padding). |
|
|
* **Frictionless integration**: Works out-of-the-box with **Sentence-Transformers CrossEncoder** and standard **Transformers** classification interfaces. |
|
|
* **Fast & memory-light**: Computes a single logit (`hidden_size × 1`) instead of a full vocabulary projection. |
|
|
* **Multilingual** and long-context (inherits capabilities from the base reranker). |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
* **Type**: Text Reranking (single-logit SequenceClassification) |
|
|
* **Base**: `ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b` (Qwen3 CausalLM) |
|
|
* **Languages**: 100+ (inherited) |
|
|
* **Params**: \~1B (inherited) |
|
|
* **Context Length**: up to 32K (inherited) |
|
|
* **Scoring definition**: single logit ≡ original `next_logits[:, 0]` |
|
|
|
|
|
--- |
|
|
|
|
|
## Input Formatting (keep this template) |
|
|
|
|
|
```text |
|
|
Check whether a given document contains information helpful to answer the query. |
|
|
<Document> {document} |
|
|
<Query> {query}{optional_instruction} ?? |
|
|
``` |
|
|
|
|
|
* Use **left padding** so the **last token** aligns across a batch. |
|
|
* If the tokenizer has no `pad_token`, set `pad_token = eos_token`. |
|
|
|
|
|
--- |
|
|
|
|
|
## Updated Usage |
|
|
|
|
|
Below are **drop-in** examples for the converted model. These mirror the original card’s behavior but through **SequenceClassification**. |
|
|
|
|
|
### Updated Sentence Transformers Usage (CrossEncoder) |
|
|
|
|
|
```python |
|
|
from sentence_transformers import CrossEncoder |
|
|
|
|
|
MODEL_ID = "sigridjineth/ctxl-rerank-v2-1b-seq-cls" # or local folder |
|
|
|
|
|
def format_prompts(query: str, instruction: str, docs: list[str]) -> list[str]: |
|
|
inst = f" {instruction}" if instruction else "" |
|
|
return [ |
|
|
"Check whether a given document contains information helpful to answer the query.\n" |
|
|
f"<Document> {d}\n" |
|
|
f"<Query> {query}{inst} ??" |
|
|
for d in docs |
|
|
] |
|
|
|
|
|
query = "Which is a domestic animal?" |
|
|
docs = ["Cats are pets.", "The moon is made of cheese.", "Dogs are loyal companions."] |
|
|
|
|
|
ce = CrossEncoder(MODEL_ID, max_length=8192) |
|
|
|
|
|
# Ensure original padding behavior |
|
|
if ce.tokenizer.pad_token is None: |
|
|
ce.tokenizer.pad_token = ce.tokenizer.eos_token |
|
|
ce.tokenizer.padding_side = "left" |
|
|
|
|
|
prompts = format_prompts(query, "", docs) |
|
|
scores = ce.predict(prompts) # one logit per doc (higher = more relevant) |
|
|
|
|
|
ranked = sorted(zip(scores, docs), key=lambda x: x[0], reverse=True) |
|
|
for s, d in ranked: |
|
|
print(f"{s:.4f} | {d}") |
|
|
``` |
|
|
|
|
|
### Updated Transformers Usage (SequenceClassification) |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
|
|
MODEL_ID = "sigridjineth/ctxl-rerank-v2-1b-seq-cls" # or local folder |
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32 |
|
|
|
|
|
def format_prompts(query: str, instruction: str, docs: list[str]) -> list[str]: |
|
|
inst = f" {instruction}" if instruction else "" |
|
|
return [ |
|
|
"Check whether a given document contains information helpful to answer the query.\n" |
|
|
f"<Document> {d}\n" |
|
|
f"<Query> {query}{inst} ??" |
|
|
for d in docs |
|
|
] |
|
|
|
|
|
tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True) |
|
|
if tok.pad_token is None: |
|
|
tok.pad_token = tok.eos_token |
|
|
tok.padding_side = "left" |
|
|
|
|
|
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID, torch_dtype=dtype).to(device).eval() |
|
|
|
|
|
query = "Which is a domestic animal?" |
|
|
docs = ["Cats are pets.", "The moon is made of cheese."] |
|
|
prompts = format_prompts(query, "", docs) |
|
|
|
|
|
enc = tok(prompts, return_tensors="pt", padding=True, truncation=True).to(device) |
|
|
with torch.no_grad(): |
|
|
logits = model(**enc).logits.squeeze(-1) # [batch] |
|
|
# Optional: exact parity rounding with original BF16 readout |
|
|
scores = logits.to(torch.bfloat16).float().cpu().tolist() |
|
|
|
|
|
ranked = sorted(zip(scores, docs), key=lambda x: x[0], reverse=True) |
|
|
for s, d in ranked: |
|
|
print(f"{s:.4f} | {d}") |
|
|
``` |
|
|
|
|
|
> **Note on parity**: Casting the output logit to **bf16 then back to float** matches the original card’s BF16 rounding step. |
|
|
|
|
|
--- |
|
|
|
|
|
## (Reference) Original Transformers Usage (CausalLM) |
|
|
|
|
|
If you prefer to call the original model directly, compute `next_logits[:, -1, 0]` as specified in the base card. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
BASE_ID = "ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b" |
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32 |
|
|
|
|
|
def format_prompts(q: str, inst: str, docs: list[str]) -> list[str]: |
|
|
inst = f" {inst}" if inst else "" |
|
|
return [ |
|
|
"Check whether a given document contains information helpful to answer the query.\n" |
|
|
f"<Document> {d}\n" |
|
|
f"<Query> {q}{inst} ??" |
|
|
for d in docs |
|
|
] |
|
|
|
|
|
tok = AutoTokenizer.from_pretrained(BASE_ID, use_fast=True) |
|
|
if tok.pad_token is None: |
|
|
tok.pad_token = tok.eos_token |
|
|
tok.padding_side = "left" |
|
|
|
|
|
lm = AutoModelForCausalLM.from_pretrained(BASE_ID, torch_dtype=dtype).to(device).eval() |
|
|
|
|
|
docs = ["Cats are pets.", "The moon is made of cheese."] |
|
|
prompts = format_prompts("Which is a domestic animal?", "", docs) |
|
|
enc = tok(prompts, return_tensors="pt", padding=True, truncation=True).to(device) |
|
|
|
|
|
with torch.no_grad(): |
|
|
out = lm(**enc).logits[:, -1, :] # [batch, vocab] |
|
|
scores = out[:, 0].to(torch.bfloat16).float().cpu().tolist() |
|
|
|
|
|
for s, d in sorted(zip(scores, docs), key=lambda x: x[0], reverse=True): |
|
|
print(f"{s:.4f} | {d}") |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Conversion Details |
|
|
|
|
|
* **Architecture**: `Qwen3ForSequenceClassification(num_labels=1)` |
|
|
* **Head initialization**: |
|
|
|
|
|
* `score.weight ← lm_head.weight[0]` (row for `vocab_id=0`) |
|
|
* `score.bias ← 0` (or the corresponding bias term if present in LM head) |
|
|
* **Tokenizer/Config**: |
|
|
|
|
|
* Ensure `pad_token` exists (`pad_token = eos_token` if missing) |
|
|
* Set `padding_side="left"` |
|
|
* Propagate `pad/eos/bos` IDs into the model `config` for correct batching |
|
|
* **Parity check**: |
|
|
|
|
|
* Verified that `SequenceClassification` logit ≡ original `next_logits[:, 0]` |
|
|
* Optional BF16 round-trip on the score for exact rounding parity |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use & Limitations |
|
|
|
|
|
* **Use**: Document reranking for search/QA/multilingual scenarios; batch scoring of `(query, document)` prompts. |
|
|
* **Not for**: Open-ended generation; the model emits a **single score** per input. |
|
|
* **License constraints**: Non-commercial & Share-Alike. If you redistribute derivatives, include attribution and the same license. |
|
|
* **Bias & safety**: Inherits all limitations and potential biases of the base model; evaluate before deployment. |
|
|
|
|
|
--- |
|
|
|
|
|
## Requirements |
|
|
|
|
|
* **Transformers** ≥ 4.51.0 |
|
|
* **PyTorch** with BF16 support recommended on GPU |
|
|
* Long inputs: set `max_length` accordingly (up to the inherited context window) |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this converted model, please cite the original work: |
|
|
|
|
|
```bibtex |
|
|
@misc{ctxl_rerank_v2_instruct_multilingual, |
|
|
title = {Contextual AI Reranker v2}, |
|
|
author = {George Halal and Sheshansh Agrawal and Bo Han and Arnav Palkhiwala}, |
|
|
year = {2025}, |
|
|
url = {https://contextual.ai/blog/rerank-v2} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
This repository follows the original **Creative Commons Attribution Non Commercial Share Alike 4.0 (CC-BY-NC-SA-4.0)** license. |
|
|
You **must** provide attribution, **may not** use it commercially, and **must** distribute derivatives under the same license. |
|
|
|
|
|
--- |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
All modeling, training, and evaluation credit goes to **Contextual AI** for the original `ctxl-rerank-v2` family. |
|
|
This repository provides a **compatibility conversion** to a single-logit `SequenceClassification` interface for easier integration and deployment. |
|
|
|