WeDLM-8B

WeDLM-8B is a diffusion language model that performs parallel decoding under standard causal attention, initialized from Qwen3-8B.

This is the base (pretrained) version. For the instruction-tuned version, see WeDLM-8B-Instruct.

📄 Paper (Coming Soon) | 🌐 Project Page | 💻 GitHub

Model Details

Attribute Value
Initialized From Qwen3-8B
Parameters 8B
Context Length 32,768

Quick Start (Recommended)

For fast inference, use the wedlm engine:

pip install git+https://github.com/tencent/WeDLM.git
from wedlm import LLM, SamplingParams

llm = LLM(model="tencent/WeDLM-8B")

prompt = "The theory of relativity states that"
outputs = llm.generate([prompt], SamplingParams(max_tokens=256))

print(outputs[0]["text"])

HuggingFace Transformers

For training or simple forward passes, you can load via Transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-8B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "tencent/WeDLM-8B", 
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)

inputs = tokenizer("The theory of relativity", return_tensors="pt").to(model.device)
outputs = model(**inputs)

⚠️ Note: The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the wedlm engine above.

Performance

Benchmark Qwen3-8B WeDLM-8B
ARC-C (0-shot) 92.66 92.92
GSM8K (3-shot) 85.97 90.20
MATH (4-shot) 50.80 53.60
HumanEval (4-shot) 68.90 75.00
MMLU (5-shot) 74.03 75.46
Average 72.61 74.72

Citation (Coming soon)

License

Apache 2.0

Downloads last month
252
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tencent/WeDLM-8B-Base

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(964)
this model

Collection including tencent/WeDLM-8B-Base