--- license: mit language: - en base_model: - Qwen/Qwen2.5-7B pipeline_tag: text-generation library_name: transformers tags: - chat --- # Qwen-2.5-7B-ConsistentChat ## 1. Introduction **Qwen-2.5-7B-ConsistentChat** is a 7B instruction-tuned chat model focused on *multi-turn consistency*. It is fine-tuned from the **Qwen/Qwen2.5-7B** base model on the **ConsistentChat** dataset, which is built with a *skeleton-guided* pipeline that explicitly models human conversational intent to reduce topic drift and improve goal completion in long dialogues. The dataset contains \~15K multi-turn conversations and \~224K utterances. Compared with generic SFT data, ConsistentChat emphasizes cross-turn consistency: it first models one of nine conversation intent trajectories, then generates a query “skeleton,” and finally fills responses, leading to substantially better consistency and task success on Light, TopDial, and MT-Eval benchmarks. **This repo contains the instruction-tuned 7B ConsistentChat model**, with the following base specs inherited from Qwen2.5-7B: * Type: Causal Language Model * Training Stage: Pretraining + Supervised Fine-Tuning (this repo) * Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias * Parameters: 7.61B (6.53B non-embedding) * Layers: 28 * Attention Heads (GQA): 28 for Q, 4 for KV * Context Length: up to 131,072 tokens --- ## 2. What makes “ConsistentChat” different? * **Skeleton-Guided Multi-Turn Synthesis.** Conversations are generated by first modeling human intent and information flow, then building a query skeleton, and finally generating responses. This reduces topic drift across turns. * **Nine intent trajectories.** The dataset covers nine common interaction patterns (summarizing from real conversational dataset, including problem-solving, educational tutoring), each with curated information flow rules that enforce global coherence. * **Empirically better consistency & success.** Fine-tuning with ConsistentChat yields **20–30%** consistency gains and up to **+15%** task-success improvements on multi-turn benchmarks vs. common SFT datasets(ShareGPT, ChatAlpaca, UltraChat, LmsysChat...). Representative MT-Eval (judge: Qwen-2.5-72B-Instruct) results: * **Qwen-2.5-7B-ConsistentChat:** ST avg **8.07**, MT avg **8.38** * Outperforms variants trained on ShareGPT / UltraChat / LMSYS-Chat for both single-turn and multi-turn settings. The improvements in commonsense tasks are more pronounced, with particularly strong gains in reasoning (↑ 30.4%) and coding (↑ 33.7%). --- ## 3. Requirements We recommend the latest **transformers**. Using `transformers < 4.37.0` will raise: ``` KeyError: 'qwen2' ``` --- ## 4. Quickstart Below shows how to load the tokenizer and model and chat via `apply_chat_template`. ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "jiawei-ucas/Qwen-2.5-7B-ConsistentChat" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Plan a weekend trip to Kyoto, and remember I’m vegetarian." messages = [ {"role": "user", "content": prompt}, ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ### vLLM We recommend using `vLLM` for more efficient inference. ```bash pip install vllm==0.8.5.post1 vllm serve /path/to/Qwen-2.5-7B-ConsistentChat --port 8080 --served-model-name Qwen-2.5-7B-ConsistentChat ``` **API endpoint** By default, the OpenAI-compatible server is exposed at: http://\:8080/v1 **Use with [Open-WebUI](https://github.com/open-webui/open-webui) (recommended for visual chat)** 1. Launch Open-WebUI. 2. Go to **Settings → Connections / Providers → Add OpenAI-Compatible**. 3. Set **Base URL** to `http://:8080/v1`. 4. Set **API Key** to any non-empty string (e.g., `sk-local`). You can also test via cURL: ```bash curl http://:8080/v1/chat/completions \ -H "Authorization: Bearer sk-local" \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen-2.5-7B-ConsistentChat", "messages": [ {"role": "user", "content": "Say hello in one sentence."} ], "max_tokens": 128 }' ``` --- ## 5. Intended Use * **Use cases:** Assistants requiring stable behavior across many turns (planning, tutoring, troubleshooting, role-play with consistent persona or constraints). * **Out-of-scope:** Safety-critical advice, legal/medical counsel, or contexts where factual guarantees and up-to-date knowledge are required. --- ## Citation If you find our work helpful, feel free to give us a cite. ```bibtex @misc{chen2025consistentchat, title={ConsistentChat: Building Skeleton-Guided Consistent Dialogues for Large Language Models from Scratch}, author={Jiawei Chen and Xinyan Guan and Qianhao Yuan and Guozhao Mo and Weixiang Zhou and Yaojie Lu and Hongyu Lin and Ben He and Le Sun and Xianpei Han}, year={2025}, eprint={2506.03558}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.03558}, } ```