--- license: apache-2.0 language: - en base_model: - Qwen/Qwen2.5-Coder-14B-Instruct pipeline_tag: text-generation library_name: transformers tags: - code - codeqwen - Qwen-Coder - Qwen2.5-Coder-14B-Qiskit ---  # Qwen-2.5-Coder-14B-Qiskit ## Introduction Qwen-2.5-Coder-14B-Qiskit is a model specialized in Qiskit coding and based on code-specific Qwen large language models. Particularly, this model is based on Qwen2.5-Coder 14 billion parameters model. The model has been trained with **Qiskit version 2.0**, ensuring compatibility with its APIs and syntax. Main features compared to previous models specialized on Qiskit code: - Significant improvements in **code generation**, **code reasoning** and **code fixing**. - A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. - **Long-context Support** up to 128K tokens. The model **Qwen-2.5-Coder-14B-Qiskit** has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 14.7B - Number of Paramaters (Non-Embedding): 13.1B - Number of Layers: 48 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: Full 131,072 tokens ## Requirements Qwen-2.5-Coder-14B-Qiskit is compatible with the latest HuggingFace `transformers` and we advise you to use the latest version of `transformers`. With `transformers<4.37.0`, you will encounter the following error: ``` KeyError: 'qwen2' ``` ## Quickstart Here we provide a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate content. ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "Qiskit/Qwen-2.5-Coder-14B-Qiskit" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Generate a random quantum circuit with 5 qubits." messages = [ {"role": "system", "content": "You are Qiskit Code Assistant. You are a helpful coding assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ### Processing Long Texts The current `config.json` is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to `config.json` to enable YaRN: ```json { ..., "rope_scaling": { "factor": 4.0, "original_max_position_embeddings": 32768, "type": "yarn" } } ``` For deployment, we recommend using vLLM. Please refer to [Qwen Documentation](https://qwen.readthedocs.io/en/latest/deployment/vllm.html) for usage if you are not familar with vLLM. Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required. ### Comparison of Qiskit models across benchmarks
## Training Data - **Data Collection and Filtering:** Our code data is sourced from a combination of publicly available datasets (e.g., Code available on