Improve model card for Hunyuan-MT: Add pipeline tag, license, paper abstract, and comprehensive content from GitHub

#14
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +369 -26
README.md CHANGED
@@ -1,7 +1,4 @@
1
  ---
2
- library_name: transformers
3
- tags:
4
- - translation
5
  language:
6
  - zh
7
  - en
@@ -39,8 +36,18 @@ language:
39
  - kk
40
  - mn
41
  - ug
 
 
 
42
  ---
43
 
 
 
 
 
 
 
 
44
 
45
  <p align="center">
46
  <img src="https://dscache.tencent-cloud.cn/upload/uploader/hunyuan-64b418fd052c033b228e04bc77bbc4b54fd7f5bc.png" width="400"/> <br>
@@ -49,38 +56,48 @@ language:
49
 
50
  <p align="center">
51
  🤗&nbsp;<a href="https://huggingface.co/collections/tencent/hunyuan-mt-68b42f76d473f82798882597"><b>Hugging Face</b></a>&nbsp;&nbsp;|&nbsp;&nbsp;
52
- 🤖&nbsp;<a href="https://modelscope.cn/collections/Hunyuan-MT-2ca6b8e1b4934f"><b>ModelScope</b></a>&nbsp;&nbsp;|&nbsp;&nbsp;
53
  </p>
54
 
55
  <p align="center">
56
- 🖥️&nbsp;<a href="https://hunyuan.tencent.com"><b>Official Website</b></a>&nbsp;&nbsp;|&nbsp;&nbsp;
57
- 🕹️&nbsp;<a href="https://hunyuan.tencent.com/modelSquare/home/list"><b>Demo</b></a>&nbsp;&nbsp;&nbsp;&nbsp;
58
  </p>
59
 
60
  <p align="center">
61
- <a href="https://github.com/Tencent-Hunyuan/Hunyuan-MT"><b>GITHUB</b></a>
 
62
  </p>
63
 
64
 
65
  ## Model Introduction
66
 
67
- The Hunyuan Translation Model comprises a translation model, Hunyuan-MT-7B, and an ensemble model, Hunyuan-MT-Chimera. The translation model is used to translate source text into the target language, while the ensemble model integrates multiple translation outputs to produce a higher-quality result. It primarily supports mutual translation among 33 languages, including five ethnic minority languages in China.
68
 
69
- ### Key Features and Advantages
70
 
71
  - In the WMT25 competition, the model achieved first place in 30 out of the 31 language categories it participated in.
72
  - Hunyuan-MT-7B achieves industry-leading performance among models of comparable scale
73
  - Hunyuan-MT-Chimera-7B is the industry’s first open-source translation ensemble model, elevating translation quality to a new level
74
- - A comprehensive training framework for translation models has been proposed, spanning from pretrain → cross-lingual pretraining (CPT) → supervised fine-tuning (SFT) → translation enhancement → ensemble refinement, achieving state-of-the-art (SOTA) results for models of similar size
75
 
76
  ## Related News
77
  * 2025.9.1 We have open-sourced **Hunyuan-MT-7B** , **Hunyuan-MT-Chimera-7B** on Hugging Face.
78
  <br>
79
 
80
 
 
 
 
 
 
 
 
 
 
81
  &nbsp;
82
 
83
- ## 模型链接
84
  | Model Name | Description | Download |
85
  | ----------- | ----------- |-----------
86
  | Hunyuan-MT-7B | Hunyuan 7B translation model |🤗 [Model](https://huggingface.co/tencent/Hunyuan-MT-7B)|
@@ -91,42 +108,35 @@ The Hunyuan Translation Model comprises a translation model, Hunyuan-MT-7B, and
91
  ## Prompts
92
 
93
  ### Prompt Template for ZH<=>XX Translation.
94
-
95
  ```
96
-
97
  把下面的文本翻译成<target_language>,不要额外解释。
98
 
99
  <source_text>
100
-
101
  ```
102
 
103
- ### Prompt Template for XX<=>XX Translation, excluding ZH<=>XX.
104
 
 
105
  ```
106
-
107
  Translate the following segment into <target_language>, without additional explanation.
108
 
109
  <source_text>
110
-
111
  ```
112
 
113
- ### Prompt Template for Hunyuan-MT-Chmeria-7B
114
 
115
  ```
116
-
117
  Analyze the following multiple <target_language> translations of the <source_language> segment surrounded in triple backticks and generate a single refined <target_language> translation. Only output the refined translation, do not explain.
118
 
119
  The <source_language> segment:
120
  ```<source_text>```
121
 
122
- The multiple <target_language> translations:
123
  1. ```<translated_text1>```
124
  2. ```<translated_text2>```
125
  3. ```<translated_text3>```
126
  4. ```<translated_text4>```
127
  5. ```<translated_text5>```
128
  6. ```<translated_text6>```
129
-
130
  ```
131
 
132
  &nbsp;
@@ -134,13 +144,13 @@ The multiple <target_language> translations:
134
  ### Use with transformers
135
  First, please install transformers, recommends v4.56.0
136
  ```SHELL
137
- pip install transformers==v4.56.0
138
  ```
139
 
140
- The following code snippet shows how to use the transformers library to load and apply the model.
141
-
142
  *!!! If you want to load fp8 model with transformers, you need to change the name"ignored_layers" in config.json to "ignore" and upgrade the compressed-tensors to compressed-tensors-0.11.0.*
143
 
 
 
144
  we use tencent/Hunyuan-MT-7B for example
145
 
146
  ```python
@@ -152,7 +162,9 @@ model_name_or_path = "tencent/Hunyuan-MT-7B"
152
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
153
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto") # You may want to use bfloat16 and/or move to GPU here
154
  messages = [
155
- {"role": "user", "content": "Translate the following segment into Chinese, without additional explanation.\n\nIt’s on the house."},
 
 
156
  ]
157
  tokenized_chat = tokenizer.apply_chat_template(
158
  messages,
@@ -176,6 +188,8 @@ We recommend using the following set of parameters for inference. Note that our
176
  }
177
  ```
178
 
 
 
179
  Supported languages:
180
  | Languages | Abbr. | Chinese Names |
181
  |-------------------|---------|-----------------|
@@ -219,6 +233,331 @@ Supported languages:
219
  | Cantonese | yue | 粤语 |
220
 
221
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
222
  Citing Hunyuan-MT:
223
 
224
  ```bibtex
@@ -231,4 +570,8 @@ Citing Hunyuan-MT:
231
  primaryClass={cs.CL},
232
  url={https://arxiv.org/abs/2509.05209},
233
  }
234
- ```
 
 
 
 
 
1
  ---
 
 
 
2
  language:
3
  - zh
4
  - en
 
36
  - kk
37
  - mn
38
  - ug
39
+ library_name: transformers
40
+ pipeline_tag: translation
41
+ license: apache-2.0
42
  ---
43
 
44
+ # Hunyuan-MT Technical Report
45
+
46
+ The model was presented in the paper [Hunyuan-MT Technical Report](https://arxiv.org/abs/2509.05209).
47
+
48
+ ## Paper Abstract
49
+
50
+ In this report, we introduce Hunyuan-MT-7B, our first open-source multilingual translation model, which supports bidirectional translation across 33 major languages and places a special emphasis on translation between Mandarin and several ethnic minority languages as well as dialects. Furthermore, to serve and address diverse translation scenarios and enhance model performance at test time, we introduce Hunyuan-MT-Chimera-7B, a translation model inspired by the slow thinking mode. This model integrates multiple outputs generated by the Hunyuan-MT-7B model under varying parameter settings, thereby achieving performance superior to that of conventional slow-thinking models based on Chain-of-Thought (CoT). The development of our models follows a holistic training process specifically engineered for multilingual translation, which begins with general and MT-oriented pre-training to build foundational capabilities, proceeds to Supervised Fine-Tuning (SFT) for task-specific adaptation, and culminates in advanced alignment through Reinforcement Learning (RL) and weak-to-strong RL. Through comprehensive experimentation, we demonstrate that both Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B significantly outperform all translation-specific models of comparable parameter size and most of the SOTA large models, particularly on the task of translation between Mandarin and minority languages as well as dialects. In the WMT2025 shared task (General Machine Translation), our models demonstrate state-of-the-art performance, ranking first in 30 out of 31 language pairs. This result highlights the robustness of our models across a diverse linguistic spectrum, encompassing high-resource languages such as Chinese, English, and Japanese, as well as low-resource languages including Czech, Marathi, Estonian, and Icelandic.
51
 
52
  <p align="center">
53
  <img src="https://dscache.tencent-cloud.cn/upload/uploader/hunyuan-64b418fd052c033b228e04bc77bbc4b54fd7f5bc.png" width="400"/> <br>
 
56
 
57
  <p align="center">
58
  🤗&nbsp;<a href="https://huggingface.co/collections/tencent/hunyuan-mt-68b42f76d473f82798882597"><b>Hugging Face</b></a>&nbsp;&nbsp;|&nbsp;&nbsp;
59
+ <img src="https://avatars.githubusercontent.com/u/109945100?s=200&v=4" width="16"/>&nbsp;<a href="https://modelscope.cn/collections/Hunyuan-MT-2ca6b8e1b4934f"><b>ModelScope</b></a>&nbsp;&nbsp;|&nbsp;&nbsp;
60
  </p>
61
 
62
  <p align="center">
63
+ 🖥️&nbsp;<a href="https://hunyuan.tencent.com" style="color: red;"><b>Official Website</b></a>&nbsp;&nbsp;|&nbsp;&nbsp;
64
+ 🕹️&nbsp;<a href="https://hunyuan.tencent.com/chat/HunyuanDefault?from=modelSquare&modelId=hunyuan-mt-7b"><b>Demo</b></a>&nbsp;&nbsp;&nbsp;&nbsp;
65
  </p>
66
 
67
  <p align="center">
68
+ <a href="https://github.com/Tencent-Hunyuan/Hunyuan-MT"><b>GITHUB</b></a>&nbsp;&nbsp;|&nbsp;&nbsp;
69
+ <a href="https://www.arxiv.org/pdf/2509.05209"><b>Technical Report</b> </a>
70
  </p>
71
 
72
 
73
  ## Model Introduction
74
 
75
+ The Hunyuan-MT comprises a translation model, Hunyuan-MT-7B, and an ensemble model, Hunyuan-MT-Chimera. The translation model is used to translate source text into the target language, while the ensemble model integrates multiple translation outputs to produce a higher-quality result. It primarily supports mutual translation among 33 languages, including five ethnic minority languages in China.
76
 
77
+ ## Key Features and Advantages
78
 
79
  - In the WMT25 competition, the model achieved first place in 30 out of the 31 language categories it participated in.
80
  - Hunyuan-MT-7B achieves industry-leading performance among models of comparable scale
81
  - Hunyuan-MT-Chimera-7B is the industry’s first open-source translation ensemble model, elevating translation quality to a new level
82
+ - A comprehensive training framework for translation models has been proposed, spanning from pretrain → continue pretraining (CPT) → supervised fine-tuning (SFT) → translation rl → ensemble rl, achieving state-of-the-art (SOTA) results for models of similar size
83
 
84
  ## Related News
85
  * 2025.9.1 We have open-sourced **Hunyuan-MT-7B** , **Hunyuan-MT-Chimera-7B** on Hugging Face.
86
  <br>
87
 
88
 
89
+ ## Performance
90
+
91
+ <div align='center'>
92
+ <img src="imgs/overall_performance.png" width = "80%" />
93
+ </div>
94
+ You can refer to our technical report for more experimental results and analysis.
95
+
96
+ <a href="https://www.arxiv.org/pdf/2509.05209"><b>Technical Report</b> </a>
97
+
98
  &nbsp;
99
 
100
+ ## Model Links
101
  | Model Name | Description | Download |
102
  | ----------- | ----------- |-----------
103
  | Hunyuan-MT-7B | Hunyuan 7B translation model |🤗 [Model](https://huggingface.co/tencent/Hunyuan-MT-7B)|
 
108
  ## Prompts
109
 
110
  ### Prompt Template for ZH<=>XX Translation.
 
111
  ```
 
112
  把下面的文本翻译成<target_language>,不要额外解释。
113
 
114
  <source_text>
 
115
  ```
116
 
 
117
 
118
+ ### Prompt Template for XX<=>XX Translation, excluding ZH<=>XX.
119
  ```
 
120
  Translate the following segment into <target_language>, without additional explanation.
121
 
122
  <source_text>
 
123
  ```
124
 
125
+ ### Prompt Template for Hunyuan-MT-Chimera-7B
126
 
127
  ```
 
128
  Analyze the following multiple <target_language> translations of the <source_language> segment surrounded in triple backticks and generate a single refined <target_language> translation. Only output the refined translation, do not explain.
129
 
130
  The <source_language> segment:
131
  ```<source_text>```
132
 
133
+ The multiple `<target_language>` translations:
134
  1. ```<translated_text1>```
135
  2. ```<translated_text2>```
136
  3. ```<translated_text3>```
137
  4. ```<translated_text4>```
138
  5. ```<translated_text5>```
139
  6. ```<translated_text6>```
 
140
  ```
141
 
142
  &nbsp;
 
144
  ### Use with transformers
145
  First, please install transformers, recommends v4.56.0
146
  ```SHELL
147
+ pip install transformers==4.56.0
148
  ```
149
 
 
 
150
  *!!! If you want to load fp8 model with transformers, you need to change the name"ignored_layers" in config.json to "ignore" and upgrade the compressed-tensors to compressed-tensors-0.11.0.*
151
 
152
+ The following code snippet shows how to use the transformers library to load and apply the model.
153
+
154
  we use tencent/Hunyuan-MT-7B for example
155
 
156
  ```python
 
162
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
163
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto") # You may want to use bfloat16 and/or move to GPU here
164
  messages = [
165
+ {"role": "user", "content": "Translate the following segment into Chinese, without additional explanation.
166
+
167
+ It’s on the house."},
168
  ]
169
  tokenized_chat = tokenizer.apply_chat_template(
170
  messages,
 
188
  }
189
  ```
190
 
191
+ &nbsp;
192
+
193
  Supported languages:
194
  | Languages | Abbr. | Chinese Names |
195
  |-------------------|---------|-----------------|
 
233
  | Cantonese | yue | 粤语 |
234
 
235
 
236
+ ### Training Data Format
237
+
238
+ If you need to fine-tune our Instruct model, we recommend processing the data into the following format.
239
+
240
+ ```python
241
+
242
+ messages = [
243
+ {"role": "system", "content": "You are a helpful assistant."},
244
+ {"role": "user", "content": "Why is seawater salty?" },
245
+ {"role": "assistant", "content": "Seawater is primarily saline due to dissolved salts and minerals. These substances come from the chemical materials in rocks and soil on the Earth's surface, which are carried into the ocean over time. When seawater evaporates, the water vapor leaves, but the salts and minerals remain, making the seawater saltier. Therefore, the salinity of seawater is determined by the amount of salts and minerals it contains."}
246
+ ]
247
+
248
+ from transformers import AutoTokenizer
249
+ tokenizer = AutoTokenizer.from_pretrained("your_tokenizer_path", trust_remote_code=True)
250
+ train_ids = tokenizer.apply_chat_template(messages)
251
+ ```
252
+
253
+ &nbsp;
254
+
255
+ ### Train with LLaMA-Factory
256
+
257
+ In the following chapter, we will introduce how to use `LLaMA-Factory` to fine-tune the `Hunyuan` model.
258
+
259
+ #### Prerequisites
260
+
261
+ Verify installation of the following dependencies:
262
+ - **LLaMA-Factory**: Follow [official installation guide](https://github.com/hiyouga/LLaMA-Factory)
263
+ - **DeepSpeed** (optional): Follow [official installation guide](https://github.com/deepspeedai/DeepSpeed#installation)
264
+ - **Transformer Library**: Use the companion branch (Hunyuan-submitted code is pending review)
265
+ ```
266
+ pip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca
267
+ ```
268
+
269
+ #### Data preparation
270
+
271
+ We need to prepare a custom dataset:
272
+ 1. Organize your data in `json` format and place it in the `data` directory in `LLaMA-Factory`. The current implementation uses the `sharegpt` dataset format, which requires the following structure:
273
+ ```
274
+ [
275
+ {
276
+ "messages": [
277
+ {
278
+ "role": "system",
279
+ "content": "System prompt (optional)"
280
+ },
281
+ {
282
+ "role": "user",
283
+ "content": "Human instruction"
284
+ },
285
+ {
286
+ "role": "assistant",
287
+ "content": "Model response"
288
+ }
289
+ ]
290
+ }
291
+ ]
292
+ ```
293
+ Refer to the [Data Format](#training-data-format) section mentioned earlier for details.
294
+
295
+ 2. Define your dataset in the data/dataset_info.json file using the following format:
296
+ ```
297
+ "dataset_name": {
298
+ "file_name": "dataset.json",
299
+ "formatting": "sharegpt",
300
+ "columns": {
301
+ "messages": "messages"
302
+ },
303
+ "tags": {
304
+ "role_tag": "role",
305
+ "content_tag": "content",
306
+ "user_tag": "user",
307
+ "assistant_tag": "assistant",
308
+ "system_tag": "system"
309
+ }
310
+ }
311
+ ```
312
+
313
+ #### Training execution
314
+
315
+ 1. Copy all files from the `llama_factory_support/example_configs` directory to the `example/hunyuan` directory in `LLaMA-Factory`.
316
+ 2. Modify the model path and dataset name in the configuration file `hunyuan_full.yaml`. Adjust other configurations as needed:
317
+ ```
318
+ ### model
319
+ model_name_or_path: [!!!add the model path here!!!]
320
+
321
+ ### dataset
322
+ dataset: [!!!add the dataset name here!!!]
323
+ ```
324
+ 3. Execute training commands:
325
+ *​​Single-node training​​
326
+ Note: Set the environment variable DISABLE_VERSION_CHECK to 1 to avoid version conflicts.
327
+ ```
328
+ export DISABLE_VERSION_CHECK=1
329
+ llamafactory-cli train examples/hunyuan/hunyuan_full.yaml
330
+ ```
331
+ *Multi-node training​​
332
+ Execute the following command on each node. Configure NNODES, NODE_RANK, MASTER_ADDR, and MASTER_PORT according to your environment:
333
+ ```
334
+ export DISABLE_VERSION_CHECK=1
335
+ FORCE_TORCHRUN=1 NNODES=${NNODES} NODE_RANK=${NODE_RANK} MASTER_ADDR=${MASTER_ADDR} MASTER_PORT=${MASTER_PORT} \
336
+ llamafactory-cli train examples/hunyuan/hunyuan_full.yaml
337
+ ```
338
+
339
+ &nbsp;
340
+
341
+
342
+ ## Quantization Compression
343
+ We used our own [AngelSlim](https://github.com/tencent/AngelSlim) compression tool to produce FP8 and INT4 quantization models. `AngelSlim` is a toolset dedicated to creating a more user-friendly, comprehensive and efficient model compression solution.
344
+
345
+ ### FP8 Quantization
346
+ We use FP8-static quantization, FP8 quantization adopts 8-bit floating point format, through a small amount of calibration data (without training) to pre-determine the quantization scale, the model weights and activation values will be converted to FP8 format, to improve the inference efficiency and reduce the deployment threshold. We you can use AngelSlim quantization, you can also directly download our quantization completed open source model to use [AngelSlim](https://huggingface.co/AngelSlim).
347
+
348
+
349
+ ## Deployment
350
+
351
+ For deployment, you can use frameworks such as **TensorRT-LLM**, **vLLM**, or **SGLang** to serve the model and create an OpenAI-compatible API endpoint.
352
+
353
+ image: https://hub.docker.com/r/hunyuaninfer/hunyuan-7B/tags
354
+
355
+
356
+ ### TensorRT-LLM
357
+
358
+ #### Docker Image
359
+
360
+ We provide a pre-built Docker image based on the latest version of TensorRT-LLM.
361
+
362
+ We use tencent/Hunyuan-7B-Instruct for example
363
+ - To get started:
364
+
365
+ ```
366
+ docker pull docker.cnb.cool/tencent/hunyuan/hunyuan-7b:hunyuan-7b-trtllm
367
+ ```
368
+ ```
369
+ docker run --privileged --user root --name hunyuanLLM_infer --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all hunyuaninfer/hunyuan-7b:hunyuan-7b-trtllm
370
+ ```
371
+
372
+ - Prepare Configuration file:
373
+
374
+ ```
375
+ cat >/path/to/extra-llm-api-config.yml <<EOF
376
+ use_cuda_graph: true
377
+ cuda_graph_padding_enabled: true
378
+ cuda_graph_batch_sizes:
379
+ - 1
380
+ - 2
381
+ - 4
382
+ - 8
383
+ - 16
384
+ - 32
385
+ print_iter_log: true
386
+ EOF
387
+ ```
388
+
389
+
390
+ - Start the API server:
391
+
392
+
393
+ ```
394
+ trtllm-serve \
395
+ /path/to/HunYuan-7b \
396
+ --host localhost \
397
+ --port 8000 \
398
+ --backend pytorch \
399
+ --max_batch_size 32 \
400
+ --max_num_tokens 16384 \
401
+ --tp_size 2 \
402
+ --kv_cache_free_gpu_memory_fraction 0.6 \
403
+ --trust_remote_code \
404
+ --extra_llm_api_options /path/to/extra-llm-api-config.yml
405
+ ```
406
+
407
+
408
+ ### vllm
409
+
410
+ #### Start
411
+ Please use vLLM version v0.10.0 or higher for inference.
412
+
413
+ First, please install transformers. We will merge it into the main branch later.
414
+ ```SHELL
415
+ pip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca
416
+ ```
417
+
418
+ We use tencent/Hunyuan-7B-Instruct for example
419
+ - Download Model file:
420
+ - Huggingface: will download automicly by vllm.
421
+ - ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-7B-Instruct`
422
+
423
+ - model download by huggingface:
424
+ ```shell
425
+ export MODEL_PATH=tencent/Hunyuan-7B-Instruct
426
+ ```
427
+
428
+ - model downloaded by modelscope:
429
+ ```shell
430
+ export MODEL_PATH=/root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-7B-Instruct/
431
+ ```
432
+
433
+ - Start the API server:
434
+
435
+ ```shell
436
+ python3 -m vllm.entrypoints.openai.api_server \
437
+ --host 0.0.0.0 \
438
+ --port 8000 \
439
+ --trust-remote-code \
440
+ --model ${MODEL_PATH} \
441
+ --tensor-parallel-size 1 \
442
+ --dtype bfloat16 \
443
+ --quantization experts_int8 \
444
+ --served-model-name hunyuan \
445
+ 2>&1 | tee log_server.txt
446
+ ```
447
+ - After running service script successfully, run the request script
448
+ ```shell
449
+ curl http://0.0.0.0:8000/v1/chat/completions -H 'Content-Type: application/json' -d '{
450
+ "model": "hunyuan",
451
+ "messages": [
452
+ {
453
+ "role": "system",
454
+ "content": [{"type": "text", "text": "You are a helpful assistant."}]
455
+ },
456
+ {
457
+ "role": "user",
458
+ "content": [{"type": "text", "text": "请按面积大小对四大洋进行排序,并给出面积最小的洋是哪一个?直接输出结果。"}]
459
+ }
460
+ ],
461
+ "max_tokens": 2048,
462
+ "temperature":0.7,
463
+ "top_p": 0.6,
464
+ "top_k": 20,
465
+ "repetition_penalty": 1.05,
466
+ "stop_token_ids": [127960]
467
+ }'
468
+ ```
469
+ #### Quantitative model deployment
470
+ This section describes the process of deploying a post-quantization model using vLLM.
471
+
472
+ Default server in BF16.
473
+
474
+ ##### Int8 quantitative model deployment
475
+ Deploying the Int8-weight-only version of the HunYuan-7B model only requires setting the environment variables
476
+
477
+ Next we start the Int8 service. Run:
478
+ ```shell
479
+ python3 -m vllm.entrypoints.openai.api_server \
480
+ --host 0.0.0.0 \
481
+ --port 8000 \
482
+ --trust-remote-code \
483
+ --model ${MODEL_PATH} \
484
+ --tensor-parallel-size 1 \
485
+ --dtype bfloat16 \
486
+ --served-model-name hunyuan \
487
+ --quantization experts_int8 \
488
+ 2>&1 | tee log_server.txt
489
+ ```
490
+
491
+
492
+ ##### Int4 quantitative model deployment
493
+ Deploying the Int4-weight-only version of the HunYuan-7B model only requires setting the environment variables , using the GPTQ method
494
+ ```shell
495
+ export MODEL_PATH=PATH_TO_INT4_MODEL
496
+ ```
497
+ Next we start the Int4 service. Run
498
+ ```shell
499
+ python3 -m vllm.entrypoints.openai.api_server \
500
+ --host 0.0.0.0 \
501
+ --port 8000 \
502
+ --trust-remote-code \
503
+ --model ${MODEL_PATH} \
504
+ --tensor-parallel-size 1 \
505
+ --dtype bfloat16 \
506
+ --served-model-name hunyuan \
507
+ --quantization gptq_marlin \
508
+ 2>&1 | tee log_server.txt
509
+ ```
510
+
511
+ ##### FP8 quantitative model deployment
512
+ Deploying the W8A8C8 version of the HunYuan-7B model only requires setting the environment variables
513
+
514
+
515
+ Next we start the FP8 service. Run
516
+ ```shell
517
+ python3 -m vllm.entrypoints.openai.api_server \
518
+ --host 0.0.0.0 \
519
+ --port 8000 \
520
+ --trust-remote-code \
521
+ --model ${MODEL_PATH} \
522
+ --tensor-parallel-size 1 \
523
+ --dtype bfloat16 \
524
+ --served-model-name hunyuan \
525
+ --kv-cache-dtype fp8 \
526
+ 2>&1 | tee log_server.txt
527
+ ```
528
+
529
+
530
+
531
+
532
+ ### SGLang
533
+
534
+ #### Docker Image
535
+
536
+ We also provide a pre-built Docker image based on the latest version of SGLang.
537
+
538
+ We use tencent/Hunyuan-7B-Instruct for example
539
+
540
+ To get started:
541
+
542
+ - Pull the Docker image
543
+
544
+ ```
545
+ docker pull lmsysorg/sglang:latest
546
+ ```
547
+
548
+ - Start the API server:
549
+
550
+ ```
551
+ docker run --entrypoint="python3" --gpus all \
552
+ --shm-size 32g \
553
+ -p 30000:30000 \
554
+ --ulimit nproc=10000 \
555
+ --privileged \
556
+ --ipc=host \
557
+ lmsysorg/sglang:latest \
558
+ -m sglang.launch_server --model-path hunyuan/huanyuan_7B --tp 4 --trust-remote-code --host 0.0.0.0 --port 30000
559
+ ```
560
+
561
  Citing Hunyuan-MT:
562
 
563
  ```bibtex
 
570
  primaryClass={cs.CL},
571
  url={https://arxiv.org/abs/2509.05209},
572
  }
573
+ ```
574
+
575
+ ## Contact Us
576
+
577
+ If you would like to leave a message for our R&D and product teams, Welcome to contact our open-source team . You can also contact us via email ([email protected]).