wanzhenchn
/

Qwen2.5-VL-7B-Instruct-gptqmodel-int8

Image-Text-to-Text

text-generation-inference

8-bit precision

Model card Files Files and versions

Qwen2.5-VL-7B-Instruct-gptqmodel-int8 / README.md

wanzhenchn's picture

Update README.md

583dacf verified 6 months ago

|

history blame contribute delete

2.57 kB

	---
	license: mit
	language:
	- en
	- zh
	base_model:
	- Qwen/Qwen2.5-VL-7B-Instruct
	pipeline_tag: image-text-to-text
	library_name: transformers
	tags:
	- text-generation-inference
	---

	# Qwen2.5-VL-7B-Instruct-gptqmodel-int8

	It is a GPTQ-INT8 quantized [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) with [GPTQModel](https://github.com/ModelCloud/GPTQModel) toolkit.

	## How to quantize

	### Install

	```bash
	# Python 3.10.x or above
	pip3 install -v "gptqmodel>=2.2.0" --no-build-isolation

	```

	### Quantize

	```bash
	python3 gptqmodel_quantize.py /path/to/Qwen2.5-VL-7B-Instruct/ /path/to/Qwen2.5-VL-7B-Instruct-gptqmodel-int8 8

	```

	```python
	# gptqmodel_quantize.py

	import fire
	from datasets import load_dataset

	from gptqmodel import GPTQModel, QuantizeConfig
	from gptqmodel.models.definitions.base_qwen2_vl import BaseQwen2VLGPTQ

	os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
	os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
	os.environ["PYTHONUTF8"]="1"

	def format_qwen2_vl_dataset(image, assistant):
	return [
	{
	"role": "user",
	"content": [
	{"type": "image", "image": image},
	{"type": "text", "text": "generate a caption for this image"},
	],
	},
	{"role": "assistant", "content": assistant},
	]


	def prepare_dataset(format_func, n_sample: int = 20) -> list[list[dict]]:
	from datasets import load_dataset

	dataset = load_dataset(
	"laion/220k-GPT4Vision-captions-from-LIVIS", split=f"train[:{n_sample}]"
	)
	return [
	format_func(sample["url"], sample["caption"])
	for sample in dataset
	]


	def get_calib_dataset(model):
	if isinstance(model, BaseQwen2VLGPTQ):
	return prepare_dataset(format_qwen2_vl_dataset, n_sample=256)
	raise NotImplementedError(f"Unsupported MODEL: {model.__class__}")


	def quantize(model_path: str,
	output_path: str,
	bit: int):
	quant_config = QuantizeConfig(bits=bit, group_size=128)

	model = GPTQModel.load(model_path, quant_config)
	calibration_dataset = get_calib_dataset(model)

	# increase `batch_size` to match gpu/vram specs to speed up quantization
	model.quantize(calibration_dataset, batch_size=8)

	model.save(output_path)

	# test post-quant inference
	model = GPTQModel.load(output_path)
	result = model.generate("Uncovering deep insights begins with")[0] # tokens
	print(model.tokenizer.decode(result)) # string output


	if __name__ == "__main__":
	fire.Fire(quantize)

	```