wavespeed
/

MiniCPM-V-4_5-abliterated-int4

4-bit precision

Model card Files Files and versions

MiniCPM-V-4_5-abliterated-int4 / README.md

chengzeyi's picture

Upload folder using huggingface_hub

e54957d verified 15 days ago

|

history blame contribute delete

1.94 kB

	# MiniCPM-V-4.5-abliterated-int4

	This is a 4-bit quantized version of [huihui-ai/Huihui-MiniCPM-V-4_5-abliterated](https://huggingface.co/huihui-ai/Huihui-MiniCPM-V-4_5-abliterated) using bitsandbytes NF4 quantization.

	## Model Details

	- Base Model: huihui-ai/Huihui-MiniCPM-V-4_5-abliterated
	- Quantization: 4-bit (NF4) using bitsandbytes
	- Model Size: ~6.4 GB (85.8% reduction from original 45.28 GB)
	- Compute dtype: float16
	- Double quantization: Disabled for better performance

	## Quantization Configuration

	```json
	{
	"load_in_4bit": true,
	"bnb_4bit_compute_dtype": "float16",
	"bnb_4bit_quant_type": "nf4",
	"bnb_4bit_use_double_quant": false,
	"llm_int8_skip_modules": ["out_proj", "kv_proj", "lm_head"],
	"quant_method": "bitsandbytes"
	}
	```

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model = AutoModelForCausalLM.from_pretrained(
	"wavespeed/MiniCPM-V-4_5-abliterated-int4",
	device_map="auto",
	trust_remote_code=True,
	torch_dtype=torch.float16
	)

	tokenizer = AutoTokenizer.from_pretrained(
	"wavespeed/MiniCPM-V-4_5-abliterated-int4",
	trust_remote_code=True
	)
	```

	## Requirements

	- transformers
	- bitsandbytes
	- torch
	- accelerate

	## Note on File Size

	The model files appear large (~6.4 GB) despite being 4-bit quantized. This is expected behavior for bitsandbytes quantization, which stores weights in a format that enables efficient on-the-fly dequantization during inference. The actual memory usage during runtime will be significantly lower than the file size suggests.

	## License

	Same as the original model - please refer to the base model's license.

	## Acknowledgments

	- Original model by [huihui-ai](https://huggingface.co/huihui-ai)
	- Quantization approach inspired by [openbmb/MiniCPM-V-4_5-int4](https://huggingface.co/openbmb/MiniCPM-V-4_5-int4)