weiweiz1
/

Llama-3.2-1B-Instruct-NVFP4-W4A4-RTN

8-bit precision

compressed-tensors

Model card Files Files and versions

Llama-3.2-1B-Instruct-NVFP4-W4A4-RTN / test.py

weiweiz1's picture

Upload folder using huggingface_hub

7a87df3 verified 3 months ago

history blame contribute delete

561 Bytes

	if __name__ == '__main__':
	from vllm import LLM, SamplingParams

	prompts = [
	"Hello, my name is",
	]
	sampling_params = SamplingParams(temperature=0.6, top_p=0.95)
	model_name = "Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound"
	llm = LLM(model=model_name, tensor_parallel_size=1,enforce_eager=True)

	outputs = llm.generate(prompts, sampling_params)

	for output in outputs:
	prompt = output.prompt
	generated_text = output.outputs[0].text
	print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")