Mistral-7B - Q4_K_M GGUF

This is a quantized GGUF version of mistralai/Mistral-7B-v0.3, quantized to Q4_K_M using llama.cpp's quantization tool.

Model Info

  • Base model: Mistral-7B-v0.3
  • Quantization: Q4_K_M
  • Format: GGUF (compatible with llama.cpp and llama-cpp-python)
  • Size: ~4.2 GB (approximate, may vary)
  • Use case: General-purpose language generation, chatbots, assistants, and instruction-following tasks optimized for CPU inference

📦 Files

  • Mistral-7B-v0.3-Q4_K_M.gguf: The quantized model

How to Use (with llama-cpp-python)

from llama_cpp import Llama

llm = Llama(model_path="Mistral-7B-Q4_K_M.gguf")
output = llm("Explain quantum computing in simple terms.", max_tokens=200)
print(output)

Recommended Settings

  • Context size: 4096 tokens (depending on your llama.cpp version)
  • Hardware: Optimized for CPU (AVX2 or better recommended); also runs efficiently on GPU with llama.cpp compiled with CUDA or Metal support

Credits

Downloads last month
7
GGUF
Model size
7.25B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Arivukkarasu/Mistral-7B-v0.3-GGUF

Quantized
(69)
this model