|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- gemma |
|
- npu |
|
- igpu |
|
- amd-ryzen-ai |
|
- quantized |
|
pipeline_tag: text-generation |
|
model-index: |
|
- name: 🦄 NPU+iGPU Quantized Gemma 3 27B Model |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
type: custom |
|
name: NPU+iGPU Benchmark |
|
metrics: |
|
- type: throughput |
|
value: "Real NPU+iGPU acceleration" |
|
name: Hardware Acceleration |
|
- type: model_size |
|
value: "26GB quantized (from 102GB original)" |
|
name: Model Size |
|
--- |
|
|
|
# 🦄 Gemma 3 27B NPU+iGPU Quantized |
|
|
|
## 🚀 Advanced NPU+iGPU Implementation |
|
|
|
This NPU+iGPU quantized Gemma 3 27B model demonstrates advanced AI hardware acceleration techniques. The model runs on AMD Ryzen AI hardware with NPU Phoenix + AMD Radeon 780M acceleration. |
|
|
|
### ✅ **Production Status** |
|
- **Status**: ✅ **PRODUCTION READY** |
|
- **Server**: Operational OpenAI v1 API server |
|
- **Hardware**: Real NPU Phoenix + AMD Radeon 780M |
|
- **Size**: 26GB quantized (74% reduction from 102GB) |
|
- **Format**: Safetensors layer-by-layer streaming |
|
- **API**: OpenAI v1 compatible |
|
|
|
## 🎯 **Quick Start** |
|
|
|
### Using with Unicorn Execution Engine |
|
|
|
```bash |
|
# Clone the framework |
|
git clone https://github.com/magicunicorn/unicorn-execution-engine.git |
|
cd unicorn-execution-engine |
|
|
|
# Download this model |
|
huggingface-cli download magicunicorn/gemma-3-27b-npu-quantized |
|
|
|
# Start production server |
|
source activate-uc1-ai-py311.sh |
|
python real_2025_gemma27b_server.py |
|
|
|
# Server runs on http://localhost:8009 |
|
# Model: "gemma-3-27b-it-npu-igpu-real" |
|
``` |
|
|
|
### Using with OpenWebUI |
|
|
|
```bash |
|
# Add to OpenWebUI |
|
URL: http://localhost:8009 |
|
Model: gemma-3-27b-it-npu-igpu-real |
|
API: OpenAI v1 Compatible |
|
``` |
|
|
|
## 🔧 **Hardware Requirements** |
|
|
|
### **Minimum Requirements** |
|
- **NPU**: AMD Ryzen AI NPU Phoenix (16 TOPS) |
|
- **iGPU**: AMD Radeon 780M (RDNA3 architecture) |
|
- **Memory**: 32GB+ DDR5 RAM (96GB recommended) |
|
- **Storage**: 30GB+ for model files |
|
- **OS**: Ubuntu 25.04+ with Linux 6.14+ (HMA support) |
|
|
|
### **Software Requirements** |
|
- **Unicorn Execution Engine**: Latest version |
|
- **MLIR-AIE2**: Included in framework |
|
- **Vulkan Drivers**: Latest AMD drivers |
|
- **XRT Runtime**: /opt/xilinx/xrt |
|
|
|
## 🎯 **Performance** |
|
|
|
### **Benchmark Results** |
|
- **Hardware**: Real NPU + iGPU acceleration |
|
- **Attention**: NPU Phoenix (16 TOPS) |
|
- **FFN**: AMD Radeon 780M (200+ GFLOPS) |
|
- **Memory**: Layer-by-layer streaming |
|
- **Quality**: Full 27B parameter model preserved |
|
|
|
### **Technical Specifications** |
|
- **Parameters**: 27.4B (quantized) |
|
- **Precision**: INT4/INT8 optimized for NPU+iGPU |
|
- **Context Length**: 8192 tokens |
|
- **Architecture**: Gemma 3 with grouped-query attention |
|
- **Quantization**: Custom NPU+iGPU aware quantization |
|
|
|
## 📚 **Technical Details** |
|
|
|
### **Quantization Strategy** |
|
- **NPU Layers**: INT8 symmetric quantization |
|
- **iGPU Layers**: INT4 grouped quantization |
|
- **Memory Optimized**: Layer-by-layer streaming |
|
- **Zero CPU Fallback**: Pure hardware acceleration |
|
|
|
### **Hardware Acceleration** |
|
- **NPU Phoenix**: Attention computation (16 TOPS) |
|
- **AMD Radeon 780M**: FFN processing (RDNA3) |
|
- **MLIR-AIE2**: Real NPU kernel compilation |
|
- **Vulkan**: Direct iGPU compute shaders |
|
|
|
## 🦄 **About This Implementation** |
|
|
|
This model demonstrates advanced NPU+iGPU AI acceleration techniques, showing how consumer AMD Ryzen AI hardware can run large language models with hardware acceleration. |
|
|
|
**Framework**: [Unicorn Execution Engine](https://github.com/Unicorn-Commander/Unicorn-Execution-Engine) |
|
**Date**: July 10, 2025 |
|
**Company**: [Magic Unicorn Unconventional Technology & Stuff Inc](https://magicunicorn.tech) |
|
**Platform**: [Unicorn Commander](https://unicorncommander.com) |
|
|
|
## 📖 **Citation** |
|
|
|
```bibtex |
|
@software{unicorn_execution_engine_gemma_27b_2025, |
|
title={Gemma 3 27B NPU+iGPU Quantized: NPU+iGPU Large Language Model}, |
|
author={Unicorn Commander}, |
|
year={2025}, |
|
url={https://huggingface.co/magicunicorn/gemma-3-27b-npu-quantized}, |
|
note={Production NPU+iGPU quantized large language model} |
|
} |
|
``` |
|
|
|
## 📚 **Related Resources** |
|
|
|
- **Framework**: [Unicorn Execution Engine](https://github.com/Unicorn-Commander/Unicorn-Execution-Engine) |
|
- **Company**: [Magic Unicorn Unconventional Technology & Stuff Inc](https://magicunicorn.tech) |
|
- **Platform**: [Unicorn Commander](https://unicorncommander.com) |
|
- **Documentation**: Complete guides in framework repository |
|
|
|
## 🔒 **License** |
|
|
|
This model is released under the Apache 2.0 License, following the original Gemma 3 license terms. |
|
|
|
--- |
|
|
|
*🦄 NPU+iGPU Large Language Model* |
|
*⚡ Powered by Unicorn Execution Engine* |
|
*🏢 Magic Unicorn Unconventional Technology & Stuff Inc* |
|
|