Update README.md

ce7f797 verified 3 months ago

4.73 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	tags:
	- gemma
	- npu
	- igpu
	- amd-ryzen-ai
	- quantized
	pipeline_tag: text-generation
	model-index:
	- name: 🦄 NPU+iGPU Quantized Gemma 3 27B Model
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	type: custom
	name: NPU+iGPU Benchmark
	metrics:
	- type: throughput
	value: "Real NPU+iGPU acceleration"
	name: Hardware Acceleration
	- type: model_size
	value: "26GB quantized (from 102GB original)"
	name: Model Size
	---

	# 🦄 Gemma 3 27B NPU+iGPU Quantized

	## 🚀 Advanced NPU+iGPU Implementation

	This NPU+iGPU quantized Gemma 3 27B model demonstrates advanced AI hardware acceleration techniques. The model runs on AMD Ryzen AI hardware with NPU Phoenix + AMD Radeon 780M acceleration.

	### ✅ Production Status
	- Status: ✅ PRODUCTION READY
	- Server: Operational OpenAI v1 API server
	- Hardware: Real NPU Phoenix + AMD Radeon 780M
	- Size: 26GB quantized (74% reduction from 102GB)
	- Format: Safetensors layer-by-layer streaming
	- API: OpenAI v1 compatible

	## 🎯 Quick Start

	### Using with Unicorn Execution Engine

	```bash
	# Clone the framework
	git clone https://github.com/magicunicorn/unicorn-execution-engine.git
	cd unicorn-execution-engine

	# Download this model
	huggingface-cli download magicunicorn/gemma-3-27b-npu-quantized

	# Start production server
	source activate-uc1-ai-py311.sh
	python real_2025_gemma27b_server.py

	# Server runs on http://localhost:8009
	# Model: "gemma-3-27b-it-npu-igpu-real"
	```

	### Using with OpenWebUI

	```bash
	# Add to OpenWebUI
	URL: http://localhost:8009
	Model: gemma-3-27b-it-npu-igpu-real
	API: OpenAI v1 Compatible
	```

	## 🔧 Hardware Requirements

	### Minimum Requirements
	- NPU: AMD Ryzen AI NPU Phoenix (16 TOPS)
	- iGPU: AMD Radeon 780M (RDNA3 architecture)
	- Memory: 32GB+ DDR5 RAM (96GB recommended)
	- Storage: 30GB+ for model files
	- OS: Ubuntu 25.04+ with Linux 6.14+ (HMA support)

	### Software Requirements
	- Unicorn Execution Engine: Latest version
	- MLIR-AIE2: Included in framework
	- Vulkan Drivers: Latest AMD drivers
	- XRT Runtime: /opt/xilinx/xrt

	## 🎯 Performance

	### Benchmark Results
	- Hardware: Real NPU + iGPU acceleration
	- Attention: NPU Phoenix (16 TOPS)
	- FFN: AMD Radeon 780M (200+ GFLOPS)
	- Memory: Layer-by-layer streaming
	- Quality: Full 27B parameter model preserved

	### Technical Specifications
	- Parameters: 27.4B (quantized)
	- Precision: INT4/INT8 optimized for NPU+iGPU
	- Context Length: 8192 tokens
	- Architecture: Gemma 3 with grouped-query attention
	- Quantization: Custom NPU+iGPU aware quantization

	## 📚 Technical Details

	### Quantization Strategy
	- NPU Layers: INT8 symmetric quantization
	- iGPU Layers: INT4 grouped quantization
	- Memory Optimized: Layer-by-layer streaming
	- Zero CPU Fallback: Pure hardware acceleration

	### Hardware Acceleration
	- NPU Phoenix: Attention computation (16 TOPS)
	- AMD Radeon 780M: FFN processing (RDNA3)
	- MLIR-AIE2: Real NPU kernel compilation
	- Vulkan: Direct iGPU compute shaders

	## 🦄 About This Implementation

	This model demonstrates advanced NPU+iGPU AI acceleration techniques, showing how consumer AMD Ryzen AI hardware can run large language models with hardware acceleration.

	Framework: [Unicorn Execution Engine](https://github.com/Unicorn-Commander/Unicorn-Execution-Engine)
	Date: July 10, 2025
	Company: [Magic Unicorn Unconventional Technology & Stuff Inc](https://magicunicorn.tech)
	Platform: [Unicorn Commander](https://unicorncommander.com)

	## 📖 Citation

	```bibtex
	@software{unicorn_execution_engine_gemma_27b_2025,
	title={Gemma 3 27B NPU+iGPU Quantized: NPU+iGPU Large Language Model},
	author={Unicorn Commander},
	year={2025},
	url={https://huggingface.co/magicunicorn/gemma-3-27b-npu-quantized},
	note={Production NPU+iGPU quantized large language model}
	}
	```

	## 📚 Related Resources

	- Framework: [Unicorn Execution Engine](https://github.com/Unicorn-Commander/Unicorn-Execution-Engine)
	- Company: [Magic Unicorn Unconventional Technology & Stuff Inc](https://magicunicorn.tech)
	- Platform: [Unicorn Commander](https://unicorncommander.com)
	- Documentation: Complete guides in framework repository

	## 🔒 License

	This model is released under the Apache 2.0 License, following the original Gemma 3 license terms.

	---

	🦄 NPU+iGPU Large Language Model
	⚡ Powered by Unicorn Execution Engine
	🏢 Magic Unicorn Unconventional Technology & Stuff Inc