Model Summary

PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning. Paper: https://arxiv.org/abs/2408.13359

This is a GGUF quantized version.

Usage

Requires latest llama.cpp to run.

Generation

This is a simple example of how to use the PowerMoe GGUF:

./llama-cli -m PowerMoE4x800M_q3km.gguf -p "How about a snack?"

Downloads last month
7
GGUF
Model size
3.51B params
Architecture
granite
Hardware compatibility
Log In to view the estimation

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TobDeBer/PowerMoe-3b-GGUF

Quantized
(6)
this model

Evaluation results