|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- MOE |
|
|
- Mixture of Experts |
|
|
- Mixtral |
|
|
- 4X8 |
|
|
- 2X8 |
|
|
- deepseek |
|
|
- reasoning |
|
|
- reason |
|
|
- thinking |
|
|
- all use cases |
|
|
- bfloat16 |
|
|
- float32 |
|
|
- float16 |
|
|
- role play |
|
|
- sillytavern |
|
|
- backyard |
|
|
- lmstudio |
|
|
- Text Generation WebUI |
|
|
- llama 3 |
|
|
- mistral |
|
|
- llama 3.1 |
|
|
- qwen 2.5 |
|
|
- context 128k |
|
|
- mergekit |
|
|
- merge |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
<H2>How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts</H2> |
|
|
|
|
|
This document discusses how to set/change the Mixture of Experts in various LLM/AI apps and includes links |
|
|
to additional MOE Models, and other helpful resources. |
|
|
|
|
|
--- |
|
|
|
|
|
<h2>LINKS:</h2> |
|
|
|
|
|
--- |
|
|
|
|
|
<H2>Mixture Of Expert Models - including Reasoning/Thinking:</H2> |
|
|
|
|
|
[ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ] |
|
|
|
|
|
<h2>Additional:</h2> |
|
|
|
|
|
<B>#1 All Reasoning/Thinking Models - including MOEs - (collection) (GGUF):</b> |
|
|
|
|
|
[ https://huggingface.co/collections/DavidAU/d-au-reasoning-deepseek-models-with-thinking-reasoning-67a41ec81d9df996fd1cdd60 ] |
|
|
|
|
|
<B>#2 All Reasoning/Thinking Models - including MOES - (collection) (Source Code to generation GGUF, EXL2, AWQ, GPTQ, HQQ, etc etc and direct usage):</b> |
|
|
|
|
|
[ https://huggingface.co/collections/DavidAU/d-au-reasoning-source-files-for-gguf-exl2-awq-gptq-67b296c5f09f3b49a6aa2704 ] |
|
|
|
|
|
<B>#3 All Adapters (collection) - Turn a "regular" model into a "thinking/reasoning" model:</b> |
|
|
|
|
|
[ https://huggingface.co/collections/DavidAU/d-au-reasoning-adapters-loras-any-model-to-reasoning-67bdb1a7156a97f6ec42ce36 ] |
|
|
|
|
|
These collections will update over time. Newest items are usually at the bottom of each collection. |
|
|
|
|
|
--- |
|
|
|
|
|
<H2>Main Document - Setting Mixture Of Experts in LLM/AI apps</H2> |
|
|
|
|
|
--- |
|
|
|
|
|
<B>Experts Activation / Models used to build this model:</B> |
|
|
|
|
|
Special Thanks to all the model makers, for the models used in this MOE Model: |
|
|
|
|
|
To be updated... |
|
|
|
|
|
The mixture of experts is set at 4 experts, but you can use 1, 2, 3, or 4. |
|
|
|
|
|
This "team" has a Captain (first listed model), and then all the team members contribute to the to "token" |
|
|
choice billions of times per second. Note the Captain also contributes too. |
|
|
|
|
|
Think of 2, 3 or 4 (or more) master chefs in the kitchen all competing to make the best dish for you. |
|
|
|
|
|
This results in higher quality generation. |
|
|
|
|
|
This also results in many cases in higher quality instruction following too. |
|
|
|
|
|
That means the power of every model is available during instruction and output generation. |
|
|
|
|
|
NOTE: |
|
|
|
|
|
You can use one "expert" too ; however this means the model will randomly select an expert to use EACH TIME, resulting |
|
|
in very different generation for each prompt / regen of a prompt. |
|
|
|
|
|
CHANGING THE NUMBER OF EXPERTS: |
|
|
|
|
|
You can set the number of experts in LMStudio (https://lmstudio.ai) at the "load" screen and via other apps/llm apps by setting "Experts" or "Number of Experts". |
|
|
|
|
|
For Text-Generation-Webui (https://github.com/oobabooga/text-generation-webui) you set the number of experts at the loading screen page. |
|
|
|
|
|
For KolboldCPP (https://github.com/LostRuins/koboldcpp) Version 1.8+ , on the load screen, click on "TOKENS", |
|
|
you can set experts on this page, and the launch the model. |
|
|
|
|
|
For server.exe / Llama-server.exe (Llamacpp - https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md ) |
|
|
add the following to the command line to start the "llamacpp server" (CLI): |
|
|
|
|
|
"--override-kv llama.expert_used_count=int:3" |
|
|
|
|
|
(no quotes, where "3" is the number of experts to use) |
|
|
|
|
|
When using "API", you set the "num_experts_used" in the JSON payload (this maybe different for different back ends). |
|
|
|
|
|
<B>SUGGESTION:</B> |
|
|
|
|
|
The MOE models at my repo: |
|
|
|
|
|
[ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ] |
|
|
|
|
|
Contain various examples, including example generation(s) showing 2, 4, and 8 experts. |
|
|
|
|
|
This will give you a better idea of what changes to expect when adjusting the number of experts |
|
|
and the effect on generation. |
|
|
|