|  | --- | 
					
						
						|  | license: apache-2.0 | 
					
						
						|  | language: | 
					
						
						|  | - en | 
					
						
						|  | tags: | 
					
						
						|  | - MOE | 
					
						
						|  | - Mixture of Experts | 
					
						
						|  | - Mixtral | 
					
						
						|  | - 4X8 | 
					
						
						|  | - 2X8 | 
					
						
						|  | - deepseek | 
					
						
						|  | - reasoning | 
					
						
						|  | - reason | 
					
						
						|  | - thinking | 
					
						
						|  | - all use cases | 
					
						
						|  | - bfloat16 | 
					
						
						|  | - float32 | 
					
						
						|  | - float16 | 
					
						
						|  | - role play | 
					
						
						|  | - sillytavern | 
					
						
						|  | - backyard | 
					
						
						|  | - lmstudio | 
					
						
						|  | - Text Generation WebUI | 
					
						
						|  | - llama 3 | 
					
						
						|  | - mistral | 
					
						
						|  | - llama 3.1 | 
					
						
						|  | - qwen 2.5 | 
					
						
						|  | - context 128k | 
					
						
						|  | - mergekit | 
					
						
						|  | - merge | 
					
						
						|  | pipeline_tag: text-generation | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | <H2>How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts</H2> | 
					
						
						|  |  | 
					
						
						|  | This document discusses how to set/change the Mixture of Experts in various LLM/AI apps and includes links | 
					
						
						|  | to additional MOE Models, and other helpful resources. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | <h2>LINKS:</h2> | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | <H2>Mixture Of Expert Models - including Reasoning/Thinking:</H2> | 
					
						
						|  |  | 
					
						
						|  | [ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ] | 
					
						
						|  |  | 
					
						
						|  | <h2>Additional:</h2> | 
					
						
						|  |  | 
					
						
						|  | <B>#1 All Reasoning/Thinking Models - including MOEs - (collection) (GGUF):</b> | 
					
						
						|  |  | 
					
						
						|  | [ https://huggingface.co/collections/DavidAU/d-au-reasoning-deepseek-models-with-thinking-reasoning-67a41ec81d9df996fd1cdd60 ] | 
					
						
						|  |  | 
					
						
						|  | <B>#2 All Reasoning/Thinking Models - including MOES - (collection) (Source Code to generation GGUF, EXL2, AWQ, GPTQ, HQQ, etc etc and direct usage):</b> | 
					
						
						|  |  | 
					
						
						|  | [ https://huggingface.co/collections/DavidAU/d-au-reasoning-source-files-for-gguf-exl2-awq-gptq-67b296c5f09f3b49a6aa2704 ] | 
					
						
						|  |  | 
					
						
						|  | <B>#3 All Adapters (collection) - Turn a "regular" model into a "thinking/reasoning" model:</b> | 
					
						
						|  |  | 
					
						
						|  | [ https://huggingface.co/collections/DavidAU/d-au-reasoning-adapters-loras-any-model-to-reasoning-67bdb1a7156a97f6ec42ce36 ] | 
					
						
						|  |  | 
					
						
						|  | These collections will update over time. Newest items are usually at the bottom of each collection. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | <H2>Main Document - Setting Mixture Of Experts in LLM/AI apps</H2> | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | <B>Experts Activation / Models used to build this model:</B> | 
					
						
						|  |  | 
					
						
						|  | Special Thanks to all the model makers, for the models used in this MOE Model: | 
					
						
						|  |  | 
					
						
						|  | To be updated... | 
					
						
						|  |  | 
					
						
						|  | The mixture of experts is set at 4 experts, but you can use 1, 2, 3, or 4. | 
					
						
						|  |  | 
					
						
						|  | This "team" has a Captain (first listed model), and then all the team members contribute to the to "token" | 
					
						
						|  | choice billions of times per second. Note the Captain also contributes too. | 
					
						
						|  |  | 
					
						
						|  | Think of 2, 3 or 4 (or more) master chefs in the kitchen all competing to make the best dish for you. | 
					
						
						|  |  | 
					
						
						|  | This results in higher quality generation. | 
					
						
						|  |  | 
					
						
						|  | This also results in many cases in higher quality instruction following too. | 
					
						
						|  |  | 
					
						
						|  | That means the power of every model is available during instruction and output generation. | 
					
						
						|  |  | 
					
						
						|  | NOTE: | 
					
						
						|  |  | 
					
						
						|  | You can use one "expert" too ; however this means the model will randomly select an expert to use EACH TIME, resulting | 
					
						
						|  | in very different generation for each prompt / regen of a prompt. | 
					
						
						|  |  | 
					
						
						|  | CHANGING THE NUMBER OF EXPERTS: | 
					
						
						|  |  | 
					
						
						|  | You can set the number of experts in LMStudio (https://lmstudio.ai) at the "load" screen and via other apps/llm apps by setting "Experts" or "Number of Experts". | 
					
						
						|  |  | 
					
						
						|  | For Text-Generation-Webui (https://github.com/oobabooga/text-generation-webui)  you set the number of experts at the loading screen page. | 
					
						
						|  |  | 
					
						
						|  | For KolboldCPP (https://github.com/LostRuins/koboldcpp) Version 1.8+ , on the load screen, click on "TOKENS", | 
					
						
						|  | you can set experts on this page, and the launch the model. | 
					
						
						|  |  | 
					
						
						|  | For server.exe / Llama-server.exe (Llamacpp - https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md ) | 
					
						
						|  | add the following to the command line to start the "llamacpp server" (CLI): | 
					
						
						|  |  | 
					
						
						|  | "--override-kv llama.expert_used_count=int:3" | 
					
						
						|  |  | 
					
						
						|  | (no quotes, where "3" is the number of experts to use) | 
					
						
						|  |  | 
					
						
						|  | When using "API", you set the "num_experts_used" in the JSON payload (this maybe different for different back ends). | 
					
						
						|  |  | 
					
						
						|  | <B>SUGGESTION:</B> | 
					
						
						|  |  | 
					
						
						|  | The MOE models at my repo: | 
					
						
						|  |  | 
					
						
						|  | [ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ] | 
					
						
						|  |  | 
					
						
						|  | Contain various examples, including example generation(s) showing 2, 4, and 8 experts. | 
					
						
						|  |  | 
					
						
						|  | This will give you a better idea of what changes to expect when adjusting the number of experts | 
					
						
						|  | and the effect on generation. | 
					
						
						|  |  |