llm-jp
/

optimal-sparsity-code-d512-E64-k2-1.7B-A170M

Text Generation

Mixture of Experts

text-generation-inference

Model card Files Files and versions

optimal-sparsity-code-d512-E64-k2-1.7B-A170M / README.md

Taishi-N324's picture

Upload README.md with huggingface_hub

17ee6a6 verified 20 days ago

|

history blame contribute delete

1.13 kB

	---
	pipeline_tag: text-generation
	library_name: transformers
	license: apache-2.0
	tags:
	- mixtral
	- moe
	- reasoning
	---

	# Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

	This repository contains model checkpoints from the paper [Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks](https://huggingface.co/papers/2508.18672).

	For more details, including code and evaluation procedures, please refer to the official GitHub repository: [https://github.com/rioyokotalab/optimal-sparsity](https://github.com/rioyokotalab/optimal-sparsity)

	## How to cite

	If you find our work helpful, please feel free to cite the paper.

	```bibtex
	@article{nakamura2025optimalsparsitymixtureofexpertslanguage,
	title={Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks},
	author={Taishi Nakamura and Satoki Ishikawa and Masaki Kawamura and Takumi Okamoto and Daisuke Nohara and Jun Suzuki and Rio Yokota},
	year={2025},
	eprint={2508.18672},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2508.18672},
	}
	```