| library_name: mlc-llm | |
| base_model: nvidia/OpenCodeReasoning-Nemotron-1.1-14B | |
| tags: | |
| - mlc-llm | |
| - web-llm | |
| # OpenCodeReasoning-Nemotron-1.1-14B-q3f16_1-MLC | |
| This is the [OpenCodeReasoning-Nemotron-1.1-14B](https://huggingface.co/nvidia/OpenCodeReasoning-Nemotron-1.1-14B) model in MLC format `q3f16_1`. | |
| The model can be used with [MLC-LLM](https://github.com/mlc-ai/mlc-llm) and [WebLLM](https://github.com/mlc-ai/web-llm). | |
| ## Example Usage | |
| Before using the examples, please follow the [installation guide](https://llm.mlc.ai/docs/install/mlc_llm.html#install-mlc-packages). | |
| ### Chat CLI | |
| ```bash | |
| mlc_llm chat HF://JackBinary/OpenCodeReasoning-Nemotron-1.1-14B-q3f16_1-MLC | |
| ```` | |
| ### REST Server | |
| ```bash | |
| mlc_llm serve HF://JackBinary/OpenCodeReasoning-Nemotron-1.1-14B-q3f16_1-MLC | |
| ``` | |
| ### Python API | |
| ```python | |
| from mlc_llm import MLCEngine | |
| model = "HF://JackBinary/OpenCodeReasoning-Nemotron-1.1-14B-q3f16_1-MLC" | |
| engine = MLCEngine(model) | |
| for response in engine.chat.completions.create( | |
| messages=[{"role": "user", "content": "What is the meaning of life?"}], | |
| model=model, | |
| stream=True, | |
| ): | |
| for choice in response.choices: | |
| print(choice.delta.content, end="", flush=True) | |
| print("\n") | |
| engine.terminate() | |
| ``` | |
| ## Documentation | |
| For more on MLC LLM, visit the [documentation](https://llm.mlc.ai/docs/) and [GitHub repo](https://github.com/mlc-ai/mlc-llm). | |