|
--- |
|
language: |
|
- en |
|
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct |
|
pipeline_tag: text-generation |
|
--- |
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). |
|
|
|
## Model Details |
|
meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization |
|
|
|
### Model Description |
|
meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization<br> |
|
https://onnxruntime.ai/docs/genai/howto/install.html#directml |
|
|
|
Created using ONNX Runtime GenAI's builder.py<br> |
|
https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/src/python/py/models/builder.py |
|
|
|
INT4 accuracy level: FP32 (float32)<br> |
|
8-bit quantization for MoE layers |
|
|
|
- **Developed by:** Mochamad Aris Zamroni |
|
- **Model type:** [More Information Needed] |
|
- **Language(s) (NLP):** [More Information Needed] |
|
- **License:** [More Information Needed] |
|
- **Finetuned from model [optional]:** [More Information Needed] |
|
|
|
### Model Sources [optional] |
|
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct |
|
|
|
- **Repository:** [More Information Needed] |
|
- **Paper [optional]:** [More Information Needed] |
|
- **Demo [optional]:** [More Information Needed] |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
### Direct Use |
|
This is Windows DirectML optimized model. |
|
|
|
Prerequisites:<br> |
|
1. Install Python 3.10 from Windows Store:<br> |
|
https://apps.microsoft.com/detail/9pjpw5ldxlz5?hl=en-us&gl=US |
|
|
|
2. Open command line cmd.exe |
|
|
|
3. Create python virtual environment and install onnxruntime-genai-directml<br> |
|
mkdir c:\temp<br> |
|
cd c:\temp<br> |
|
python -m venv dmlgenai<br> |
|
dmlgenai\Scripts\activate.bat<br> |
|
pip install onnxruntime-genai-directml |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
[More Information Needed] |
|
|
|
#### Preprocessing [optional] |
|
|
|
[More Information Needed] |
|
|
|
|
|
#### Speeds, Sizes, Times [optional] |
|
15 token/s in Radeon 780M with 8GB dedicated RAM |
|
|
|
#### Metrics |
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
|
[More Information Needed] |
|
|
|
### Results |
|
|
|
[More Information Needed] |
|
|
|
#### Summary |
|
|
|
|
|
|
|
## Model Examination [optional] |
|
|
|
<!-- Relevant interpretability work for the model goes here --> |
|
|
|
[More Information Needed] |
|
|
|
## Technical Specifications [optional] |
|
|
|
### Model Architecture and Objective |
|
|
|
[More Information Needed] |
|
|
|
### Compute Infrastructure |
|
|
|
Microsoft Windows DirectML |
|
|
|
#### Hardware |
|
|
|
AMD Ryzen 7840U with integrated Radeon 780M GPU |
|
RAM 32GB |
|
shared VRAM 8GB |
|
|
|
#### Software |
|
|
|
Microsoft Windows DirectML |
|
|
|
## Model Card Authors [optional] |
|
Mochamad Aris Zamroni |
|
|
|
## Model Card Contact |
|
|
|
https://www.linkedin.com/in/zamroni/ |