File size: 3,011 Bytes
43e4be9 452fed3 43e4be9 bc78057 452fed3 43e4be9 bc78057 d7264eb 43e4be9 bc78057 452fed3 43e4be9 c42f4cc 43e4be9 49b8e8a 43e4be9 452fed3 43e4be9 bc78057 452fed3 43e4be9 452fed3 43e4be9 bc78057 452fed3 43e4be9 452fed3 43e4be9 c42f4cc 43e4be9 452fed3 43e4be9 c42f4cc 43e4be9 c42f4cc 43e4be9 c42f4cc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
---
language:
- en
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
pipeline_tag: text-generation
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
## Model Details
meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization
### Model Description
meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization<br>
https://onnxruntime.ai/docs/genai/howto/install.html#directml
Created using ONNX Runtime GenAI's builder.py<br>
https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/src/python/py/models/builder.py
INT4 accuracy level: FP32 (float32)<br>
8-bit quantization for MoE layers
- **Developed by:** Mochamad Aris Zamroni
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]
### Model Sources [optional]
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
This is Windows DirectML optimized model.
Prerequisites:<br>
1. Install Python 3.10 from Windows Store:<br>
https://apps.microsoft.com/detail/9pjpw5ldxlz5?hl=en-us&gl=US
2. Open command line cmd.exe
3. Create python virtual environment and install onnxruntime-genai-directml<br>
mkdir c:\temp<br>
cd c:\temp<br>
python -m venv dmlgenai<br>
dmlgenai\Scripts\activate.bat<br>
pip install onnxruntime-genai-directml
## How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
#### Preprocessing [optional]
[More Information Needed]
#### Speeds, Sizes, Times [optional]
15 token/s in Radeon 780M with 8GB dedicated RAM
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
[More Information Needed]
### Results
[More Information Needed]
#### Summary
## Model Examination [optional]
<!-- Relevant interpretability work for the model goes here -->
[More Information Needed]
## Technical Specifications [optional]
### Model Architecture and Objective
[More Information Needed]
### Compute Infrastructure
Microsoft Windows DirectML
#### Hardware
AMD Ryzen 7840U with integrated Radeon 780M GPU
RAM 32GB
shared VRAM 8GB
#### Software
Microsoft Windows DirectML
## Model Card Authors [optional]
Mochamad Aris Zamroni
## Model Card Contact
https://www.linkedin.com/in/zamroni/ |