File size: 3,011 Bytes
43e4be9
 
 
 
 
 
 
 
 
 
 
 
 
452fed3
43e4be9
 
bc78057
452fed3
43e4be9
bc78057
d7264eb
43e4be9
bc78057
452fed3
43e4be9
c42f4cc
43e4be9
 
 
 
 
 
49b8e8a
43e4be9
 
 
 
 
 
 
 
 
 
452fed3
43e4be9
bc78057
 
452fed3
43e4be9
452fed3
43e4be9
bc78057
 
 
 
 
452fed3
43e4be9
 
 
 
 
 
 
 
 
 
 
 
 
 
452fed3
43e4be9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c42f4cc
43e4be9
 
 
452fed3
 
 
43e4be9
 
 
c42f4cc
43e4be9
 
c42f4cc
43e4be9
 
 
c42f4cc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
language:
- en
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
pipeline_tag: text-generation
---
# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

## Model Details
meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization

### Model Description
meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization<br>
https://onnxruntime.ai/docs/genai/howto/install.html#directml

Created using ONNX Runtime GenAI's builder.py<br>
https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/src/python/py/models/builder.py

INT4 accuracy level: FP32 (float32)<br>
8-bit quantization for MoE layers

- **Developed by:** Mochamad Aris Zamroni
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]

### Model Sources [optional]
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use
This is Windows DirectML optimized model.

Prerequisites:<br>
1. Install Python 3.10 from Windows Store:<br>
https://apps.microsoft.com/detail/9pjpw5ldxlz5?hl=en-us&gl=US

2. Open command line cmd.exe

3. Create python virtual environment and install onnxruntime-genai-directml<br>
mkdir c:\temp<br>
cd c:\temp<br>
python -m venv dmlgenai<br>
dmlgenai\Scripts\activate.bat<br>
pip install onnxruntime-genai-directml


## How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

#### Preprocessing [optional]

[More Information Needed]


#### Speeds, Sizes, Times [optional]
15 token/s in Radeon 780M with 8GB dedicated RAM

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]

#### Summary



## Model Examination [optional]

<!-- Relevant interpretability work for the model goes here -->

[More Information Needed]

## Technical Specifications [optional]

### Model Architecture and Objective

[More Information Needed]

### Compute Infrastructure

Microsoft Windows DirectML

#### Hardware

AMD Ryzen 7840U with integrated Radeon 780M GPU
RAM 32GB
shared VRAM 8GB

#### Software

Microsoft Windows DirectML

## Model Card Authors [optional]
Mochamad Aris Zamroni

## Model Card Contact

https://www.linkedin.com/in/zamroni/