zamroni111
/

Meta-Llama-3.1-8B-Instruct-ONNX-DirectML-GenAI-INT4

Text Generation

Model card Files Files and versions

Meta-Llama-3.1-8B-Instruct-ONNX-DirectML-GenAI-INT4 / README.md

zamroni111's picture

Update README.md

bc78057 verified about 1 year ago

|

3.01 kB

	---
	language:
	- en
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	pipeline_tag: text-generation
	---
	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

	## Model Details
	meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization

	### Model Description
	meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization<br>
	https://onnxruntime.ai/docs/genai/howto/install.html#directml

	Created using ONNX Runtime GenAI's builder.py<br>
	https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/src/python/py/models/builder.py

	INT4 accuracy level: FP32 (float32)<br>
	8-bit quantization for MoE layers

	- Developed by: Mochamad Aris Zamroni
	- Model type: [More Information Needed]
	- Language(s) (NLP): [More Information Needed]
	- License: [More Information Needed]
	- Finetuned from model [optional]: [More Information Needed]

	### Model Sources [optional]
	https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

	- Repository: [More Information Needed]
	- Paper [optional]: [More Information Needed]
	- Demo [optional]: [More Information Needed]

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	### Direct Use
	This is Windows DirectML optimized model.

	Prerequisites:<br>
	1. Install Python 3.10 from Windows Store:<br>
	https://apps.microsoft.com/detail/9pjpw5ldxlz5?hl=en-us&gl=US

	2. Open command line cmd.exe

	3. Create python virtual environment and install onnxruntime-genai-directml<br>
	mkdir c:\temp<br>
	cd c:\temp<br>
	python -m venv dmlgenai<br>
	dmlgenai\Scripts\activate.bat<br>
	pip install onnxruntime-genai-directml


	## How to Get Started with the Model

	Use the code below to get started with the model.

	[More Information Needed]

	#### Preprocessing [optional]

	[More Information Needed]


	#### Speeds, Sizes, Times [optional]
	15 token/s in Radeon 780M with 8GB dedicated RAM

	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	[More Information Needed]

	### Results

	[More Information Needed]

	#### Summary



	## Model Examination [optional]

	<!-- Relevant interpretability work for the model goes here -->

	[More Information Needed]

	## Technical Specifications [optional]

	### Model Architecture and Objective

	[More Information Needed]

	### Compute Infrastructure

	Microsoft Windows DirectML

	#### Hardware

	AMD Ryzen 7840U with integrated Radeon 780M GPU
	RAM 32GB
	shared VRAM 8GB

	#### Software

	Microsoft Windows DirectML

	## Model Card Authors [optional]
	Mochamad Aris Zamroni

	## Model Card Contact

	https://www.linkedin.com/in/zamroni/