File size: 3,175 Bytes
75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae b5e0dad 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 75ee2ae 53e5874 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
---
tags:
- llama
- instruct
- finetune
- chatml
- gpt4
- synthetic data
- distillation
model-index:
- name: Meta-Llama-3.1-8B-openhermes-2.5
results: []
license: apache-2.0
language:
- en
library_name: transformers
datasets:
- teknium/OpenHermes-2.5
---
# Model Card for Meta-Llama-3.1-8B-openhermes-2.5
This model is a fine-tuned version of Meta-Llama-3.1-8B on the OpenHermes-2.5 dataset.
## Model Details
### Model Description
This is a fine-tuned version of the Meta-Llama-3.1-8B model, trained on the OpenHermes-2.5 dataset. It is designed for instruction following and general language tasks.
- **Developed by:** artificialguybr
- **Model type:** Causal Language Model
- **Language(s):** English
- **License:** apache-2.0
- **Finetuned from model:** meta-llama/Meta-Llama-3.1-8B
---
### 🌐 Website
You can find more of my models, projects, and information on my official website:
- **[artificialguy.com](https://artificialguy.com/)**
### 💖 Support My Work
If you find this model useful, please consider supporting my work. It helps me cover server costs and dedicate more time to new open-source projects.
- **Patreon:** [Support on Patreon](https://www.patreon.com/user?u=81570187)
- **Ko-fi:** [Buy me a Ko-fi](https://ko-fi.com/artificialguybr)
- **Buy Me a Coffee:** [Buy me a Coffee](https://buymeacoffee.com/jvkape)
### Model Sources
- **Repository:** https://huggingface.co/artificialguybr/Meta-Llama-3.1-8B-openhermes-2.5
## Uses
This model can be used for various natural language processing tasks, particularly those involving instruction following and general language understanding.
### Direct Use
The model can be used for tasks such as text generation, question answering, and other language-related applications.
### Out-of-Scope Use
The model should not be used for generating harmful or biased content. Users should be aware of potential biases in the training data.
## Training Details
### Training Data
The model was fine-tuned on the teknium/OpenHermes-2.5 dataset.
### Training Procedure
#### Training Hyperparameters
- **Training regime:** BF16 mixed precision
- **Optimizer:** AdamW
- **Learning rate:** Started at 0.00000249316296439037 (decaying)
- **Batch size:** Not specified (gradient accumulation steps: 8)
- **Training steps:** 13,368
- **Evaluation strategy:** Steps (every 0.16666666666666666 steps)
- **Gradient checkpointing:** Enabled
- **Weight decay:** 0
#### Hardware and Software
- **Hardware:** NVIDIA A100-SXM4-80GB (1 GPU)
- **Software Framework:** 🤗 Transformers, Axolotl
## Evaluation
### Metrics
- **Loss:** 0.6727465987205505 (evaluation)
- **Perplexity:** Not provided
### Results
- **Evaluation runtime:** 2,676.4173 seconds
- **Samples per second:** 18.711
- **Steps per second:** 18.711
## Model Architecture
- **Model Type:** LlamaForCausalLM
- **Hidden size:** 4,096
- **Intermediate size:** 14,336
- **Number of attention heads:** Not specified
- **Number of layers:** Not specified
- **Activation function:** SiLU
- **Vocabulary size:** 128,256
## Limitations and Biases
More information is needed about specific limitations and biases of this model.
|