| # OpenLLaMA 7Bv2 Model Card | |
| ## Model Description | |
| OpenLLaMA 7Bv2 is a cutting-edge language model, trained with a focus on delivering high-quality, contextually relevant text predictions. It leverages a diverse composite dataset that includes web-crawled data, scholarly articles, and a wide range of literature and question-answer pairs to ensure broad domain coverage and applicability. | |
| ## Training Data | |
| The model was trained on a composite dataset that includes: | |
| - Falcon refined-web dataset | |
| - starcoder datasets | |
| - Contributions from Wikipedia for encyclopedic knowledge | |
| - Academic papers from arXiv for scientific understanding | |
| - A vast collection of books spanning multiple genres | |
| - Stack Exchange data curated by RedPajama | |
| ## Training Procedure | |
| - **Learning Rate:** Utilized a maximum learning rate of 3e-4 and a minimum learning rate of 3e-5. | |
| - **Batch Size:** Employed a batch size of 4 million tokens, optimizing the training process for both efficiency and performance. | |
| - **Learning Rate Scheduler:** The model's learning rate scheduling closely follows the strategy used in Llama2, ensuring gradual adjustments for optimal convergence. | |