Running on CPU Upgrade Featured 2.72k The Smol Training Playbook 📚 2.72k The secrets to building world-class LLMs
Datasets for Pretrained Thai LLM Collection List Datasets for pretrained Thai LLM by PyThaiNLP • 25 items • Updated Aug 5 • 14
Thai instruction dataset list Collection Thai instruction datasets that have high quality and doesn't are the translated dataset by Google translate (low quality) • 14 items • Updated Oct 9 • 2
HuggingFaceFW/fineweb-edu-classifier Text Classification • 0.1B • Updated Nov 17, 2024 • 13.6k • • 202
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 42