(discord https://discord.gg/DUzP7CXqJt , https://discord.gg/jzwR7jFfSB) Website: https://calmacatai.draklor.ru

License

This model is licensed under the MIT License.

CalmaCatLM-1.5-mini

🚧 Experimental Under-Training Model (~12M parameters) based on a custom 12-layer/12-head Transformer architecture.
Primarily supports English 🇬🇧. This is my third model.

📖 Description

CalmaCatLM is an experimental generative language model designed for text generation and dialogue tasks.
The main goal of this project is to test the full pipeline: from implementing the architecture and training from scratch to uploading models to the Hugging Face Hub.

⚙️ Model Details

  • Architecture: Custom Transformer Decoder (6 layers, 6 attention heads)
  • Model size: ~12M parameters #
  • Training Approach: Pre-trained from scratch on My dataset
  • Languages: Primarily Russian
  • License: MIT

🏋️ Training Details

  • Dataset: My
  • Hardware: Single AMD RX 7700 XT (12GB VRAM)
  • Training Status: Very early checkpoint (Under-trained)
  • Epochs: 100
  • Batch size: 32
  • Optimizer: AdamW, lr = 3e-4
  • Max sequence length: 128 tokens
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support