MistyozAI
/

CosmicFish-90M

+---
+license: apache-2.0
+tags:
+- text-generation
+- language-model
+- LLM
+- CosmicFish
+- 90M
+- transformer
+language: en
+datasets:
+- CosmicSet-1.0
+- akkiisfrommars/TreeCorpusCleanedmodel
+model_type: CosmicFish
+---
+# CosmicFish-90M
+A 90M parameter language model with modern architecture improvements developed by Mistyoz AI.
+## Quick Start
+**The easiest way to chat with CosmicFish is using our chat.py script:**
+```bash
+# Download the chat script from this repository
+wget https://huggingface.co/MistyozAI/CosmicFish-90M/resolve/main/chat.py
+# Install dependencies
+pip install transformers huggingface-hub termcolor
+# Run the chat interface (automatically downloads model)
+python chat.py
+```
+The `chat.py` script handles all model loading, generation, and provides the best chat experience with live streaming, repetition penalty, and conversation commands.
+## Model Details
+- **Parameters**: 91.6M
+- **Architecture**: CosmicFish (RoPE, GQA, SwiGLU, RMSNorm)
+- **Context Length**: 512 tokens
+- **Vocabulary**: 50,257 tokens
+- **Training Data**: CosmicSet 1.0
+- **Developer**: Mistyoz AI
+- **Repository**: MistyozAI/CosmicFish-90M
+## Usage
+### Installation
+```bash
+pip install transformers huggingface-hub termcolor
+```
+### Quick Chat Interface
+```python
+from transformers import GPT2Tokenizer
+from huggingface_hub import snapshot_download
+import torch
+import json
+import os
+# Download model from Hugging Face Hub
+cache_dir = snapshot_download(repo_id="MistyozAI/CosmicFish-90M")
+# Load tokenizer
+tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+# Load config
+with open(os.path.join(cache_dir, "config.json")) as f:
+    config_dict = json.load(f)
+# Load model weights
+state_dict = torch.load(os.path.join(cache_dir, "pytorch_model.bin"), map_location="cpu")
+# Note: Full model class available in the repository
+print("Model downloaded and ready for use!")
+```
+### Advanced Generation with Repetition Penalty
+```python
+def generate_with_repetition_penalty(model, tokenizer, prompt, max_tokens=100, temperature=0.7, penalty=1.2):
+    input_ids = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
+    generated = input_ids.clone()
+    for _ in range(max_tokens):
+        with torch.no_grad():
+            logits, _ = model(generated)
+        next_token_logits = logits[:, -1, :] / temperature
+        # Apply repetition penalty
+        if penalty > 1.0:
+            for token_id in set(generated[0].tolist()):
+                if next_token_logits[0, token_id] > 0:
+                    next_token_logits[0, token_id] /= penalty
+                else:
+                    next_token_logits[0, token_id] *= penalty
+        probs = torch.nn.functional.softmax(next_token_logits, dim=-1)
+        next_token = torch.multinomial(probs, num_samples=1)
+        if next_token.item() == tokenizer.eos_token_id:
+            break
+        generated = torch.cat([generated, next_token], dim=1)
+    return tokenizer.decode(generated[0], skip_special_tokens=True)
+```
+### Chat Interface
+```python
+def chat_with_model():
+    conversation = []
+    while True:
+        user_input = input("You: ")
+        if user_input.lower() in ['quit', 'exit']:
+            break
+        context = "Below is a conversation between a human and an AI assistant.\n\n"
+        for human, ai in conversation:
+            context += f"Human: {human}\nAssistant: {ai}\n\n"
+        context += f"Human: {user_input}\nAssistant:"
+        # Generate response with repetition penalty
+        response = generate_with_repetition_penalty(
+            model, tokenizer, context,
+            max_tokens=150, temperature=0.7, penalty=1.2
+        )
+        # Extract just the assistant's response
+        response = response.split("Assistant:")[-1].split('\n')[0].strip()
+        print(f"CosmicFish: {response}")
+        conversation.append((user_input, response))
+chat_with_model()
+```
+## Architecture
+CosmicFish uses several modern improvements over standard transformers:
+- **RoPE (Rotary Position Embeddings)**: Better position encoding than absolute positions
+- **GQA (Grouped-Query Attention)**: Reduces memory usage with 4 query groups
+- **SwiGLU**: More effective activation function than ReLU/GELU
+- **RMSNorm**: Simpler, more stable normalization than LayerNorm
+## Training
+- **Dataset**: CosmicSet 1.0
+- **Sequence Length**: 512 tokens
+- **Training Steps**: ~300K iterations
+- **Hardware**: Nvidia A40 x1
+## Performance
+- **Speed**: Varies by hardware (not benchmarked)
+- **Memory**: ~500MB RAM (FP16)
+- **File Size**: 243MB
+## Limitations
+- Small model size (90M parameters) may produce less accurate responses
+- 512 token context limit
+- Training data cutoff applies
+- May generate incorrect information
+- Cannot browse internet or access real-time data
+## License
+Apache 2.0 - see LICENSE file.
+## Credit
+If you use CosmicFish-90M, please credit Mistyoz AI.