CosmicFish-90M / README.md
akkiisfrommars's picture
Update README.md
84aa403 verified
---
license: apache-2.0
tags:
- text-generation
- language-model
- causal-lm
- cosmicfish
- 90m
- transformer
- rope
- gqa
- swiglu
- rmsnorm
language: en
datasets:
- CosmicSet-2.0-mini
- akkiisfrommars/TreeCorpusCleanedmodel
model_type: CosmicFish
pipeline_tag: text-generation
---
# CosmicFish-90M
A 90M parameter language model with modern architecture improvements developed by Mistyoz AI.
## Quick Start
**The easiest way to chat with CosmicFish is using our chat.py script:**
```bash
# Download the chat script from this repository
wget https://huggingface.co/MistyozAI/CosmicFish-90M/resolve/main/chat.py
# Install dependencies
pip install transformers huggingface-hub termcolor safetensors
# Run the chat interface (automatically downloads model)
python chat.py
```
The `chat.py` script handles all model loading, generation, and provides the best chat experience with live streaming, repetition penalty, and conversation commands.
## Model Details
- **Parameters**: 91.6M
- **Architecture**: CosmicFish (RoPE, GQA, SwiGLU, RMSNorm)
- **Context Length**: 512 tokens
- **Vocabulary**: 50,257 tokens
- **Training Data**: CosmicSet 2.0 mini
- **Developer**: Mistyoz AI
- **Repository**: MistyozAI/CosmicFish-90M
- **Format**: Safetensors
## Usage
### Installation
```bash
pip install transformers huggingface-hub termcolor safetensors torch
```
### Quick Chat Interface
```python
from transformers import GPT2Tokenizer
from huggingface_hub import snapshot_download
from safetensors.torch import load_file
import torch
import json
import os
# Download model from Hugging Face Hub
cache_dir = snapshot_download(repo_id="MistyozAI/CosmicFish-90M")
# Load tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
# Load config
with open(os.path.join(cache_dir, "config.json")) as f:
config_dict = json.load(f)
# Load model weights from safetensors
state_dict = load_file(os.path.join(cache_dir, "model.safetensors"))
# Note: Full model class available in the repository
print("Model downloaded and ready for use!")
```
### Advanced Generation with Repetition Penalty
```python
def generate_with_repetition_penalty(model, tokenizer, prompt, max_tokens=100, temperature=0.5, penalty=1.2):
input_ids = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
generated = input_ids.clone()
for _ in range(max_tokens):
with torch.no_grad():
logits, _ = model(generated)
next_token_logits = logits[:, -1, :] / temperature
# Apply repetition penalty
if penalty > 1.0:
for token_id in set(generated[0].tolist()):
if next_token_logits[0, token_id] > 0:
next_token_logits[0, token_id] /= penalty
else:
next_token_logits[0, token_id] *= penalty
probs = torch.nn.functional.softmax(next_token_logits, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
if next_token.item() == tokenizer.eos_token_id:
break
generated = torch.cat([generated, next_token], dim=1)
return tokenizer.decode(generated[0], skip_special_tokens=True)
```
### Loading Model with Safetensors
```python
from safetensors.torch import load_file
from modeling_cosmicfish import CosmicFish, CosmicConfig
import json
def load_cosmicfish_model(model_path):
# Load config
with open(os.path.join(model_path, "config.json")) as f:
config_dict = json.load(f)
# Create model config
config = CosmicConfig(
vocab_size=config_dict["vocab_size"],
block_size=config_dict["block_size"],
n_layer=config_dict["n_layer"],
n_head=config_dict["n_head"],
n_embd=config_dict["n_embd"],
bias=config_dict["bias"],
dropout=0.0,
use_rotary=config_dict["use_rotary"],
use_swiglu=config_dict["use_swiglu"],
use_gqa=config_dict["use_gqa"],
n_query_groups=config_dict["n_query_groups"]
)
# Create model
model = CosmicFish(config)
# Load weights from safetensors (secure format)
state_dict = load_file(os.path.join(model_path, "model.safetensors"))
# Handle weight sharing (lm_head.weight shares with transformer.wte.weight)
if 'lm_head.weight' not in state_dict and 'transformer.wte.weight' in state_dict:
state_dict['lm_head.weight'] = state_dict['transformer.wte.weight']
model.load_state_dict(state_dict)
model.eval()
return model
```
### Chat Interface
```python
def chat_with_model():
conversation = []
while True:
user_input = input("You: ")
if user_input.lower() in ['quit', 'exit']:
break
context = "Below is a conversation between a human and an AI assistant.\n\n"
for human, ai in conversation:
context += f"Human: {human}\nAssistant: {ai}\n\n"
context += f"Human: {user_input}\nAssistant:"
# Generate response with repetition penalty
response = generate_with_repetition_penalty(
model, tokenizer, context,
max_tokens=150, temperature=0.7, penalty=1.2
)
# Extract just the assistant's response
response = response.split("Assistant:")[-1].split('\n')[0].strip()
print(f"CosmicFish: {response}")
conversation.append((user_input, response))
chat_with_model()
```
## Architecture
CosmicFish uses several modern improvements over standard transformers:
- **RoPE (Rotary Position Embeddings)**: Better position encoding than absolute positions
- **GQA (Grouped-Query Attention)**: Reduces memory usage with 4 query groups
- **SwiGLU**: More effective activation function than ReLU/GELU
- **RMSNorm**: Simpler, more stable normalization than LayerNorm
## Training
- **Dataset**: CosmicSet 2.0 mini
- **Sequence Length**: 512 tokens
- **Training Steps**: ~200K iterations
- **Hardware**: Nvidia A40 x1
## Performance
- **Speed**: Varies by hardware (not benchmarked)
- **Memory**: ~256MB RAM
- **File Size**: 185MB
- **Loading**: Fast and secure with safetensors
## Limitations
- Small model size (90M parameters) may produce less accurate responses
- 512 token context limit
- English only
- Training data cutoff applies
- May generate incorrect information
- Cannot browse internet or access real-time data
## License
Apache 2.0 - see LICENSE file.
## Credit
If you use CosmicFish-90M, please credit Mistyoz AI.