|
--- |
|
license: apache-2.0 |
|
tags: |
|
- text-generation |
|
- language-model |
|
- causal-lm |
|
- cosmicfish |
|
- 90m |
|
- transformer |
|
- rope |
|
- gqa |
|
- swiglu |
|
- rmsnorm |
|
language: en |
|
datasets: |
|
- CosmicSet-2.0-mini |
|
- akkiisfrommars/TreeCorpusCleanedmodel |
|
model_type: CosmicFish |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# CosmicFish-90M |
|
|
|
A 90M parameter language model with modern architecture improvements developed by Mistyoz AI. |
|
|
|
## Quick Start |
|
|
|
**The easiest way to chat with CosmicFish is using our chat.py script:** |
|
|
|
```bash |
|
# Download the chat script from this repository |
|
wget https://huggingface.co/MistyozAI/CosmicFish-90M/resolve/main/chat.py |
|
|
|
# Install dependencies |
|
pip install transformers huggingface-hub termcolor safetensors |
|
|
|
# Run the chat interface (automatically downloads model) |
|
python chat.py |
|
``` |
|
|
|
The `chat.py` script handles all model loading, generation, and provides the best chat experience with live streaming, repetition penalty, and conversation commands. |
|
|
|
## Model Details |
|
|
|
- **Parameters**: 91.6M |
|
- **Architecture**: CosmicFish (RoPE, GQA, SwiGLU, RMSNorm) |
|
- **Context Length**: 512 tokens |
|
- **Vocabulary**: 50,257 tokens |
|
- **Training Data**: CosmicSet 2.0 mini |
|
- **Developer**: Mistyoz AI |
|
- **Repository**: MistyozAI/CosmicFish-90M |
|
- **Format**: Safetensors |
|
|
|
## Usage |
|
|
|
### Installation |
|
|
|
```bash |
|
pip install transformers huggingface-hub termcolor safetensors torch |
|
``` |
|
|
|
### Quick Chat Interface |
|
|
|
```python |
|
from transformers import GPT2Tokenizer |
|
from huggingface_hub import snapshot_download |
|
from safetensors.torch import load_file |
|
import torch |
|
import json |
|
import os |
|
|
|
# Download model from Hugging Face Hub |
|
cache_dir = snapshot_download(repo_id="MistyozAI/CosmicFish-90M") |
|
|
|
# Load tokenizer |
|
tokenizer = GPT2Tokenizer.from_pretrained("gpt2") |
|
|
|
# Load config |
|
with open(os.path.join(cache_dir, "config.json")) as f: |
|
config_dict = json.load(f) |
|
|
|
# Load model weights from safetensors |
|
state_dict = load_file(os.path.join(cache_dir, "model.safetensors")) |
|
|
|
# Note: Full model class available in the repository |
|
print("Model downloaded and ready for use!") |
|
``` |
|
|
|
### Advanced Generation with Repetition Penalty |
|
|
|
```python |
|
def generate_with_repetition_penalty(model, tokenizer, prompt, max_tokens=100, temperature=0.5, penalty=1.2): |
|
input_ids = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0) |
|
generated = input_ids.clone() |
|
|
|
for _ in range(max_tokens): |
|
with torch.no_grad(): |
|
logits, _ = model(generated) |
|
|
|
next_token_logits = logits[:, -1, :] / temperature |
|
|
|
# Apply repetition penalty |
|
if penalty > 1.0: |
|
for token_id in set(generated[0].tolist()): |
|
if next_token_logits[0, token_id] > 0: |
|
next_token_logits[0, token_id] /= penalty |
|
else: |
|
next_token_logits[0, token_id] *= penalty |
|
|
|
probs = torch.nn.functional.softmax(next_token_logits, dim=-1) |
|
next_token = torch.multinomial(probs, num_samples=1) |
|
|
|
if next_token.item() == tokenizer.eos_token_id: |
|
break |
|
|
|
generated = torch.cat([generated, next_token], dim=1) |
|
|
|
return tokenizer.decode(generated[0], skip_special_tokens=True) |
|
``` |
|
|
|
### Loading Model with Safetensors |
|
|
|
```python |
|
from safetensors.torch import load_file |
|
from modeling_cosmicfish import CosmicFish, CosmicConfig |
|
import json |
|
|
|
def load_cosmicfish_model(model_path): |
|
# Load config |
|
with open(os.path.join(model_path, "config.json")) as f: |
|
config_dict = json.load(f) |
|
|
|
# Create model config |
|
config = CosmicConfig( |
|
vocab_size=config_dict["vocab_size"], |
|
block_size=config_dict["block_size"], |
|
n_layer=config_dict["n_layer"], |
|
n_head=config_dict["n_head"], |
|
n_embd=config_dict["n_embd"], |
|
bias=config_dict["bias"], |
|
dropout=0.0, |
|
use_rotary=config_dict["use_rotary"], |
|
use_swiglu=config_dict["use_swiglu"], |
|
use_gqa=config_dict["use_gqa"], |
|
n_query_groups=config_dict["n_query_groups"] |
|
) |
|
|
|
# Create model |
|
model = CosmicFish(config) |
|
|
|
# Load weights from safetensors (secure format) |
|
state_dict = load_file(os.path.join(model_path, "model.safetensors")) |
|
|
|
# Handle weight sharing (lm_head.weight shares with transformer.wte.weight) |
|
if 'lm_head.weight' not in state_dict and 'transformer.wte.weight' in state_dict: |
|
state_dict['lm_head.weight'] = state_dict['transformer.wte.weight'] |
|
|
|
model.load_state_dict(state_dict) |
|
model.eval() |
|
|
|
return model |
|
``` |
|
|
|
### Chat Interface |
|
|
|
```python |
|
def chat_with_model(): |
|
conversation = [] |
|
|
|
while True: |
|
user_input = input("You: ") |
|
if user_input.lower() in ['quit', 'exit']: |
|
break |
|
|
|
context = "Below is a conversation between a human and an AI assistant.\n\n" |
|
for human, ai in conversation: |
|
context += f"Human: {human}\nAssistant: {ai}\n\n" |
|
context += f"Human: {user_input}\nAssistant:" |
|
|
|
# Generate response with repetition penalty |
|
response = generate_with_repetition_penalty( |
|
model, tokenizer, context, |
|
max_tokens=150, temperature=0.7, penalty=1.2 |
|
) |
|
|
|
# Extract just the assistant's response |
|
response = response.split("Assistant:")[-1].split('\n')[0].strip() |
|
print(f"CosmicFish: {response}") |
|
|
|
conversation.append((user_input, response)) |
|
|
|
chat_with_model() |
|
``` |
|
|
|
## Architecture |
|
|
|
CosmicFish uses several modern improvements over standard transformers: |
|
|
|
- **RoPE (Rotary Position Embeddings)**: Better position encoding than absolute positions |
|
- **GQA (Grouped-Query Attention)**: Reduces memory usage with 4 query groups |
|
- **SwiGLU**: More effective activation function than ReLU/GELU |
|
- **RMSNorm**: Simpler, more stable normalization than LayerNorm |
|
|
|
## Training |
|
|
|
- **Dataset**: CosmicSet 2.0 mini |
|
- **Sequence Length**: 512 tokens |
|
- **Training Steps**: ~200K iterations |
|
- **Hardware**: Nvidia A40 x1 |
|
|
|
## Performance |
|
|
|
- **Speed**: Varies by hardware (not benchmarked) |
|
- **Memory**: ~256MB RAM |
|
- **File Size**: 185MB |
|
- **Loading**: Fast and secure with safetensors |
|
|
|
## Limitations |
|
|
|
- Small model size (90M parameters) may produce less accurate responses |
|
- 512 token context limit |
|
- English only |
|
- Training data cutoff applies |
|
- May generate incorrect information |
|
- Cannot browse internet or access real-time data |
|
|
|
## License |
|
|
|
Apache 2.0 - see LICENSE file. |
|
|
|
## Credit |
|
|
|
If you use CosmicFish-90M, please credit Mistyoz AI. |