CosmicFish-90M / README.md

Update README.md

84aa403 verified 29 days ago

6.53 kB

	---
	license: apache-2.0
	tags:
	- text-generation
	- language-model
	- causal-lm
	- cosmicfish
	- 90m
	- transformer
	- rope
	- gqa
	- swiglu
	- rmsnorm
	language: en
	datasets:
	- CosmicSet-2.0-mini
	- akkiisfrommars/TreeCorpusCleanedmodel
	model_type: CosmicFish
	pipeline_tag: text-generation
	---

	# CosmicFish-90M

	A 90M parameter language model with modern architecture improvements developed by Mistyoz AI.

	## Quick Start

	The easiest way to chat with CosmicFish is using our chat.py script:

	```bash
	# Download the chat script from this repository
	wget https://huggingface.co/MistyozAI/CosmicFish-90M/resolve/main/chat.py

	# Install dependencies
	pip install transformers huggingface-hub termcolor safetensors

	# Run the chat interface (automatically downloads model)
	python chat.py
	```

	The `chat.py` script handles all model loading, generation, and provides the best chat experience with live streaming, repetition penalty, and conversation commands.

	## Model Details

	- Parameters: 91.6M
	- Architecture: CosmicFish (RoPE, GQA, SwiGLU, RMSNorm)
	- Context Length: 512 tokens
	- Vocabulary: 50,257 tokens
	- Training Data: CosmicSet 2.0 mini
	- Developer: Mistyoz AI
	- Repository: MistyozAI/CosmicFish-90M
	- Format: Safetensors

	## Usage

	### Installation

	```bash
	pip install transformers huggingface-hub termcolor safetensors torch
	```

	### Quick Chat Interface

	```python
	from transformers import GPT2Tokenizer
	from huggingface_hub import snapshot_download
	from safetensors.torch import load_file
	import torch
	import json
	import os

	# Download model from Hugging Face Hub
	cache_dir = snapshot_download(repo_id="MistyozAI/CosmicFish-90M")

	# Load tokenizer
	tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

	# Load config
	with open(os.path.join(cache_dir, "config.json")) as f:
	config_dict = json.load(f)

	# Load model weights from safetensors
	state_dict = load_file(os.path.join(cache_dir, "model.safetensors"))

	# Note: Full model class available in the repository
	print("Model downloaded and ready for use!")
	```

	### Advanced Generation with Repetition Penalty

	```python
	def generate_with_repetition_penalty(model, tokenizer, prompt, max_tokens=100, temperature=0.5, penalty=1.2):
	input_ids = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
	generated = input_ids.clone()

	for _ in range(max_tokens):
	with torch.no_grad():
	logits, _ = model(generated)

	next_token_logits = logits[:, -1, :] / temperature

	# Apply repetition penalty
	if penalty > 1.0:
	for token_id in set(generated[0].tolist()):
	if next_token_logits[0, token_id] > 0:
	next_token_logits[0, token_id] /= penalty
	else:
	next_token_logits[0, token_id] *= penalty

	probs = torch.nn.functional.softmax(next_token_logits, dim=-1)
	next_token = torch.multinomial(probs, num_samples=1)

	if next_token.item() == tokenizer.eos_token_id:
	break

	generated = torch.cat([generated, next_token], dim=1)

	return tokenizer.decode(generated[0], skip_special_tokens=True)
	```

	### Loading Model with Safetensors

	```python
	from safetensors.torch import load_file
	from modeling_cosmicfish import CosmicFish, CosmicConfig
	import json

	def load_cosmicfish_model(model_path):
	# Load config
	with open(os.path.join(model_path, "config.json")) as f:
	config_dict = json.load(f)

	# Create model config
	config = CosmicConfig(
	vocab_size=config_dict["vocab_size"],
	block_size=config_dict["block_size"],
	n_layer=config_dict["n_layer"],
	n_head=config_dict["n_head"],
	n_embd=config_dict["n_embd"],
	bias=config_dict["bias"],
	dropout=0.0,
	use_rotary=config_dict["use_rotary"],
	use_swiglu=config_dict["use_swiglu"],
	use_gqa=config_dict["use_gqa"],
	n_query_groups=config_dict["n_query_groups"]
	)

	# Create model
	model = CosmicFish(config)

	# Load weights from safetensors (secure format)
	state_dict = load_file(os.path.join(model_path, "model.safetensors"))

	# Handle weight sharing (lm_head.weight shares with transformer.wte.weight)
	if 'lm_head.weight' not in state_dict and 'transformer.wte.weight' in state_dict:
	state_dict['lm_head.weight'] = state_dict['transformer.wte.weight']

	model.load_state_dict(state_dict)
	model.eval()

	return model
	```

	### Chat Interface

	```python
	def chat_with_model():
	conversation = []

	while True:
	user_input = input("You: ")
	if user_input.lower() in ['quit', 'exit']:
	break

	context = "Below is a conversation between a human and an AI assistant.\n\n"
	for human, ai in conversation:
	context += f"Human: {human}\nAssistant: {ai}\n\n"
	context += f"Human: {user_input}\nAssistant:"

	# Generate response with repetition penalty
	response = generate_with_repetition_penalty(
	model, tokenizer, context,
	max_tokens=150, temperature=0.7, penalty=1.2
	)

	# Extract just the assistant's response
	response = response.split("Assistant:")[-1].split('\n')[0].strip()
	print(f"CosmicFish: {response}")

	conversation.append((user_input, response))

	chat_with_model()
	```

	## Architecture

	CosmicFish uses several modern improvements over standard transformers:

	- RoPE (Rotary Position Embeddings): Better position encoding than absolute positions
	- GQA (Grouped-Query Attention): Reduces memory usage with 4 query groups
	- SwiGLU: More effective activation function than ReLU/GELU
	- RMSNorm: Simpler, more stable normalization than LayerNorm

	## Training

	- Dataset: CosmicSet 2.0 mini
	- Sequence Length: 512 tokens
	- Training Steps: ~200K iterations
	- Hardware: Nvidia A40 x1

	## Performance

	- Speed: Varies by hardware (not benchmarked)
	- Memory: ~256MB RAM
	- File Size: 185MB
	- Loading: Fast and secure with safetensors

	## Limitations

	- Small model size (90M parameters) may produce less accurate responses
	- 512 token context limit
	- English only
	- Training data cutoff applies
	- May generate incorrect information
	- Cannot browse internet or access real-time data

	## License

	Apache 2.0 - see LICENSE file.

	## Credit

	If you use CosmicFish-90M, please credit Mistyoz AI.