akkiisfrommars commited on
Commit
8d4d0f0
·
verified ·
1 Parent(s): 98922ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +181 -3
README.md CHANGED
@@ -1,3 +1,181 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - text-generation
5
+ - language-model
6
+ - LLM
7
+ - CosmicFish
8
+ - 90M
9
+ - transformer
10
+ language: en
11
+ datasets:
12
+ - CosmicSet-1.0
13
+ - akkiisfrommars/TreeCorpusCleanedmodel
14
+ model_type: CosmicFish
15
+ ---
16
+
17
+ # CosmicFish-90M
18
+
19
+ A 90M parameter language model with modern architecture improvements developed by Mistyoz AI.
20
+
21
+ ## Quick Start
22
+
23
+ **The easiest way to chat with CosmicFish is using our chat.py script:**
24
+
25
+ ```bash
26
+ # Download the chat script from this repository
27
+ wget https://huggingface.co/MistyozAI/CosmicFish-90M/resolve/main/chat.py
28
+
29
+ # Install dependencies
30
+ pip install transformers huggingface-hub termcolor
31
+
32
+ # Run the chat interface (automatically downloads model)
33
+ python chat.py
34
+ ```
35
+
36
+ The `chat.py` script handles all model loading, generation, and provides the best chat experience with live streaming, repetition penalty, and conversation commands.
37
+
38
+ ## Model Details
39
+
40
+ - **Parameters**: 91.6M
41
+ - **Architecture**: CosmicFish (RoPE, GQA, SwiGLU, RMSNorm)
42
+ - **Context Length**: 512 tokens
43
+ - **Vocabulary**: 50,257 tokens
44
+ - **Training Data**: CosmicSet 1.0
45
+ - **Developer**: Mistyoz AI
46
+ - **Repository**: MistyozAI/CosmicFish-90M
47
+
48
+ ## Usage
49
+
50
+ ### Installation
51
+
52
+ ```bash
53
+ pip install transformers huggingface-hub termcolor
54
+ ```
55
+
56
+ ### Quick Chat Interface
57
+
58
+ ```python
59
+ from transformers import GPT2Tokenizer
60
+ from huggingface_hub import snapshot_download
61
+ import torch
62
+ import json
63
+ import os
64
+
65
+ # Download model from Hugging Face Hub
66
+ cache_dir = snapshot_download(repo_id="MistyozAI/CosmicFish-90M")
67
+
68
+ # Load tokenizer
69
+ tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
70
+
71
+ # Load config
72
+ with open(os.path.join(cache_dir, "config.json")) as f:
73
+ config_dict = json.load(f)
74
+
75
+ # Load model weights
76
+ state_dict = torch.load(os.path.join(cache_dir, "pytorch_model.bin"), map_location="cpu")
77
+
78
+ # Note: Full model class available in the repository
79
+ print("Model downloaded and ready for use!")
80
+ ```
81
+
82
+ ### Advanced Generation with Repetition Penalty
83
+
84
+ ```python
85
+ def generate_with_repetition_penalty(model, tokenizer, prompt, max_tokens=100, temperature=0.7, penalty=1.2):
86
+ input_ids = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
87
+ generated = input_ids.clone()
88
+
89
+ for _ in range(max_tokens):
90
+ with torch.no_grad():
91
+ logits, _ = model(generated)
92
+
93
+ next_token_logits = logits[:, -1, :] / temperature
94
+
95
+ # Apply repetition penalty
96
+ if penalty > 1.0:
97
+ for token_id in set(generated[0].tolist()):
98
+ if next_token_logits[0, token_id] > 0:
99
+ next_token_logits[0, token_id] /= penalty
100
+ else:
101
+ next_token_logits[0, token_id] *= penalty
102
+
103
+ probs = torch.nn.functional.softmax(next_token_logits, dim=-1)
104
+ next_token = torch.multinomial(probs, num_samples=1)
105
+
106
+ if next_token.item() == tokenizer.eos_token_id:
107
+ break
108
+
109
+ generated = torch.cat([generated, next_token], dim=1)
110
+
111
+ return tokenizer.decode(generated[0], skip_special_tokens=True)
112
+ ```
113
+
114
+ ### Chat Interface
115
+
116
+ ```python
117
+ def chat_with_model():
118
+ conversation = []
119
+
120
+ while True:
121
+ user_input = input("You: ")
122
+ if user_input.lower() in ['quit', 'exit']:
123
+ break
124
+
125
+ context = "Below is a conversation between a human and an AI assistant.\n\n"
126
+ for human, ai in conversation:
127
+ context += f"Human: {human}\nAssistant: {ai}\n\n"
128
+ context += f"Human: {user_input}\nAssistant:"
129
+
130
+ # Generate response with repetition penalty
131
+ response = generate_with_repetition_penalty(
132
+ model, tokenizer, context,
133
+ max_tokens=150, temperature=0.7, penalty=1.2
134
+ )
135
+
136
+ # Extract just the assistant's response
137
+ response = response.split("Assistant:")[-1].split('\n')[0].strip()
138
+ print(f"CosmicFish: {response}")
139
+
140
+ conversation.append((user_input, response))
141
+
142
+ chat_with_model()
143
+ ```
144
+
145
+ ## Architecture
146
+
147
+ CosmicFish uses several modern improvements over standard transformers:
148
+
149
+ - **RoPE (Rotary Position Embeddings)**: Better position encoding than absolute positions
150
+ - **GQA (Grouped-Query Attention)**: Reduces memory usage with 4 query groups
151
+ - **SwiGLU**: More effective activation function than ReLU/GELU
152
+ - **RMSNorm**: Simpler, more stable normalization than LayerNorm
153
+
154
+ ## Training
155
+
156
+ - **Dataset**: CosmicSet 1.0
157
+ - **Sequence Length**: 512 tokens
158
+ - **Training Steps**: ~300K iterations
159
+ - **Hardware**: Nvidia A40 x1
160
+
161
+ ## Performance
162
+
163
+ - **Speed**: Varies by hardware (not benchmarked)
164
+ - **Memory**: ~500MB RAM (FP16)
165
+ - **File Size**: 243MB
166
+
167
+ ## Limitations
168
+
169
+ - Small model size (90M parameters) may produce less accurate responses
170
+ - 512 token context limit
171
+ - Training data cutoff applies
172
+ - May generate incorrect information
173
+ - Cannot browse internet or access real-time data
174
+
175
+ ## License
176
+
177
+ Apache 2.0 - see LICENSE file.
178
+
179
+ ## Credit
180
+
181
+ If you use CosmicFish-90M, please credit Mistyoz AI.