seedboxai
/

KafkaLM-15B

@@ -54,36 +54,42 @@ The pruned 15 B student is distilled from a calibrated 24 B teacher using a
 ### Results
 Up to 40 % parameter reduction (24 B → 15 B) delivers 2× lower TTFT and ≈ 40 % higher tokens/s versus the uncompressed teacher while matching perplexity and divergence metrics—validating SimplePrune as an effective route to deploy KafkaLM in memory‑constrained, sparsity‑accelerated environments.
-| Metric | Mistral‑24B | **KafkaLM‑15B** | Δ |
 |--------|-------------|-----------------|---|
-| Time‑to‑First‑Token | 4.91 s | **2.46 s** | −50 % |
-| Prompts / s | 4.70 | **6.55** | +38 % |
-| Tokens / s | 579 | **812** | +40 % |
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645ded34a45b4182d7f5c385/4rDhaeC-1GMj6KWbB27f9.png)
 ### Training scalability (distillation run, MI300A cluster)
-| Nodes | Tokens / s | Speed‑up |
 |-------|------------|----------|
-| 4 | 1 461 | – |
-| 8 | 3 327 | 2.3 × |
-| 16 | 7 423 | 5.1 × |
-| 32 | 15 286 | 10.5 × |
-| 64 | 25 455 | 17.4 × |
 Near‑linear scaling thanks to sharded ZeRO‑3 + RCCL optimisations.
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ### Results
 Up to 40 % parameter reduction (24 B → 15 B) delivers 2× lower TTFT and ≈ 40 % higher tokens/s versus the uncompressed teacher while matching perplexity and divergence metrics—validating SimplePrune as an effective route to deploy KafkaLM in memory‑constrained, sparsity‑accelerated environments.
+| Metric | Mistral‑24B | **KafkaLM‑15B** | Δ |
 |--------|-------------|-----------------|---|
+| Time‑to‑First‑Token | 4.91 s | **2.46 s** | −50% |
+| Prompts / s | 4.70 | **6.55** | +38% |
+| Tokens / s | 579 | **812** | +40% |
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645ded34a45b4182d7f5c385/4rDhaeC-1GMj6KWbB27f9.png)
 ### Training scalability (distillation run, MI300A cluster)
+| Nodes | Tokens / s | Speed‑up |
 |-------|------------|----------|
+| 4 | 1 461 | – |
+| 8 | 3 327 | 2.3 × |
+| 16 | 7 423 | 5.1 × |
+| 32 | 15 286 | 10.5 × |
+| 64 | 25 455 | 17.4 × |
 Near‑linear scaling thanks to sharded ZeRO‑3 + RCCL optimisations.
+@misc{deepcoder2025,
+  title={DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level},
+  author={Michael Luo, Sijun Tan, Roy Huang, Ameen Patel, Alpay Ariyak, Qingyang Wu, Xiaoxiang Shi, Rachel Xin, Colin Cai, Maurice Weber, Ce Zhang, Li Erran Li, Raluca Ada Popa, Ion Stoica},
+  howpublished={\url{https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51}},
+  note={Notion Blog},
+  year={2025}
+}
+## Citation
+@misc{kafkalm2025,
+  title={Evaluating AMD's MI300A APU: Performance Insights on LLM Training via Knowledge Distillation},
+  author={Dennis Dickmann, Philipp Offenhäuser, Rishabh Saxena, George S. Markomanolis, Alessandro Rigazzi, Patrick Keller, Dennis Hoppe},
+  howpublished={Cray User Group Conference, 2025},
+  note={to be published},
+  year={2025}
+}