doubledsbv commited on
Commit
ce00daa
·
verified ·
1 Parent(s): 6865a82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -20
README.md CHANGED
@@ -54,36 +54,42 @@ The pruned 15 B student is distilled from a calibrated 24 B teacher using a
54
  ### Results
55
  Up to 40 % parameter reduction (24 B → 15 B) delivers 2× lower TTFT and ≈ 40 % higher tokens/s versus the uncompressed teacher while matching perplexity and divergence metrics—validating SimplePrune as an effective route to deploy KafkaLM in memory‑constrained, sparsity‑accelerated environments.
56
 
57
- ​| Metric | Mistral‑24B | **KafkaLM‑15B** | Δ |
58
  |--------|-------------|-----------------|---|
59
- | Time‑to‑First‑Token | 4.91s | **2.46s** | −50% |
60
- | Prompts/s | 4.70 | **6.55** | +38% |
61
- | Tokens/s | 579 | **812** | +40% |
 
62
 
63
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645ded34a45b4182d7f5c385/4rDhaeC-1GMj6KWbB27f9.png)
64
 
65
  ### Training scalability (distillation run, MI300A cluster)
66
 
67
- | Nodes | Tokens/s | Speed‑up |
68
  |-------|------------|----------|
69
- | 4 |1461 | – |
70
- | 8 |3327 |2.3× |
71
- | 16 |7423 |5.1× |
72
- | 32 |15286 |10.5× |
73
- | 64 |25455 |17.4× |
74
 
75
  Near‑linear scaling thanks to sharded ZeRO‑3 + RCCL optimisations.
76
 
 
 
 
 
 
 
 
77
 
78
 
79
- ## More Information [optional]
80
-
81
- [More Information Needed]
82
-
83
- ## Model Card Authors [optional]
84
-
85
- [More Information Needed]
86
-
87
- ## Model Card Contact
88
 
89
- [More Information Needed]
 
 
 
 
 
 
 
54
  ### Results
55
  Up to 40 % parameter reduction (24 B → 15 B) delivers 2× lower TTFT and ≈ 40 % higher tokens/s versus the uncompressed teacher while matching perplexity and divergence metrics—validating SimplePrune as an effective route to deploy KafkaLM in memory‑constrained, sparsity‑accelerated environments.
56
 
57
+ | Metric | Mistral‑24B | **KafkaLM‑15B** | Δ |
58
  |--------|-------------|-----------------|---|
59
+ | Time‑to‑First‑Token | 4.91 s | **2.46 s** | −50% |
60
+ | Prompts / s | 4.70 | **6.55** | +38% |
61
+ | Tokens / s | 579 | **812** | +40% |
62
+
63
 
64
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645ded34a45b4182d7f5c385/4rDhaeC-1GMj6KWbB27f9.png)
65
 
66
  ### Training scalability (distillation run, MI300A cluster)
67
 
68
+ | Nodes | Tokens / s | Speed‑up |
69
  |-------|------------|----------|
70
+ | 4 | 1 461 | – |
71
+ | 8 | 3 327 | 2.3 × |
72
+ | 16 | 7 423 | 5.1 × |
73
+ | 32 | 15 286 | 10.5 × |
74
+ | 64 | 25 455 | 17.4 × |
75
 
76
  Near‑linear scaling thanks to sharded ZeRO‑3 + RCCL optimisations.
77
 
78
+ @misc{deepcoder2025,
79
+ title={DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level},
80
+ author={Michael Luo, Sijun Tan, Roy Huang, Ameen Patel, Alpay Ariyak, Qingyang Wu, Xiaoxiang Shi, Rachel Xin, Colin Cai, Maurice Weber, Ce Zhang, Li Erran Li, Raluca Ada Popa, Ion Stoica},
81
+ howpublished={\url{https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51}},
82
+ note={Notion Blog},
83
+ year={2025}
84
+ }
85
 
86
 
87
+ ## Citation
 
 
 
 
 
 
 
 
88
 
89
+ @misc{kafkalm2025,
90
+ title={Evaluating AMD's MI300A APU: Performance Insights on LLM Training via Knowledge Distillation},
91
+ author={Dennis Dickmann, Philipp Offenhäuser, Rishabh Saxena, George S. Markomanolis, Alessandro Rigazzi, Patrick Keller, Dennis Hoppe},
92
+ howpublished={Cray User Group Conference, 2025},
93
+ note={to be published},
94
+ year={2025}
95
+ }