rpand002 commited on
Commit
e927733
·
verified ·
1 Parent(s): 688b292

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -7
README.md CHANGED
@@ -10,13 +10,12 @@ tags:
10
 
11
  **Model Summary:**
12
 
13
- Granite-4.0-Tiny-Base-Preview is a 7B-parameter decoder-only language model featuring a 128k token context window. The architecture leverages Mamba-2, superimposed with a softmax mechanism for enhanced expressiveness, and utilizes the NoPE for positional information encoding for better length generalization.
14
 
15
 
16
 
17
 
18
  - **Developers:** Granite Team, IBM
19
- - **GitHub Repository:** [ibm-granite/granite-4.0-language-models](https://github.com/ibm-granite/granite-4.0-language-models)
20
  - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
21
  - **Release Date**: May 2nd, 2025
22
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
@@ -163,7 +162,7 @@ Granite-4.0-Tiny-Base-Preview is based on a decoder-only dense transformer archi
163
  <th style="text-align:left; background-color: #001d6c; color: white;">Model</th>
164
  <th style="text-align:center; background-color: #001d6c; color: white;">2B Dense</th>
165
  <th style="text-align:center; background-color: #001d6c; color: white;">8B Dense</th>
166
- <th style="text-align:center; background-color: #001d6c; color: white;">Granite-4.0-Tiny</th>
167
  </tr></thead>
168
  <tbody>
169
  <tr>
@@ -224,7 +223,7 @@ Granite-4.0-Tiny-Base-Preview is based on a decoder-only dense transformer archi
224
  <td style="text-align:left; background-color: #FFFFFF; color: black;">Position embedding</td>
225
  <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
226
  <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
227
- <td style="text-align:center; background-color: #DAE8FF; color: black;">NoPE</td>
228
  </tr>
229
  <tr>
230
  <td style="text-align:left; background-color: #FFFFFF; color: black;"># Parameters</td>
@@ -247,14 +246,11 @@ Granite-4.0-Tiny-Base-Preview is based on a decoder-only dense transformer archi
247
  </tbody></table>
248
 
249
  **Training Data:**
250
- <todo>Need to check if this is correct</todo>
251
  This model is trained on a mix of open source and proprietary data following a two-stage training strategy.
252
  * Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
253
  * Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
254
  contains a recitation of the related paragraph before the answer.
255
 
256
- <!-- A detailed attribution of datasets can be found in the [Granite 3.0 Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf), [Granite 4.0 Technical Report (coming soon)](https://huggingface.co/collections/ibm-granite/granite-31-language-models-6751dbbf2f3389bec5c6f02d), and [Accompanying Author List](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/author-ack.pdf). -->
257
-
258
  **Infrastructure:**
259
  We train Granite 4.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
260
 
 
10
 
11
  **Model Summary:**
12
 
13
+ Granite-4.0-Tiny-Base-Preview is a 7B-parameter hybrid mixture-of-experts (MoE) language model featuring a 128k token context window. The architecture leverages Mamba-2, superimposed with a softmax attention for enhanced expressiveness, with no positional encoding for better length generalization.
14
 
15
 
16
 
17
 
18
  - **Developers:** Granite Team, IBM
 
19
  - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
20
  - **Release Date**: May 2nd, 2025
21
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 
162
  <th style="text-align:left; background-color: #001d6c; color: white;">Model</th>
163
  <th style="text-align:center; background-color: #001d6c; color: white;">2B Dense</th>
164
  <th style="text-align:center; background-color: #001d6c; color: white;">8B Dense</th>
165
+ <th style="text-align:center; background-color: #001d6c; color: white;">Granite-4.0-Tiny-Base-Preview</th>
166
  </tr></thead>
167
  <tbody>
168
  <tr>
 
223
  <td style="text-align:left; background-color: #FFFFFF; color: black;">Position embedding</td>
224
  <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
225
  <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
226
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">None</td>
227
  </tr>
228
  <tr>
229
  <td style="text-align:left; background-color: #FFFFFF; color: black;"># Parameters</td>
 
246
  </tbody></table>
247
 
248
  **Training Data:**
 
249
  This model is trained on a mix of open source and proprietary data following a two-stage training strategy.
250
  * Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
251
  * Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
252
  contains a recitation of the related paragraph before the answer.
253
 
 
 
254
  **Infrastructure:**
255
  We train Granite 4.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
256