ibm-granite
/

granite-4.0-tiny-base-preview

@@ -10,13 +10,12 @@ tags:
 **Model Summary:**
-Granite-4.0-Tiny-Base-Preview is a 7B-parameter decoder-only language model featuring a 128k token context window. The architecture leverages Mamba-2, superimposed with a softmax mechanism for enhanced expressiveness, and utilizes the NoPE for positional information encoding for better length generalization.
 - **Developers:** Granite Team, IBM
-- **GitHub Repository:** [ibm-granite/granite-4.0-language-models](https://github.com/ibm-granite/granite-4.0-language-models)
 - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
 - **Release Date**: May 2nd, 2025
 - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
@@ -163,7 +162,7 @@ Granite-4.0-Tiny-Base-Preview is based on a decoder-only dense transformer archi
     <th style="text-align:left; background-color: #001d6c; color: white;">Model</th>
     <th style="text-align:center; background-color: #001d6c; color: white;">2B Dense</th>
     <th style="text-align:center; background-color: #001d6c; color: white;">8B Dense</th>
-    <th style="text-align:center; background-color: #001d6c; color: white;">Granite-4.0-Tiny</th>
   </tr></thead>
 <tbody>
   <tr>
@@ -224,7 +223,7 @@ Granite-4.0-Tiny-Base-Preview is based on a decoder-only dense transformer archi
     <td style="text-align:left; background-color: #FFFFFF; color: black;">Position embedding</td>
     <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
     <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
-    <td style="text-align:center; background-color: #DAE8FF; color: black;">NoPE</td>
   </tr>
   <tr>
     <td style="text-align:left; background-color: #FFFFFF; color: black;"># Parameters</td>
@@ -247,14 +246,11 @@ Granite-4.0-Tiny-Base-Preview is based on a decoder-only dense transformer archi
 </tbody></table>
 **Training Data:**
-    <todo>Need to check if this is correct</todo>
 This model is trained on a mix of open source and proprietary data following a two-stage training strategy.
 * Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
 * Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
 contains a recitation of the related paragraph before the answer.
-<!-- A detailed attribution of datasets can be found in the [Granite 3.0 Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf), [Granite 4.0 Technical Report (coming soon)](https://huggingface.co/collections/ibm-granite/granite-31-language-models-6751dbbf2f3389bec5c6f02d), and [Accompanying Author List](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/author-ack.pdf). -->
 **Infrastructure:**
 We train Granite 4.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.

 **Model Summary:**
+Granite-4.0-Tiny-Base-Preview is a 7B-parameter hybrid mixture-of-experts (MoE) language model featuring a 128k token context window. The architecture leverages Mamba-2, superimposed with a softmax attention for enhanced expressiveness, with no positional encoding for better length generalization.
 - **Developers:** Granite Team, IBM
 - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
 - **Release Date**: May 2nd, 2025
 - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
     <th style="text-align:left; background-color: #001d6c; color: white;">Model</th>
     <th style="text-align:center; background-color: #001d6c; color: white;">2B Dense</th>
     <th style="text-align:center; background-color: #001d6c; color: white;">8B Dense</th>
+    <th style="text-align:center; background-color: #001d6c; color: white;">Granite-4.0-Tiny-Base-Preview</th>
   </tr></thead>
 <tbody>
   <tr>
     <td style="text-align:left; background-color: #FFFFFF; color: black;">Position embedding</td>
     <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
     <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
+    <td style="text-align:center; background-color: #DAE8FF; color: black;">None</td>
   </tr>
   <tr>
     <td style="text-align:left; background-color: #FFFFFF; color: black;"># Parameters</td>
 </tbody></table>
 **Training Data:**
 This model is trained on a mix of open source and proprietary data following a two-stage training strategy.
 * Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
 * Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
 contains a recitation of the related paragraph before the answer.
 **Infrastructure:**
 We train Granite 4.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.