Update README.md
Browse files
README.md
CHANGED
@@ -10,13 +10,12 @@ tags:
|
|
10 |
|
11 |
**Model Summary:**
|
12 |
|
13 |
-
Granite-4.0-Tiny-Base-Preview is a 7B-parameter
|
14 |
|
15 |
|
16 |
|
17 |
|
18 |
- **Developers:** Granite Team, IBM
|
19 |
-
- **GitHub Repository:** [ibm-granite/granite-4.0-language-models](https://github.com/ibm-granite/granite-4.0-language-models)
|
20 |
- **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
|
21 |
- **Release Date**: May 2nd, 2025
|
22 |
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
@@ -163,7 +162,7 @@ Granite-4.0-Tiny-Base-Preview is based on a decoder-only dense transformer archi
|
|
163 |
<th style="text-align:left; background-color: #001d6c; color: white;">Model</th>
|
164 |
<th style="text-align:center; background-color: #001d6c; color: white;">2B Dense</th>
|
165 |
<th style="text-align:center; background-color: #001d6c; color: white;">8B Dense</th>
|
166 |
-
<th style="text-align:center; background-color: #001d6c; color: white;">Granite-4.0-Tiny</th>
|
167 |
</tr></thead>
|
168 |
<tbody>
|
169 |
<tr>
|
@@ -224,7 +223,7 @@ Granite-4.0-Tiny-Base-Preview is based on a decoder-only dense transformer archi
|
|
224 |
<td style="text-align:left; background-color: #FFFFFF; color: black;">Position embedding</td>
|
225 |
<td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
|
226 |
<td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
|
227 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">
|
228 |
</tr>
|
229 |
<tr>
|
230 |
<td style="text-align:left; background-color: #FFFFFF; color: black;"># Parameters</td>
|
@@ -247,14 +246,11 @@ Granite-4.0-Tiny-Base-Preview is based on a decoder-only dense transformer archi
|
|
247 |
</tbody></table>
|
248 |
|
249 |
**Training Data:**
|
250 |
-
<todo>Need to check if this is correct</todo>
|
251 |
This model is trained on a mix of open source and proprietary data following a two-stage training strategy.
|
252 |
* Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
|
253 |
* Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
|
254 |
contains a recitation of the related paragraph before the answer.
|
255 |
|
256 |
-
<!-- A detailed attribution of datasets can be found in the [Granite 3.0 Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf), [Granite 4.0 Technical Report (coming soon)](https://huggingface.co/collections/ibm-granite/granite-31-language-models-6751dbbf2f3389bec5c6f02d), and [Accompanying Author List](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/author-ack.pdf). -->
|
257 |
-
|
258 |
**Infrastructure:**
|
259 |
We train Granite 4.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
260 |
|
|
|
10 |
|
11 |
**Model Summary:**
|
12 |
|
13 |
+
Granite-4.0-Tiny-Base-Preview is a 7B-parameter hybrid mixture-of-experts (MoE) language model featuring a 128k token context window. The architecture leverages Mamba-2, superimposed with a softmax attention for enhanced expressiveness, with no positional encoding for better length generalization.
|
14 |
|
15 |
|
16 |
|
17 |
|
18 |
- **Developers:** Granite Team, IBM
|
|
|
19 |
- **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
|
20 |
- **Release Date**: May 2nd, 2025
|
21 |
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
|
|
162 |
<th style="text-align:left; background-color: #001d6c; color: white;">Model</th>
|
163 |
<th style="text-align:center; background-color: #001d6c; color: white;">2B Dense</th>
|
164 |
<th style="text-align:center; background-color: #001d6c; color: white;">8B Dense</th>
|
165 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">Granite-4.0-Tiny-Base-Preview</th>
|
166 |
</tr></thead>
|
167 |
<tbody>
|
168 |
<tr>
|
|
|
223 |
<td style="text-align:left; background-color: #FFFFFF; color: black;">Position embedding</td>
|
224 |
<td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
|
225 |
<td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
|
226 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">None</td>
|
227 |
</tr>
|
228 |
<tr>
|
229 |
<td style="text-align:left; background-color: #FFFFFF; color: black;"># Parameters</td>
|
|
|
246 |
</tbody></table>
|
247 |
|
248 |
**Training Data:**
|
|
|
249 |
This model is trained on a mix of open source and proprietary data following a two-stage training strategy.
|
250 |
* Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
|
251 |
* Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
|
252 |
contains a recitation of the related paragraph before the answer.
|
253 |
|
|
|
|
|
254 |
**Infrastructure:**
|
255 |
We train Granite 4.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
256 |
|