Update README.md
Browse files
README.md
CHANGED
@@ -13,8 +13,8 @@ pipeline_tag: text-generation
|
|
13 |
# TituLM-1B-ENBN-V1
|
14 |
TituLM-1B-ENBN-V1 is a large language model specifically trained for generating and understanding English and Bangla text. Utilizing a decoder-style transformer architecture, this model has been extensively trained on a dataset comprising __(will disclose later)__ billion Bangla and English tokens. This model is the part of iterative train and release Bilingual LLM from Hishab.
|
15 |
|
16 |
-
|
17 |
-
|
18 |
|
19 |
- n_nead: 16
|
20 |
- n_layers: 24
|
@@ -23,6 +23,7 @@ The training process was managed using the robust framework provided by MosaicML
|
|
23 |
- attn_impl: flash
|
24 |
- Trained on 8 H100 GPU on GCP
|
25 |
|
|
|
26 |
## Datasets
|
27 |
|
28 |
|
|
|
13 |
# TituLM-1B-ENBN-V1
|
14 |
TituLM-1B-ENBN-V1 is a large language model specifically trained for generating and understanding English and Bangla text. Utilizing a decoder-style transformer architecture, this model has been extensively trained on a dataset comprising __(will disclose later)__ billion Bangla and English tokens. This model is the part of iterative train and release Bilingual LLM from Hishab.
|
15 |
|
16 |
+
The training process was managed using the robust framework provided by MosaicML's [llm-foundry](https://github.com/mosaicml/llm-foundry) repository. Throughout the training phase, titulm-1b-bn-v1 underwent a total of 59 iterations, allowing for iterative refinements and optimization.
|
17 |
+
Notable training configs:
|
18 |
|
19 |
- n_nead: 16
|
20 |
- n_layers: 24
|
|
|
23 |
- attn_impl: flash
|
24 |
- Trained on 8 H100 GPU on GCP
|
25 |
|
26 |
+
|
27 |
## Datasets
|
28 |
|
29 |
|