Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,25 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- togethercomputer/RedPajama-Data-V2
|
5 |
+
- uonlp/CulturaX
|
6 |
+
- wikipedia
|
7 |
+
language:
|
8 |
+
- en
|
9 |
+
- bn
|
10 |
+
pipeline_tag: text-generation
|
11 |
---
|
12 |
+
|
13 |
+
# TituLM-1B-ENBN-V1
|
14 |
+
TituLM-1B-ENBN-V1 is a large language model specifically trained for generating and understanding English and Bangla text. Utilizing a decoder-style transformer architecture, this model has been extensively trained on a dataset comprising __(will disclose later)__ billion Bangla tokens. This model is the part of iterative train and release Bilingual LLM from Hishab.
|
15 |
+
|
16 |
+
## Training
|
17 |
+
The training process was managed using the robust framework provided by MosaicML's llm-foundry repository. Throughout the training phase, titulm-1b-bn-v1 underwent a total of 42 iterations, allowing for iterative refinements and optimization. Notable training configs:
|
18 |
+
|
19 |
+
- n_nead: 16
|
20 |
+
- n_layers: 24
|
21 |
+
- max_sequence_length: 2048
|
22 |
+
- vocab_size: 72000
|
23 |
+
- attn_impl: flash
|
24 |
+
- Trained on 8 H100 GPU on GCP
|
25 |
+
|