hishab
/

titulm-mpt-1b-v2.0

Text Generation

text-generation-inference

Model card Files Files and versions

sagorsarker commited on Apr 3, 2024

Commit

c2604aa

·

verified ·

1 Parent(s): c905252

Update README.md

Files changed (1) hide show

README.md +22 -0

README.md CHANGED Viewed

@@ -1,3 +1,25 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+datasets:
+- togethercomputer/RedPajama-Data-V2
+- uonlp/CulturaX
+- wikipedia
+language:
+- en
+- bn
+pipeline_tag: text-generation
 ---
+# TituLM-1B-ENBN-V1
+TituLM-1B-ENBN-V1 is a large language model specifically trained for generating and understanding English and Bangla text. Utilizing a decoder-style transformer architecture, this model has been extensively trained on a dataset comprising __(will disclose later)__ billion Bangla tokens. This model is the part of iterative train and release Bilingual LLM from Hishab.
+## Training
+The training process was managed using the robust framework provided by MosaicML's llm-foundry repository. Throughout the training phase, titulm-1b-bn-v1 underwent a total of 42 iterations, allowing for iterative refinements and optimization. Notable training configs:
+- n_nead: 16
+- n_layers: 24
+- max_sequence_length: 2048
+- vocab_size: 72000
+- attn_impl: flash
+- Trained on 8 H100 GPU on GCP