Gemma-3-270M Dhivehi
Compact Dhivehi (ދިވެހި) pretrained model based on google/gemma-3-270m
, trained on a large corpus of Dhivehi text data including news articles, Wikipedia content, and general web text.
Note: This model is specifically pretrained on Dhivehi text and provides a strong foundation for further fine-tuning on specific tasks or direct use for text generation.
Model details
- Base:
google/gemma-3-270m
- Language: Dhivehi
- Training data: Large corpus of Dhivehi text including:
- News articles (Dhivehi news corpus)
- Wikipedia articles
- General web content (FineWeb-2)
- Glot500 Dhivehi dataset
- Training method: Pretraining with causal language modeling
- Training data size: Large corpus combining multiple Dhivehi datasets
- Supported tasks:
- Text generation in Dhivehi
- Foundation for fine-tuning on specific tasks
Training Data
The model was pretrained on a comprehensive corpus of Dhivehi text data:
- Random Articles: News articles from diffrent sources
- Dhivehi News Corpus: General news content in Dhivehi
- Wikipedia (dv): Dhivehi Wikipedia articles
- FineWeb-2: Filtered web content in Dhivehi
- Glot500 (div-thaa): Multilingual dataset with Dhivehi content
Intended use
- As a foundation model for fine-tuning on specific Dhivehi tasks
- For general text generation in Dhivehi
- For research and development of Dhivehi language models
How to use
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
# Load model
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
# Load model
model_path = "alakxender/gemma-3-270m-dhivehi-pt"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype="auto",
device_map="auto",
)
# Method 1: Direct text generation
prompt = "ދިވެހިރާއްޖެއަކީ"
# Tokenize input
inputs = tokenizer(
prompt,
return_tensors="pt",
padding=True
)
# Move inputs to the same device as the model
if torch.cuda.is_available():
inputs = {k: v.to(model.device) for k, v in inputs.items()}
# Generate content
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=150,
# num_return_sequences=1,
# temperature=0.8,
# do_sample=True,
# pad_token_id=tokenizer.eos_token_id,
# eos_token_id=tokenizer.eos_token_id,
# repetition_penalty=1.1,
)
# Decode generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Extract only the newly generated part
generated_only = generated_text[len(prompt):].strip()
print(f"Generated: {generated_only}")
# `gemma-3-270m-dhivehi-pt`: އަޅުގަނޑުމެންގެ ގޮނޑުދޮށްތަކާއި ފަރުބަދަތަކާއި އުތުރު ހިންދުސްތާނުގެ ވަކިވަކި ހިސާބުގައި ދިރިއުޅޭ ދިވެހިންގެ މެދުގައި އޮންނަ ގުދުރަތީ ކަންކަން ދެކިލުމުގެ ފުރުޞަތު އޮތް ޤައުމެއްކަމުގައި ދުވަހަކުވެސް ހިޔެއްނުކުރާނެއެވެ. 33 އަހަރުގެ ވެރިކަމުން މިދެންނެވި
# `google/gemma-3-270m`: ވެސް އޭނާގެ ރާއްޖެއެއްކަމަށް ވެރިކަން ނުވަތަ ޓީމް ނޭޝަނަލްގެ ބައިވެރިވަރމް އޭޝިޔާގެ ހައިސިއްޔަތް ބޯޑިޔަށް ނުހުންދާ ކަމަށް ލިޔުއްވައިގައެވެ. ދިވެހިންނާއި މިއީ އެއާ އޭޝިޔާގެ ބަޔާންކޮށް ނިޒާމް ނެތުމުގައި ދާއިރާއަށް އޭޝިޔާއަށް ދެއްވައިދޭ މަލިވާރުކަމަށް
Generation Parameters
max_new_tokens
: Controls the length of generated text (64-512 recommended)temperature
: Controls randomness (0.1-1.0, higher = more creative)top_p
: Nucleus sampling parameter (0.1-1.0)top_k
: Top-k sampling parameter (1-100)do_sample
: Boolean flag to enable/disable sampling
Limitations
- Generated content may not always be factually accurate
- Quality depends on the clarity and specificity of input prompts
- Context window limitations for very long inputs
- The model is specifically pretrained on Dhivehi text and may require fine-tuning for specific tasks
- No instruction-following capabilities (this is a pretrained model, not instruction-tuned)
- Downloads last month
- 134
Model tree for alakxender/gemma-3-270m-dhivehi-pt
Base model
google/gemma-3-270m