Gemma-3-270M Dhivehi

Compact Dhivehi (ދިވެހި) pretrained model based on google/gemma-3-270m, trained on a large corpus of Dhivehi text data including news articles, Wikipedia content, and general web text.

Note: This model is specifically pretrained on Dhivehi text and provides a strong foundation for further fine-tuning on specific tasks or direct use for text generation.

Model details

Base: google/gemma-3-270m
Language: Dhivehi
Training data: Large corpus of Dhivehi text including:
- News articles (Dhivehi news corpus)
- Wikipedia articles
- General web content (FineWeb-2)
- Glot500 Dhivehi dataset
Training method: Pretraining with causal language modeling
Training data size: Large corpus combining multiple Dhivehi datasets
Supported tasks:
- Text generation in Dhivehi
- Foundation for fine-tuning on specific tasks

Training Data

The model was pretrained on a comprehensive corpus of Dhivehi text data:

Random Articles: News articles from diffrent sources
Dhivehi News Corpus: General news content in Dhivehi
Wikipedia (dv): Dhivehi Wikipedia articles
FineWeb-2: Filtered web content in Dhivehi
Glot500 (div-thaa): Multilingual dataset with Dhivehi content

Intended use

As a foundation model for fine-tuning on specific Dhivehi tasks
For general text generation in Dhivehi
For research and development of Dhivehi language models

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

# Load model
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

# Load model
model_path = "alakxender/gemma-3-270m-dhivehi-pt"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype="auto",
    device_map="auto",
)

# Method 1: Direct text generation
prompt = "ދިވެހިރާއްޖެއަކީ"

# Tokenize input
inputs = tokenizer(
    prompt, 
    return_tensors="pt", 
    padding=True
)

# Move inputs to the same device as the model
if torch.cuda.is_available():
    inputs = {k: v.to(model.device) for k, v in inputs.items()}

# Generate content
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        # num_return_sequences=1,
        # temperature=0.8,
        # do_sample=True,
        # pad_token_id=tokenizer.eos_token_id,
        # eos_token_id=tokenizer.eos_token_id,
        # repetition_penalty=1.1,
    )

# Decode generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Extract only the newly generated part
generated_only = generated_text[len(prompt):].strip()

print(f"Generated: {generated_only}")

# `gemma-3-270m-dhivehi-pt`: އަޅުގަނޑުމެންގެ ގޮނޑުދޮށްތަކާއި ފަރުބަދަތަކާއި އުތުރު ހިންދުސްތާނުގެ ވަކިވަކި ހިސާބުގައި ދިރިއުޅޭ ދިވެހިންގެ މެދުގައި އޮންނަ ގުދުރަތީ ކަންކަން ދެކިލުމުގެ ފުރުޞަތު އޮތް ޤައުމެއްކަމުގައި ދުވަހަކުވެސް ހިޔެއްނުކުރާނެއެވެ. 33 އަހަރުގެ ވެރިކަމުން މިދެންނެވި
# `google/gemma-3-270m`:  ވެސް އޭނާގެ ރާއްޖެއެއްކަމަށް ވެރިކަން ނުވަތަ ޓީމް ނޭޝަނަލްގެ ބައިވެރިވަރމް އޭޝިޔާގެ ހައިސިއްޔަތް ބޯޑިޔަށް ނުހުންދާ ކަމަށް ލިޔުއްވައިގައެވެ. ދިވެހިންނާއި މިއީ އެއާ އޭޝިޔާގެ ބަޔާންކޮށް ނިޒާމް ނެތުމުގައި ދާއިރާއަށް އޭޝިޔާއަށް ދެއްވައިދޭ މަލިވާރުކަމަށް

Generation Parameters

max_new_tokens: Controls the length of generated text (64-512 recommended)
temperature: Controls randomness (0.1-1.0, higher = more creative)
top_p: Nucleus sampling parameter (0.1-1.0)
top_k: Top-k sampling parameter (1-100)
do_sample: Boolean flag to enable/disable sampling

Limitations

Generated content may not always be factually accurate
Quality depends on the clarity and specificity of input prompts
Context window limitations for very long inputs
The model is specifically pretrained on Dhivehi text and may require fine-tuning for specific tasks
No instruction-following capabilities (this is a pretrained model, not instruction-tuned)

alakxender
/

gemma-3-270m-dhivehi-pt