Gemma-3-270M Dhivehi

Compact Dhivehi (ދިވެހި) pretrained model based on google/gemma-3-270m, trained on a large corpus of Dhivehi text data including news articles, Wikipedia content, and general web text.

Note: This model is specifically pretrained on Dhivehi text and provides a strong foundation for further fine-tuning on specific tasks or direct use for text generation.

Model details

  • Base: google/gemma-3-270m
  • Language: Dhivehi
  • Training data: Large corpus of Dhivehi text including:
    • News articles (Dhivehi news corpus)
    • Wikipedia articles
    • General web content (FineWeb-2)
    • Glot500 Dhivehi dataset
  • Training method: Pretraining with causal language modeling
  • Training data size: Large corpus combining multiple Dhivehi datasets
  • Supported tasks:
    • Text generation in Dhivehi
    • Foundation for fine-tuning on specific tasks

Training Data

The model was pretrained on a comprehensive corpus of Dhivehi text data:

  • Random Articles: News articles from diffrent sources
  • Dhivehi News Corpus: General news content in Dhivehi
  • Wikipedia (dv): Dhivehi Wikipedia articles
  • FineWeb-2: Filtered web content in Dhivehi
  • Glot500 (div-thaa): Multilingual dataset with Dhivehi content

Intended use

  • As a foundation model for fine-tuning on specific Dhivehi tasks
  • For general text generation in Dhivehi
  • For research and development of Dhivehi language models

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

# Load model
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

# Load model
model_path = "alakxender/gemma-3-270m-dhivehi-pt"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype="auto",
    device_map="auto",
)

# Method 1: Direct text generation
prompt = "ދިވެހިރާއްޖެއަކީ"

# Tokenize input
inputs = tokenizer(
    prompt, 
    return_tensors="pt", 
    padding=True
)

# Move inputs to the same device as the model
if torch.cuda.is_available():
    inputs = {k: v.to(model.device) for k, v in inputs.items()}

# Generate content
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        # num_return_sequences=1,
        # temperature=0.8,
        # do_sample=True,
        # pad_token_id=tokenizer.eos_token_id,
        # eos_token_id=tokenizer.eos_token_id,
        # repetition_penalty=1.1,
    )

# Decode generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Extract only the newly generated part
generated_only = generated_text[len(prompt):].strip()

print(f"Generated: {generated_only}")

# `gemma-3-270m-dhivehi-pt`: އަޅުގަނޑުމެންގެ ގޮނޑުދޮށްތަކާއި ފަރުބަދަތަކާއި އުތުރު ހިންދުސްތާނުގެ ވަކިވަކި ހިސާބުގައި ދިރިއުޅޭ ދިވެހިންގެ މެދުގައި އޮންނަ ގުދުރަތީ ކަންކަން ދެކިލުމުގެ ފުރުޞަތު އޮތް ޤައުމެއްކަމުގައި ދުވަހަކުވެސް ހިޔެއްނުކުރާނެއެވެ. 33 އަހަރުގެ ވެރިކަމުން މިދެންނެވި
# `google/gemma-3-270m`:  ވެސް އޭނާގެ ރާއްޖެއެއްކަމަށް ވެރިކަން ނުވަތަ ޓީމް ނޭޝަނަލްގެ ބައިވެރިވަރމް އޭޝިޔާގެ ހައިސިއްޔަތް ބޯޑިޔަށް ނުހުންދާ ކަމަށް ލިޔުއްވައިގައެވެ. ދިވެހިންނާއި މިއީ އެއާ އޭޝިޔާގެ ބަޔާންކޮށް ނިޒާމް ނެތުމުގައި ދާއިރާއަށް އޭޝިޔާއަށް ދެއްވައިދޭ މަލިވާރުކަމަށް

Generation Parameters

  • max_new_tokens: Controls the length of generated text (64-512 recommended)
  • temperature: Controls randomness (0.1-1.0, higher = more creative)
  • top_p: Nucleus sampling parameter (0.1-1.0)
  • top_k: Top-k sampling parameter (1-100)
  • do_sample: Boolean flag to enable/disable sampling

Limitations

  • Generated content may not always be factually accurate
  • Quality depends on the clarity and specificity of input prompts
  • Context window limitations for very long inputs
  • The model is specifically pretrained on Dhivehi text and may require fine-tuning for specific tasks
  • No instruction-following capabilities (this is a pretrained model, not instruction-tuned)
Downloads last month
134
Safetensors
Model size
268M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alakxender/gemma-3-270m-dhivehi-pt

Finetuned
(52)
this model

Datasets used to train alakxender/gemma-3-270m-dhivehi-pt