Granite-3.3-2B-Avg-SliceWeighted

Granite-3.3-2B-Avg-SliceWeighted is a merge of the following models using LazyMergekit:

🧩 Configuration

# ----------------------------------------------------------------------
# merge_weighted_average_40layers.yaml
#   Slice‑wise weighted‑average merge for a 40‑layer LLM.
#   – Different contribution per layer range.
# ----------------------------------------------------------------------
merge_method: linear               # merge type

# ----------------------------------------------------------------------
# Global merge options
# ----------------------------------------------------------------------
dtype: bfloat16                               # preferred dtype on modern GPUs
parameters:
  normalize: true                           # make each slice’s weights sum to 1.0
  low_cpu_mem_usage: true                    # stream weights, don’t load everything into RAM

seed: 2025                                    # reproducibility
deterministic: true                           # torch‑cudnn deterministic mode

# ----------------------------------------------------------------------
# Metadata (helps with provenance & experiment tracking)
# ----------------------------------------------------------------------
metadata:
  model_name: Granite-3.3-2B-Avg-SliceWeighted
  version: v1.0
  date: 2025-08-15
  notes: |
    - 40‑layer model (indices 0‑39).
    - Three slices:
        * Layers 0‑13   → 80 % Llama‑2, 20 % Mistral
        * Layers 14‑26  → 50 % each (mid‑point)
        * Layers 27‑39  → 20 % Llama‑2, 80 % Mistral
    - Normalised weights are enforced by `parameters.normalize`.
    - Uses granite-3.3-2b-Hermes3dataset tokenizer for token‑id alignment.

# ----------------------------------------------------------------------
# Tokenizer – both source models share the same one, so we can safely force it.
# ----------------------------------------------------------------------
tokenizer_source: powermove72/granite-3.3-2b-Hermes3dataset

# ----------------------------------------------------------------------
# Slice definitions (non‑overlapping, each covers a contiguous block of layers)
# ----------------------------------------------------------------------
slices:
  # --------------------------------------------------------------
  # Slice 1: Layers 0‑13 (the first 14 transformer blocks)
  # --------------------------------------------------------------
  - sources:
      - model: ibm-granite/granite-3.3-2b-instruct
        layer_range: [0, 13]               
        parameters:
          weight: 0.8                       
      - model: powermove72/granite-3.3-2b-Hermes3dataset
        layer_range: [0, 13]
        parameters:
          weight: 0.2                       

  # --------------------------------------------------------------
  # Slice 2: Layers 14‑26 (the middle 13 transformer blocks)
  # --------------------------------------------------------------
  - sources:
      - model: ibm-granite/granite-3.3-2b-instruct
        layer_range: [13, 26]
        parameters:
          weight: 0.5                       # balanced
      - model: powermove72/granite-3.3-2b-Hermes3dataset
        layer_range: [13, 26]
        parameters:
          weight: 0.5

  # --------------------------------------------------------------
  # Slice 3: Layers 27‑39 (the last 14 transformer blocks)
  # --------------------------------------------------------------
  - sources:
      - model: ibm-granite/granite-3.3-2b-instruct
        layer_range: [26, 40]
        parameters:
          weight: 0.2                       
      - model: powermove72/granite-3.3-2b-Hermes3dataset
        layer_range: [26, 40]
        parameters:
          weight: 0.8                       

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "powermove72/Granite-3.3-2B-Avg-SliceWeighted"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
34
Safetensors
Model size
2.53B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for powermove72/Granite-3.3-2B-Avg-SliceWeighted