Granite-3.3-2B-Slerp
Granite-3.3-2B-Slerp is a merge of the following models using LazyMergekit:
🧩 Configuration
# ----------------------------------------------------------------------
# Granite-3.3-2B-Slerp – 40‑layer variant (v1.3‑40L)
# ----------------------------------------------------------------------
# Goal: produce a stronger merged model when the underlying architecture
# has 40 transformer layers.
# ----------------------------------------------------------------------
slices:
- sources:
- model: powermove72/granite-3.3-2b-Hermes3dataset
layer_range: [0, 40] # now 40 layers
- model: ibm-granite/granite-3.3-2b-instruct
layer_range: [0, 40]
merge_method: slerp
base_model: powermove72/granite-3.3-2b-Hermes3dataset
parameters:
t:
- filter: self_attn
value: &self_attn_t
# Cosine‑annealed schedule (40 values)
- 1.000
- 0.991
- 0.967
- 0.928
- 0.876
- 0.812
- 0.739
- 0.658
- 0.572
- 0.483
- 0.393
- 0.304
- 0.218
- 0.138
- 0.067
- 0.008
- 0.000
- 0.008
- 0.067
- 0.138
- 0.218
- 0.304
- 0.393
- 0.483
- 0.572
- 0.658
- 0.739
- 0.812
- 0.876
- 0.928
- 0.967
- 0.991
- 1.000
- 0.991
- 0.967
- 0.928
- 0.876
- 0.812
- 0.739
- 0.658
- 0.572
- 0.483
- 0.393
- 0.304
- 0.218
- 0.138
- 0.067
- 0.008
- 0.000
- filter: mlp
value: &mlp_t
# Complementary schedule (1 - self_attn)
- 0.000
- 0.009
- 0.033
- 0.072
- 0.124
- 0.188
- 0.261
- 0.342
- 0.428
- 0.517
- 0.607
- 0.696
- 0.782
- 0.862
- 0.933
- 0.992
- 1.000
- 0.992
- 0.933
- 0.862
- 0.782
- 0.696
- 0.607
- 0.517
- 0.428
- 0.342
- 0.261
- 0.188
- 0.124
- 0.072
- 0.033
- 0.009
- 0.000
- 0.009
- 0.033
- 0.072
- 0.124
- 0.188
- 0.261
- 0.342
- 0.428
- 0.517
- 0.607
- 0.696
- 0.782
- 0.862
- 0.933
- 0.992
- 1.000
- value: 0.5 # Global fallback (unused when per‑filter defined)
dtype: bfloat16
seed: 42
deterministic: true
metadata:
model_name: Granite-3.3-2B-Slerp
version: v1.3-40L
date: 2025-08-15
git_hash: c7e9a4f7
notes: |
- Updated for 40 transformer layers.
- Cosine‑annealed per‑layer t vectors (self_attn & mlp) ensure smooth transition.
- Deterministic SLERP (seed=42) for reproducibility.
- Evaluation hook runs MMLU & HELM after merge.
- Optional: uncomment `t_amplitude: 0.6` to increase the contrast between the two models.
- Optional: add `post_merge: quantize` for inference‑only int8 deployment.
post_merge:
- name: eval_benchmarks
command: |
python -m eval.run --model Granite-3.3-2B-Slerp --tasks mmlu,helm --precision bfloat16 --output results/2025-08-15-40L.json
💻 Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "powermove72/Granite-3.3-2B-Slerp"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
- Downloads last month
- 13
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for powermove72/Granite-3.3-2B-Slerp
Merge model
this model