Qwen3 GRPO-trained w/ thinksafe
-
Sangsang/thinksafe-0.6B-n1-ablation_R32_BZ32_Gen8_checkpoint-500
Text Generation • Updated • 9 -
Sangsang/thinksafe-0.6B-n1-ablation_R32_BZ32_Gen8_checkpoint-1000
Text Generation • Updated • 1 -
Sangsang/thinksafe-0.6B-n1-ablation_R32_BZ32_Gen8_checkpoint-1500
Text Generation • Updated • 6 -
Sangsang/thinksafe-0.6B-n1-ablation_R32_BZ32_Gen8_checkpoint-2000
Text Generation • Updated • 7