devilman's picture

26

devilman

devilmanz

·

devilmanz

AI & ML interests

None yet

Recent Activity

liked a model about 1 month ago

magiccodingman/Qwen3-4B-Thinking-2507-Unsloth-MagicQuant-Hybrid-GGUF

liked a model 3 months ago

Qwen/Qwen3-VL-4B-Thinking

reacted to codelion's post with 🔥 5 months ago

I recently added a recipe in ellora to improve reasoning capabilities to Gemma-3-1B using self-supervised learning. Model now shows step-by-step thinking in <think> tags before answering. Logic puzzle accuracy: 61% → 84%. 3 hours training on single GPU. 🧠 Used GRPO where model generates multiple responses and learns to prefer better reasoning. Works surprisingly well for making smaller models more transparent. 🔗 Colab: https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_2_Reasoning_LoRA_with_Self-Rewarding_GRPO.ipynb 🤗 Model: https://huggingface.co/codelion/gemma-3-1b-it-reasoning-grpo-lora 💻 Code: https://github.com/codelion/ellora

View all activity

Organizations

None yet

Collections 1

models 0

None public yet

datasets 0

None public yet