ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-F16 Reinforcement Learning • 8B • Updated Mar 25 • 4.62k • 87
Open-Reasoner-Zero/Open-Reasoner-Zero-Critic-32B Reinforcement Learning • 32B • Updated Apr 7 • 7 • 6
NousResearch/DeepHermes-AscensionMaze-RLAIF-8b-Atropos Reinforcement Learning • 8B • Updated Apr 29 • 84 • 7
mradermacher/Qwen3-14B-ARPO-DeepSearch-GGUF Reinforcement Learning • 15B • Updated 30 days ago • 2.09k • 2