MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round5-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated 13 days ago • 9
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round5-checkpoint-epoch-60 Reinforcement Learning • 1B • Updated 13 days ago • 10
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round5-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated 13 days ago • 10
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round5-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated 13 days ago • 7
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated 13 days ago • 9
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated 13 days ago • 7
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-60 Reinforcement Learning • 1B • Updated 13 days ago • 10
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated 13 days ago • 7
MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated 13 days ago • 7
AzalKhan/Qwen2.5-1.5B-Instruct_open-r1-DAPO-Math-17k-Processed_294 Reinforcement Learning • 2B • Updated about 8 hours ago • 19
AzalKhan/Qwen2.5-1.5B-Instruct_open-r1-DAPO-Math-17k-Processed_588 Reinforcement Learning • 2B • Updated about 1 hour ago • 6
AzalKhan/Qwen2.5-1.5B-Instruct_open-r1-DAPO-Math-17k-Processed_1 Reinforcement Learning • 2B • Updated 1 day ago • 3