Train one epoch SFT on UltraChat200K
Zizhuo Zhang PRO
resistz
AI & ML interests
None yet
Recent Activity
updated
a model
4 days ago
resistz/GT-GRPO_Llama-3.2-3B-Instruct_NQ-HotpotQA
published
a model
4 days ago
resistz/GT-GRPO_Llama-3.2-3B-Instruct_NQ-HotpotQA
updated
a model
4 days ago
TMLR-Group-HF/Co-rewarding-III-Llama-3.2-3B-Instruct-DAPO14k