--- library_name: transformers license: apache-2.0 datasets: - PrimeIntellect/Reverse-Text-SFT base_model: - PrimeIntellect/Qwen3-0.6B --- # Qwen3-0.6B-Reverse-Text-SFT A debug model fine-tuned on `willcb/R1-reverse-wikipedia-paragraphs-v1-1000`. To be used as warmed up model to RL in `vf-reverse-text`. Created with this training command from [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl) (commit hash: `8262560`) ```bash uv run torchrun --nproc-per-node 8 src/prime_rl/trainer/sft/train.py \ --model.name PrimeIntellect/Qwen3-0.6B \ --data.name willcb/R1-reverse-wikipedia-paragraphs-v1-1000 \ --max-steps 100 \ --data.batch-size 16 \ --data.micro-batch-size 1 \ --data.seq-len 4096 \ --optim.lr 2e-5 ``` Check the run out on [W&B](https://wandb.ai/primeintellect/mika/runs/odsfiekx?nw=nwusermikasenghaas_).