24 59 230

Yinxu Pan

cppowboy

https://github.com/Cppowboy

AI & ML interests

RL for LLM, Code&Math Reasoning, Function Calling, Code Interpreter, Vision-Language Pretraining

Recent Activity

upvoted a paper about 23 hours ago

A Survey of Reinforcement Learning for Large Reasoning Models

upvoted a paper about 23 hours ago

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

upvoted a paper 3 days ago

WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

View all activity

Organizations

upvoted 2 papers about 23 hours ago

A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published 1 day ago • 110

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 175

upvoted a paper 3 days ago

WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

Paper • 2509.06501 • Published 4 days ago • 72

liked 2 datasets 3 days ago

Pageshift-Entertainment/LongPage

Viewer • Updated 7 days ago • 300 • 8.66k • 40

jupyter-agent/jupyter-agent-dataset

Viewer • Updated 1 day ago • 95.8k • 2.95k • 125

New activity in hkust-nlp/WebExplorer-QA 3 days ago

Will the full train dataset be open sourced in the future?

#2 opened 3 days ago by

cppowboy

liked a dataset 3 days ago

hkust-nlp/WebExplorer-QA

Viewer • Updated 3 days ago • 100 • 76 • 4

upvoted a paper 4 days ago

Why Language Models Hallucinate

Paper • 2509.04664 • Published 7 days ago • 151

liked a model 6 days ago

openbmb/MiniCPM4.1-8B

Text Generation • 8B • Updated about 18 hours ago • 846 • 283

liked 2 datasets 11 days ago

MathArena/hmmt_feb_2025

Viewer • Updated May 14 • 30 • 1.25k • 4

nvidia/OpenScienceReasoning-2

Viewer • Updated Jul 31 • 803k • 1.83k • 38

upvoted a paper 14 days ago

rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published 15 days ago • 102

upvoted a paper 16 days ago

Hermes 4 Technical Report

Paper • 2508.18255 • Published 18 days ago • 35

New activity in r2e-edits/SweSmith-RL-Dataset 17 days ago

Are these docker images publicly available?

#2 opened 17 days ago by

cppowboy

liked a model 17 days ago

openbmb/MiniCPM-V-4_5

Image-Text-to-Text • 9B • Updated 27 minutes ago • 50.7k • 912

New activity in SWE-bench/SWE-smith 18 days ago

您好，请问FAIL_TO_PASS的文件在镜像里为什么没有啊

#6 opened about 1 month ago by

ray075hl

New activity in nebius/SWE-rebench 18 days ago

Could this dataset be repurposed for LLM training?

#7 opened 18 days ago by

cppowboy

liked a dataset 20 days ago

Alibaba-NLP/WebShaper

Viewer • Updated Jul 22 • 500 • 7.4k • 20

liked a dataset 22 days ago

inclusionAI/ASearcher-train-data

Preview • Updated 30 days ago • 702 • 12

upvoted a paper 22 days ago

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Paper • 2508.11408 • Published 28 days ago • 8

Yinxu Pan

AI & ML interests

Recent Activity

Organizations

cppowboy's activity

Will the full train dataset be open sourced in the future?

Are these docker images publicly available?

您好，请问FAIL_TO_PASS的文件在镜像里为什么没有啊

Could this dataset be repurposed for LLM training?