-
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 121 -
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Paper • 2505.09694 • Published • 19 -
End-to-End Vision Tokenizer Tuning
Paper • 2505.10562 • Published • 22 -
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2505.10320
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 31 -
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Paper • 2503.23157 • Published • 11 -
AI Agents: Evolution, Architecture, and Real-World Applications
Paper • 2503.12687 • Published • 2
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 24 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 23 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 16
-
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 23 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 69 -
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 121 -
Scaling Reasoning can Improve Factuality in Large Language Models
Paper • 2505.11140 • Published • 7
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84
-
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 121 -
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Paper • 2505.09694 • Published • 19 -
End-to-End Vision Tokenizer Tuning
Paper • 2505.10562 • Published • 22 -
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 23
-
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper • 2505.10320 • Published • 23 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 69 -
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Paper • 2505.10554 • Published • 121 -
Scaling Reasoning can Improve Factuality in Large Language Models
Paper • 2505.11140 • Published • 7
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 31 -
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Paper • 2503.23157 • Published • 11 -
AI Agents: Evolution, Architecture, and Real-World Applications
Paper • 2503.12687 • Published • 2
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 24 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 23 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 16