Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks Paper • 2510.02286 • Published Oct 2, 2025 • 28
What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models Paper • 2506.06485 • Published Jun 6, 2025 • 5
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 13