Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation Paper • 2506.05062 • Published Jun 5 • 15
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games Paper • 2506.05309 • Published Jun 5 • 15
CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature Paper • 2505.20779 • Published May 27 • 15
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation Paper • 2504.17502 • Published Apr 24 • 56
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models Paper • 2504.01137 • Published Apr 1 • 21
More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG Paper • 2503.04388 • Published Mar 6 • 17