DecipherPref: Analyzing Influential Factors in Human Preference Judgments via GPT-4 Paper • 2305.14702 • Published May 24, 2023 • 1
MeetingBank: A Benchmark Dataset for Meeting Summarization Paper • 2305.17529 • Published May 27, 2023 • 1
InFoBench: Evaluating Instruction Following Ability in Large Language Models Paper • 2401.03601 • Published Jan 7, 2024 • 7
SportsMetrics: Blending Text and Numerical Data to Understand Information Fusion in LLMs Paper • 2402.10979 • Published Feb 15, 2024
When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives Paper • 2406.12084 • Published Jun 17, 2024
TCIA: A Task-Centric Instruction Augmentation Method for Instruction Finetuning Paper • 2508.20374 • Published 12 days ago • 21
TCIA: A Task-Centric Instruction Augmentation Method for Instruction Finetuning Paper • 2508.20374 • Published 12 days ago • 21
MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs Paper • 2508.18264 • Published 15 days ago • 26
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries Paper • 2508.15760 • Published 19 days ago • 44
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries Paper • 2508.15760 • Published 19 days ago • 44
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation Paper • 2502.03860 • Published Feb 6 • 25