19 34 23

Zhang Yuanhan

ZhangYuanhan

https://zhangyuanhan-ai.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper 7 days ago

Streaming Video Instruction Tuning

upvoted a paper 13 days ago

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

upvoted a paper about 1 month ago

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

View all activity

Organizations

authored a paper 6 months ago

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

Paper • 2507.15028 • Published Jul 20, 2025 • 21

authored a paper 10 months ago

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5, 2025 • 46

authored a paper 12 months ago

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23, 2025 • 23

authored 12 papers over 1 year ago

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

Paper • 2404.01258 • Published Apr 1, 2024 • 12

Learning without Forgetting for Vision-Language Models

Paper • 2305.19270 • Published May 30, 2023

Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy

Paper • 2203.07845 • Published Mar 15, 2022

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 42

Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24, 2024 • 33

authored 2 papers about 2 years ago

OtterHD: A High-Resolution Multi-modality Model

Paper • 2311.04219 • Published Nov 7, 2023 • 34

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Paper • 2310.08588 • Published Oct 12, 2023 • 38

authored 2 papers over 2 years ago

MIMIC-IT: Multi-Modal In-Context Instruction Tuning

Paper • 2306.05425 • Published Jun 8, 2023 • 11

Otter: A Multi-Modal Model with In-Context Instruction Tuning

Paper • 2305.03726 • Published May 5, 2023 • 6

Zhang Yuanhan

AI & ML interests

Recent Activity

Organizations

ZhangYuanhan's activity

🎉 Free Image Generator Now Available!