Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,12 @@ license: apache-2.0
|
|
3 |
---
|
4 |
|
5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
|
7 |
|
8 |
# VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos
|
@@ -12,11 +18,6 @@ license: apache-2.0
|
|
12 |
[[📖 Paper](https://arxiv.org/abs/2505.01481)] [[🤗 Dataset](https://huggingface.co/datasets/IntelligenceLab/VideoHallu)][[🌍Website](https://wuxiyang1996.github.io/videohallu_page/)]
|
13 |
|
14 |
|
15 |
-
# Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
|
16 |
-
[[📖 Paper](https://arxiv.org/abs/2506.15068)]
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
## 👀 About VideoHallu
|
21 |
|
22 |
With the recent success of video generation models such as [Sora](https://openai.com/sora/), [Veo2](https://veo2.ai), [Kling](https://www.klingai.com/global/), the visual quality of generated videos has reached new heights—making evaluation more challenging and pushing it beyond traditional metrics like frame consistency, resolution, and realism. However, we find that MLLMs struggle to detect abnormalities in generated videos, which is crucial for developing reliable automatic video evaluation methods.
|
@@ -25,8 +26,6 @@ We introduce VideoHallu, a curated dataset that includes videos generated by sev
|
|
25 |
|
26 |
We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on a subset of our dataset and show improvement on generated video understanding.
|
27 |
|
28 |
-
## About Open-Ended R1 Training
|
29 |
-
As open-ended long-form generation gains traction, reliably judging the quality of multi-sentence and paragraph-length outputs has become a major hurdle—traditional overlap metrics like ROUGE-L and BERTScore often miss nuances of coherence, style, and relevance, and can be skewed by pretraining biases. This leaves a critical gap in evaluation methods for guiding and training models that produce lengthy, free-form text.
|
30 |
|
31 |
|
32 |
## 🔥 News
|
|
|
3 |
---
|
4 |
|
5 |
|
6 |
+
# Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
|
7 |
+
[[📖 Paper](https://arxiv.org/abs/2506.15068)]
|
8 |
+
|
9 |
+
|
10 |
+
## About Open-Ended R1 Training
|
11 |
+
As open-ended long-form generation gains traction, reliably judging the quality of multi-sentence and paragraph-length outputs has become a major hurdle—traditional overlap metrics like ROUGE-L and BERTScore often miss nuances of coherence, style, and relevance, and can be skewed by pretraining biases. This leaves a critical gap in evaluation methods for guiding and training models that produce lengthy, free-form text.
|
12 |
|
13 |
|
14 |
# VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos
|
|
|
18 |
[[📖 Paper](https://arxiv.org/abs/2505.01481)] [[🤗 Dataset](https://huggingface.co/datasets/IntelligenceLab/VideoHallu)][[🌍Website](https://wuxiyang1996.github.io/videohallu_page/)]
|
19 |
|
20 |
|
|
|
|
|
|
|
|
|
|
|
21 |
## 👀 About VideoHallu
|
22 |
|
23 |
With the recent success of video generation models such as [Sora](https://openai.com/sora/), [Veo2](https://veo2.ai), [Kling](https://www.klingai.com/global/), the visual quality of generated videos has reached new heights—making evaluation more challenging and pushing it beyond traditional metrics like frame consistency, resolution, and realism. However, we find that MLLMs struggle to detect abnormalities in generated videos, which is crucial for developing reliable automatic video evaluation methods.
|
|
|
26 |
|
27 |
We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on a subset of our dataset and show improvement on generated video understanding.
|
28 |
|
|
|
|
|
29 |
|
30 |
|
31 |
## 🔥 News
|