File size: 726 Bytes
2269f8f 43f8106 2269f8f a6b6f6a 1ef02e3 566eef2 1ef02e3 2211bf2 b1face4 a6b6f6a 26fe2a5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
---
license: apache-2.0
datasets:
- lmms-lab/LLaVA-Video-178K
language:
- en
base_model:
- Qwen/Qwen2-VL-7B
tags:
- qwen2_vl
- multimodal
- conversational
---
# Model Card
This model is obtained by fine-tuning [Qwen2-VL-7B-Base](https://huggingface.co/Qwen/Qwen2-VL-7B) on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K). It is used as a comparison baseline in [LiveCC](https://showlab.github.io/livecc) project.
# Performance
# Acknowledgement
[Joya Chen](https://chenjoya.github.io/) built the training code, and Yiqi Lin trained the model. The QA evaluation is done by [Joya Chen](https://chenjoya.github.io/), and CC evaluation is done by Ziyun Zeng. Infra is supported by the company. |